Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache misses due to path-based versioning and its normalization #330

Open
mzabaluev opened this issue May 26, 2020 · 15 comments
Open

Cache misses due to path-based versioning and its normalization #330

mzabaluev opened this issue May 26, 2020 · 15 comments
Assignees

Comments

@mzabaluev
Copy link

mzabaluev commented May 26, 2020

This run using actions/cache@v2 shows how a cache created by the update-deps job is not hit by the jobs in the test matrix that depend on update-deps and use the same key. This works in actions/cache@v1, so it looks like a regression to me.

What's more peculiar, each of the three test jobs reported to have successfully stored its cache state under this same key.
In the next run, all jobs hit the cache, but which state did they get?

@dhadka
Copy link
Member

dhadka commented May 26, 2020

Hey @mzabaluev, thanks for spotting this. Can you please enable step debugging, change the key, and re-run the workflow? This will provide some more output so we can see what's going on.

@mzabaluev
Copy link
Author

mzabaluev commented May 27, 2020

@dhadka Here you are

I see that the 'Post Cache cargo registry index' step in the producer job gets to write its cache first:

Resource Url: https://artifactcache.actions.githubusercontent.com/SwZ48ggH2fmptrBol7q4o6DwD74REZ2KvEjuDiE8Jb9dgsMc8M/_apis/artifactcache/caches/58

Then the 'Post Cache cargo registry' step allocates at the same resource:

Resource Url: https://artifactcache.actions.githubusercontent.com/SwZ48ggH2fmptrBol7q4o6DwD74REZ2KvEjuDiE8Jb9dgsMc8M/_apis/artifactcache/caches/59

There is also intra-action state CACHE_KEY which is set by the caching steps prior to the above, but in the opposite order:

2020-05-27T02:11:21.4628234Z ##[debug]Save intra-action state CACHE_KEY = cargo-registry-debug1-39f2b8d68761c35ef5f7317c495a2e84f0aed5d34a5d724b2144d58bf6ed66b3
...
2020-05-27T02:11:22.0139963Z ##[debug]Save intra-action state CACHE_KEY = cargo-index-debug1-5d9c7f87b512b22e0543a050c81e91b650b44aff

A consumer job queries and fails with:

Resource Url: https://artifactcache.actions.githubusercontent.com/SwZ48ggH2fmptrBol7q4o6DwD74REZ2KvEjuDiE8Jb9dgsMc8M/_apis/artifactcache/cache?keys=cargo-registry-debug1-39f2b8d68761c35ef5f7317c495a2e84f0aed5d34a5d724b2144d58bf6ed66b3&version=55646160c9f2cc2a37adf24b94db1ec7546d85f0e81c7dde1c978bdca39a013e

Then in the 'Post Cache cargo registry' step, it saves a new value:

Resource Url: https://artifactcache.actions.githubusercontent.com/SwZ48ggH2fmptrBol7q4o6DwD74REZ2KvEjuDiE8Jb9dgsMc8M/_apis/artifactcache/caches/60

@dhadka
Copy link
Member

dhadka commented May 27, 2020

@mzabaluev I think I see what's going on. @v2 introduced a version field, which is used to detect when the configuration of the cache action changes. This includes what kind of compression it's using (gzip vs zstd) and the path. From what I see, update-deps is using the path

  with:
    path: ~/.cargo/registry
          !~/.cargo/registry/src

whereas the test step is using:

  with:
    path: ~/.cargo/registry

If you make the path identical in both, you should see it behave as expected.

This was intended to prevent a user from changing the path but using the same key, since the existing cache entry (if any) wouldn't have the right content. However, in your particular case where you're just excluding a path, it's less helpful.

@mzabaluev
Copy link
Author

mzabaluev commented May 27, 2020

@dhadka

If you make the path identical in both, you should see it behave as expected.

On Linux and Mac it does, on Windows it doesn't: https://github.com/mzabaluev/github-actions-playground/actions/runs/116588403

I'm guessing that CRLF line endings screw up the version hash.
Consider making the normalization function invariant to whitespace separating the paths, including any line ending characters. The path parameter really could help accepting a list, but I gather that actions/toolkit#184 is preventing that.

@mzabaluev
Copy link
Author

mzabaluev commented May 27, 2020

For more fun with cache key normalization across OSes, see actions/runner#498

@mzabaluev mzabaluev changed the title Cache created by one job not passed to dependent jobs in the same workflow Cache misses due to path-based versioning and its normalization May 27, 2020
aragon999 added a commit to aragon999/gpt that referenced this issue May 29, 2020
aragon999 added a commit to aragon999/gpt that referenced this issue May 29, 2020
@mzabaluev
Copy link
Author

mzabaluev commented Jun 2, 2020

Any updates on this?

I'm trying to decide which actions/cache version is a less buggy fit for our needs, between this and v1 somehow failing to hit the primary key and going for a truncated key it has somehow stored at some point, despite not being given any restore-keys to try prefix searching.

@dhadka
Copy link
Member

dhadka commented Jun 2, 2020

@mzabaluev I don't think this is related to newlines, since both Windows and Linux are reading from the same file, so the line endings wouldn't be changing. Your issue about new lines and hashFiles is valid, however, since the newlines are being changed by Git. Edit: I realized that this is a bit more involved...hopefully this makes sense. You're using the checkout action to clone the Git repo, which will alter the newlines depending on how Git is configured. hashFiles operates on these checked out files, so it would be affected by newlines. However, Actions isn't using the .github/workflows/*.yml file checked out on the VM, it reads it using a different mechanism. So as long as Actions isn't changing newlines when reading that .yml file, it shouldn't be an issue. I put together a PR to fix this (see my next comment), but I still believe your issue is caused by the gzip/zstd bug I mentioned below.

I suspect this is the result of #301. There's an issue with Windows bsd tar and zstd causing it to hang, so it falls back to gzip compression.

@dhadka
Copy link
Member

dhadka commented Jun 2, 2020

@mzabaluev I've created this PR to trim any leading and trailing whitespace from the path and restore keys, which would also remove any leftover Windows newline characters (\r) after the split - #343.

@mzabaluev
Copy link
Author

mzabaluev commented Jun 2, 2020

@dhadka No, the issue reproduced in that test run is not about hashFiles – the hashed file is actually passed as an artefact from a Linux job fixed up specificaly for the line endings, so it's exactly the same on Windows. This is also seen in the logs, as the cache key is exactly the same, but the version query string parameter ends up being different.

@dhadka
Copy link
Member

dhadka commented Jun 2, 2020

@mzabaluev Agreed. I meant to say that #343 will fix any newline differences in the path which would cause a different version hash, but you'll likely still see Windows creating a different cache because it uses a different compression method.

The version hash includes the compression method, but we fall back to using Gzip instead of Zstd on Windows due to bug #301.

@mzabaluev
Copy link
Author

mzabaluev commented Jun 2, 2020

The version hash includes the compression method, but we fall back to using Gzip instead of Zstd on Windows due to bug #301.

I hope in the end state it would be possible to share cache state no matter which runner created it, with the restore action being able to adapt to a possibly different archival format; or maybe this and other tar-related issues will be solved by settling on a single flavor of tar that works across all OSes. The disparate cache entries for Windows and Unix would increase our cache pressure twofold and duplicate the dependency fetching work for Windows jobs without a good reason for it.

@github-actions
Copy link

github-actions bot commented Dec 23, 2021

This issue is stale because it has been open for 365 days with no activity. Leave a comment to avoid closing this issue in 5 days.

@github-actions github-actions bot added the stale label Dec 23, 2021
@mzabaluev
Copy link
Author

mzabaluev commented Dec 23, 2021

Yes, we still need tar in the cache action to be fixed.

@pdotl
Copy link
Contributor

pdotl commented Sep 7, 2022

👋 @mzabaluev we are currently looking into this. For now you could try this approach if it works for you.

@y-nk
Copy link

y-nk commented Nov 9, 2022

I created this repo to illustrate the issue i'm having which is related to this bug.

https://github.com/y-nk/cache-in-container/actions/runs/3427866818

The problem being that an action building outside of a container has a certain path, and so that path cannot be restored once inside a container (bc the context of the container is different)

@Phantsure Phantsure assigned pdotl and Phantsure and unassigned aparna-ravindra Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants