-
Improvement
-
Resolution: Fixed
-
Minor
-
3.10.7, 4.0
-
MOODLE_310_STABLE, MOODLE_400_STABLE
-
MOODLE_400_STABLE
-
MDL-72837-master -
In order to support the way modinfo works across multiple-layer caches, we need to make an API that allows versioned data in the cache. Currently modinfo cache is kind of versioned because the system checks the 'cacherev' number from the course table, but we need the cache API to be aware of the versioning.
Why the current situation causes a bug
It is possible (and a good idea for performance in a large system) to configure the modinfo cache to have two levels, i.e. a local cache + a shared cache. In general the way this works is that modinfo will normally be loaded from local; if unavailable locally, it will be loaded from shared.
The problem is if the modinfo is available locally but out of date compared to the shared version. In that case, the cache will be rebuilt (even though it has already been built and the shared version is current). In cases where course cache rebuild takes a long time (e.g. 10 seconds) and you have frequent requests to the course and many server instances (e.g. 20) then some requests after a cache rebuild can take up to 200 seconds. (rebuild time * server/container instances, give or take).
In our system we have a front-end server timeout after 60 seconds, so lots of people see error pages briefly, every time somebody edits a course.
To explain how this works, consider a situation with three containers (C1, C2, and C3). Each has a local cache in addition to the global shared cache. Initially, all the containers have a current version (V1) of the modinfo cache for a particular course.
- A user causes the cache to be cleared; their request is handled by C1. This will clear the modinfo cache for the course on C1 local cache and the shared cache.
- A user requests the course; the request is handled by C3. This will get the lock for building the course and start building it, which takes a while (N seconds).
- During this time another two requests come in, from C1 and C2. Both these requests also try to get the lock to build the cache because the cached data is missing (C1) or out of date (C2). They wait for the lock.
- C3 finishes building modinfo cache V2 and saves it to local (C3) and shared cache, then releases the lock. C3's request was answered in about N seconds.
- C1 now gets the lock. It retries requesting from cache (this code is supposed to stop this sort of duplication happening). There is nothing in C1 local cache, so it now requests it from shared cache, which already has V2. C1's request is also answered in about N seconds - no problem.
- C2 now gets the lock. It retries requesting from cache, which uses C2 local cache, which still has V1. As a result, it decides to build the modinfo cache, which takes another N seconds. This request takes 2N seconds to answer because it waited (for database lock and for building the cache) twice...
I'll attach a diagram which may or may not help:
If there were more than 3 containers you can see that multiple containers can be in the same position as C2. They can all be waiting for the database lock, so you can rebuild it as many times as you have containers.
The solution to this is that instead of finding outdated data in the local cache, and then deciding to rebuild course cache, it should instead get the current data from the shared cache. Achieving this requires an API change.
New API
The new API allows for versioned data to be stored in a cache by using set_versioned() and get_versioned() functions instead of set and get.
These functions accept an integer version number. The get function will return a cached value if there is something in the cache with either the requested version, or a higher version. (The result is a cache_version_wrapper object so you can find out the actual version returned if you need it.) It automatically handles retrieving it from higher-level caches if necessary.
After a lot of consideration (and unit tests) I think this system is robust and suitable for use for modinfo. It fixes the problem described above.
Note: In addition to this new API, there are other ways to safely store data in a 'localisable' (multi-layer) cache which are suitable for other situations, for example if the version identifier can't be represented as a monotonically increasing integer. The main one is to incorporate a version identifier into the cache key; in that case, unless we are certain that there will be a strictly limited number of versions between cache clears, TTL should be enabled for the cache or it could grow infinitely large.
- blocks
-
MDL-72991 Regression from partial cache rebuild
-
- Closed
-
- caused a regression
-
MDL-78466 A static cache value returning an empty array will be fetched from the real cache
-
- Closed
-
-
MDL-74020 unit test failures in master branch
-
- Closed
-
-
MDL-74032 One-shot coding error happening on first request to site (versioned-caches)
-
- Closed
-
- has a non-specific relationship to
-
MDL-67020 The coursemodinfo cache item doesn't scale when localized due to global locking
-
- Closed
-
-
MDL-55231 Partial course cache rebuild
-
- Closed
-
- is duplicated by
-
MDL-68456 Create cache localization helper
-
- Closed
-
- will help resolve
-
MDL-73382 Localize htmlpurifier cache using value versions instead of key versions
-
- Open
-