Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-82171

Fix subtle siteidentifier race condition when alternative_cache_factory_class is used

XMLWordPrintable

    • MOODLE_401_STABLE
    • MOODLE_403_STABLE, MOODLE_404_STABLE
    • MDL-82171-muc-sited-MOODLE_403_STABLE
    • MDL-82171-muc-sited-MOODLE_404_STABLE
    • MDL-82171-muc-sited
    • Hide

      1) Terminal A, in the sitedata directory run this and confirm the existence and age of the bootstrap file:

      /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php 
      -rw-rw-rw- 1 root root 376 2024-06-13 09:49:48.448769945 +1000 localcache/bootstrap.php

      2) In terminal B purge the caches

      php admin/cli/purge_caches.php

      3) In terminal A confirm that the shared bootstrap file is instantly created and confirm that the timestamp is updated

      /var/lib/sitedata# ls -la --time-style=full-iso cache/bootstrap.php 
      -rw-rw-rw- 1 root root 376 2024-06-13 09:50:58.231449322 +1000 localcache/bootstrap.php

      4) In terminal A confirm that the local bootstrap file is instantly created and confirm that the timestamp is updated

      /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php 
      -rw-rw-rw- 1 root root 376 2024-06-13 09:50:58.231449322 +1000 localcache/bootstrap.php

      5) Simulate a new front end spawning which doesn't have a local cache yet by deleting the local cache

      /var/lib/sitedata# rm -rf localcache/
      

      5) Do anything which triggers a bootstrap

      php admin/cli/cfg.php --name=version

      6) Confirm that the local bootstrap file is instantly created with no errors and confirm that the timestamp is current

      /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php 
      -rw-rw-rw- 1 root root 376 2024-06-13 09:50:58.231449322 +1000 localcache/bootstrap.php

      7) Simulate a cache purge on one front end propagating to the other local caches. Reset the lastpurged time to a long time ago and then confirm it took effect

      # touch -d 2020-01-01 localcache/.lastpurged
      # ls -la --time-style=full-iso localcache/.lastpurged  -rw-rw-rw- 1 root root 0 2020-01-01 00:00:00.000000000 +1100 localcache/.lastpurged
      

      8) Run MDL-82171-localcache.php which attempts to use the localcache, detects it is stale and purges it:

      php MDL-82171-localcache.php
      

      9) Confirm that the local bootstrap file is instantly created with no errors and confirm that the timestamp is current

      /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php 
      -rw-rw-rw- 1 root root 376 2024-06-13 09:50:58.231449322 +1000 localcache/bootstrap.php

      Show
      1) Terminal A, in the sitedata directory run this and confirm the existence and age of the bootstrap file: /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php  -rw-rw-rw- 1 root root 376 2024 - 06 - 13 09 : 49 : 48.448769945 + 1000 localcache/bootstrap.php 2) In terminal B purge the caches php admin/cli/purge_caches.php 3) In terminal A confirm that the shared bootstrap file is instantly created and confirm that the timestamp is updated /var/lib/sitedata# ls -la --time-style=full-iso cache/bootstrap.php -rw-rw-rw- 1 root root 376 2024 - 06 - 13 09 : 50 : 58.231449322 + 1000 localcache/bootstrap.php 4) In terminal A confirm that the local bootstrap file is instantly created and confirm that the timestamp is updated /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php -rw-rw-rw- 1 root root 376 2024 - 06 - 13 09 : 50 : 58.231449322 + 1000 localcache/bootstrap.php 5) Simulate a new front end spawning which doesn't have a local cache yet by deleting the local cache /var/lib/sitedata# rm -rf localcache/ 5) Do anything which triggers a bootstrap php admin/cli/cfg.php --name=version 6) Confirm that the local bootstrap file is instantly created with no errors and confirm that the timestamp is current /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php -rw-rw-rw- 1 root root 376 2024 - 06 - 13 09 : 50 : 58.231449322 + 1000 localcache/bootstrap.php 7) Simulate a cache purge on one front end propagating to the other local caches. Reset the lastpurged time to a long time ago and then confirm it took effect # touch -d 2020 - 01 - 01 localcache/.lastpurged # ls -la --time-style=full-iso localcache/.lastpurged  -rw-rw-rw- 1 root root 0 2020 - 01 - 01 00 : 00 : 00.000000000 + 1100 localcache/.lastpurged 8) Run MDL-82171 -localcache.php which attempts to use the localcache, detects it is stale and purges it: php MDL- 82171 -localcache.php 9)  Confirm that the local bootstrap file is instantly created with no errors and confirm that the timestamp is current /var/lib/sitedata# ls -la --time-style=full-iso localcache/bootstrap.php -rw-rw-rw- 1 root root 376 2024 - 06 - 13 09 : 50 : 58.231449322 + 1000 localcache/bootstrap.php

      We've found a race condition when using alternative_cache_factory_class on the first read or write of a cache after a purge due to a chicken and egg related to siteidentifier. Honestly pretty surprising we haven't found it for so long. It's very hard and likely impossible to unit test because you are testing the bootstrap itself.

      When moodle bootstraps, it loads:

      1. the dml library
      2. then cache library
      3. open the db connection
      4. reads localcache/bootstrap.php if it exists which stores the siteidentifier, file doesn't exist after a cache purge
      5. runs initialise_cfg to load config including the $CFG->siteidentifier from the database
      6. the DML layer uses MUC to cache metadata (in postgres but not mysql)
      7. MUC needs the siteid in order to configure the cache

      So we have a chicken and egg situation, steps 5-7 needs the site id for the db to work which needs the cache which needs the db. The default implementation in core works around this by using an interim value of 'unknown' for the site id:

      https://github.com/moodle/moodle/blob/master/cache/classes/config.php#L147

      This doesn't actually change anything though and it temporarily uses that siteid and then replaces it later with the real one.

      However when you override the alternative_cache_factory_class we don't seem to have a convenient way to do the same thing (might be wrong on this).

      So proposing two solutions:

      a) when we bootstrap, if we don't have the siteid yet (step 4) then we simply disable the cache factory so the first request warms the bootstrap.php file

      b) maybe there is a better solution which means we can warm in a single request which would be disabling muc temporarily just until that first DB call to mdl_config happens

       

        1. English idioms based on Destination book.docx
          31 kB
          Юрий М
        2. MDL-82171-localcache.php
          0.1 kB
          Brendan Heywood
        3. MDL-82171-test.jpg
          436 kB
          Mikel Martín Corrales

            brendanheywood Brendan Heywood
            brendanheywood Brendan Heywood
            Matthew Hilton Matthew Hilton
            Ilya Tregubov Ilya Tregubov
            Mikel Martín Corrales Mikel Martín Corrales
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 hour, 19 minutes
                1h 19m

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.