Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-82511

cache_cron_task causes a failover on redis/sentinel clusters

    • MOODLE_404_STABLE, MOODLE_500_STABLE
    • MDL-82511-cache-cron-MOODLE_405_STABLE
    • MDL-82511-cache-cron-MOODLE_500_STABLE
    • MDL-82511-cache-cron
    • Hide

      Instructions modified from MDL-75864

      Setup

      • Have an instance of redis installed. If you haven't got this set up you can try creating a new instance with docker

        docker run --name redis -p 6379:6379 -d redis:alpine

      • Place the attached createbulksessioncache.php file in the main directory of moodle
      • Add around one million items to a cache. This can be done curling the bulkcreatesessioncache.php file 200 times (in practice there will be more users with less items each, but this should replicate the same conditions for testing)

        for i in {1..200}; do
            curl -s https://moodle.localhost/createbulksessioncache.php & done
        wait

      1. Testing the Redis lib changes.

      1. Navigate to Site Admin > Plugins > Caching > Configuration (/cache/admin.php)
      2. Click on "Add instance" in the "Redis" row to create a new Redis instance.
      3. Enter the name "Redis store" and the the redis server from the setup (may be 127.0.0.1).
      4. Click save.
      5. On the Cache configuration page. Click the "Edit Mappings" link at the bottom of the page.
      6. Set the Session with the "Redis store" option.
      7. Save changes
      8. Navigate to Site Admin > Plugins > Caching > Cache usage (/cache/usage.php)
      9. Confirm there are around 1 million items in core/calendar_categories
      10. Now we need to test with a lower connection timeout. In the constructor of cache/store/redis/lib.php, before $this->redis is set (line 222), hardcode the connectiontimeout to a fraction of a second to override the config (this can't be done from the UI as it only accepts integers).

        $this->connectiontimeout = 0.2;

      11. Refresh the usage page and confirm the page loads. Before this patch it would give "read error on connection to redis:6379".

      2. Testing the task changes.

      1. Remove the hardcoded connectiontimeout that was added to cache/store/redis/lib.php in the previous steps (this is relevant to the task, however the deletion step will exceed the reduced value for this test - in practice this many would not be deleted at the same time).
      2. Navigate to Site Admin > Server > Session handling (admin/settings.php?section=sessionhandling)
      3. Set sessiontimeout to 2 minutes and sessiontimeoutwarning to something lower (i.e. 1 minute).
      4. Save changes
      5. Ensure at least 2 minutes have passed since the curls from the setup finished.
      6. Open CLI and run the scheduled task:

        php admin/cli/scheduled_task.php --execute='\core\task\cache_cron_task'

      7. Confirm the task has trace messages like

        Removed 1000400 old core/calendar_categories sessions from the 'Redis store' cache store

      8. Confirm the task is relatively quick (should be under 10 seconds with the patch, over 2 minutes without).
      Show
      Instructions modified from MDL-75864 Setup Have an instance of redis installed. If you haven't got this set up you can try creating a new instance with docker docker run --name redis -p 6379:6379 -d redis:alpine Place the attached createbulksessioncache.php file in the main directory of moodle Add around one million items to a cache. This can be done curling the bulkcreatesessioncache.php file 200 times (in practice there will be more users with less items each, but this should replicate the same conditions for testing) for i in {1..200}; do curl -s https://moodle.localhost/createbulksessioncache.php & done wait 1. Testing the Redis lib changes. Navigate to Site Admin > Plugins > Caching > Configuration (/cache/admin.php) Click on "Add instance" in the "Redis" row to create a new Redis instance. Enter the name "Redis store" and the the redis server from the setup (may be 127.0.0.1). Click save. On the Cache configuration page. Click the "Edit Mappings" link at the bottom of the page. Set the Session with the "Redis store" option. Save changes Navigate to Site Admin > Plugins > Caching > Cache usage (/cache/usage.php) Confirm there are around 1 million items in core/calendar_categories Now we need to test with a lower connection timeout. In the constructor of cache/store/redis/lib.php, before $this->redis is set (line 222), hardcode the connectiontimeout to a fraction of a second to override the config (this can't be done from the UI as it only accepts integers). $this->connectiontimeout = 0.2; Refresh the usage page and confirm the page loads. Before this patch it would give "read error on connection to redis:6379". 2. Testing the task changes. Remove the hardcoded connectiontimeout that was added to cache/store/redis/lib.php in the previous steps (this is relevant to the task, however the deletion step will exceed the reduced value for this test - in practice this many would not be deleted at the same time). Navigate to Site Admin > Server > Session handling (admin/settings.php?section=sessionhandling) Set sessiontimeout to 2 minutes and sessiontimeoutwarning to something lower (i.e. 1 minute). Save changes Ensure at least 2 minutes have passed since the curls from the setup finished. Open CLI and run the scheduled task: php admin/cli/scheduled_task.php --execute='\core\task\cache_cron_task' Confirm the task has trace messages like Removed 1000400 old core/calendar_categories sessions from the 'Redis store' cache store Confirm the task is relatively quick (should be under 10 seconds with the patch, over 2 minutes without).
    • Hide

      Code verified against automated checks.

      Checked MDL-82511 using repository: https://github.com/bwalkerl/moodle

      More information about this report

      Built on: Tue May 13 02:08:31 UTC 2025

      Show
      Code verified against automated checks. Checked MDL-82511 using repository: https://github.com/bwalkerl/moodle MOODLE_405_STABLE (0 errors / 0 warnings) [branch: MDL-82511-cache-cron-MOODLE_405_STABLE | CI Job ] MOODLE_500_STABLE (0 errors / 0 warnings) [branch: MDL-82511-cache-cron-MOODLE_500_STABLE | CI Job ] main (0 errors / 0 warnings) [branch: MDL-82511-cache-cron | CI Job ] More information about this report Built on: Tue May 13 02:08:31 UTC 2025
    • Show
      Launching automatic jobs for branch MDL-82511 -cache-cron-MOODLE_405_STABLE https://ci.moodle.org/view/Testing/job/DEV.02%20-%20Developer-requested%20PHPUnit/19293/ PHPUnit (sqlsrv) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65921/ Behat (NonJS - boost and classic) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65922/ Behat (Firefox - boost) Launching automatic jobs for branch MDL-82511 -cache-cron-MOODLE_500_STABLE https://ci.moodle.org/view/Testing/job/DEV.02%20-%20Developer-requested%20PHPUnit/19294/ PHPUnit (sqlsrv) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65923/ Behat (NonJS - boost and classic) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65924/ Behat (Firefox - boost) Launching automatic jobs for branch MDL-82511 -cache-cron https://ci.moodle.org/view/Testing/job/DEV.02%20-%20Developer-requested%20PHPUnit/19295/ PHPUnit (sqlsrv) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65925/ Behat (NonJS - boost and classic) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65926/ Behat (Firefox - boost) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65927/ Behat (Firefox - classic) https://ci.moodle.org/view/Testing/job/DEV.01%20-%20Developer-requested%20Behat/65928/ App tests (stable app version) Built on: Tue May 13 02:25:42 UTC 2025

      I'm running a 3 nodes redis + sentinel HA setup, which is used as a cachestore for a quite large Moodle installation (~30k users). The redis master node is reached by each webserver through a local HA Proxy instance, which probe for the master node every seconds.

      When the task cache_cron_task runs, the way the redis cache store is implemented makes the master redis server failover, because the sub-optimal implementation takes a very long time to run. 

      The issue has been first reported to Redis and can be seen here : https://github.com/redis/redis/issues/12968

      They pin pointed the issue in the find_by_prefix implementation (https://github.com/moodle/moodle/blob/main/cache/stores/redis/lib.php#L675 ) and recommend using HSCAN in place of HKEYS.

            benjaminwalker Benjamin Walker
            olivierbeytrison olivierbeytrison
            Brendan Heywood Brendan Heywood
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.