Amazon Web Services has the capability to use a technology OpsWorks.
When you deploy PHP applications via the default OpsWorks chef script, it will deploy in a fashion similar;
Node 1 -> /srv/www/moodle/20131231001123/
Node 2 -> /srv/www/moodle/20131231001127/
These folder names are based on the deployment time to that server. Which will be a few seconds per server as they are not deployed at the same instant.
These real folders are then symlinked to current.
All Nodes -> /srv/www/moodle/current/
Moodles plugin cache uses the real folder name and does not expect it to be different between frontend nodes. Technically there is no server requirement for different nodes to be exactly the same as the above configuration shows.
The crashes that happen are;
Dec 24 11:06:11 www3 logger: [Tue Dec 24 11:06:11 2013] [error] [client 182.255.102.140] Default exception handler: Coding error detected, it must be fixed by a programmer:
|
Request for an unknown renderer class block_settings_renderer
|
Debug: \n
|
Error code: codingerror\n
|
* line 290 of /lib/outputfactories.php: coding_exception thrown\n
|
* line 1402 of /lib/outputlib.php: call to theme_overridden_renderer_factory->get_renderer()\n
|
* line 771 of /lib/pagelib.php: call to theme_config->get_renderer()\n
|
* line 141 of /blocks/settings/block_settings.php: call to moodle_page->get_renderer()\n
|
* line 292 of /blocks/moodleblock.class.php: call to block_settings->get_content()\n
|
* line 238 of /blocks/moodleblock.class.php: call to block_base->formatted_contents()\n
|
* line 951 of /lib/blocklib.php: call to block_base->get_content_for_output()\n
|
* line 1003 of /lib/blocklib.php: call to block_manager->create_block_contents()\n
|
* line 353 of /lib/blocklib.php: call to block_manager->ensure_content_created()\n
|
* line 31 of /theme/vet/layout/header.php: call to b
|
And only happen intermittently. It is either when you change nodes or when you hit a race condition with multiple accesses from different nodes. Even though I cannot prove which it is, the race condition appears to be the issue as the frequency we are seeing the error does not match up with the amount it should happen if every request was failing.