Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-77149

core_files: Network filesystem (Amazon EFS) can cause warnings

XMLWordPrintable

    • MOODLE_400_STABLE
    • MOODLE_400_STABLE, MOODLE_401_STABLE
    • MDL-77149-m401
    • MDL-77149-master
    • Hide

      I and another developer have done testing on one our test systems with this patch applied and it fixes the problem for us.

      Beyond that there is no easy way to test this actual problem because we don't really want test instructions to depend on an Amazon EFS file system (with whatever specific configuration, who knows) and other than that there isn't an easy way to fake file_exists to return true when the file doesn't exist...

      The modified functions, including the failure (invalid file) states being tweaked here, are thoroughly tested in unit tests (both in file_system_filedir_test and file_storage_test) so I don't think manual testing is required.

      Show
      I and another developer have done testing on one our test systems with this patch applied and it fixes the problem for us. Beyond that there is no easy way to test this actual problem because we don't really want test instructions to depend on an Amazon EFS file system (with whatever specific configuration, who knows) and other than that there isn't an easy way to fake file_exists to return true when the file doesn't exist... The modified functions, including the failure (invalid file) states being tweaked here, are thoroughly tested in unit tests (both in file_system_filedir_test and file_storage_test) so I don't think manual testing is required.

      We have a plugin which will regularly create and delete the same file in response to user actions (essentially, we create the file when they start editing, delete it when they stop; it contains metadata about the thing being edited, so if you edit the same thing and don't change the metadata it will have the same content).

      To summarise:

      • Web request 1: deletes file
      • Web request 2 (e.g. 10 seconds later): creates identical file

      This causes errors in file_system_filedir::add_file_from_string(), code as follows:

              if (file_exists($hashfile)) {
                  if (filesize($hashfile) === $filesize) {
                      return array($contenthash, $filesize, false);
                  }
                  if (file_storage::hash_from_path($hashfile) === $contenthash) {
                      // Jackpot! We have a hash collision.
                      mkdir("$this->filedir/jackpot/", $this->dirpermissions, true);
                      copy($hashfile, "$this->filedir/jackpot/{$contenthash}_1");
                      file_put_contents("$this->filedir/jackpot/{$contenthash}_2", $content);
                      throw new file_pool_content_exception($contenthash);
                  }
                  debugging("Replacing invalid content file $contenthash");
                  unlink($hashfile);
                  $newfile = false;
              }
      

      The filesize(), hash_from_path(), and unlink() functions all show PHP warnings indicating that the file doesn't exist (which is indeed the expected current state), but clearly this can only happen if file_exists($hashfile) returns true when it doesn't exist.

      Note this is not a PHP stat cache issue, because the file was already deleted in a previous separate web request.

      I am unable to reproduce this on my desktop, and the problem also goes away if I use a separate script to show me the contents of the relevant directory and confirm that the file doesn't exist before I run the second script, even if I do it really quickly,

      So my conclusion is that this is related to the network file system in use (Amazon EFS). Due to caching at the file system layer, perhaps if the file is deleted on one container, it is possible that another container will return 'true' for the file_exists check. But the later stat check returns false.

      Obviously it is unavoidable to have a certain chance of failures in this kind of case, but we could stop this happening so often in this particular case by specifically checking for an error in the filesize() call and, if so, redoing the exists check. It doesn't seem to me like this would do any harm, so I'm going to investigate it.

            quen Sam Marshall
            quen Sam Marshall
            Katie Ransom Katie Ransom
            Andrew Lyons Andrew Lyons
            CiBoT CiBoT
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 3 hours
                3h

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.