-
Bug
-
Resolution: Fixed
-
Minor
-
4.0.6
We have a plugin which will regularly create and delete the same file in response to user actions (essentially, we create the file when they start editing, delete it when they stop; it contains metadata about the thing being edited, so if you edit the same thing and don't change the metadata it will have the same content).
To summarise:
- Web request 1: deletes file
- Web request 2 (e.g. 10 seconds later): creates identical file
This causes errors in file_system_filedir::add_file_from_string(), code as follows:
if (file_exists($hashfile)) {
|
if (filesize($hashfile) === $filesize) {
|
return array($contenthash, $filesize, false);
|
}
|
if (file_storage::hash_from_path($hashfile) === $contenthash) {
|
// Jackpot! We have a hash collision.
|
mkdir("$this->filedir/jackpot/", $this->dirpermissions, true);
|
copy($hashfile, "$this->filedir/jackpot/{$contenthash}_1");
|
file_put_contents("$this->filedir/jackpot/{$contenthash}_2", $content);
|
throw new file_pool_content_exception($contenthash);
|
}
|
debugging("Replacing invalid content file $contenthash");
|
unlink($hashfile);
|
$newfile = false;
|
}
|
The filesize(), hash_from_path(), and unlink() functions all show PHP warnings indicating that the file doesn't exist (which is indeed the expected current state), but clearly this can only happen if file_exists($hashfile) returns true when it doesn't exist.
Note this is not a PHP stat cache issue, because the file was already deleted in a previous separate web request.
I am unable to reproduce this on my desktop, and the problem also goes away if I use a separate script to show me the contents of the relevant directory and confirm that the file doesn't exist before I run the second script, even if I do it really quickly,
So my conclusion is that this is related to the network file system in use (Amazon EFS). Due to caching at the file system layer, perhaps if the file is deleted on one container, it is possible that another container will return 'true' for the file_exists check. But the later stat check returns false.
Obviously it is unavoidable to have a certain chance of failures in this kind of case, but we could stop this happening so often in this particular case by specifically checking for an error in the filesize() call and, if so, redoing the exists check. It doesn't seem to me like this would do any harm, so I'm going to investigate it.