-
Bug
-
Resolution: Fixed
-
Blocker
-
1.9.9, 2.1.4, 2.2.1, 2.3
-
MOODLE_19_STABLE, MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
-
MOODLE_21_STABLE, MOODLE_22_STABLE
-
wip-mdl-22896
-
Easy
-
Greetings.. I believe I've found and fixed a bug in the html2text library.
In /lib/html2text.php...
---------------------------
478 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
479 $text = preg_replace('/&[^&;]+;/i', '', $text);
---------------------------
That regular expression is too greedy... it matches any sequence of characters that starts with an ampersand and ends with a semicolon.
We've had numerous reports from users that huge chunks of forum posts are missing from the plain-text emails they receive by subscription.
The problem occurs when someone happens to include an ampersand in their text, and also a semicolon somewhere. Anything between those two characters is filtered out.
Here's an example...
Gin & Tonic
- 2oz gin;
- 5oz tonic water;
- 5 cubes of ice;
- 1 lime wedge.
if you ran that through html2text, it would output this..
Gin
- 5oz tonic water;
- 5 cubes of ice;
- 1 lime wedge.
The simple fix I am testing now is this:
479 $text = preg_replace('/&[^&;\s]+;/i', '', $text);
The additional \s makes sure the match stops on whitespace.
Best regards,
-Garret