Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-22896

bad regular expression in html2text library causes text to go missing from forum emails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • 2.1.5, 2.2.2
    • 1.9.9, 2.1.4, 2.2.1, 2.3
    • Libraries
    • MOODLE_19_STABLE, MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
    • MOODLE_21_STABLE, MOODLE_22_STABLE
    • wip-mdl-22896
    • Easy
    • Hide

      Note: To test this, you should have email working for forum.

      1. set "Email format" as plain text in your profile.
      2. Add a forum post with following text with "Mail now" checked

        Gin & Tonic
        - 2oz gin;
        - 5oz tonic water;
        - 5 cubes of ice;
        - 1 lime wedge.

      3. Run cron /admin/cron.php after 1 min.
      4. Make sure no text is lost.
      Show
      Note: To test this, you should have email working for forum. set "Email format" as plain text in your profile. Add a forum post with following text with "Mail now" checked Gin & Tonic - 2oz gin; - 5oz tonic water; - 5 cubes of ice; - 1 lime wedge. Run cron /admin/cron.php after 1 min. Make sure no text is lost.

      Greetings.. I believe I've found and fixed a bug in the html2text library.

      In /lib/html2text.php...
      ---------------------------
      478 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
      479 $text = preg_replace('/&[^&;]+;/i', '', $text);
      ---------------------------

      That regular expression is too greedy... it matches any sequence of characters that starts with an ampersand and ends with a semicolon.

      We've had numerous reports from users that huge chunks of forum posts are missing from the plain-text emails they receive by subscription.

      The problem occurs when someone happens to include an ampersand in their text, and also a semicolon somewhere. Anything between those two characters is filtered out.

      Here's an example...

      Gin & Tonic

      • 2oz gin;
      • 5oz tonic water;
      • 5 cubes of ice;
      • 1 lime wedge.

      if you ran that through html2text, it would output this..

      Gin

      • 5oz tonic water;
      • 5 cubes of ice;
      • 1 lime wedge.

      The simple fix I am testing now is this:
      479 $text = preg_replace('/&[^&;\s]+;/i', '', $text);

      The additional \s makes sure the match stops on whitespace.

      Best regards,
      -Garret

        1. Help Sessions and general admin.20100827140840.txt.withoutfix
          3 kB
          Troy Williams
        2. Help Sessions and general admin.20100827140938.txt.withfix
          3 kB
          Troy Williams
        3. Help Sessions and general admin.html
          4 kB
          Troy Williams

            rajeshtaneja Rajesh Taneja
            garretg Garret Gengler (Inactive)
            Gerard Caulfield Gerard Caulfield
            Aparup Banerjee Aparup Banerjee
            Ankit Agarwal Ankit Agarwal
            Votes:
            9 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.