Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-73675

Performance issue with generated SQL in an extra large course

XMLWordPrintable

    • 6

      A client with a large number of users (~760k students) in a single course is observing a significant performance issue with queries of the form:

      SELECT COUNT(eu.id) FROM (SELECT DISTINCT eu1_u.id
      FROM mdl_user eu1_u
      JOIN mdl_user_enrolments ej1_ue ON ej1_ue.userid = eu1_u.id
      JOIN mdl_enrol ej1_e ON (ej1_e.id = ej1_ue.enrolid AND ej1_e.courseid = '11149')
      JOIN (SELECT DISTINCT userid
      FROM mdl_role_assignments
      WHERE contextid IN (1,3,10094,10102,11955,1333860)
      AND roleid IN (5)
      ) ra ON ra.userid = eu1_u.id
      WHERE 1 = 1 AND ej1_ue.status = '0' AND ej1_e.status = '0' AND ej1_ue.timestart < 1638389800 AND (ej1_ue.timeend = 0 OR ej1_ue.timeend > 1638389800) AND eu1_u.deleted = 0 AND eu1_u.id <> '1' AND eu1_u.deleted = 0) eu JOIN mdl_user u ON u.id = eu.id WHERE u.firstname LIKE 'E%' ESCAPE '\\' AND u.lastname LIKE 'W%' ESCAPE '\\'
      

       

      • There are a number of variations of this query taking a long time to complete; the variations mostly appear contained to different values for `WHERE u.firstname LIKE ... AND u.lastname LIKE...`
      • These kinds of queries take a long time to complete, with logged times between 15 minutes and 3+ hours.
      • Due to the long-running nature of this kind of query, the client is experiencing performance issues on their DB server(s) - these queries generally hit reader nodes, as expected.
      • roleid IN (5) refers to the Student role
      • contextid IN (1,3,10094,10102,11955,1333860) can be traced back to a single course (1333860 is the course's context; 11955, 10102, 10094, 1 are the course's containing course categories)
      • It appears these kind of queries are triggered by visiting the Course Completion and Activity Completion reports, with the queries themselves almost certainly originating from lib/completionlib.php::get_num_tracked_users()

      Notes from a DBA working for the client:

      • looking at explain plan, mdl_role_assignments subquery causing most of the load, the DISTINCT clause causing unnecessary deduplication, user list is dereplicated again later in top select, can get rid of find JOIN to mdl_user, not needed for counts
      • can't create new index, index already exist for role, contextid for mdl_role_assignments table
      • the mdl_role_assignments subquery is forced to range scan table, which will net [almost everyone in the system] from the given context values.

      More context from the server logs

      1. Query_time: 849.897175  Lock_time: 0.000195 Rows_sent: 1  Rows_examined: 4273909

      Actions

      For the reports on the following two pages:

      • /report/completion/index.php?course=2
      • /report/progress/index.php?course=2

      We can optimise the generated SQL by:

      • When there are no selected filters only make the call to the database to get the total count once. It’s only needed to get the count twice when filters are applied. This will remove one complete query in a lot of situations.
      • Remove the unnecessary DISTINCT call from the JOIN subquery.
      • Move rearrange the filter conditions, the idea here is that the conditional will fail faster.

      For example:

       

      SELECT COUNT(eu.id) FROM (
      SELECT DISTINCT eu1_u.id
                  FROM m_user eu1_u
                  JOIN m_user_enrolments ej1_ue ON ej1_ue.userid = eu1_u.id
           JOIN m_enrol ej1_e ON (ej1_e.id = ej1_ue.enrolid AND ej1_e.courseid = 2)
           JOIN (
      SELECT DISTINCT userid
                              FROM m_role_assignments
                              WHERE contextid IN (1,3,11)  AND roleid IN (5)
                   ) ra ON ra.userid = eu1_u.id
                   WHERE 1 = 1 AND ej1_ue.status = 0 AND ej1_e.status = 0
                   AND ej1_ue.timestart < 1693525700 
                   AND (ej1_ue.timeend = 0 OR ej1_ue.timeend > 1693525700
                   AND eu1_u.deleted = 0 AND eu1_u.id <> 1 AND eu1_u.deleted = 0
      ) eu 
      JOIN m_user u ON u.id = eu.id;
       
      

      Becomes:

      SELECT COUNT(eu.id)
      FROM (
      SELECT DISTINCT eu1_u.id
      FROM m_user eu1_u
      JOIN m_user_enrolments ej1_ue ON ej1_ue.userid = eu1_u.id
      JOIN m_enrol ej1_e ON (ej1_e.id = ej1_ue.enrolid AND ej1_e.courseid = 2)
      JOIN (
           SELECT userid
           FROM m_role_assignments
           WHERE contextid IN (1,3,11) AND roleid IN (5)
      ) ra ON ra.userid = eu1_u.id
      WHERE ej1_ue.status = 0 AND ej1_e.status = 0 AND eu1_u.deleted = 0
      AND ej1_ue.timestart < 1693525700 AND (ej1_ue.timeend = 0 OR ej1_ue.timeend > 1693525700)
      AND eu1_u.id <> 1
      ) eu
      JOIN m_user u ON u.id = eu.id;
      

      Additionally adding the following index proves some further performance improvement.

      CREATE INDEX idx_m_enrol_status ON m_enrol(status);

      Link to test course backup: https://drive.google.com/file/d/1I_txoP5VK40J5DUMxKZUkLocMahyaw6n/view?usp=sharing

            Unassigned Unassigned
            matt.rice Matt Rice
            Votes:
            4 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.