Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-85637

Suboptimal SQL query on searching for the latest version of a question

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • 4.5.4, 5.0
    • Questions, Quiz

      We've hit an issue with an SQL query performing quite badly while trying to find the latest version of a question.

      In our case the issue happens in a course with less than 400 questions, and the question with most versions have around a 100 versions. In this course the question bank related pages take around 55 seconds to load.

      Tracking the issue we've found the issue is in this subquery used in several places:

       

      qv.version = (SELECT MAX(v.version)
                      FROM {question_versions} v
                      JOIN {question_bank_entries} be ON be.id = v.questionbankentryid
                     WHERE be.id = qbe.id
                       AND (...)
                   )
                                                         )

       

      We've done several tries to improve it's performance:

      • The first try was adding an index to the {question_versions} table's (questionbankentryid,version) columns. It improved the performance but was still very slow (55s --> 35s)
      • The second approach was trying to refactor the query itself; after changing it to something like this

       

      (qv.questionbankentryid,qv.version) IN (SELECT be.id, MAX(v.version)
                                                FROM m_question_bank_entries be 
                                                JOIN m_question_versions v ON be.id = v.questionbankentryid
                                               WHERE (...)
                                               GROUP BY be.id
                                             )

       

      the page load improves greatly (less than 1 second). This is the change applied to all the places we've found this kind of subquery:

      https://github.com/izendegi/moodle/commit/9051a85e7194bd0b9d8f464e08a72e5214797181

      • Then we realised that apparently using more than one column on IN clauses isn't standard SQL and it's not supported on SQL Server :-/.  We've thought about changing the approach (transforming the query to use a WITH clause, for instance), but that's harder to implement because the code isn't the same in all the places.
      • In some places this subquery is used only to count the number of questions of a category (for example here), in that case the query can be rebuilt to avoid that kind of subquery

      So, summarizing:

      • Adding an index to the {question_versions} table's (questionbankentryid,version) columns seems like a good idea, but doesn't solve the problem by itself.
      • Improving that subquery is needed, but I'm not sure about the best DBMS gnostic approach to do it.

            izendegi Iñigo Zendegi Urzelai
            izendegi Iñigo Zendegi Urzelai
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.