-
Improvement
-
Resolution: Fixed
-
Critical
-
3.4
-
MOODLE_34_STABLE
-
MOODLE_34_STABLE
-
MDL-59694_master -
At the moment train() and predict() process all site contents without any limit, would be good to limit the amount of analysable elements that can be processed in 1 train() or predict() execution.
We can limit by the number of analysable elements or by time spent on each model. Limiting it by analysable may not always help because it is completely up to the model, the site itself is an analysable element.
I would opt for a "Time limit per model" and I would allow admins to configure the time using a new site setting, something similar although more complex has recently been integrated for search.
During training and prediction what we do is to process analysables one by one and build a dataset file for each of them; at the end we merge them all together and train / get predictions using the machine learning backend. Using the get_analysables method proposed in https://tracker.moodle.org/browse/MDL-59630?focusedCommentId=477070&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-477070 we could iterate through the available analysables in a private base class method and setup the timer there checking the time spent since the start of get_analysable_data after each process_analysable call.
Some extra comments:
- It is an approximate limit not exact because we would wait until the analysable is fully processed and we would later train / get predictions, which also takes some time. This should be explained in the setting description
- There is still 1 case where this time limit will not be that effective, prediction models using the site as a single analysable element. I already commented in the official docs page (around https://docs.moodle.org/dev/Analytics_API#How_many_predictions_for_each_sample.3F) that models iterating through tons of samples at site level should be careful and pay attention to memory usage I think that is enough
- This time limits should not be applied for evaluation processes ($this->options['evaluation'] in the analyser), as we need the whole site dataset
- This issue and
MDL-59630are related because they share the new get_analysables public method need. This new API method would also need an abstract get_analysables in the base class, it would be implemented by sitewide and by by_course analysers; for sitewide analyser would be a 1 array item with the site and for by_course just rename get_courses to get_analysables so people extending those analysers would not need to implement it.
- is blocked by
-
MDL-59630 Analytics tables and files cleanup
-
- Closed
-