What we want to predict (the target) is something that has not yet happened. To predict we need previous data where previous targets can be calculated. Does not happen the same with indicators, we need to calculate them on prediction, before the target is known. Sounds stupid but it is not trivial.
Examples:
- We want to predict if students will be engaged in a course based on its settings and activities; I can look at finished courses to calculate the target (were these courses' students engaged in the course?) and the indicators (settings and activities) I can predict this straight after the course setup is ready, I don't need to look at the course activity
- We want to predict students at risk of dropping out based on students engagement indicators; I can look at finished courses to calculate the target (which students dropped out of courses) and the indicators (were those students engaged in the course?) I can not predict which students will drop out of the course until they generate activity, because student engagement indicators are based on students activity
So the difference is the indicators that are used. In the examples above the difference is that some indicators depend on activity logs, if we abstract this dependency we can say that they depend on time, they need time to pass to be calculated and in some cases, like students at risk of dropping out, they need to be calculated at different points in time.
It is important to not tie the dependency to activity logs because the real dependency is on things that happen along the course not necessarily linked to activity logs. e.g. Imagine that we want to predict students that will complete a course (course completion) before they reach half of the course duration, the amount of students that have already completed the course may be an indicator, it can be directly extracted from course completion API without having to read logs (they have a much more limited API).
What I am proposing is to add the ability to limit which indicators can be used by a model, so the model can just use a time splitting method like 'no time splitting'.