There are a few steps involved in selecting up to 6 contributing factors:
- Variables importance
We used Random Forest, a popular machine learning method that allows us to rank segments that have high to low impact to changes in a measure. This helps us to narrow down a list of segments that have a higher predictive power and more likely to significantly impact the measure.
We used Cramer's V to measure the association of all pairs of segments. E.g. If you wish to understand impact of sales for grocery (a value in the department column), this method helps us to effectively find other segments that have high associations to department.
- Remove what is already in question
To understand factors that impact sales for grocery (a value in the department column), we would not drill down by department again.
- Very high number of levels
We do not drill down by segments that have more than 5000 levels.
- Single result from the drill down
We identified that Anna produce more relevant results by understanding hierarchical relationships between segments. E.g. If we wish to understand drivers for people who live in Sydney, we will not breakdown by states because it is certain it will always return NSW.