Data Quality Checklist
- There is a table that describes the label of your predictive task, which can be used to create positive and negative training examples.
- There is enough historical data to learn seasonal patterns —typically 1-2 years, but can be more.
- The data will fit in the Kumo recommended data size limits.
- Any PII or sensitive data can be hashed/obfuscated/removed, at your discresion.
- There is no target leakage, such as columns that are mutated/updated over time. If you have data that changes over time, each row should contain a timestamp containing the date + time at which that information was first known.
- The dataset contains all of the signals that you expect to be predictive from a business perspective.
- The tables can be all linked together with primary/foreign keys.