LocalTable objects wraps a pandas.DataFrame with metadata about columns, primary keys, and time columns. The semantic types are required metadata, while the primary key and time column are optional. Each table can have at most one primary key and at most one time column, but it can contain many foreign keys (primary keys of other tables).
Dtype and Metadata Inference
When creating aLocalTable, column dtypes and stypes are automatically inferred from the underlying data based on the pandas data type and heuristics. The key metadata that needs to be properly set includes:
- Stypes: Semantic types that determine model processing behavior
- Primary key: Unique identifier for the table (optional but recommended)
- Time column: Temporal column for time-based operations (optional)
LocalTable.infer_metadata() method automates much of this process:
- Primary key detection: Uses heuristics to suggest potential primary keys based on column names, uniqueness, and data patterns
- Time column detection: Identifies columns with temporal data types or time-related naming patterns
Basic Table Creation
Inspecting Table Metadata
What Makes a Good Table
A goodLocalTable should have:
- Clean dtypes: Set proper pandas dtypes at DataFrame level before table creation
- Meaningful stypes: ID columns use
Stype.ID, categorical data usesStype.categorical, text usesStype.text, etc - Unique primary key: Non-null, no duplicates, uniquely identifies each row, preferably stored as integer
- Consistent naming: Foreign keys match their referenced primary key names
- Single time column: One temporal column when temporal data is available