Select Tables
Once your connector is set up, the next step is to select the source tables to ingest data into Kumo.
Connecting a Table
-
Navigate to Tables in the side menu and click Add Table.
-
Next, select a Source:
-
Connector – Choose an existing data connector.
-
Local Upload – Upload a local CSV or Parquet file.
-
Selecting a Connector Source
If you choose Connector as the source type:
-
Select a connector from the drop-down.
-
Kumo will load available tables in the Source Table drop-down.
-
(For Native Databricks users) – Enter the Schema Name to populate the table list.
Column Preprocessing
After selecting a table, you can define column types and preprocessing steps to ensure proper data handling. For details, see Column Preprocessing.
Debugging & Data Validation
To verify a table’s schema and metadata:
-
Navigate to the Tables page.
-
Click on the table name to view details.
-
Click on a column to the see detailes Stats.
This section provides useful insights, such as:
-
Column statistics (e.g., missing values, cardinality, distributions).
-
Sample rows for verification.
Kumo computes this information from a smaller sample of the dataset. Complete statistics are available after full data ingestion during model training.
Handling Missing Data
-
Kumo treats blank entries as missing values.
-
It does not automatically recognize special strings (
"NaN"
,"none"
,"N/A"
) as missing. -
For numeric columns, missing values are often filled with placeholder values (e.g.,
-1
). These should be cleared or left blank if you want them treated as missing.