Once your connector is set up, the next step is to select the source tables to ingest data into Kumo.

Connecting a Table

  1. Navigate to Tables in the side menu and click Add Table.

  2. Next, select a Source:

    • Connector – Choose an existing data connector.

    • Local Upload – Upload a local CSV or Parquet file.

Selecting a Connector Source

If you choose Connector as the source type:

  1. Select a connector from the drop-down.

  2. Kumo will load available tables in the Source Table drop-down.

  3. (For Native Databricks users) – Enter the Schema Name to populate the table list.

Column Preprocessing

After selecting a table, you can define column types and preprocessing steps to ensure proper data handling. For details, see Column Preprocessing.

Debugging & Data Validation

To verify a table’s schema and metadata:

  1. Navigate to the Tables page.

  2. Click on the table name to view details.

  3. Click on a column to the see detailes Stats.

This section provides useful insights, such as:

  • Column statistics (e.g., missing values, cardinality, distributions).

  • Sample rows for verification.

Kumo computes this information from a smaller sample of the dataset. Complete statistics are available after full data ingestion during model training.

Handling Missing Data

  • Kumo treats blank entries as missing values.

  • It does not automatically recognize special strings ("NaN", "none", "N/A") as missing.

  • For numeric columns, missing values are often filled with placeholder values (e.g., -1). These should be cleared or left blank if you want them treated as missing.