kumoai.connector

The kumoai.connector module provides interfaces for connecting Kumo to backing data sources and inspecting raw table data. SourceTable objects returned by connectors are used to create Table and Graph objects for downstream ML.

Uploading Your Own Data

Kumo supports uploading local tables directly. Files larger than 1 GB are supported via automatic partitioning. Tables must be a single Parquet or CSV file on your local machine. Use FileUploadConnector.upload() to upload and FileUploadConnector.delete() to remove a table.

`FileUploadConnector`

A connector for uploading local files to the Kumo data plane.

`upload()`

Uploads a Parquet or CSV file to Kumo.

name

str

required

The name to assign to the uploaded table.

path

str

required

Local path to the Parquet or CSV file.

auto_partition

bool

default:"True"

Whether to automatically partition files larger than partition_size_mb.

partition_size_mb

int

default:"250"

Target partition size in megabytes when auto_partition is enabled.

Returns SourceTable

`delete()`

Deletes a previously uploaded table.

name

str

required

The name of the uploaded table to delete.

`source_type` `property`

Returns DataSourceType — The source type for this connector.

`name` `property`

Returns str — The connector name.

Connector

Connectors link Kumo to data stored in an external data warehouse or object store. The SDK supports Amazon S3, Snowflake, BigQuery, and Databricks. All connectors inherit the following methods from the base Connector class:

`table_names()`

Returns List[str] — Names of all tables accessible through this connector.

`has_table()`

name

str

required

The table name to check.

Returns bool — True if the table exists in the connector.

`table()`

Returns a SourceTable for the named table. Use the returned object to inspect raw data or construct a Table.

name

str

required

The table name.

Returns SourceTable Raises ValueError if name does not exist in the connector.

`SnowflakeConnector`

Connects Kumo to a Snowflake data warehouse.

`get_by_name()` `classmethod`

Retrieves a SnowflakeConnector by its registered name.

name

str

required

The registered connector name.

Returns SnowflakeConnector

`source_type` `property`

Returns DataSourceType

`name` `property`

Returns str

`BigQueryConnector`

Connects Kumo to a Google BigQuery data warehouse.

`get_by_name()` `classmethod`

name

str

required

The registered connector name.

Returns BigQueryConnector

`source_type` `property`

Returns DataSourceType

`name` `property`

Returns str

`DatabricksConnector`

Connects Kumo to a Databricks data lakehouse.

`get_by_name()` `classmethod`

name

str

required

The registered connector name.

Returns DatabricksConnector

`source_type` `property`

Returns DataSourceType

`name` `property`

Returns str

`S3Connector`

Connects Kumo to an Amazon S3 object store.

`source_type` `property`

Returns DataSourceType

Source Data

Tables accessed from connectors are represented as SourceTable objects, with column metadata represented as SourceColumn objects.

`SourceTable`

A reference to a table stored behind a backing Connector. Use it to inspect raw data before constructing a Table for ML.

connector = kumoai.SnowflakeConnector.get_by_name("my-connector")
src = connector.table("orders")
src.head(10)

name

str

required

The name of the table in the backing connector.

connector

Connector

required

The connector containing this table.

`columns` `property`

Returns List[SourceColumn] — Column metadata for all columns in this table.

`column_dict` `property`

Returns Dict[str, SourceColumn] — Column metadata keyed by column name.

`head()`

Returns the first num_rows rows by reading from the backing connector.

num_rows

int

default:"5"

Number of rows to return. Returns all rows if fewer are available.

Returns pd.DataFrame

`add_llm()`

Enriches the table with LLM-generated embeddings, adding a new embedding column.

model

str

required

The embedding model name (e.g. "text-embedding-3-small").

api_key

str

required

API key for the embedding model provider.

template

str

required

A template string defining which columns to embed (e.g. "{title} {description}").

output_dir

str

required

Output directory path (S3 or local) where the enriched table is written.

output_column_name

str

required

Name of the new embedding column in the output table.

output_table_name

str

required

Name of the output table.

dimensions

int

default:"None"

Embedding dimensionality. Uses the model’s default if not specified.

non_blocking

bool

default:"False"

If True, returns an LLMSourceTableFuture immediately rather than blocking.

Returns Union[SourceTable, LLMSourceTableFuture]

`SourceTableFuture`

Represents an ongoing asynchronous SourceTable generation process.

`result()`

Blocks until complete and returns the SourceTable. Returns SourceTable

`status()`

Returns JobStatus — Current status of the job.

`future()`

Returns concurrent.futures.Future[SourceTable] — The underlying future object.

`LLMSourceTableFuture`

Extends SourceTableFuture with cancellation support for LLM enrichment jobs.

`cancel()`

Cancels the running LLM enrichment job. Returns JobStatus

`SourceColumn`

Metadata for a single column in a SourceTable, including its name and inferred data type.

Predictive Query

Python SDK

Model Plan

Documentation Index

​Uploading Your Own Data

​FileUploadConnector

​upload()

​delete()

​source_type property

​name property

​Connector

​table_names()

​has_table()

​table()

​SnowflakeConnector

​get_by_name() classmethod

​source_type property

​name property

​BigQueryConnector

​get_by_name() classmethod

​source_type property

​name property

​DatabricksConnector

​get_by_name() classmethod

​source_type property

​name property

​S3Connector

​source_type property

​Source Data

​SourceTable

​columns property

​column_dict property

​head()

​add_llm()

​SourceTableFuture

​result()

​status()

​future()

​LLMSourceTableFuture

​cancel()

​SourceColumn

Uploading Your Own Data

`FileUploadConnector`

`upload()`

`delete()`

`source_type` `property`

`name` `property`

Connector

`table_names()`

`has_table()`

`table()`

`SnowflakeConnector`

`get_by_name()` `classmethod`

`source_type` `property`

`name` `property`

`BigQueryConnector`

`get_by_name()` `classmethod`

`source_type` `property`

`name` `property`

`DatabricksConnector`

`get_by_name()` `classmethod`

`source_type` `property`

`name` `property`

`S3Connector`

`source_type` `property`

Source Data

`SourceTable`

`columns` `property`

`column_dict` `property`

`head()`

`add_llm()`

`SourceTableFuture`

`result()`

`status()`

`future()`

`LLMSourceTableFuture`

`cancel()`

`SourceColumn`