Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The kumoai.connector module provides interfaces for connecting Kumo to backing data sources and inspecting raw table data. SourceTable objects returned by connectors are used to create Table and Graph objects for downstream ML.

Uploading Your Own Data

Kumo supports uploading local tables directly. Files larger than 1 GB are supported via automatic partitioning. Tables must be a single Parquet or CSV file on your local machine. Use FileUploadConnector.upload() to upload and FileUploadConnector.delete() to remove a table.

FileUploadConnector

A connector for uploading local files to the Kumo data plane.

upload()

Uploads a Parquet or CSV file to Kumo.
name
str
required
The name to assign to the uploaded table.
path
str
required
Local path to the Parquet or CSV file.
auto_partition
bool
default:"True"
Whether to automatically partition files larger than partition_size_mb.
partition_size_mb
int
default:"250"
Target partition size in megabytes when auto_partition is enabled.
Returns SourceTable

delete()

Deletes a previously uploaded table.
name
str
required
The name of the uploaded table to delete.

source_type property

Returns DataSourceType — The source type for this connector.

name property

Returns str — The connector name.

Connector

Connectors link Kumo to data stored in an external data warehouse or object store. The SDK supports Amazon S3, Snowflake, BigQuery, and Databricks. All connectors inherit the following methods from the base Connector class:

table_names()

Returns List[str] — Names of all tables accessible through this connector.

has_table()

name
str
required
The table name to check.
Returns boolTrue if the table exists in the connector.

table()

Returns a SourceTable for the named table. Use the returned object to inspect raw data or construct a Table.
name
str
required
The table name.
Returns SourceTable Raises ValueError if name does not exist in the connector.

SnowflakeConnector

Connects Kumo to a Snowflake data warehouse.

get_by_name() classmethod

Retrieves a SnowflakeConnector by its registered name.
name
str
required
The registered connector name.
Returns SnowflakeConnector

source_type property

Returns DataSourceType

name property

Returns str

BigQueryConnector

Connects Kumo to a Google BigQuery data warehouse.

get_by_name() classmethod

name
str
required
The registered connector name.
Returns BigQueryConnector

source_type property

Returns DataSourceType

name property

Returns str

DatabricksConnector

Connects Kumo to a Databricks data lakehouse.

get_by_name() classmethod

name
str
required
The registered connector name.
Returns DatabricksConnector

source_type property

Returns DataSourceType

name property

Returns str

S3Connector

Connects Kumo to an Amazon S3 object store.

source_type property

Returns DataSourceType

Source Data

Tables accessed from connectors are represented as SourceTable objects, with column metadata represented as SourceColumn objects.

SourceTable

A reference to a table stored behind a backing Connector. Use it to inspect raw data before constructing a Table for ML.
connector = kumoai.SnowflakeConnector.get_by_name("my-connector")
src = connector.table("orders")
src.head(10)
name
str
required
The name of the table in the backing connector.
connector
Connector
required
The connector containing this table.

columns property

Returns List[SourceColumn] — Column metadata for all columns in this table.

column_dict property

Returns Dict[str, SourceColumn] — Column metadata keyed by column name. Returns the first num_rows rows by reading from the backing connector.
num_rows
int
default:"5"
Number of rows to return. Returns all rows if fewer are available.
Returns pd.DataFrame

add_llm()

Enriches the table with LLM-generated embeddings, adding a new embedding column.
model
str
required
The embedding model name (e.g. "text-embedding-3-small").
api_key
str
required
API key for the embedding model provider.
template
str
required
A template string defining which columns to embed (e.g. "{title} {description}").
output_dir
str
required
Output directory path (S3 or local) where the enriched table is written.
output_column_name
str
required
Name of the new embedding column in the output table.
output_table_name
str
required
Name of the output table.
dimensions
int
default:"None"
Embedding dimensionality. Uses the model’s default if not specified.
non_blocking
bool
default:"False"
If True, returns an LLMSourceTableFuture immediately rather than blocking.
Returns Union[SourceTable, LLMSourceTableFuture]

SourceTableFuture

Represents an ongoing asynchronous SourceTable generation process.

result()

Blocks until complete and returns the SourceTable. Returns SourceTable

status()

Returns JobStatus — Current status of the job.

future()

Returns concurrent.futures.Future[SourceTable] — The underlying future object.

LLMSourceTableFuture

Extends SourceTableFuture with cancellation support for LLM enrichment jobs.

cancel()

Cancels the running LLM enrichment job. Returns JobStatus

SourceColumn

Metadata for a single column in a SourceTable, including its name and inferred data type.