TheDocumentation Index
Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
kumoai.connector module provides interfaces for connecting Kumo to backing data sources and inspecting raw table data. SourceTable objects returned by connectors are used to create Table and Graph objects for downstream ML.
Uploading Your Own Data
Kumo supports uploading local tables directly. Files larger than 1 GB are supported via automatic partitioning. Tables must be a single Parquet or CSV file on your local machine. UseFileUploadConnector.upload() to upload and FileUploadConnector.delete() to remove a table.
FileUploadConnector
A connector for uploading local files to the Kumo data plane.
upload()
Uploads a Parquet or CSV file to Kumo.
The name to assign to the uploaded table.
Local path to the Parquet or CSV file.
Whether to automatically partition files larger than
partition_size_mb.Target partition size in megabytes when
auto_partition is enabled.SourceTable
delete()
Deletes a previously uploaded table.
The name of the uploaded table to delete.
source_type property
Returns DataSourceType — The source type for this connector.
name property
Returns str — The connector name.
Connector
Connectors link Kumo to data stored in an external data warehouse or object store. The SDK supports Amazon S3, Snowflake, BigQuery, and Databricks. All connectors inherit the following methods from the baseConnector class:
table_names()
Returns List[str] — Names of all tables accessible through this connector.
has_table()
The table name to check.
bool — True if the table exists in the connector.
table()
Returns a SourceTable for the named table. Use the returned object to inspect raw data or construct a Table.
The table name.
SourceTable
Raises ValueError if name does not exist in the connector.
SnowflakeConnector
Connects Kumo to a Snowflake data warehouse.
get_by_name() classmethod
Retrieves a SnowflakeConnector by its registered name.
The registered connector name.
SnowflakeConnector
source_type property
Returns DataSourceType
name property
Returns str
BigQueryConnector
Connects Kumo to a Google BigQuery data warehouse.
get_by_name() classmethod
The registered connector name.
BigQueryConnector
source_type property
Returns DataSourceType
name property
Returns str
DatabricksConnector
Connects Kumo to a Databricks data lakehouse.
get_by_name() classmethod
The registered connector name.
DatabricksConnector
source_type property
Returns DataSourceType
name property
Returns str
S3Connector
Connects Kumo to an Amazon S3 object store.
source_type property
Returns DataSourceType
Source Data
Tables accessed from connectors are represented asSourceTable objects, with column metadata represented as SourceColumn objects.
SourceTable
A reference to a table stored behind a backing Connector. Use it to inspect raw data before constructing a Table for ML.
The name of the table in the backing connector.
The connector containing this table.
columns property
Returns List[SourceColumn] — Column metadata for all columns in this table.
column_dict property
Returns Dict[str, SourceColumn] — Column metadata keyed by column name.
head()
Returns the first num_rows rows by reading from the backing connector.
Number of rows to return. Returns all rows if fewer are available.
pd.DataFrame
add_llm()
Enriches the table with LLM-generated embeddings, adding a new embedding column.
The embedding model name (e.g.
"text-embedding-3-small").API key for the embedding model provider.
A template string defining which columns to embed (e.g.
"{title} {description}").Output directory path (S3 or local) where the enriched table is written.
Name of the new embedding column in the output table.
Name of the output table.
Embedding dimensionality. Uses the model’s default if not specified.
If
True, returns an LLMSourceTableFuture immediately rather than blocking.Union[SourceTable, LLMSourceTableFuture]
SourceTableFuture
Represents an ongoing asynchronous SourceTable generation process.
result()
Blocks until complete and returns the SourceTable.
Returns SourceTable
status()
Returns JobStatus — Current status of the job.
future()
Returns concurrent.futures.Future[SourceTable] — The underlying future object.
LLMSourceTableFuture
Extends SourceTableFuture with cancellation support for LLM enrichment jobs.
cancel()
Cancels the running LLM enrichment job.
Returns JobStatus
SourceColumn
Metadata for a single column in a SourceTable, including its name and inferred data type.