Once you’ve initialized the SDK, the first step to working with your data is defining a connector to your source tables. The Kumo SDK supports creating connectors to data on Amazon S3 with aDocumentation Index
Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
S3Connector, Snowflake with a SnowflakeConnector, or Databricks with a DatabricksConnector. Here, we work with data on S3, but equivalent steps can be taken with other supported data warehouses. Connecting multiple tables across multiple connectors is supported (for example, you can use S3 and Snowflake together).
Creating a Connector
Creating a connector to a dataset on S3 is as simple as specifying the root directory of your data:table() method:
Inspecting Source Tables
The tablescustomer_src, transaction_src and stock_src are objects of type SourceTable, which support basic operations to verify the types and raw data you have connected to Kumo. Some examples include viewing a sample of the source data (as a pandas.DataFrame) or viewing the source columns and their data types:
For tables with semantically meaningful text columns, Kumo supports a language model integration that allows for modeling to utilize powerful large language model embeddings, e.g. from OpenAI’s GPT. Please see
add_llm() for more details.Data Transformations
Alongside viewing source table raw data, you can additionally perform data transformations with your own data platform directly alongside the Kumo SDK. For example, withpyspark:
Uploading Local Tables
For local files, you can useupload_table() to upload Parquet or CSV files directly to Kumo. Files >1GB are supported by default through automatic partitioning. Once uploaded, access tables via FileUploadConnector.
name (table name), path (local file path), auto_partition (default True for >1GB files), partition_size_mb (default 250MB).