Kumo reads Apache Parquet tables registered in AWS Glue Catalog and loads the underlying data directly from the S3 locations referenced by each table. UI support is coming soon; for now the connector is available via the SDK.Documentation Index
Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Parquet tables only (CSV and other formats are not supported).
- The S3 paths in your Glue table partitions must cover every file you want ingested.
- You must grant access to both the Glue Catalog and the S3 buckets that store the Parquet data.
Access Prerequisites
- Grant Kumo access to the S3 bucket(s) referenced by your Glue tables using the AWS S3 connector policy. Glue provides metadata, but data still flows from S3.
- Add a Glue Catalog resource policy that lets Kumo read the database, tables, and partitions in your account.
- Share the AWS region, Glue database name(s), and S3 bucket/prefixes with Kumo so we can configure the connector.
Granting Access
Update your Glue Catalog resource policy by replacing:<region>with your AWS region containing the catalog.<account id>with your AWS account ID (the Glue catalog ID).{% $customerId %}with your company name.
Tables in AWS Lake Formation
If your Glue Catalog tables are governed by AWS Lake Formation, grant the same principal (arn:aws:iam::926922431314:role/kumo-{% $customerId %}-external-shared-iam-role) data location access to each S3 path and table-level SELECT/DESCRIBE permissions so Glue can return partitions and Kumo can read the data.