Skip to main content
Kumo reads Apache Parquet tables registered in AWS Glue Catalog and loads the underlying data directly from the S3 locations referenced by each table. UI support is coming soon; for now the connector is available via the SDK.
  • Parquet tables only (CSV and other formats are not supported).
  • The S3 paths in your Glue table partitions must cover every file you want ingested.
  • You must grant access to both the Glue Catalog and the S3 buckets that store the Parquet data.

Access Prerequisites

  1. Grant Kumo access to the S3 bucket(s) referenced by your Glue tables using the AWS S3 connector policy. Glue provides metadata, but data still flows from S3.
  2. Add a Glue Catalog resource policy that lets Kumo read the database, tables, and partitions in your account.
  3. Share the AWS region, Glue database name(s), and S3 bucket/prefixes with Kumo so we can configure the connector.

Granting Access

Update your Glue Catalog resource policy by replacing:
  • <region> with your AWS region containing the catalog.
  • <account id> with your AWS account ID (the Glue catalog ID).
  • {% $customerId %} with your company name.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "KumoGlueCatalogReadPermissions",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::926922431314:role/kumo-{% $customerId %}-external-shared-iam-role"
      },
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartitions",
        "glue:GetPartitionIndexes",
        "glue:ListSchemas",
        "glue:ListSchemaVersions",
        "glue:GetSchema"
      ],
      "Resource": [
        "arn:aws:glue:<region>:<account id>:catalog",
        "arn:aws:glue:<region>:<account id>:database/*",
        "arn:aws:glue:<region>:<account id>:schema/*",
        "arn:aws:glue:<region>:<account id>:table/*",
        "arn:aws:glue:<region>:<account id>:registry/*"
      ]
    }
  ]
}

Tables in AWS Lake Formation

If your Glue Catalog tables are governed by AWS Lake Formation, grant the same principal (arn:aws:iam::926922431314:role/kumo-{% $customerId %}-external-shared-iam-role) data location access to each S3 path and table-level SELECT/DESCRIBE permissions so Glue can return partitions and Kumo can read the data.