Skip to main content
KumoRFM uses two complementary type systems: Data Types (Dtype) for physical storage and Semantic Types (Stype) for semantic meaning.

Data Types (Dtype)

Data types represent how data is physically stored and processed:
from kumoai import Dtype

# Numerical types
Dtype.bool      # Boolean values
Dtype.int       # Integer values
Dtype.float     # Floating point values

# String types
Dtype.string    # Text data
Dtype.binary    # Binary data

# Temporal types
Dtype.date      # Date/timestamp
Dtype.time      # Time values
Dtype.timedelta # Time differences

# List types
Dtype.floatlist   # Lists of floats (embeddings/sequences)
Dtype.intlist     # Lists of integers
Dtype.stringlist  # Lists of strings

Dtype Mapping

When constructing a LocalTable, each pandas dtype is automatically mapped to a corresponding Kumo dtype. While you can access the dtype of individual columns in a LocalTable, you cannot modify it. For data type modifications, modify the underlying pandas.DataFrame instead before creating the table:
import pandas as pd
import kumoai.experimental.rfm as rfm

df = pd.DataFrame({'user_id': [1, 2, 3]})
table = rfm.LocalTable(df, name="users")

print(table["user_id"].dtype)

Semantic Types (Stype)

Semantic types define the meaning of data and determine model column-level data processing within the model:
from kumoai import Stype

# Core semantic types
Stype.numerical        # Numerical values for mathematical operations
Stype.categorical      # Discrete categories with limited cardinality
Stype.multicategorical # Multiple categories in single field
Stype.ID               # Unique identifiers
Stype.text             # Natural language text
Stype.timestamp        # Date/time information
Stype.sequence         # Embeddings or sequential data

Dtype-Stype Compatibility

Not all combinations are valid. Check compatibility using:
# Check if a semantic type supports a data type
stype = Stype.categorical
dtype = Dtype.string

is_compatible = stype.supports_dtype(dtype)
print(f"{stype} supports {dtype}: {is_compatible}")

# Get default semantic type for a data type
default_stype = dtype.default_stype
print(f"Default stype for {dtype}: {default_stype}")

Stype Assignment and Encoding

Semantic types can be modified after table creation, provided they are compatible with the underlying data type. The semantic type determines how values are encoded and processed by the foundation model:
# Valid stype modifications (compatible with underlying dtype)
table['user_id'].stype = 'ID'           # Dtype.int -> Stype.ID
table['category'].stype = 'categorical' # Dtype.string -> Stype.categorical
table['description'].stype = 'text'     # Dtype.string -> Stype.text

# Invalid: integers do not support text semantic types:
table['user_id'].stype = 'text'
For detailed information about how different semantic types affect column preprocessing and encoding, please refer to the Column Preprocessing Guide.