Solution Background and Business Value

Businesses invest heavily in marketing campaigns to generate leads for their sales teams. These campaigns can produce thousands of leads daily, making it impossible for sales teams to follow up on every lead effectively. To maximize conversion rates, it is crucial to prioritize leads that are most likely to convert.

Most companies rely on third-party lead scores, which are often not optimized for their specific business needs and do not leverage all internal data. This results in poor accuracy, sometimes performing no better than random selection.

With Kumo AI, businesses can leverage all internal structured data to train a predictive model that generates highly optimized lead scores, improving sales efficiency and conversion rates.

Data Requirements and Kumo Graph Schema

We start with a core set of tables that capture lead interactions and marketing responses. Over time, additional signals can be incorporated for better predictions.

Core Tables

  1. Leads Table

    • Stores information about each lead.

    • Key attributes:

      • lead_id: Unique identifier for each lead.

      • Optional: Title, industry, location, engagement level.

  2. Triggers Table

    • Tracks lead responses to marketing campaigns.

    • Key attributes:

      • lead_id: Links to a lead.

      • timestamp: Time of the response.

      • Optional: Campaign type, engagement channel.

  3. Events Table

    • Captures lead interactions across various channels.

    • Key attributes:

      • lead_id: Links to a lead.

      • timestamp: Time of the interaction.

      • Optional: Event type (email open, meeting scheduled, purchase made).

Additional Tables (Optional Enhancements)

  • Additional Event Tables: Web logs, CRM activity, chat interactions.

  • Organizations Table: Attributes of companies associated with leads.

  • Sales Rep Table: Information about the sales team members interacting with leads.

Entity Relationship Diagram (ERD)

Predictive Queries

To prioritize leads, we predict whether a lead that recently responded to a marketing campaign will convert in the next N days. The model only trains on leads where the sales team followed up, ensuring it does not learn from biased data.

PREDICT COUNT(events.* WHERE events.type = 'conversion', 0, N, days ) > 0
FOR EACH leads.lead_id
WHERE COUNT(triggers.*, -1, 0, days) > 0
ASSUMING COUNT(events.* WHERE events.source = 'sales', 0, 1, days ) > 0

At prediction time, the model scores all potential leads assuming they will receive outreach from the sales team.

Building models in Kumo SDK

1. Initialize the Kumo SDK

import kumoai as kumo

kumo.init(url="https://<customer_id>.kumoai.cloud/api", api_key=API_KEY)

2. Connect data

connector = kumo.S3Connector("s3://your-dataset-location/")

3. Select tables

leads = kumo.Table.from_source_table(
    source_table=connector.table('leads'),
    primary_key='lead_id',
).infer_metadata()

triggers = kumo.Table.from_source_table(
    source_table=connector.table('triggers'),
    time_column='timestamp',
).infer_metadata()

events = kumo.Table.from_source_table(
    source_table=connector.table('events'),
    time_column='timestamp',
).infer_metadata()

4. Create graph schema

graph = kumo.Graph(
    tables={
        'leads': leads,
        'triggers': triggers,
        'events': events,
    },
    edges=[
        dict(src_table='triggers', fkey='lead_id', dst_table='leads'),
        dict(src_table='events', fkey='lead_id', dst_table='leads'),
    ],
)

graph.validate(verbose=True)

5. Train the model

pquery = kumo.PredictiveQuery(
    graph=graph,
    query="""
    PREDICT COUNT(events.* WHERE events.type = 'conversion', 0, N, days ) > 0
    FOR EACH leads.lead_id
    WHERE COUNT(triggers.*, -1, 0, days) > 0
    ASSUMING COUNT(events.* WHERE events.source = 'sales', 0, 1, days ) > 0
    """
)

pquery.validate(verbose=True)

model_plan = pquery.suggest_model_plan()
trainer = kumo.Trainer(model_plan)
training_job = trainer.fit(
    graph=graph,
    train_table=pquery.generate_training_table(non_blocking=True),
    non_blocking=False,
)
print(f"Training metrics: {training_job.metrics()}")

Deployment Strategy

In production, the model scores leads in a batch process using the latest available data. These scores are stored in the data warehouse and integrated with CRM platforms like Salesforce to guide sales teams in prioritizing outreach.