What Tables Should I Link?
In order to choose which tables to link together, you should think about the main entities in your enterprise and how they relate to/interact with each other, the same way you would design the data model for your data warehouse/BI tooling. To connect table(s) into a graph, at least one table should contain a primary key. For example, in an e-commerce graph, you may have:- Many Shoppers, each of which may have one or more sessions
- Within those sessions would be one or more web interaction events such as Clicks, Add-to-Carts, Purchases, etc. usually done in relation to products
- Products could be sold by different Merchants
- Shoppers (PK = Shopper ID)
- Products (PK = Product ID)
- Merchants (PK = Merchant ID)
- Sessions (Time Column = Session End Time and an ID Column for Shopper)
- Either a single events table or separate tables for clicks, add-to-carts, and purchases (each with Time Column = Time of Event, and where there are ID columns for the relevant shoppers, products, and merchants)
- In the case of frequently changing data for shoppers/products/merchants, you can also include tables for them where each fact represents a snapshot of the changing data at a particular point in time, with appropriate ID columns
-
At a minimum, you would need to include two tables:
- A table with a row per unique customer, with a customer ID Primary Key column.
- A table with a row per historical purchase event, with at least a time column for the time of the purchase, a customer ID FK column linking it to the customer dimension table, and a column for the numeric value of each purchase.