Raw Vault Entities
Raw Vault primarily consists of 3 main types of conceptual entities, which are further extended to create more types.
Hubs
Satellites
Links
Hubs: Hubs are used to represent business entities in a Data Vault. Hubs store business keys or identifiers of the business entities. They have the same level of granularity and semantic meaning as the source system, i.e., no denormalization of entities.
For example, a Customer object from the source system will be represented by a CustomersHub object in a data vault, and only the business key, CustomerID, will be stored uniquely in this hub.
Links: Link tables are used to store relationships between two or more hubs or business entities. These relationships may exist due to transactions, associations, or hierarchies. It is worth mentioning that all relationships in Data Vault Link tables are modelled as many-to-many relationships.
For example, consider a scenario where we have two source system objects, Employees and Department. The Employees table has a foreign key from the Department table to identify which department the employee is working in.
When converting to a data vault, both tables will be converted into Hub tables, EmployeeHub and DepartmentHub, and the relationship between them will be converted into a Link table, which consists of incoming keys from both Hub tables.
Satellites: Satellites store descriptive attributes or context information of a business entity or relationship. Thus, Satellites are either attached to Hubs to store attributes of a certain Business entity, or they can be attached to a Link table to store all attributes related to that relationship. Satellites store these attributes’ historical information, so that all changes are tracked and stored within the satellite. Satellites do not provide options to store the level of history, unlike Dimensional Models where you have SCDs. Satellites store 100% history and are completely auditable.
Also, depending on either the source system type, or rate of change of attributes, you can split Satellites. Therefore, more than one satellite can be attached to a Hub or Link.
Data Vault Roles in Astera Data Stack
Hub roles:
Hub entities usually have 4 main roles:
Business Key
Business key is the role given to the business object’s unique identifier. This is the primary key of the source system in many cases. For example, CustomerID.
Primary Hash Key
Data Vault does not use the business object’s identifiers to identify records, instead it uses a Hash key as the primary key in Hub. This Hash is calculated based on the Business key. This field is used as the Primary key in the Hub table.
Record Source
This role stores information about where the data is coming from.
Record Date time Stamp
This is the date time of when the business key was first spotted by the Data Vault.
Link roles:
Links have 4 main types of roles:
Hub Hash key
The Hub Hash key role is given to incoming foreign keys from the Hubs between which links are created. For example, a link between Employees and Department will have 2 Hub Hash Keys, one coming in from the Employees Hub and the other from the Department Hub.
Primary Hash
This is the unique identifier of the Link itself. It is calculated by Hashing the concatenated business keys of the Hubs involved in the Link. This field is used as the Primary key of Link table.
Record Source
This role stores information about where the data is coming from.
Record date time stamp
This is the date time of when the relationship was first spotted by the Data Vault.
Links consist of one more role, the Transaction Key role. However, this will be discussed in a separate article.
Satellites roles:
Satellites have 5 main types of roles:
Primary Hash Key
The Primary Hash key role is given to the identifying foreign key from the Parent hub table. This is the Hashed representation of the business key from the source system. This field, along with the Record date time stamp field, is marked as a composite primary key in Satellite tables.
Attributes
The Attribute role is given to fields containing descriptive information. For example, in a Customers Satellite table, you might have a phone number field, that should be given the Attribute role.
Record Hash
The Record hash is stored in the satellite to store the Hashed representation of all concatenated attributes. This is done because Satellites are supposed to track attribute history, and Record Hash helps improve the comparison.
Record source
This role stores information about where the data is coming from.
Record date time stamp
This is the date time of when the set of attributes for the given business key was first spotted by the Data Vault system.
Last updated