Raw Vault primarily consists of 3 main types of conceptual entities, which are further extended to create more types.
Hubs
Satellites
Links
Hubs: Hubs are used to represent business entities in a Data Vault. Hubs store business keys or identifiers of the business entities. They have the same level of granularity and semantic meaning as the source system, i.e., no denormalization of entities.
For example, a Customer object from the source system will be represented by a CustomersHub object in a data vault, and only the business key, CustomerID, will be stored uniquely in this hub.
Links: Link tables are used to store relationships between two or more hubs or business entities. These relationships may exist due to transactions, associations, or hierarchies. It is worth mentioning that all relationships in Data Vault Link tables are modelled as many-to-many relationships.
For example, consider a scenario where we have two source system objects, Employees and Department. The Employees table has a foreign key from the Department table to identify which department the employee is working in.
When converting to a data vault, both tables will be converted into Hub tables, EmployeeHub and DepartmentHub, and the relationship between them will be converted into a Link table, which consists of incoming keys from both Hub tables.
Satellites: Satellites store descriptive attributes or context information of a business entity or relationship. Thus, Satellites are either attached to Hubs to store attributes of a certain Business entity, or they can be attached to a Link table to store all attributes related to that relationship. Satellites store these attributes’ historical information, so that all changes are tracked and stored within the satellite. Satellites do not provide options to store the level of history, unlike Dimensional Models where you have SCDs. Satellites store 100% history and are completely auditable.
Also, depending on either the source system type, or rate of change of attributes, you can split Satellites. Therefore, more than one satellite can be attached to a Hub or Link.
Hub roles:
Hub entities usually have 4 main roles:
Business Key
Business key is the role given to the business object’s unique identifier. This is the primary key of the source system in many cases. For example, CustomerID.
Primary Hash Key
Data Vault does not use the business object’s identifiers to identify records, instead it uses a Hash key as the primary key in Hub. This Hash is calculated based on the Business key. This field is used as the Primary key in the Hub table.
Record Source
This role stores information about where the data is coming from.
Record Date time Stamp
This is the date time of when the business key was first spotted by the Data Vault.
Link roles:
Links have 4 main types of roles:
Hub Hash key
The Hub Hash key role is given to incoming foreign keys from the Hubs between which links are created. For example, a link between Employees and Department will have 2 Hub Hash Keys, one coming in from the Employees Hub and the other from the Department Hub.
Primary Hash
This is the unique identifier of the Link itself. It is calculated by Hashing the concatenated business keys of the Hubs involved in the Link. This field is used as the Primary key of Link table.
Record Source
This role stores information about where the data is coming from.
Record date time stamp
This is the date time of when the relationship was first spotted by the Data Vault.
Links consist of one more role, the Transaction Key role. However, this will be discussed in a separate article.
Satellites roles:
Satellites have 5 main types of roles:
Primary Hash Key
The Primary Hash key role is given to the identifying foreign key from the Parent hub table. This is the Hashed representation of the business key from the source system. This field, along with the Record date time stamp field, is marked as a composite primary key in Satellite tables.
Attributes
The Attribute role is given to fields containing descriptive information. For example, in a Customers Satellite table, you might have a phone number field, that should be given the Attribute role.
Record Hash
The Record hash is stored in the satellite to store the Hashed representation of all concatenated attributes. This is done because Satellites are supposed to track attribute history, and Record Hash helps improve the comparison.
Record source
This role stores information about where the data is coming from.
Record date time stamp
This is the date time of when the set of attributes for the given business key was first spotted by the Data Vault system.
Bridge tables serve as query assistance tables to facilitate the handling of complex many-to-many relationships and hierarchical data structures in a data vault.
Relationships between business objects are represented using Links, which store the associations between Hubs.
However, when querying data from Satellites, for example, the number of joins involving multiple Hubs and Links can increase which can impact query performance negatively.
A bridge table provides a way to navigate relationships in a data vault by becoming an intermediate table that connects multiple Hubs and their associated Links.
It contains hash keys from the Hubs and Links it spans and also stores business keys of the Hubs. The bridge table is populated by querying the relevant Hubs and Links to determine the relationships that need to be represented. Once it has all the relationships, the bridge table reduces the number of joins required to query the Satellites, thereby improving query performance.
To add a bridge table to a data vault, select the Hubs you want the bridge table to cover, then right-click on any one of the selected Hubs and select the Create DV Bridge option from the context menu.
Note: If the selected Hubs are not directly connected to each other through common Links, the bridge table will not be created. This is because bridge tables can only store direct relationships between Hubs.
Once a bridge table is added, it will appear connected to the related Hubs and Links by means of ghost relationships. These are not actual relationships and simply represent a linkage between the bridge table, Hubs, and Links.
To configure the bridge table, right-click the table’s header and select Properties from the context menu or double-click the table header.
A configuration window will open. On the first screen, you can view and edit the table name and schema, but you cannot edit the entity type because doing so would alter the properties of the bridge table, such as entity roles.
Click Next to proceed to the Layout Builder.
In the layout of this bridge table, you will notice a few foreign keys, such as CustomersHub_Customers_Key, OrdersHub_Orders_Key, etc. Each of these foreign keys represents a relationship between the bridge table and the related Hub. For example, the CustomersHub_Customers_Key represents the primary key of Customers Hub.
The Snapshot field represents the snapshot date of the bridge table, and it forms the primary key of the table along with the incoming hash keys from the related Hubs and Links.
Note: You cannot edit the field names and properties of the bridge table as this would cause conflict between the layouts of the bridge table and its related Hubs and Links. This would further result in erroneous loads.
If there is a change in the properties of the Hub or Link related to the bridge table, it is necessary to refresh the layout of the bridge table before forward-engineering it to a database.
To refresh the bridge table layout, right-click the bridge table header and select the Refresh Bridge Table option from the context menu.
You can populate a bridge table by loading its related Links. This can be done through the Data Vault Table Loader object.
To view the bridge table data, right-click the bridge table header and select Database Table -> View Table Data from the context menu.
Note: The bridge table will not be populated if even one of the Links is not loaded. This is because bridge tables incorporate inner joins to load.
Once all the Links are loaded, the data in the bridge table will look like this:
Additionally, when the Links are loading, you can monitor the loading of the bridge table in the Job progress window.
This concludes using bridge tables in Astera Data Stack.
A Data Vault is a hub-and-spoke based data warehouse modeling technique developed by Dan Linstedt. Data Vault was designed to improve data warehousing scalability and flexibility with emphasis on agility in the process and improving the data’s auditability. Thus, enabling a complete audit trail of stored data and making it well suited for large and complex data sets.
There are three layers in a Data Vault: Raw Vault, Business Vault, and Information Vault (Presentation Layer).
The Raw Vault is a non-volatile layer that keeps data in an integrated, function oriented, historical, time variant, and original format where it is easily auditable and transparent. Hard Business rules are applied before loading data into the Raw Vault layer. The Raw Vault layer comprises of entities such as Hubs, Links, and Satellites.
The Business Vault layer contains objects that make querying easier and faster in a Data Vault, while also allowing for easier data loads into the Information Vault layer. It is at this stage where Soft Business rules are also applied. Some business vault entities include Bridge tables, Point-In-Time (PIT) tables, etc. These are loaded using data from the Raw Vault layer, which can be dropped and recreated at any time.
The Information Vault or presentation layer is a subject-oriented, user-friendly layer built on top of Raw and Business Vault layers, making data access easier for reporting, analytics or dashboarding. They are mostly in the form of aggregated denormalized tables, star schemas, or Dimensional models.
Data vault modeling can be time-consuming when dealing with many entities in a data model.
Astera Data Stack's Build Data Vault Model feature swiftly automates the data vault modeling process of your OLTP system.
Using this feature provides users with a good starting point to further model their data vault.
The first step is to Reverse Engineer a database. To learn more about Reverse Engineering in Astera Data Stack, click here.
Once a database has been reversed engineered, the entities will be set as general entities by default. To convert this data model into a data vault, these entities need to be changed according to their attributes, into Hubs, Links, and Satellites.
To use the data vault automation algorithm, select Data Model -> Build Data Vault Model from the Data Model drop-down menu.
Note: Before the Build Data Vault Model option can be implemented, you will be asked to save the data model.
Upon selecting the Build Data Vault Model option, the Select entities screen will prompt you to select entities required to be modeled into a data vault. Entities can also be searched using the search bar, and filtered based on entity name and schema, using the filter option. Once you have selected the entities to be modeled, click OK.
For this use case, we have selected all the entities. Once done, a data vault of these entities will be generated as a separate data model. This allows users to keep the source and data vault model separate. Each data vault entity will automatically have data vault roles assigned to each field, along with appropriate relationships between entities.
The data vault automation feature does not simply create Hubs, Links, and Satellites for every underlying entity, but also models different kinds of relationships, such as self-relationships, many-to-many relationships, unit of work relationships, and multiple relationships between entities.
It can also create Satellites on Hubs and Links based on the primary key or combination of primary keys that are described by the attributes of the underlying entity.
This functionality essentially provides users with a starting point, wherein it gives us a basic vault model that we can make further changes to, based on our specific requirements.
Point-in-Time or PIT tables serve as query assistance tables in data vault modeling to facilitate easier querying of historical data.
Satellite tables store changes in attributes over time along with the effective date of the change, thus ensuring that the entire history of changes is preserved. However, this can make querying historical data complex, especially when dealing with multiple Satellite tables.
A PIT table is created for a specific Hub or Link and their associated Satellites. It provides a snapshot of the data at a specific point in time by consolidating loaded dates from Satellite tables, essentially creating a flattened, denormalized view of the data. It contains loaded dates and corresponding hash keys from each Satellite that corresponds best with the snapshot date. The PIT table is populated by querying Satellite tables and determining the state of each attribute at the specified point in time.
To add a PIT table to a data vault, right-click on the Hub or Link consisting of more than one Satellites and select the Create Point-in-Time entity option from the context menu.
If there are less than two Satellites linked to a Hub or Link, the Create Point-in-Time entity option will be disabled.
Once a PIT table is added to an entity, it will appear automatically connected to the related entity and Satellites via ghost relationships. These relationships are indicated by blue, dash-dotted lines. These are not actual relationships and simply represent a linkage between the PIT table, Satellites, and the Hub or Link.
To configure the PIT entity’s properties, either double-click the entity header, or right-click the entity header and select Properties from the context menu.
The PIT entity’s properties window provides the same options as it does for a general entity. On the Entity Properties screen, you can view and edit the table name and schema, but you cannot edit the entity type as doing so would alter the properties of the PIT table, such as entity roles
To proceed to the Layout Builder, Click Next.
In the layout of this PIT table, you will notice a few foreign keys, such as Employees_Key, EmployeesSatellite1_LoadedDate, etc. Each of these foreign keys represents a relationship between the PIT table, one of the Satellite tables, and the related Hub or Link.
For example, the Employees_Key represents the primary key of Employees Hub, the EmployeesSatellite1_LoadedDate represents one of the primary keys of EmployeesSatellite1 Satellite, etc.
The LoadedDate field represents a snapshot date of the PIT table, and it forms the primary key of the table along with the hash key related to the Hub or Link.
Note: You cannot edit field names and properties of the PIT table as this would create conflict between the layout of the PIT table and its related Satellites and Hub/Link which would result in erroneous loads.
If there is a change in the properties of the Hub or Link related to the PIT table, it is necessary to refresh the layout of the PIT table before forward engineering it to a database.
To refresh the PIT table layout, right-click the PIT table and select the Refresh satellite fields option from the context menu.
You can populate a PIT table using the DVTableLoad object and loading its related Satellites.
To view the PIT table data, right-click the PIT table and select Database Table -> View Table Data from the context menu. If one of the Satellites is not loaded, the PIT table stores NULL values for the loaded date column of that Satellite.
Once all the Satellites are loaded and the PIT table is completely populated, the data would look like this:
Additionally, when loading the Satellites, you can monitor the loading of the PIT table from the Job Progress window.
This concludes working with PIT tables in Astera Data Stack.