Lineage and Impact Analysis

The Lineage diagram in Astera helps in tracing the data roots while the Impact diagram helps in identifying where that data is consumed in the data processes.

Overview

Conducting data impact and lineage analysis is helpful in following business scenarios:

  • To get a complete picture of your business data – where it is originating from, how it is transformed during the processing cycle, what target systems are impacted as a result

  • To see how changing a data source will alter the established data processes

  • How a specific data entity is related to other objects in a data model or an ETL flow

  • To identify who has access to data and how the data is transformed at each step

The main purpose of conducting the impact and lineage analysis is to get a bird’s eye view of the journey of your business data, and how (at each level) a specific data entity will impact other entities and objects in your ETL project. When you have this information, you can:

  • Make informed business decisions

  • Do an impact analysis before adding a new data source or altering an existing one

  • Protect and govern your enterprise data

Astera offers to create lineage and impact analysis diagrams at three levels:

  1. Document level lineage

  2. Object level lineage

  3. Field level lineage

In this document, we will see how data lineage and impact feature works in Astera.

Sample Use-Case

Here, we have a project comprising of two subflows, a dataflow, and a workflow document. These flows are interlinked.

Subflows are called in the dataflow through Subflow Transformation object. Subflow_Union performs Union Transformation on incoming data, followed by Subflow_Aggregate that uses Aggregate Transformation to calculate totals of price, quantity and discount based on OrderID. A Data Quality Rule is then applied to treat records with no discount values as errors and the validated data is written to a delimited file.

Workflow orchestrates this dataflow using a Run Dataflow Task object. Based on the criterion defined in the decision object, a notification email is sent to the administrator.

Let’s create Lineage and Impact diagrams for these flows at three levels.

Field Level Lineage

Astera enables users to see data lineage and impact at the field level.

For instance, if you have a formula field in a dataflow, and you’d like to see the lineage (where the data is coming from in that field) and impact (what other fields/objects that field is impacting), you make use of the Field Lineage option available.

In this example, we want to see the field lineage for the TotalUnitPrice field in the Data Quality Rules object.

To see/create a field lineage diagram, right-click on the field and select Show Lineage from the context menu. The Lineage and Impact window will open up at the bottom of the screen.

This window provides a detailed map of the fields in the lineage and the fields in the impact of the TotalUnitPrice field.

If you are unable to see the Lineage and Impact window, go to View > Lineage and Impact or press Ctrl + Alt + O.

Moreover, a new tab will open in Astera, showing you the Lineage and Impact diagram for the TotalUnitPrice field.

In the above screenshot, observe that the representative element for the TotalUnitPrice field is visibly prominent.

The blue links indicate the flow of data inside the TotalUnitPrice field from the fields in the Lineage to the fields in the Impact.

Links in black represent the flow of data of other fields inside the Data Quality Rules object.

Object Level Lineage

When using several different objects in multiple flows within a single project, manually tracing the data roots of a particular object becomes difficult.

Astera makes tracing the data roots of an object simpler by providing the option to create object-level lineage and impact diagrams.

In this example, we want to create a Lineage and Impact diagram for the DataQualityRules_Task (Run Dataflow Task) object used inside the workflow.

To create an object lineage diagram, right-click on the object’s header and select Show Lineage from the context menu.

A detailed map showing objects in the lineage and the objects in the impact of the DataQualityRule_Task object can be seen in the Lineage and Impact window.

A new tab will open in Centerprise, showing you the Lineage and Impact diagram for the DataQualityRules_Task object.

In the above screenshot, see that the representative element for DataQualityRule_Task is visibly prominent to indicate the object under focus.

The source object for the DataQualityRule_Task object is called its Lineage and the target objects are called its Impact.

Document Level Lineage

Once field-level lineage and object-level lineage diagrams are created, users can view document-level lineage.

An Astera project can comprise multiple interlinked flow documents (subflows, dataflows, and workflows). Tracing which flows are linked to a particular flow document in question, becomes easier with a document-level lineage graph.

To see a document-level lineage, go to the Astera tab showing object lineage, right-click on any representative element in the lineage diagram, and select Show Lineage for this Document from the context menu.

In this example, we are creating a document-level lineage for the dataflow. On the same tab, Astera will create the document-level lineage and impact analysis diagram for the dataflow document.

Observe that the lineage includes two subflows and the impact comprises of a workflow shown by their representative elements. These are the flows that the DataQuality.Df dataflow document is interlinked, which means that the data in this dataflow is coming from two subflow documents and further this dataflow is being called to a workflow.

The Lineage and Impact window shows a map of documents in the lineage of DataQuality.Df and the document in its impact under separate headings.

Alternative Method to see Data Lineage

There are two ways a user can access data lineage and impact in Astera. One way is already defined in previous sections of this article.

The other way is directly from the Project Explorer panel.

To view document-level lineage, go to the Project Explorer panel, and right-click on DataQuality.Df node and select Show Lineage from the context menu.

To view object-level lineage, expand the Workflow.Wf node, right-click on the DataQualityRule_Task object and select Show Lineage from the context menu.

To view field-level lineage, expand the DataQuality.Df node then expands the DataQualityRules object, right-click on the TotalUnitPrice field, and select Show Lineage from the context menu.

To check how many lineage graphs have been created by Astera, you can view its output.

To check the lineage output, go to View > Output or press Ctrl + Alt + O.

An Output window opens, displaying the lineage graph output of the project. This window will only show the output once the Show Lineage command has been executed.

Last updated

© Copyright 2023, Astera Software