Astera Data Stack
Version 9
Version 9
  • Welcome to Astera Data Stack Documentation
  • Release Notes
    • Astera 9.0 - Release Notes
  • Setting Up
    • System Requirements
    • Product Architecture
    • Installing Client and Server Applications
    • Connecting to a Astera Server using Lean Client
    • How to Connect to a Different Astera Server from the Lean Client
    • How to Set up a Server Certificate (.pfx) File in a New Environment
    • How to Build a Cluster Database and Create a Repository
    • How to Login from Lean Client
    • Licensing Model in Astera 9
    • User Roles and Access Control
    • Offline Activation of Astera Data Stack
  • Dataflows
    • Sources
      • Data Providers and File Formats Supported in Astera Data Stack
      • Setting Up Sources
      • Excel Workbook Source
      • COBOL File Source
      • Database Table Source
      • Delimited File Source
      • File System Items Source
      • Fixed Length File Source
      • Email Source
      • Report Source
      • SQL Query Source
      • XML/JSON File Source
      • PDF Form Source
    • Transformations
      • Introducing Transformations
      • Aggregate Transformation
      • Constant Value Transformation
      • Denormalize Transformation
      • Distinct Transformation
      • Expression Transformation
      • Filter Transformation
      • Join Transformation
      • List Lookup Transformation
      • Merge Transformation
      • Normalize Transformation
      • Passthru Transformation
      • Reconcile Transformation
      • Route Transformation
      • Sequence Generator
      • Sort Transformation
      • Sources as Transformations
      • Subflow Transformation
      • Switch Transformation
      • Tree Join Transformation
      • Tree Transform
      • Union Transformation
      • Data Cleanse Transformation
      • File Lookup Transformation
      • SQL Statement Lookup
      • Database Lookup
    • Destinations
      • Setting Up Destinations
      • Database Table Destination
      • Delimited File Destination
      • Excel Workbook Destination
      • Fixed Length File Destination
      • SQL Statement Destination
      • XML File Destination
    • Data Logging and Profiling
      • Creating Data Profile
      • Creating Field Profile
      • Data Quality Mode
      • Using Data Quality Rules in Astera
      • Record Level Log
    • Database Write Strategies
      • Data Driven
      • Source Diff Processor
      • Database Diff Processor
      • Dimension Loader - Database Write
    • Text Processors
      • Delimited Parser
      • Delimited Serializer
      • Language Parser
      • Fixed Length Parser
      • Fixed Length Serializer
      • XML/JSON Parser
      • XML JSON Serializer
    • Visualizations
      • Basic Plots
      • Distribution Plots
  • Workflows
    • What are Workflows?
    • Creating Workflows in Astera
    • Decision Task
    • EDI Acknowledgment Task
    • File System Task
    • File Transfer Task
    • OR Task
    • Run Dataflow Task
    • Run Program Task
    • Run SQL File Task
    • Run SQL Script Task
    • Run Workflow Task
    • Send Mail Task
    • Workflows with a Dynamic Destination Path
    • Customizing Workflows With Parameters
    • GPG-Integrated File Decryption in Astera
  • Subflows
    • Using Subflows in Astera
  • DATA MODEL
    • Creating a Data Warehousing Project
    • Data Models
      • Introducing Data Models
      • Opening a New Data Model
      • Data Modeler - UI Walkthrough
      • Reverse Engineering an Existing Database
      • Creating a Data Model from Scratch
      • General Entity Properties
      • Creating and Editing Relationships
      • Forward Engineering
      • Verifying a Data Model
    • Dimensional Modelling
      • Introducing Dimensional Models
      • Converting a Data Model to a Dimensional Model
      • Fact Entities
      • Dimension Entities
      • Date and Time Dimension
      • Verifying a Dimensional Model
    • Documentation
      • Generating Technical and Business Documentation for Data Models
      • Lineage and Impact Analysis
    • Deployment and Usage
      • Deploying a Data Model
      • Validate Metadata and Data Integrity
      • Using Astera Data Models in ETL Pipelines
      • Connecting an Astera Data Model to a Third Party Visualization Tool
  • Functions
    • Introducing Function Transformations
    • Custom Functions
    • Logical
      • Coalesce (Any value1, Any value2)
      • IsNotNull (AnyValue)
      • IsRealNumber (AnyValue)
      • IsValidSqlDate (Date)
      • IsDate (AnyValue)
      • If (Boolean)
      • If (DateTime)
      • If (Double)
      • Exists
      • If (Int64)
      • If (String)
      • IsDate (str, strformat)
      • IsInteger (AnyValue)
      • IsNullOrWhitespace (StringValue)
      • IsNullorEmpty (StringValue)
      • IsNull (AnyValue)
      • IsNumeric (AnyValue)
    • Conversion
      • GetDateComponents (DateWithOffset)
      • ParseDate (Formats, Str)
      • GetDateComponents (Date)
      • HexToInteger (Any Value)
      • ToInteger (Any value)
      • ToDecimal (Any value)
      • ToReal (Any value)
      • ToDate (String dateStr)
      • TryParseDate (String, UnknownDate)
      • ToString (Any value)
      • ToString (DateValue)
      • ToString (Any data, String format)
    • Math
      • Abs (Double)
      • Abs (Decimal)
      • Ceiling (Real)
      • Ceiling(Decimal)
      • Floor (Decimal)
      • Floor (Real)
      • Max (Decimal)
      • Max (Date)
      • Min (Decimal)
      • Min (Date)
      • Max (Real)
      • Max (Integer)
      • Min (Real)
      • Pow (BaseExponent)
      • Min (Integer)
      • RandomReal (Int)
      • Round (Real)
      • Round (Real Integer)
      • Round (Decimal Integer)
      • Round (Decimal)
    • Financial
      • DDB
      • FV
      • IPmt
      • IPmt (FV)
      • Pmt
      • Pmt (FV)
      • PPmt
      • PPmt (FV)
      • PV (FV)
      • Rate
      • Rate (FV)
      • SLN
      • SYD
    • String
      • Center (String)
      • Chr (IntAscii)
      • Asc (String)
      • AddCDATAEnvelope
      • Concatenate (String)
      • ContainsAnyChar (String)
      • Contains (String)
      • Compact (String)
      • Find (Int64)
      • EndsWith (String)
      • FindIntStart (Int32)
      • Extract (String)
      • GetFindCount (Int64)
      • FindLast (Int64)
      • GetDigits (String)
      • GetLineFeed
      • Insert (String)
      • IsAlpha
      • GetToken
      • IndexOf
      • IsBlank
      • IsLower
      • IsUpper
      • IsSubstringOf
      • Length (String)
      • LeftOf (String)
      • Left (String)
      • IsValidName
      • Mid (String)
      • PadLeft
      • Mid (String Chars)
      • LSplit (String)
      • PadRight
      • ReplaceAllSpecialCharsWithSpace
      • RemoveChars (String str, StringCharsToRemove)
      • ReplaceLast
      • RightAlign
      • Reverse
      • Right (String)
      • RSplit (String)
      • SplitStringMultipleRecords
      • SplitStringMultipleRecords (2 Separators)
      • SplitString (3 separators)
      • SplitString
      • SplitStringMultipleRecords (3 Separators)
      • Trim
      • SubString (NoOfChars)
      • StripHtml
      • Trim (Start)
      • TrimExtraMiddleSpace
      • TrimEnd
      • PascalCaseWithSpace (String str)
      • Trim (String str)
      • ToLower(String str)
      • ToProper(String str)
      • ToUpper (String str)
      • Substring (String str, Integer startAt)
      • StartsWith (String str, String value)
      • RemoveAt (String str, Integer startAt, Integer noofChars)
      • Proper (String str)
      • Repeat (String str, Integer count)
      • ReplaceAll (String str, String lookFor, String replaceWith)
      • ReplaceFirst (String str, String lookFor, String replaceWith)
      • RightOf (String str, String lookFor)
      • RemoveChars (String str, String charsToRemove)
      • SplitString (String str, String separator1, String separator2)
    • Date Time
      • AddMinutes (DateTime)
      • AddDays (DateTimeOffset)
      • AddDays (DateTime)
      • AddHours (DateTime)
      • AddSeconds (DateTime)
      • AddMonths (DateTime)
      • AddMonths (DateTimeOffset)
      • AddMinutes (DateTimeOffset)
      • AddSeconds (DateTimeOffset)
      • AddYears (DateTimeOffset)
      • AddYears (DateTime)
      • Age (DateTime)
      • Age (DateTimeOffset)
      • CharToSeconds (Str)
      • DateDifferenceDays (DateTimeOffset)
      • DateDifferenceDays (DateTime)
      • DateDifferenceHours (DateTimeOffset)
      • DateDifferenceHours (DateTime)
      • DateDifferenceMonths (DateTimeOffset)
      • DateDifferenceMonths (DateTime)
      • DatePart (DateTimeOffset)
      • DatePart (DateTime)
      • DateDifferenceYears (DateTimeOffset)
      • DateDifferenceYears (DateTime)
      • Month (DateTime)
      • Month (DateTimeOffset)
      • Now
      • Quarter (DateTime)
      • Quarter (DateTimeOffset)
      • Second (DateTime)
      • Second (DateTimeOffset)
      • SecondsToChar (String)
      • TimeToInteger (DateTime)
      • TimeToInteger (DateTimeOffset)
      • ToDate Date (DateTime)
      • ToDate DateTime (DateTime)
      • ToDateString (DateTime)
      • ToDateTimeOffset-Date (DateTimeOffset)
      • ToDate DateTime (DateTimeOffset)
      • ToDateString (DateTimeOffset)
      • Today
      • ToLocal (DateTime)
      • ToJulianDate (DateTime)
      • ToJulianDayNumber (DateTime)
      • ToTicks (Date dateTime)
      • ToTicks (DateTimeWithOffset dateTime)
      • ToUnixEpoc (Date dateTime)
      • ToUtc (Date dateTime)
      • UnixTimeStampToDateTime (Real unixTimeStamp)
      • UtcNow ()
      • Week (Date dateTime)
      • Week (DateTimeWithOffset dateTime)
      • Year (Date dateTime)
      • Year (DateTimeWithOffset dateTime)
      • DateToJulian (Date dateTime, Integer length)
      • DateTimeOffsetUtcNow ()
      • DateTimeOffsetNow ()
      • Day (DateTimeWithOffset dateTime)
      • Day (Date dateTime)
      • DayOfWeekStr (DateTimeWithOffset dateTime)
      • DayOfWeek (DateTimeWithOffset dateTime)
      • DayOfWeek (Date dateTime)
      • DateToJulian (DateTimeWithOffset dateTime, Integer length)
      • DayOfWeekStr (Date dateTime)
      • FromJulianDate (Real julianDate)
      • DayOfYear (Date dateTime)
      • DaysInMonth(Integer year, Integer month)
      • DayOfYear (DateTimeWithOffset dateTime)
      • FromUnixEpoc
      • FromJulianDayNumber (Integer julianDayNumber)
      • FromTicksUtc(Integer ticks)
      • FromTicksLocal(Integer ticks)
      • Hour (Date dateTime)
      • Hour (DateTimeWithOffset dateTime)
      • Minute (Date dateTime)
      • JulianToDate (String julianDate)
      • Minute (DateTimeWithOffset dateTime)
      • DateToIntegerYYYYMMDD (DateTimeWithOffset dateTime)
      • DateToIntegerYYYYMMDD (Date dateTime)
    • Files
      • AppendTextToFile (String filePath, String text)
      • CopyFile (String sourceFilePath, String destFilePath, Boolean overWrite)
      • CreateDateTime (String filePath)
      • DeleteFile (String filePath)
      • DirectoryExists (String filePath)
      • FileExists (String filePath)
      • FileLength (String filePath)
      • FileLineCount (String filePath)
      • GetDirectory (String filePath)
      • GetEDIFileMetaData (String filePath)
      • GetExcelWorksheets (String excelFilePath)
      • GetFileExtension (String filePath)
      • GetFileInfo (String filePath)
      • GetFileName (String filePath)
      • GetFileNameWithoutExtension (String filePath)
      • LastUpdateDateTime (String filePath)
      • MoveFile (String filePath, String newDirectory)
      • ReadFileBytes (String filePath)
      • ReadFileFirstLine (String filePath)
      • ReadFileText (String filePath)
      • ReadFileText (String filePath, String codePage)
      • WriteBytesToFile (String filePath, ByteArray bytes)
      • WriteTextToFile (String filePath, String text)
    • Date Time With Offset
      • ToDateTimeOffsetFromDateTime (dateTime String)
      • ToUtc (DateTimeWithOffset)
      • ToDateTimeOffsetFromDateTime
      • ToDateTimeOffset (String dateTimeOffsetStr)
      • ToDateTimeFromDateTimeOffset
    • GUID
      • NewGuid
    • Encoding
      • ToBytes
      • FromBytes
      • UrlEncode
      • UrlDecode
    • Regular Expressions
      • ReplaceRegEx
      • ReplaceRegEx (Integer StartAt)
    • TimeSpan
      • Minutes
      • Hours
      • Days
      • Milliseconds
    • Matching
      • Soundex
      • DoubleMetaphone
      • RefinedSoundex
  • Report Model
    • User Guide
      • Report Model Tutorial
    • Report Model Interface
      • Report Options
      • Report Browser
      • Data Regions in Report Models
      • Region Properties Panel
      • Pattern Properties
      • Field Properties Panel
    • Use Cases
      • Applying Pattern to Line
      • Auto Creating Data Regions and Fields
      • Auto Parsing
      • Connecting to Cloud Storage
      • Creating Multi Column Data Regions
      • Defining Region End Type as Specific Text and Regular Expression
      • Defining the Start Position of Data Fields
      • Floating Patterns and Floating Fields
      • How To Work With PDF Scaling Factor in a Report Model
      • Line Count
      • Pattern Count
      • Pattern is a Regular Expression
      • Using Comma Separated Values to Define Start Position
    • Auto Generate Layout (Beta)
      • Setting Up AGL in Astera
      • UI Walkthrough Auto Generation of Layout, Fields and Table
      • Using Auto Generation Layout, Auto Create Fields, and Auto Create Table (Beta)
    • Exporting Options
      • Exporting a Report Model
      • Exporting Report Model to A Dataflow
    • Miscellaneous
      • Importing Monarch Models
      • Microsoft Word and Rich Text Format Support
      • Working With Problematic PDF Files
  • API Flows
    • API Consumption
      • Consume
        • REST Connection
        • Making API Calls with the REST Client Object in Astera
        • REST API Browser
        • Method Operations
        • Pagination
      • Authorize
        • Open APIs - Configuration Details
        • Authorizing Facebook APIs in Astera
        • Authorizing Astera's Server APIs
        • Authorizing Avaza APIs in Astera
        • Authorizing Square API in Astera
        • Authorizing ActiveCampaign API in Astera
        • Authorizing QuickBooks’ API in Astera
        • Accessing Astera's Server APIs Through a Third Party Tool
        • Astera's Server API Documentation
  • Project Management
    • Project Management
      • Deployment
      • Server Monitoring and Job Management
      • Connecting to Source Control
      • Astera Project and Project Explorer
    • Job Scheduling
      • Scheduling Jobs on the Server
      • Job Monitor
  • Use Cases
    • End-to-End Use Cases
      • Data Integration
        • Using Astera Data Stack to Create and Orchestrate an ETL Process for Partner Onboarding
      • Data Warehousing
        • Building a Data Warehouse - A Step By Step Approach
      • Data Extraction
        • Reusing The Extraction Template for Similar Layout Files
  • Connectors
    • Setting Up IBM DB2/iSeries Connectivity in Astera
    • Connecting to SAP HANA Database
    • Connecting to MariaDB Database
    • Connecting to Salesforce Database
    • Connecting to Salesforce - Legacy Database
    • Connecting to Vertica Database
    • Connecting to Snowflake Database
    • Connecting to Amazon Redshift Database
    • Connecting to Amazon Aurora Database
    • Connecting to Google Cloud SQL in Astera
    • Connecting to MySQL Database
    • Connecting to PostgreSQL in Astera
    • Connecting to Netezza Database
    • Connecting to Oracle Database
    • Connecting to Microsoft Azure Databases
    • Connecting to Amazon RDS Databases
  • Miscellaneous
    • Using Dynamic Layout Template Mapping in Astera
    • Synonym Dictionary File
    • SmartMatch Feature
    • Role Based Access Control in Astera
    • Updating Your License in Astera
    • Using Output Variables in Astera
    • Connection Vault
    • Safe Mode
    • Using the Data Source Browser in Astera
    • Pushdown Mode
    • Cloud Deployment
      • Deploying Astera on Microsoft Azure Cloud
      • Deploying Astera on Oracle Cloud
      • Deploying Astera on Amazon Web Services
    • Context Information
  • Best Practices
    • Overview of Cardinality in Data Modeling
    • Cardinality Errors FAQs
    • Astera Best Practices - Dataflows
Powered by GitBook

© Copyright 2025, Astera Software

On this page
  • Overview
  • Dataflow Design
  • Modularity and Reusability
  • Subflows
  • Shared Actions
  • Shared Connections
  • Detached Transformations
  • Performance
  • Input Parameters and Output Variables
  1. Best Practices

Astera Best Practices - Dataflows

PreviousCardinality Errors FAQs

Last updated 11 months ago

Overview

Astera represents a new generation of data integration platforms designed for superior usability, productivity, and performance. Astera has worked closely with customers from a variety of industries, including financial services, healthcare, government, nonprofits, and others to build and continually refine Astera with the specific goal of creating and maintaining complex hierarchical dataflows and workflows.

Over the years, we have assisted customers in successfully deploying Astera in a variety of usage scenarios. The Astera Best Practices Series represents a synthesis of these experiences and each part will cover a specific key area of Astera's technologies and processes. The objective is to assist customers in developing world-class solutions by establishing and communicating these best practices.

Dataflow Design

Dataflows are the cornerstone of any integration project. The Astera Dataflow Designer leads the industry in power and ease of use. Our customers consistently give high marks to the visual interface, drag-and-drop capabilities, instant preview, and full complement of sources, targets, and transformations.

The dataflow best practices outlined herein are intended to facilitate the initial creation and ongoing maintenance of Astera dataflows. Whether you are developing small dataflows with a few steps or large ones comprising scores of steps, these practices will help you manage your integration project more effectively.

Modularity and Reusability

Modularity is a key design principle in the development of integration projects. Modularity enhances the maintainability of your dataflows by making them easier to read and understand. It also promotes reusability by isolating frequently-used logic into individual components that can be leveraged as “black boxes” by other flows.

Astera supports multiple types of reusable components. These components and their usage recommendations are discussed in the following subsections.

Subflows

Subflows are reusable blocks of dataflow steps that have inputs and outputs. Subflows enable users to isolate frequently used logic in reusable components. Once created, subflows can be used just like built-in Astera transformations. Examples of reusable logic that can be housed in subflows include:

  • Validations that are applied to data coming from multiple sources, frequently in incompatible formats

  • Transformation sequences such as a combination of lookup, expression, and function transformations that occur in multiple places in the project

  • Processing of incoming data that arrives in different formats but must be normalized, validated, and boarded

Shared Actions

Shared Actions are similar to subflows but only contain a single action. Shared actions should be employed when a source or destination is used in multiple places within the project. This way, if a field is added or removed from this source, all the projects inherit that change automatically.

Shared Connections

Shared Connections contain database connection information that can be shared by multiple actions within a dataflow. Shared connections can also be used to enforce transaction management across a number of database destinations. Use shared connections whenever multiple actions in a dataflow use the same database connection information.

Detached Transformations

Detached Transformations are a capability within Astera developed for scenarios where a lookup or expression is used in multiple places within a dataflow. Detached Transformations enable you to create a single instance and use it in multiple places.

They are available in expressions as callable functions, enabling you to use them in multiple expressions. Additionally, Detached Transformations allow you to use lookups conditionally.

An example of an expression would be: The same expression needs to be used in an Excel Source's field and also in a Delimited Source's field.

Performance

Astera has been designed as a parallel processing platform to deliver superior speed and performance, so designing dataflows to take advantage of the software’s abilities can significantly affect your data integration performance. The performance best practices discussed here can result in a major performance boost in many common situations.

Data Sources

Frequently, a dataflow can be optimized by some fine-tuning at the data source. Some of the optimization techniques are discussed in this section.

Filtering in Database

When loading data from a database, enter a where clause to filter data at the source. Loading data in Astera and then filtering using the Astera Filter Transformation can significantly degrade performance.

Avoid Mapping Extra Fields

The Database Table Source automatically creates a query to load only the fields that are mapped. To take advantage of this optimization, map only the fields that are used in subsequent actions.

Sorting for Succeeding Joins

The performance of Astera joins improves by orders of magnitude when working with previously-sorted data. Where possible, avoid sorting data in Astera and sort instead in your database query by adding order by clause.

Partitioning

Astera Database and File Sources enable data partitioning, which speeds up reading by breaking a dataset into chunks and reading these chunks in parallel. Use partitioning if you are moving a large data table.

Change Data Capture

If you periodically transfer incremental data changes, consider using one of Astera's change data capture (CDC) patterns to ensure your data is as up-to-the-minute as you need it to be. Astera supports a variety of CDC strategies enabling you to select the appropriate strategy to fit your environment and requirements. Refer to Additional Resources at the end of this document for more information on Astera change data capture.

Joins

Astera Join Transformations enable you to join multiple data sources. Joining often involves sorting incoming data streams, making it the most common reason for performance issues. Here are the practices to keep in mind when using joins:

  • Joining data in Astera is time-consuming. Use Database Joins wherever possible.

  • Sort data in the source where appropriate. Joining sorted data streams is much faster.

  • When joining data from the same database, use the Database Join option in the Join Transformation. When the Database Join option is specified, Astera builds and runs a single query joining multiple tables. This can enhance performance in most situations.

  • Reducing the number of fields in the Join Transformation also improves performance, so be sure to remove any unnecessary fields.

Lookups

Lookups are another source of performance issues in integration jobs. Astera offers several caching techniques to improve the performance of lookups. Experiment with lookup caching options and select the options that work best for specific situations. Some tips are:

  • If you work with a large dataset that does not change often, consider using the new Astera Persistent Lookup Cache, which stores a snapshot of the lookup table on the server’s local drive and uses it in subsequent runs. In situations where the lookup table is updated daily, a snapshot can be taken on the first run after the update and can be used throughout the day to process incremental data.

  • If you work with a large dataset but use only a small fraction of items in a single run, consider using the Cache on First Use option

  • If a lookup is used in multiple places within the same flow, consider using a detached lookup

  • Where appropriate, use a database join instead of a lookup function

Destinations

Astera supports bulk load for popular databases. Where possible, use bulk load for database destinations and experiment with bulk load batch sizes to fine-tune performance. Astera offers a Diff Processor Transformation that can be used to compare an incoming data stream with existing data in the table and apply differences. In certain situations, this transformation can substantially speed up dataflows.

General Guidelines

  • Avoid unnecessary steps. Some steps such as Record Level Log incur major performance overhead and should only be used during development to isolate data issues.

  • The job monitoring window shows the execution time for each step. These are not precise numbers but provide a rough approximation of the time taken by each step and can be used to optimize specific steps.

Input Parameters and Output Variables

Input parameters and output variables make it possible to supply values to dataflows at runtime and return output results from dataflows. Well-designed parameter and output variable structures promote reusability and reduce ongoing maintenance costs.

Input Parameters

Input parameter values can be supplied from a calling workflow using data mapping. When designing dataflows, analyze the values that could change between different runs of the flow and define these values as input parameters. Input parameters can include file paths, database connection information, and other data values.

Output Variables

If you would like to make decisions about subsequent execution paths based on the result of a dataflow run, define output variables and use an expression transformation to store values into output variables. These output variables can be used in subsequent workflow actions to control the execution path.