Astera Data Stack
Version 8
Version 8
  • Welcome to Astera Data Stack Documentation
  • Release Notes
    • Astera 8.0 - What's New, What's Fixed, and What's Improved
    • Astera 8.0 - Known Issues
    • Astera 8.1 - Release Notes
    • Astera 8.2 Release Notes
    • Astera 8.3 Release Notes
    • Astera 8.4 Release Notes
    • Astera 8.5 Release Notes
  • Getting Started
    • Astera 8 - Important Considerations
    • Astera 8 - System Requirements
    • Configuring the Server
    • Connecting to a Different Astera Server from the Lean Client
    • Connecting to an Astera Server using Lean Client
    • How to Build a Cluster Database and Create a Repository
    • How to Login from Lean Client
    • Setting up a Server Certificate (.pfx) File in a New Environment
    • Installing Client and Server Applications
    • Licensing Model in Astera 8
    • Migrating from Astera 7.x to Astera 8
    • UI Walkthrough - Astera 8.0
    • User Roles and Access Control
  • Dataflows
    • Sources
      • Data Providers and File Formats Supported in Astera
      • Setting Up Sources
      • COBOL File Source
      • Database Table Source
      • Data Model Query Source
      • Delimited File Source
      • Email Source
      • Excel Workbook Source
      • File Systems Item Source
      • Fixed Length File Source
      • PDF Form Source
      • Report Source
      • SQL Query Source
      • XML/JSON File Source
    • Transformations
      • Introducing Transformations
      • Aggregate Transformation
      • Constant Value Transformation
      • Data Cleanse Transformation
      • Denormalize Transformation
      • Distinct Transformation
      • Database Lookup Transformation
      • Expression Transformation
      • File Lookup Transformation
      • Filter Transformation
      • Join Transformation
      • List Lookup Transformation
      • Merge Transformation
      • Normalize Transformation
      • Passthru Transformation
      • Reconcile Transformation
      • Route Transformation
      • Sequence Generator Transformation
      • Sort Transformation
      • Sources as Transformations
      • Subflow Transformation
      • SQL Statement Lookup Transformation
      • Switch Transformation
      • Tree Join Transformation
      • Tree Transform
      • Union Transformation
    • Destinations
      • Setting Up Destinations
      • Database Table Destination
      • Delimited File Destination
      • Excel Workbook Destination
      • Fixed Length File Destination
      • SQL Statement Destination
      • XML/JSON File Destination
    • Data Logging and Profiling
      • Creating Data Profile
      • Creating Field Profile
      • Data Quality Mode
      • Record Level Log
      • Using Data Quality Rules in Astera
    • Database Write Strategies
      • Database Diff Processor
      • Data Driven Write Strategy
      • Dimension Loader - Database Write
      • Source Diff Processor
    • Text Processors
      • Delimited Parser
      • Delimited Serializer
      • Fixed Length Parser
      • Fixed Length Serializer
      • Language Parser
      • XML JSON Parser
      • XML JSON Serializer
    • Data Warehouse
      • Fact Table Loader
      • Dimension Table Loader
  • WORKFLOWS
    • What Are Workflows?
    • Using the Workflow Designer
    • Creating Workflows in Astera
    • Decision Task
    • EDI Acknowledgement Task
    • File System Task
    • File Transfer Task
    • OR Task
    • Run Dataflow Task
    • Run Program Task
    • Run SQL File Task
    • Run SQL Script Task
    • Run Workflow Task
    • Send Mail Task
    • Workflows with a Dynamic Destination Path
    • Customizing Workflows with Parameters
    • GPG-Integrated File Decryption in Astera
  • Subflows
    • Using Subflows in Astera
  • Functions
    • Introducing Function Transformations
    • Custom Functions
    • Logical
      • Coalesce (Any value1, Any value2)
      • IsNotNull (AnyValue)
      • IsRealNumber (AnyValue)
      • IsValidSqlDate (Date)
      • IsDate (AnyValue)
      • If (Boolean)
      • If (DateTime)
      • If (Double)
      • Exists
      • If (Int64)
      • If (String)
      • IsDate (str, strformat)
      • IsInteger (AnyValue)
      • IsNullOrWhitespace (StringValue)
      • IsNullorEmpty (StringValue)
      • IsNull (AnyValue)
      • IsNumeric (AnyValue)
    • Conversion
      • GetDateComponents (DateWithOffset)
      • ParseDate (Formats, Str)
      • GetDateComponents (Date)
      • HexToInteger (Any Value)
      • ToInteger (Any value)
      • ToDecimal (Any value)
      • ToReal (Any value)
      • ToDate (String dateStr)
      • TryParseDate (String, UnknownDate)
      • ToString (Any value)
      • ToString (DateValue)
      • ToString (Any data, String format)
    • Math
      • Abs (Double)
      • Abs (Decimal)
      • Ceiling (Real)
      • Ceiling(Decimal)
      • Floor (Decimal)
      • Floor (Real)
      • Max (Decimal)
      • Max (Date)
      • Min (Decimal)
      • Min (Date)
      • Max (Real)
      • Max (Integer)
      • Min (Real)
      • Pow (BaseExponent)
      • Min (Integer)
      • RandomReal (Int)
      • Round (Real)
      • Round (Real Integer)
      • Round (Decimal Integer)
      • Round (Decimal)
    • Financial
      • DDB
      • FV
      • IPmt
      • IPmt (FV)
      • Pmt
      • Pmt (FV)
      • PPmt
      • PPmt (FV)
      • PV (FV)
      • Rate
      • Rate (FV)
      • SLN
      • SYD
    • String
      • Center (String)
      • Chr (IntAscii)
      • Asc (String)
      • AddCDATAEnvelope
      • Concatenate (String)
      • ContainsAnyChar (String)
      • Contains (String)
      • Compact (String)
      • Find (Int64)
      • EndsWith (String)
      • FindIntStart (Int32)
      • Extract (String)
      • GetFindCount (Int64)
      • FindLast (Int64)
      • GetDigits (String)
      • GetLineFeed
      • Insert (String)
      • IsAlpha
      • GetToken
      • IndexOf
      • IsBlank
      • IsLower
      • IsUpper
      • IsSubstringOf
      • Length (String)
      • LeftOf (String)
      • Left (String)
      • IsValidName
      • Mid (String)
      • PadLeft
      • Mid (String Chars)
      • LSplit (String)
      • PadRight
      • ReplaceAllSpecialCharsWithSpace
      • RemoveChars (String str, StringCharsToRemove)
      • ReplaceLast
      • RightAlign
      • Reverse
      • Right (String)
      • RSplit (String)
      • SplitStringMultipleRecords
      • SplitStringMultipleRecords (2 Separators)
      • SplitString (3 separators)
      • SplitString
      • SplitStringMultipleRecords (3 Separators)
      • Trim
      • SubString (NoOfChars)
      • StripHtml
      • Trim (Start)
      • TrimExtraMiddleSpace
      • TrimEnd
      • PascalCaseWithSpace (String str)
      • Trim (String str)
      • ToLower(String str)
      • ToProper(String str)
      • ToUpper (String str)
      • Substring (String str, Integer startAt)
      • StartsWith (String str, String value)
      • RemoveAt (String str, Integer startAt, Integer noofChars)
      • Proper (String str)
      • Repeat (String str, Integer count)
      • ReplaceAll (String str, String lookFor, String replaceWith)
      • ReplaceFirst (String str, String lookFor, String replaceWith)
      • RightOf (String str, String lookFor)
      • RemoveChars (String str, String charsToRemove)
      • SplitString (String str, String separator1, String separator2)
    • Date Time
      • AddMinutes (DateTime)
      • AddDays (DateTimeOffset)
      • AddDays (DateTime)
      • AddHours (DateTime)
      • AddSeconds (DateTime)
      • AddMonths (DateTime)
      • AddMonths (DateTimeOffset)
      • AddMinutes (DateTimeOffset)
      • AddSeconds (DateTimeOffset)
      • AddYears (DateTimeOffset)
      • AddYears (DateTime)
      • Age (DateTime)
      • Age (DateTimeOffset)
      • CharToSeconds (Str)
      • DateDifferenceDays (DateTimeOffset)
      • DateDifferenceDays (DateTime)
      • DateDifferenceHours (DateTimeOffset)
      • DateDifferenceHours (DateTime)
      • DateDifferenceMonths (DateTimeOffset)
      • DateDifferenceMonths (DateTime)
      • DatePart (DateTimeOffset)
      • DatePart (DateTime)
      • DateDifferenceYears (DateTimeOffset)
      • DateDifferenceYears (DateTime)
      • Month (DateTime)
      • Month (DateTimeOffset)
      • Now
      • Quarter (DateTime)
      • Quarter (DateTimeOffset)
      • Second (DateTime)
      • Second (DateTimeOffset)
      • SecondsToChar (String)
      • TimeToInteger (DateTime)
      • TimeToInteger (DateTimeOffset)
      • ToDate Date (DateTime)
      • ToDate DateTime (DateTime)
      • ToDateString (DateTime)
      • ToDateTimeOffset-Date (DateTimeOffset)
      • ToDate DateTime (DateTimeOffset)
      • ToDateString (DateTimeOffset)
      • Today
      • ToLocal (DateTime)
      • ToJulianDate (DateTime)
      • ToJulianDayNumber (DateTime)
      • ToTicks (Date dateTime)
      • ToTicks (DateTimeWithOffset dateTime)
      • ToUnixEpoc (Date dateTime)
      • ToUtc (Date dateTime)
      • UnixTimeStampToDateTime (Real unixTimeStamp)
      • UtcNow ()
      • Week (Date dateTime)
      • Week (DateTimeWithOffset dateTime)
      • Year (Date dateTime)
      • Year (DateTimeWithOffset dateTime)
      • DateToJulian (Date dateTime, Integer length)
      • DateTimeOffsetUtcNow ()
      • DateTimeOffsetNow ()
      • Day (DateTimeWithOffset dateTime)
      • Day (Date dateTime)
      • DayOfWeekStr (DateTimeWithOffset dateTime)
      • DayOfWeek (DateTimeWithOffset dateTime)
      • DayOfWeek (Date dateTime)
      • DateToJulian (DateTimeWithOffset dateTime, Integer length)
      • DayOfWeekStr (Date dateTime)
      • FromJulianDate (Real julianDate)
      • DayOfYear (Date dateTime)
      • DaysInMonth(Integer year, Integer month)
      • DayOfYear (DateTimeWithOffset dateTime)
      • FromUnixEpoc
      • FromJulianDayNumber (Integer julianDayNumber)
      • FromTicksUtc(Integer ticks)
      • FromTicksLocal(Integer ticks)
      • Hour (Date dateTime)
      • Hour (DateTimeWithOffset dateTime)
      • Minute (Date dateTime)
      • JulianToDate (String julianDate)
      • Minute (DateTimeWithOffset dateTime)
      • DateToIntegerYYYYMMDD (DateTimeWithOffset dateTime)
      • DateToIntegerYYYYMMDD (Date dateTime)
    • Files
      • AppendTextToFile (String filePath, String text)
      • CopyFile (String sourceFilePath, String destFilePath, Boolean overWrite)
      • CreateDateTime (String filePath)
      • DeleteFile (String filePath)
      • DirectoryExists (String filePath)
      • FileExists (String filePath)
      • FileLength (String filePath)
      • FileLineCount (String filePath)
      • GetDirectory (String filePath)
      • GetEDIFileMetaData (String filePath)
      • GetExcelWorksheets (String excelFilePath)
      • GetFileExtension (String filePath)
      • GetFileInfo (String filePath)
      • GetFileName (String filePath)
      • GetFileNameWithoutExtension (String filePath)
      • LastUpdateDateTime (String filePath)
      • MoveFile (String filePath, String newDirectory)
      • ReadFileBytes (String filePath)
      • ReadFileFirstLine (String filePath)
      • ReadFileText (String filePath)
      • ReadFileText (String filePath, String codePage)
      • WriteBytesToFile (String filePath, ByteArray bytes)
      • WriteTextToFile (String filePath, String text)
    • Date Time With Offset
      • ToDateTimeOffsetFromDateTime (dateTime String)
      • ToUtc (DateTimeWithOffset)
      • ToDateTimeOffsetFromDateTime
      • ToDateTimeOffset (String dateTimeOffsetStr)
      • ToDateTimeFromDateTimeOffset
    • GUID
      • NewGuid
    • Encoding
      • ToBytes
      • FromBytes
      • UrlEncode
      • UrlDecode
    • Regular Expressions
      • ReplaceRegEx
      • ReplaceRegEx (Integer StartAt)
    • TimeSpan
      • Minutes
      • Hours
      • Days
      • Milliseconds
    • Matching
      • Soundex
      • DoubleMetaphone
      • RefinedSoundex
  • Report Model
    • User Guide
      • Report Model Tutorial
    • Report Model Interface
      • Field Properties Panel
      • Region Properties Panel
      • Report Browser
      • Report Options
    • Use Cases
      • Applying Pattern to Line
      • Auto Creating Data Regions and Fields
      • Auto-Parsing
      • Creating Multi-Column Data Regions
      • Floating Patterns and Floating Fields
      • How To Work With PDF Scaling Factor in a Report Model
      • Line Count
      • Pattern Count
      • Pattern is a Regular Expression
    • Exporting Options
      • Exporting a Report Model
      • Exporting Report Model to a Dataflow
    • Miscellaneous
      • Importing Monarch Models
      • Microsoft Word and Rich Text Format Support
      • Working With Problematic PDF Files
  • API Flows
    • API Consumption
      • Consume
        • REST API Browser
        • Making HTTP Requests Through REST API Browser
        • Using REST Client Outside of the Scope of the Project
      • Authorize
        • Authorizing ActiveCampaign API in Astera
        • Authorizing Astera Server APIs
        • Authorizing Avaza APIs in Astera
        • Authorizing Facebook APIs in Astera
        • Authorizing QuickBooks API in Astera
        • Authorizing Square API in Astera
        • Open APIs - Configuration Details
  • Project Management
    • Project Management
      • Astera's Project Explorer
      • Connecting to Source Control
      • Deployment
      • Server Monitoring and Job Management
    • Job Scheduling
      • Scheduling Jobs on the Server
      • Job Monitor
  • Use Cases
    • End-to-End Use Cases
      • Data Integration
        • Using Astera to Create and Orchestrate an ETL Process for Partner Onboarding
      • Data Warehousing
        • Building a Data Warehouse - A Step-By-Step Approach
      • Data Extraction
        • Reusing The Extraction Template for Similar Layout Files
  • Connectors
    • Connecting to Amazon Aurora Database
    • Connecting to Amazon RDS Databases
    • Connecting to Amazon Redshift Database
    • Connecting to Cloud Storage
    • Connecting to Google Cloud SQL in Astera
    • Connecting to MariaDB Database
    • Connecting to Microsoft Azure Databases
    • Connecting to MySQL Database
    • Connecting to Netezza Database
    • Connecting to Oracle Database
    • Connecting to PostgreSQL in Astera
    • Connecting to Salesforce Database
    • Connecting to Salesforce - Legacy Database
    • Connecting to SAP HANA Database
    • Connecting to Snowflake Database
    • Connecting to Vertica Database
    • Setting Up IBM DB2 iSeries Connectivity in Astera
  • Miscellaneous
    • Cloud Deployment
      • Deploying Astera on Amazon Web Services
      • Deploying Astera on Microsoft Azure Cloud
      • Deploying Astera on Oracle Cloud
    • Context Information
    • Pushdown Mode
    • Role Based Access Control in Astera
    • Safe Mode
    • Server Command Line Utility
    • SmartMatch Feature
    • Synonym Dictionary File
    • Updating Your License in Astera
    • Using Dynamic Layout/Template Mapping in Astera
    • Using Output Variables in Astera
    • Using the Data Source Browser in Astera
  • Best Practices
    • Astera Best Practices - Dataflows
    • Overview of Cardinality in Data Modeling
    • Cardinality Errors FAQs
Powered by GitBook

© Copyright 2025, Astera Software

On this page
  • Working With Problematic PDF Files
  • Determining Whether a PDF File Contains Text
  • Cases in Which Astera Cannot Import the PDF File
  1. Report Model
  2. Miscellaneous

Working With Problematic PDF Files

Working With Problematic PDF Files

Quite often, you may come across a PDF file that does not get imported properly in Astera, meaning that you may have some trouble importing the data from the PDF file. There a several reasons why this may occur.

Mostly, it is because the PDF file is not a true PDF (readable text format) in the first place but a scanned or some other embedded image, or because the file’s text layer got damaged during its creation.

The first step to finding the actual reason is to determine whether the PDF file contains any text.

Determining Whether a PDF File Contains Text

To determine whether or not a PDF file contains any text, open it in a PDF reader and use the Find feature to search for the text you can simply see on the screen. If the search feature is unable to find the specified text, it signifies that the text layer of the file is damaged, or it is an image and therefore, not readable by the PDF reader or Astera.

Another way is to use the text extract tool in a PDF reader. Copy some text and then paste it onto Notepad. If the PDF reader is unable to highlight your selected text, then it is not text but a scanned or embedded image. If after pasting, you cannot see the similar text pasted on the Notepad, then the text layer of the file is damaged.

Cases in Which Astera Cannot Import the PDF File

There are several cases in which Astera is unable to import the PDF files properly. Let’s go over them and their suggested solutions one by one.

  1. Password Protected File

The author can specify the security options while publishing the PDF document to refrain anyone from extracting the data it contains. If the PDF file is password protected, Astera will throw an exception asking you to enter the password. When you provide the password only then the contents of the file be displayed, and Astera will show an empty page.

  1. Damaged PDF Files

In some scenarios, the PDF file does appear fine but Astera is unable to extract data from it. This is because the file’s text layer gets damaged beyond repair during its creation. Opening the file with a PDF reader and saving it might help cater to this issue.

  1. Scanned or Embedded Images in PDF Files

Sometimes, the PDF file contains a scanned or embedded image instead of the text. A scanned image is usually a picture taken by a scanner and embedded into a PDF document. Astera cannot read data from scanned files. For that, you will need to have a third-party OCR (Optical Character Recognition) tool integrated with it.

  1. Redacted Files

One of the reasons why Astera is unable to read the contents of your PDF file is that it has redacted data. In this case, you cannot do anything because the information has been redacted by the author for security reasons.

  1. Encoding

The encoding of the file should be UTF–8 for it to open in Astera. In case, the encoding of the file is other than that UTF – 8, you can change it from the Encoding option in the Report Options panel.

This concludes working with problematic PDF files in Astera.

PreviousMicrosoft Word and Rich Text Format SupportNextAPI Consumption

Last updated 9 months ago