Quite often, you may come across a PDF file that does not get imported properly in Astera, meaning that you may have some trouble importing the data from the PDF file. There a several reasons why this may occur. Mostly, it is because the PDF file is not a true PDF (readable text format) in the first place but a scanned or some other embedded image, or because the file’s text layer was damaged during its creation.
The first step to finding the actual reason is to determine whether the PDF file contains any text.
To determine whether or not a PDF file contains any text, open it in a PDF reader and use the Find feature to search for the text you can simply see on the screen. If the search feature is unable to find the specified text, it signifies that the text layer of the file is damaged, or it is an image and therefore, not readable by the PDF reader or Astera.
Another way is to use the text extract tool in PDF reader. Copy some text and then paste it onto Notepad. If the PDF reader is unable to highlight your selected text, then it is not text but a scanned or embedded image. If after pasting, you cannot see the similar text pasted on the Notepad, then the text layer of the file is damaged.
There are several cases in which Astera is unable to import the PDF files properly. Let’s go over them and their suggested solutions one by one.
The author can specify the security options while publishing the PDF document to refrain anyone from extracting the data it contains. If the PDF file is password protected, Astera will throw an exception asking you to enter the password. When you provide the password only then the contents of the file would be displayed, else Astera will show an empty page.
In some scenarios, the PDF file does appear fine but Astera is unable to extract data from it. This is because the file’s text layer gets damaged beyond repair during its creation. Opening the file with a PDF reader and saving it might help cater to this issue.
Sometimes, the PDF file contains a scanned or embedded image instead of the text. A scanned image is usually a picture taken by a scanner and embedded into the PDF document. Astera cannot read data from scanned files. For this, you will need to have a third-party OCR (Optical Character Recognition) tool integrated with it.
One of the reasons why Astera is unable to read the contents of your PDF file is that it has redacted data. In this case, you cannot do anything because the information has been redacted by the author for security reasons.
The encoding of the file should be UTF – 8 for it to open in Astera. In case, the encoding of the file is other than that UTF – 8, you can change it from the Encoding option in the Report Options panel.
This concludes working with problematic PDF files in Astera.
The .doc file extension was proprietary to Microsoft and is still one of the most popular file formats existing today. Software from other types of word processing products had trouble reading files with a .doc extension. As a result, Microsoft created a newer file format called .docx extension. This signifies the Office Open XML international standard for Office documents. .Docx is supported by a growing number of applications from Microsoft and Apple, as well as open-source word processing programs and other vendors.
The .rtf file format was developed by Microsoft Corporation for cross-platform document interchange with Microsoft products.
Most word processing software supports RTF format importing/exporting/editing, etc. The RTF Specification uses the ANSI file format. RTF supports generic font family names and also supports inclusion of image file formats such as JPEG and PNG.
Astera 10 includes support for Microsoft Word and RTF formats: enjoy efficient and easy information extraction from more source files than ever before. Now, you can process invoices, purchase orders, receipts, forms and other Word/RTF-formatted files with Astera 10.
Select File > New > Report Model and choose your source document.
As seen in the screenshot below, the .docx file opens and is ready for processing.
You can now continue to create your report model.
If you are a former Monarch user, you can use your Monarch models within Astera . You can load and convert your old Monarch files by opening up your Monarch .xmods in Astera. These models will automatically be converted to a usable report model. With the built in converter, the Monarch model logic is converted to Report model logic.
The main difference between Monarch models and Report models is the type of data region used. Monarch models are all appended data. In Astera, the models consist of data regions with a parent-child hierarchy or a tree structure. This better represents how the data is extracted.
When converting your Monarch models, always make sure to preview your model to make sure everything is being extracted correctly.
If your Monarch model is using a PDF, you may need to adjust the Scaling Factor.
Open Astera and click the Open icon placed in the icon bar.
A window will appear where you will need to select your Monarch model. Select your .xmod file and click Open.
Once you do this, the .xmod file would be loaded on to the Astera designer and a Report Options panel will appear. This is where you will select the source file of your .xmod file.
Once you select your file, Astera will convert the extension from .xmod to .rmd.
Now you have a usable report model. You can adjust this model or proceed to export settings. This concludes importing Monarch Models in Astera.