> For the complete documentation index, see [llms.txt](https://documentation.astera.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.astera.com/report-model/miscellaneous/working-with-problematic-pdf-files.md).

# Working With Problematic PDF Files

Quite often, you may come across a PDF file that does not get imported properly in Astera, meaning that you may have some trouble importing the data from the PDF file. There a several reasons why this may occur. Mostly, it is because the PDF file is not a true PDF (readable text format) in the first place but a scanned or some other embedded image, or because the file’s text layer was damaged during its creation.

The first step to finding the actual reason is to determine whether the PDF file contains any text.

### Determining Whether a PDF File Contains Text

To determine whether or not a PDF file contains any text, open it in a PDF reader and use the *Find* feature to search for the text you can simply see on the screen. If the search feature is unable to find the specified text, it signifies that the text layer of the file is damaged, or it is an image and therefore, not readable by the PDF reader or Astera.

Another way is to use the text extract tool in PDF reader. Copy some text and then paste it onto Notepad. If the PDF reader is unable to highlight your selected text, then it is not text but a scanned or embedded image. If after pasting, you cannot see the similar text pasted on the Notepad, then the text layer of the file is damaged.

### Cases in Which Astera Cannot Import the PDF File

There are several cases in which Astera is unable to import the PDF files properly. Let’s go over them and their suggested solutions one by one.

#### Password Protected File

The author can specify the security options while publishing the PDF document to refrain anyone from extracting the data it contains. If the PDF file is password protected, Astera will throw an exception asking you to enter the password. When you provide the password only then the contents of the file would be displayed, else Astera will show an empty page.

#### Damaged PDF Files

In some scenarios, the PDF file does appear fine but Astera is unable to extract data from it. This is because the file’s text layer gets damaged beyond repair during its creation. Opening the file with a PDF reader and saving it might help cater to this issue.

#### Scanned or Embedded Images in PDF Files

Sometimes, the PDF file contains a scanned or embedded image instead of the text. A scanned image is usually a picture taken by a scanner and embedded into the PDF document. Astera cannot read data from scanned files. For this, you will need to have a third-party OCR (Optical Character Recognition) tool integrated with it.

#### Redacted Files

One of the reasons why Astera is unable to read the contents of your PDF file is that it has redacted data. In this case, you cannot do anything because the information has been redacted by the author for security reasons.

#### Encoding

The encoding of the file should be UTF – 8 for it to open in Astera. In case, the encoding of the file is other than that UTF – 8, you can change it from the *Encoding* option in the Report Options panel.

This concludes working with problematic PDF files in Astera.<br>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.astera.com/report-model/miscellaneous/working-with-problematic-pdf-files.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.