# How to Work with Microsoft Word (Doc/Docx) Files in a Report Model

Astera supports the extraction of data from a wide range of unstructured file formats including PDF files, PDF forms, TXT, XLS/XLSX, PRN and RTF.

By integrating Astera with a third-party tool, a support for extracting unstructured data from a bulk load of MS Word doc/docx files can be established.

In this document, we will use an open-source tool *OfficeToPDF.exe* to convert MS Word documents into PDF files. We will orchestrate this process of conversion through the workflow component in Astera.

Following system requirements should be met in order to use *OfficeToPDF.exe*:

* .Net Framework 4
* Office 2016, 2013, 2010 or Office 2007

Read up more on *OfficeToPDF.exe* from [here](https://github.com/cognidox/OfficeToPDF).

## **Integration with Astera**

1. Download the zip folder: [*WordtoPDF.zip*](https://www.astera.com/Downloads/Misc/WordtoPDF.zip) and extract all three files shown below.

![](/files/Kynn1PW2ky8BPG3TQ3ra)

2. On the Astera client, go to *File* menu in the menu bar at the top, click *Open* and point the path towards *Sample\_Workflow\.Wfs* extracted in the first step.

![](/files/z55Bh2ld3S3iHlkJ1v1l)

The *Sample\_Workflow\.Wfs* will open in your application as shown below.

![](/files/IO157iEzRaECxlUvOTvz)

Follow the steps below to configure this workflow.

3. Right-click on the *FilesToConvert* object and select *Properties* from the context menu. A configuration window will open.

![](/files/iUphWxDhc2BAJ7xEOdAd)

Here you need to provide the path to the source folder that contains all the doc/docx files. Apply the filter "**\*.doc\***" and click OK.

![](/files/QooB7IlzMWlE6zpFxuc6)

4. Right-click on the header of *Exe\_FilePath* object and select *Properties* from the context menu. In the constant value box, paste the local path to *OfficeToPDF.exe*, extracted in the first step and click *OK*.

![](/files/osG1wsG1wuaHbORzFUNm)

5. Right-click on the header of *Bat\_FilePath* object and select *Properties* from the context menu . In the constant value box, paste the local path to *officetopdf.bat* Windows batch file, extracted in the first step.

![](/files/REchXP1qCTBAxcPGh9QL)

6. Right-click on the header of *RunExe* object to open its configuration window. Here, point the *Program Path* to the local path for *officetopdf.bat* Windows batch file.

![](/files/kHuFWuZbdzn13QAXEDlZ)

7. Click on the *Run Workflow* icon to execute this workflow. This will generate the PDF files for all .doc/.docx files residing in the folder specified in *FileSystem* object.

![](/files/HRIjZQRA8bbDPjNfGkAb)

Now, these PDF files can be loaded onto the Astera designer for creating an extraction template (report model) to extract the data.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.astera.com/astera-data-stack-v7/report-model/use-cases/how-to-work-with-microsoft-word-doc-docx-files-in-a-report-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
