# Parquet File Source (Beta)

### What is Apache Parquet File Format?

Apache Parquet is a column storage file format used by Hadoop systems such as Pig, Spark, and Hive. The file format is language-independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension of .parquet.

The key features of Parquet with respect to Astera Data Stack are:

* It offers the option of compression with a lesser size post-compression.
* It encodes the data.
* It stores data in a column layout.

### Using Parquet File Source in Astera

In Astera Data Stack, you can use a Parquet file in which the cardinality of the data is maintained, i.e., all columns must have the same number of fields.

{% hint style="info" %}
**Note:** There should only be one row for each data field.
{% endhint %}

1. Drag and drop the *Parquet File Source* from the *Sources* section of the Toolbox onto the dataflow designer.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/5yTwrkMdTnv2w6KSusRb/01-Drag-Drop-Parquet.PNG)

2. Right-click on the *Parquet File Source* object and select *Properties* from the context menu.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/4fGYq7NhttmlnUBsQzTi/02-Right-Click-Parquet-1655366432309.PNG)

This will open a new window.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/tyT08y9EVUqNYz6W4Zc0/03-Parquet-Properties-Source.PNG)

Let’s have a look at the options present here.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/ndJeVMWmRUHkwfQJPdLg/04-Parquet-File-Path.png)

**File Location**

*File Path:* This is where you will provide the path to the .parquet file.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/zdskwaR2rYEOzs6siTOV/05-01-Data-Load-Options-Parquet.PNG)

**Data Load Option**

If you wish to control memory consumption and increase read time, then the Data Load option can be used.

*Batch Size:* This is where the size of each batch is defined.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/DLBaYC31tC7Jr6AvcuKt/05-Checkboxes-Parquet.PNG)

**Advanced File Processing: String Processing**

*Treat empty string as null value:* Checking this will give a null value on every empty string.

*Trim strings:* Checking this box will trim the strings.

3. Once done, click *Next* and you will be led to the *Layout Builder* screen.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/RAE7DYjVWPoYj7C8VE72/06-Layout-Builder-Parquet.PNG)

The layout will be automatically built. Otherwise, you can build it using the *Build Layout from layout spec* option at the top of the screen.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/bjLPyRPuK9xGel4pHPdJ/07-Build-Layout-Spec-Parquet.png)

4. Once done, click *Next* and you will be taken to the *Config Parameters* screen.

![08-Config-Paramet](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/wnLTSbX1dA3hbOPpbFa0/08-Config-Parameters-Parquet.PNG)

This allows you to further configure and define dynamic parameters for the Parquet source file.

{% hint style="info" %}
**Note:** Parameters left blank will use their default values assigned on the properties page.
{% endhint %}

5. Click *Next* and you will be taken to the *General Options* screen.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/5Oy6w5oNpsJmoK6lrL3x/09-General-Options-Parquet.PNG)

Here, you can add any comments that you wish to add.

6. Click *OK* and the *Parquet File Source* object will be configured.

![](https://content.gitbook.com/content/zEifS4h8yurLAAwiGNX2/blobs/veohuH3g6xZjCoPBuKfe/10-Parquet-Source-Configured.PNG)

You can now map these fields to other objects as part of the dataflow.

### Data Types Supported in Parquet Astera:

* Integer
* Time/Timestamp
* Date
* String
* Float
* Real
* Decimal
* Double
* Byte Array
* Guid

### Data Types not Supported in Parquet Astera:

* Base64
* Integer96
* Image

### Limitations:

* Hierarchy is not supported.

This concludes our discussion on the definition and configuration of the *Parquet File Source* object in Astera Data Stack.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.astera.com/dataflows/sources/parquet-file-source-beta.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
