> For the complete documentation index, see [llms.txt](https://documentation.astera.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.astera.com/astera-data-stack-v7/report-model/miscellaneous/astera-report-model-performance-on-different-hardware-settings-and-case-complexities.md).

# Astera Report Model: Performance on Different Hardware Settings and Case Complexities

The hardware requirements for Astera have been tested on the following three systems with different configurations:

1. Processor: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40 GHz 2.40 GHz, RAM: 8 GB, No of CPUs: 6
2. Processor: Intel(R) Core (TM) i5-8250U CPU @ 1.60GHz RAM: 32 GB, No of CPUs: 8
3. Processor: Intel(R) Xeon(R) D-2146 NT CPU @ 2.30 GHz 2.30 GHz, RAM: 64 GB, No of CPUs: 16

The two variables that generally affect the performance of Astera are concurrent jobs and file size. However, case complexity may also have some part to play in special cases. We will test the time taken by Astera to run each job by keeping one factor as a variable and the other facts as constant.

Let’s go over the three cases for complexity.

## **Case Complexity**

### **Normal Complexity**

In this scenario, we are orchestrating the process of extracting data from an unstructured PDF file that contains the proof of loss of an emergency management agency.

**Use Case**

In the case of normal complexity, we have applied some validation checks on the incoming data from a *Report Source* that contains the dimensions of the rooms to ensure that all the dimensions are in the ‘Real’ data type before writing it to a database destination.

### **High Complexity**

In this scenario, we are orchestrating the process of extracting data from an unstructured PDF file. The source file contains the proof of loss of an emergency management agency. However, this time, instead of one, we have used two report models to extract the required data from the PDF file. Hence, there are two *Report Source* objects in the dataflow. The complexity of the process has increased due to the higher number of sources and the additional steps required for data scrubbing, aggregation, and sorting.

**Use Case**

In the case of high complexity, the source data is in a hierarchical format and is coming from two *Report Source* objects. The purpose of using two *Report Source* objects is to cater to the complexity that may have occurred if a single source was used considering the nature of the data fields which we are extracting. We have transformed the data by applying relevant transformations while retaining its hierarchical structure using the *Passthru* transformation to avoid any cardinality issues. The datasets from both the sources are then joined using the *Join* transformation where it is flattened and loaded into a database destination.

### **Very High Complexity**

In this scenario, we are orchestrating the process of extracting data from an unstructured PDF file that contains an insurance report of a maintenance company. The complexity of this process is very high since the dataflow contains six subflows and each subflow contains four *Report Sources*. Hence, we are extracting data from 24 *Report Sources* altogether and then loading it to a database destination after applying the relevant transformations to it.

**Use Case**

In this case, we have six subflows and each one of them has the entire logic of data extraction encapsulated in it. We are extracting the data using a *Report Source* object. This data is further sorted and summarized and written into a *Subflow Output*.

In the dataflow, these *Subflow* objects are triggered, further transformed, and joined into a single dataset to be written to a database destination.

## **Factors Affecting Astera's Hardware Performance**

1. **Concurrent Jobs**

Below are observations for each hardware configuration when we run 5, 10, 15, and 20 jobs in parallel while keeping the file size constant. You can see that as the number of CPU increases, the runtime of the job decreases. However, it is important to note that the runtime may also vary depending upon the case complexity but not up to a noticeable extent.

Let’s observe how the runtime of jobs varies as the number of concurrent jobs increases with the help of a graph.

In the graph:

| On x-axis | *No. of Concurrent Jobs*                    |
| --------- | ------------------------------------------- |
| On y-axis | *Time Taken to Execute the Concurrent Jobs* |

(a) The runtime of 5, 10, 15, and 20 concurrent jobs when the case complexity was kept *normal*:

Readings:

![](/files/grWoCoxLp2pbBcIx8Vef)

Graph:

![](/files/p8AKVSdMeAWOb4NYIEtJ)

(b) The runtime of 5, 10, 15, and 20 concurrent jobs when the case complexity was kept *high*:

Readings:

![](/files/USGLoC8V4DOlzlGRk8FJ)

Graph:

![](/files/Ktihp2JXJ5feLr6GB2XP)

(c) The runtime of 5, 10, 15, and 20 concurrent jobs when the case complexity was kept *very high*:

![](/files/JwzNtL5e8IG1fSyjv9Jq)

Graph:

![](/files/7THocHsORZSaR80r4lf6)

Note that, the file size varies due to the increased complexity in the above three cases. However, as stated earlier, it does not have a noticeable impact on the runtime.

2. **File Size**

Below are the observations taken by three different file sizes while keeping the number of concurrent jobs constant. You can observe that as the file size increases, the time taken to run the jobs increases.

![](/files/724DYUpSep3fs0bFfD1i)

Let’s understand this with the help of a graph.

In the graph:

| On x-axis | *Average File Size*          |
| --------- | ---------------------------- |
| On y-axis | *Time Taken to Execute Them* |

![](/files/hvLSqn9kvf8hMijWNlLH)

This concludes the summary of Astera's performance on different case complexities and hardware settings.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.astera.com/astera-data-stack-v7/report-model/miscellaneous/astera-report-model-performance-on-different-hardware-settings-and-case-complexities.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.