# File System Items Source

The *File System Items Source* in Astera Data Stack is used to provide metadata information to a task in a dataflow or workflow. In a dataflow, it can be used in conjunction with a source object, especially in cases where you want to process multiple files through the transformation and loading process.

### Video

{% embed url="<https://www.youtube.com/watch?t=232s&v=huGpOxr9mQQ>" %}

In a workflow, the *File System Items Source* object can be used to provide input paths to a subsequent object such as a *RunDataflow* task.

Let’s see how it works in a dataflow.

### Using File Systems Items Source in a Dataflow

#### Scenario

Here we have a dataflow that we want to run on multiple source files that contain *Customer\_Data* from a fictitious organization. We are going to use the source object as a transformation and provide the location of the source files using a *File System Items Source* object. The *File System Items Source* will provide the path to the location where our source files reside and the source will object pick the source files from that location, one by one, and pass it on for further processing in the dataflow.

#### Steps to Use the File System Item Source in a Dataflow

Here, we want to sort the data, filter out records of customers from Germany and write the filtered records into a database table. The source data is stored in delimited (.csv) files.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F07Aeyty32ZFL9Pk3gBPE%2F01-file-system-items-source-dataflow.png?alt=media\&token=0b6e9904-6b6e-4d56-9b73-8bf1411df433)

First, change the source object into a Transformation object. This is because the data is stored in multiple delimited files and we want to process all of them in the dataflow. For this, right-click on the source object’s header and click *Transformation* in the context menu.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fdx4KnJfs3PN5OQSJwxl4%2F2.png?alt=media\&token=9621650c-174e-448b-b7a9-c08c4d7e3079)

You can see that the color of the source object has changed from green to purple which indicates that the source object has been changed into a transformation object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FiybVVOfWPMy60AiAlmvl%2F3.png?alt=media\&token=fb86bdaf-1abd-433e-ae4e-77cc04b212ef)

Notice that the source object now has two nodes: *Input* and *Output*. The *Input* node has an input mapping port which means that it can take the path to the source file from another object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FvKxuCIagXjTj2nyD7cBO%2F04-file-system-items-source-source-as-transformation.png?alt=media\&token=10fc2969-8960-457b-984b-0ef3150943c3)

Now we will use a *File System Items Source* object to provide a path to *Customer\_Data* Transformation object. Go to the Sources section in the Toolbox and drag-and-drop the *File System Items Source* object onto the designer.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fm19vN3MsoWj8mFK1Tboa%2F05-file-system-items-source-drag-and-drop.gif?alt=media\&token=5a8347e6-d8fb-4bf0-884c-959b806d576f)

If you look at the *File System Items Source* object, you can see that the layout is pre-populated with fields such as *FileName*, *FileNameWithoutExtension*, *Extension*, *FullPAth*, *Directory*, *ReadOnly*, *Size*, and other attributes of the files.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FDdcKpgpnl4U6tb9MXpYN%2F06-file-system-items-source-object.png?alt=media\&token=37c6d1d9-b92a-49af-9e31-a388310694fe)

To configure the properties of the *File System Items Source* object, right-click on the *File System Items Source* object’s header and go to *Properties*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FHetuaJffCVRjnAAqHIN4%2F07-file-system-items-source-properties.png?alt=media\&token=ecfddf9c-39f3-4b97-b15e-ec80d88f5f89)

This will open the *File System Properties* window.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FQrX230D1TzX6ZZO9iNNK%2F8.png?alt=media\&token=a32d5f81-97a1-4c1b-bbd8-b7c4f4ffe272)

The first thing you need to do is point the *Path* to the directory or folder where your source files reside.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FSYZugTbzoZL7y8nD1Utx%2F09-file-systems-items-source-properties.png?alt=media\&token=28784c3e-34a2-401e-978b-59f8594e6131)

You can see a couple of other options on this screen:

*Filter:* If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “\*.csv” in the *Filter* field to filter and read delimited files only.

![](https://docs.astera.com/projects/centerprise/en/10/_images/10-file-system-items-source-properties.PNG)

*Include items in subdirectories*: Check this option if you want to process files present in the sub-directories

*Include Entries for Directories:* Check this option if you want to include all items in the specified directory

Once you have specified the Path and other options, click *OK*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FnGpCexczcFZgXaS26Lhl%2F11.png?alt=media\&token=3c433bd4-0697-4953-97fa-6dc93d6c449e)

Now right-click on the *File System Items Source* object’s header and select *Preview Output*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fr57aZxglftpOe631k8Ct%2F12.png?alt=media\&token=7c948107-a7b6-4009-9e37-13c1ab23da5e)

You can see that the *File System Items Source* object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the *FileName*, *FileNameWithoutExtension*, *Extension*, *FullPath*, *Directory*, and other attributes such as whether the file is *ReadOnly*, *FileSize*, *LastAccessed*, and other details in the output.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fjc1UTXJF8WyfBKgTlkXj%2F13.png?alt=media\&token=9114593c-82d8-47f7-bf54-86aa729fa620)

Now let’s start mapping. Map the *FullPath* field from the *File System Items Source* object to the *FullPath* field under the *Input* node in the *Customer\_Data* Transformation object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FtFxSNV20RSM9mBHL9jFc%2F14.png?alt=media\&token=b5f9609f-5a41-4edc-a719-65bcc8d9cb72)

Once mapped, when we run the dataflow, the *File System Items Source* will pass the path to the source files, one by one, to the *Customer\_Data* Transformation object. The *Customer\_Data* Transformation object will read the data from the source file and pass it to the subsequent transformation object to be processed further in the dataflow.

### Using File System Items Source in a Workflow

In a workflow, the *File System Items Source* object can be used to provide input paths to a subsequent task such as a *RunDataflow* task. Let’s see how this works.

#### Steps to Use the File System Items Source in a Workflow

We want to design a workflow to orchestrate the process of extracting customer data stored in delimited files, sorting that data, filtering out records of customers from Germany and loading the filtered records in a database table.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FaWjiL3K23zt0zkBJeeQM%2F15.png?alt=media\&token=46a4092c-55ff-41f4-9b5f-24d99ca2691b)

We have already designed a dataflow for the process and have called this dataflow in our workflow using the *RunDataflow* task object.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F2GvYMg7RZqXOHjqhqCpI%2F16.png?alt=media\&token=3bb27f42-6a77-4aa8-b5c8-7c5f3bb5f46b)

We have multiple source files that we want to process in this dataflow. So, we will use a *File System Items Source* object to provide the path to our source files to the *RunDataFlow* task. For this, go to the *Sources* section in the Toolbox and drag-and-drop the *File System Items Source* onto the designer.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FQLBPEpeIF0zYezKnqLaJ%2F17.gif?alt=media\&token=a9a4cf6b-b6a9-40ff-a4af-02c39b4a3f8b)

If you look at the *File System Items Source*, you can see that the layout is pre-populated with fields such as *FileName*, *FileNameWithoutExtension*, *Extension*, *FullPAth*, *Directory*, *ReadOnly*, *Size*, and other attributes of the files. Also, there is this small blue icon with the letter ‘s’, this indicates that the object is set to run in Singleton mode.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FyicMgVUN7RKJsGEztM9C%2F18.png?alt=media\&token=1548afef-f808-46c0-aaa7-75f7f4cae401)

By default, all objects in a workflow are set to execute in *Singleton* mode. However, since we have multiple files to process in the dataflow, we will set the *File System Items Source* object to run in loop. For this, right-click on the *File System Items Source* and click *Loop* in the context menu.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FSPcyzu6DHa6MkHYavesW%2F19.png?alt=media\&token=d2a19fef-aa42-4ea0-b360-ef78be341f6e)

You can see that the color of the object has changed to purple, and it now has this purple icon over the header which denotes the loop function.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FubHXVGrAPE56Nl6cZpUf%2F20.png?alt=media\&token=7220e80e-c559-4eef-9ce1-7f530cfd0f81)

It also has these two mapping ports on the header to map the *File System Items Source* object to the subsequent action in the workflow. Let’s map it to the *RunDataflowTask*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F40dodMLUEVQ8kbDPkMC9%2F21.png?alt=media\&token=f05f871b-da5c-4c5e-b4c6-47e927fa96f6)

To configure the properties of the *File System Items Source*, right-click on the *File System Item Source* object’s header and go to *Properties*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FSK4vJTe3O9wpkjGWkHyy%2F22.png?alt=media\&token=63a36d97-137c-46fd-bd9b-f3696d4c0a4e)

This will open the *File System Items Source Properties* window.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2F9P5y5otzzFXYTQpNeasT%2F23.png?alt=media\&token=e413cc6f-b7c6-43fc-8ca0-080ea2af48c5)

The first thing you need to do is point the *Path* to the directory or folder where your source files reside.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FyXnsb9krlx5kuQIjGSo2%2F24.png?alt=media\&token=d469fcfc-ea76-4d6d-a872-28823e676e40)

You can see a couple of other options on this window:

*Filter:* If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “\*.csv” in the *Filter* field to filter and read delimited files only.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FIPxVISpGFh4S3eUjzyWE%2F25.png?alt=media\&token=c7ca24b4-9f9a-41a0-ab52-fba8acedab4e)

*Include items in subdirectories:* Check this option if you want to process files present in the sub-directories.

*Include Entries for Directories:* Check this option if you want to include all items in the specified directory.

Once you have specified the *Path* and other options, click *OK*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Fi5vn0ddZNfHFLrGdXyQ6%2F26.png?alt=media\&token=b90e2fac-4973-4212-842a-f5e38f675e27)

Now right-click on the *File System Items Source* object’s header and click *Preview Output*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2Flwvr73d9mXikjrazjshE%2F27.png?alt=media\&token=04a525bb-a2da-4768-9344-630841650ffa)

You can see that the *File System Items Source* object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the *FileName*, *FileNameWithoutExtension*, *Extension*, *FullPath*, *Directory*, and other attributes such as whether the file is *ReadOnly*, *FileSize*, *LastAccessed*, and other details in the output.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FMUIWXA1QiiDYPnrsffSS%2F28.png?alt=media\&token=813bd3b5-b513-4adf-8024-cff76ddf7729)

Now let’s start mapping. Map the *FullPath* field from the *File System Items Source object* to the *FilePath* variable in the *RunDataflow task*.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FTYFoocabSkRXKKb6vi53%2F29.png?alt=media\&token=80472cd0-b1fc-49f2-acb1-47b0465f07d6)

Once mapped, upon running the dataflow, the *File System Items Source* object will pass the path to the source files, one by one, to the *RunDataflow task.* In other words, the *File System Items Source* acts as a driver to provide source files to the *RunDataflow* tasks, which will then process them in the dataflow.

When the *File System Items Source* is set to run in a loop, the dataflow will run for *‘n’* number of times; where ‘n’ = the number of files passed by the *File System Items Source* to the *RunDataflow* task. For instance, you can see that we have six source files in the specified folder. The *RunDataflow* task object will pass these six files one by one to the *RunDataflow* task to be processed in the dataflow.

![](https://3083465318-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsR50Wa7EwZGlmPSAMkkf%2Fuploads%2FaNd2xPDoOjohksSq175U%2F30.png?alt=media\&token=16b549cd-54ae-4fa6-adb5-84dba5179a96)

This concludes using the *File System Items Source* object in Astera Data Stack.
