VectorDB Lookup Transformation

Overview 

The VectorDB Lookup transformation object in Astera is designed to find and return relevant tokens (most relevant text from Vector Database) that closely match the query provided.

Getting the VectorDB Lookup Object 

  1. To get a VectorDB Lookup object from the Toolbox, go to Toolbox > Transformations > VectorDB Lookup. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.

  1. Drag-and-drop the VectorDB Lookup object onto the designer. 

Configuring the Vector DB Lookup Object 

  1. To configure the VectorDB Lookup object, right-click on its header and select Properties from the context menu.  

As soon as you have selected the Properties option from the context menu, a dialog box will open. 

This is where you can configure the properties for the VectorDB Lookup object. 

  1. The first step is to build a connection to the Pinecone database. Enter the Environment and API Key.

  2. Once finished, click Next.

  1. The next screen is the Vector Embedding Screen. Currently, Astera uses OpenAI as only Embedding provider (more providers would be added in the future), with OpenAI being responsible for determining the similarity score between the query passed and the tokens returned. In this screen, enter the API Key and select your desired Embedding Models to be used. Select Next once you finish.

  1. The next screen is the Vector Database Pick Index screen, where you will select the index from which you want to draw the data from in the dropdown. The dropdown will show a list of all the indexes that are available. Click here to learn more about indexing in Vector Databases.

Next is the Reader Options tab which has two options:

  • Return top – documents: the most relevant number of documents found in the vectorDB according to the similarity score

  • Include metadata field in response: select this option when you want the results to include the metadata information mapped to the fields.

  1. Select Next once you finish.

  1. This is the Config Parameters window. Click Next.

  1. The final window is the General Options window. This window consists of options common to most objects in a dataflow.

  • Clear Incoming Record Messages: When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.

  • Do Not Process Records with Errors: When this option is checked, records with errors will not be outputted by the object. When this option is unchecked, records with errors will be outputted by the object, and a record message will be attached to the record. This record message can then feed into downstream objects in the dataflow, for example a destination file that will capture record messages, or a log that will capture messages, as well as collect statistics.

  • Enable Sort Optimization: Check this option if you would like to optimize the tokens to be sorted.

  • The Comments input allows you to enter comments associated with this object.

  1. Once finished, click OK.

The VectorDB Lookup object is now configured according to the changes made. 

Right-click on the VectorDB Lookup object’s header and select Preview Output from the context menu. 

View the data through the Data Preview window. 

You have successfully configured your VectorDB Lookup object. The fields from the object can now be mapped to other objects in the dataflow. These fields include:

  • the Query that was passed

  • the Id

  • the Relevance Score of the Token compared to the Query

  • the File Name without any extensions

  • the resulting token for the query

Last updated

© Copyright 2023, Astera Software