Complete Guide to Lookup Table Fields and Settings

Modified on Mon, 14 Jul at 11:00 PM

In this article, we'll show complete guide to lookup table fields and settings


Let’s understand what it contains and how each column is used.


Look Up Table Columns:

  1. TABLE_SRC
  2. TABLE_DEST
  3. SCHEMA_DEST
  4. PRIMARY_KEY
  5. WATERMARK_COLUMN
  6. WATERMARK_COLUMN_FORMAT
  7. FREQUENCY_IN_DAYS
  8. REFRESH_TYPE
  9. LAST_RUN_DATE

Understanding the columns in Lookup Table

  • Source and Destination Information

1. TABLE_SRC
The source table for the pipeline, which refers to the 
table name defined in the BI Connector from which the data will be pulled.

2. TABLE_DEST
The name of the 
destination table in the target database where the data will be loaded.

3. SCHEMA_DEST
The 
schema in the destination database where the TABLE_DEST exists.


  • Primary Key and Uniqueness

4. PRIMARY_KEY
The 
primary key column(s) of the source data.

  • Used to uniquely identify rows and track changes.
  • Can be a single column. Example: ERROR_ID
  • Or multiple columns separated by commas. Example: PERSON_ID,EFFECTIVE_START_DATE,EFFECTIVE_END_DATE

  • Watermarking (Incremental Load Logic)

5. WATERMARK_COLUMN
The column(s) used to detect 
new or updated records since the last run. Typically a timestamp or numeric column.

  • Simply provide the name of the column used to track new or updated records
  • Example: START_DATE

6. WATERMARK_COLUMN_FORMAT
Specifies the format of the datetime column of the WATERMARK_COLUMN field. It ensures that ADF can correctly i
nterpret timestamp values for incremental data filtering.

  • Required only if the watermark column is date/time column.
  • Not required (can be left empty) if the column is not date/time column.
  • Example: YYYY-MM-DD"T"HH24:MI:SS

               Watermark Column Configuration Scenarios using example columns

Scenarios

watermark_column

watermark_column_format

Water mark column (Date/Time Column)

START_DATE

YYYY-MM-DD"T"HH24:MI:SS

Water mark column (Not a Date/Time Column)

RUN_RESULT_ID

Not required (can be left empty)


  • Execution Control
    7. FREQUENCY_IN_DAYS
                    Indicates how often this pipeline should run.
  • Example: 1 = daily, 7 = weekly.


            8. REFRESH_TYPE
               Defines the type of data load:

  • FULL_LOAD : 
  1. Set the REFRESH_TYPE to FULL_LOAD in the lookup table for the tables you want to do full load.
  2. Make sure the LAST_RUN_DATE is NULL for the rows configured for full load.
  3. To set FULL_LOAD for all rows in the lookup table, you can run the following SQL command:
UPDATE <your_lookup_table_name>
SET 
  REFRESH_TYPE = 'FULL_LOAD',
  LAST_RUN_DATE = NULL;
  1. To set FULL_LOAD for specific rows in the lookup table, you can run the following SQL command:

UPDATE <your_lookup_table_name>
SET 
  REFRESH_TYPE = 'FULL_LOAD',
  LAST_RUN_DATE = NULL
WHERE 
  TABLE_SRC IN ('BIC_PJB_ERRORS', 'BIC_PER_ALL_PEOPLE_F', 'BIC_PAY_RUN_BALANCES');

                                         5. This query will update rows for the tables listed in the IN clause.
Note: In the IN clause, these are BI Connector table names (that you created in BI Connector) inserted within '' and separated by commas.

  • IF NOT FULL_LOAD:  
  1. First , it will check for the maximum value of the watermark column in the destination table.
  2. If the Destination Table is empty or the maximum value is empty, then it will do full load
  3. Else, It will do an incremental refresh. It does this by comparing each source row's watermark value (like LAST_UPDATED_DATE) against the maximum value already present in the destination.Only the records with newer values — meaning data that was added or updated since the last load — are fetched and loaded.


              9. LAST_RUN_DATE
              Timestamp when the pipeline was 
last executed successfully.

  • It is used to track the last execution and determine when the pipeline should run next, based on the value of FREQUENCY_IN_DAYS
  • For example, LAST_RUN_DATE= 2024-12-20 00:00:00
  • FREQUENCY_IN_DAYS=2 
  • Today’s date: 2024-12-22, the pipeline will execute since 2 days have passed since the last run