In this article, we'll show complete guide to lookup table fields and settings
Let’s understand what it contains and how each column is used.
Look Up Table Columns:
- TABLE_SRC
- TABLE_DEST
- SCHEMA_DEST
- PRIMARY_KEY
- WATERMARK_COLUMN
- WATERMARK_COLUMN_FORMAT
- FREQUENCY_IN_DAYS
- REFRESH_TYPE
- LAST_RUN_DATE
Understanding the columns in Lookup Table
- Source and Destination Information
1. TABLE_SRC
The source table for the pipeline, which refers to the table name defined in the BI Connector from which the data will be pulled.
2. TABLE_DEST
The name of the destination table in the target database where the data will be loaded.
3. SCHEMA_DEST
The schema in the destination database where the TABLE_DEST exists.
- Primary Key and Uniqueness
4. PRIMARY_KEY
The primary key column(s) of the source data.
- Used to uniquely identify rows and track changes.
- Can be a single column. Example: ERROR_ID
- Or multiple columns separated by commas. Example: PERSON_ID,EFFECTIVE_START_DATE,EFFECTIVE_END_DATE
- Watermarking (Incremental Load Logic)
5. WATERMARK_COLUMN
The column(s) used to detect new or updated records since the last run. Typically a timestamp or numeric column.
- Simply provide the name of the column used to track new or updated records
- Example: START_DATE
6. WATERMARK_COLUMN_FORMAT
Specifies the format of the datetime column of the WATERMARK_COLUMN field. It ensures that ADF can correctly interpret timestamp values for incremental data filtering.
- Required only if the watermark column is date/time column.
- Not required (can be left empty) if the column is not date/time column.
- Example: YYYY-MM-DD"T"HH24:MI:SS
Watermark Column Configuration Scenarios using example columns
- Execution Control
7. FREQUENCY_IN_DAYS
Indicates how often this pipeline should run.
- Example: 1 = daily, 7 = weekly.
8. REFRESH_TYPE
Defines the type of data load:
- FULL_LOAD :
- Set the REFRESH_TYPE to FULL_LOAD in the lookup table for the tables you want to do full load.
- Make sure the LAST_RUN_DATE is NULL for the rows configured for full load.
- To set FULL_LOAD for all rows in the lookup table, you can run the following SQL command:
UPDATE <your_lookup_table_name>
SET
REFRESH_TYPE = 'FULL_LOAD',
LAST_RUN_DATE = NULL;
To set FULL_LOAD for specific rows in the lookup table, you can run the following SQL command:
UPDATE <your_lookup_table_name>
SET
REFRESH_TYPE = 'FULL_LOAD',
LAST_RUN_DATE = NULL
WHERE
TABLE_SRC IN ('BIC_PJB_ERRORS', 'BIC_PER_ALL_PEOPLE_F', 'BIC_PAY_RUN_BALANCES');
5. This query will update rows for the tables listed in the IN clause.
Note: In the IN clause, these are BI Connector table names (that you created in BI Connector) inserted within '' and separated by commas.
- IF NOT FULL_LOAD:
- First , it will check for the maximum value of the watermark column in the destination table.
- If the Destination Table is empty or the maximum value is empty, then it will do full load
- Else, It will do an incremental refresh. It does this by comparing each source row's watermark value (like LAST_UPDATED_DATE) against the maximum value already present in the destination.Only the records with newer values — meaning data that was added or updated since the last load — are fetched and loaded.
9. LAST_RUN_DATE
Timestamp when the pipeline was last executed successfully.
- It is used to track the last execution and determine when the pipeline should run next, based on the value of FREQUENCY_IN_DAYS
- For example, LAST_RUN_DATE= 2024-12-20 00:00:00
- FREQUENCY_IN_DAYS=2
- Today’s date: 2024-12-22, the pipeline will execute since 2 days have passed since the last run