Skip to main content

Define: Surrogate Key -> Consolidated Info


Definition:

Surrogate key is a substitution for the natural primary key in Data Warehousing.
It is just a unique identifier or number for each row that can be used for the primary key to the table.
The only requirement for a surrogate primary key is that it is unique for each row in the table.
It is useful because the natural primary key can change and this makes updates more difficult.
Surrogated keys are always integer or numeric. 

Scenario overview and details

To illustrate this example, we will use two made up sources of information to provide data about customers dimension. Each extract contains customer records with a business key (natural key) assigned to it.

In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand, and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.

Surrogate key generation mechanism may vary depending on the requirements, however the inputs and outputs usually fit into the design shown below:
Inputs:
- an input respresented by an extract from the source system
- datawarehouse table reference for identifying the existing records
- maximum key lookup

Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new records


Comments

Popular posts from this blog

SQL Transformation with examples

============================================================================================= SQL Transformation with examples   Use : SQL Transformation is a connected transformation used to process SQL queries in the midstream of a pipeline . We can insert, update, delete and retrieve rows from the database at run time using the SQL transformation. Use SQL transformation in script mode to run DDL (data definition language) statements like creating or dropping the tables. The following SQL statements can be used in the SQL transformation. Data Definition Statements (CREATE, ALTER, DROP, TRUNCATE, RENAME) DATA MANIPULATION statements (INSERT, UPDATE, DELETE, MERGE) DATA Retrieval Statement (SELECT) DATA Control Language Statements (GRANT, REVOKE) Transaction Control Statements (COMMIT, ROLLBACK) Scenario: Let’s say we want to create a temporary table in mapping while workflow is running for some intermediate calculation. We can use SQL transformat...

Load the session statistics such as Session Start & End Time, Success Rows, Failed Rows and Rejected Rows etc. into a database table for audit/log purpose.

                                                                                                                                                                     ...

CMN_1650 A duplicate row was attempted to be inserted into a dynamic lookup cache Dynamic lookup error.

Scenario: I have 2 ports going through a dynamic lookup, and then to a router. In the router it is a simple case of inserting new target rows (NewRowLookup=1) or rejecting existing rows (NewRowLookup=0). However, when I run the session I'm getting the error: "CMN_1650 A duplicate row was attempted to be inserted into a dynamic lookup cache Dynamic lookup error. The dynamic lookup cache only supports unique condition keys." I thought that I was bringing through duplicate values so I put a distinct on the SQ. There is also a not null filter on both ports. However, whilst investigating the initial error that is logged for a specific pair of values from the source, there is only 1 set of them (no duplicates). The pair exists on the target so surely should just return from the dynamic lookup newrowlookup=0. Is this some kind of persistent data in the cache that is causing this to think that it is duplicate data? I haven't got the persistent cache or...