Skip to main content

What is Pushdown Optimization and things to consider



                                                                                                                                                                      
Scenario: What is Pushdown Optimization and things to consider

Solution:
The process of pushing transformation logic to the source or target database by Informatica Integration service is known as Pushdown Optimization. When a session is configured to run for Pushdown Optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The Source or Target Database executes the SQL queries to process the transformations.

How does Pushdown Optimization (PO) Works?

The Integration Service generates SQL statements when native database driver is used. In case of ODBC drivers, the Integration Service cannot detect the database type and generates ANSI SQL.  The Integration Service can usually push more transformation logic to a database if a native driver is used, instead of an ODBC driver.
For any SQL Override, Integration service creates a view (PM_*) in the database while executing the session task and drops the view after the task gets complete. Similarly it also create sequences (PM_*) in the database.
Database schema (SQ Connection, LKP connection), should have the Create View / Create Sequence Privilege, else the session will fail.

Few Benefits in using PO

  • There is no memory or disk space required to manage the cache in the Informatica server for Aggregator, Lookup, Sorter and Joiner Transformation, as the transformation logic is pushed to database.
  • SQL Generated by Informatica Integration service can be viewed before running the session through Optimizer viewer, making easier to debug.
  • When inserting into Targets, Integration Service do row by row processing using bind variable (only soft parse – only processing time, no parsing time). But In case of Pushdown Optimization, the statement will be executed once.
Without Using Pushdown optimization:
INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL,
PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT,
MANAGER_ID,MANAGER_NAME,
DEPARTMENT_ID) VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13) –executes 7012352 times
With Using Pushdown optimization
INSERT INTO EMPLOYEES(ID_EMPLOYEE, EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, MANAGER_NAME, DEPARTMENT_ID) SELECT CAST(PM_SJEAIJTJRNWT45X3OO5ZZLJYJRY.NEXTVAL AS NUMBER(15, 2)), EMPLOYEES_SRC.EMPLOYEE_ID, EMPLOYEES_SRC.FIRST_NAME, EMPLOYEES_SRC.LAST_NAME, CAST((EMPLOYEES_SRC.EMAIL || ‘@gmail.com’) AS VARCHAR2(25)), EMPLOYEES_SRC.PHONE_NUMBER, CAST(EMPLOYEES_SRC.HIRE_DATE AS date), EMPLOYEES_SRC.JOB_ID, EMPLOYEES_SRC.SALARY, EMPLOYEES_SRC.COMMISSION_PCT, EMPLOYEES_SRC.MANAGER_ID, NULL, EMPLOYEES_SRC.DEPARTMENT_ID FROM (EMPLOYEES_SRC LEFT OUTER JOIN EMPLOYEES PM_Alkp_emp_mgr_1 ON (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID)) WHERE ((EMPLOYEES_SRC.MANAGER_ID = (SELECT PM_Alkp_emp_mgr_1.EMPLOYEE_ID FROM EMPLOYEES PM_Alkp_emp_mgr_1 WHERE (PM_Alkp_emp_mgr_1.EMPLOYEE_ID = EMPLOYEES_SRC.MANAGER_ID))) OR (0=0)) –executes 1 time

Things to note when using PO

There are cases where the Integration Service and Pushdown Optimization can produce different result sets for the same transformation logic. This can happen during data type conversion, handling null values, case sensitivity, sequence generation, and sorting of data.
The database and Integration Service produce different output when the following settings and conversions are different:
  • Nulls treated as the highest or lowest value: While sorting the data, the Integration Service can treat null values as lowest, but database treats null values as the highest value in the sort order.
  • SYSDATE built-in variable: Built-in Variable SYSDATE in the Integration Service returns the current date and time for the node running the service process. However, in the database, the SYSDATE returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
  • Date Conversion: The Integration Service converts all dates before pushing transformations to the database and if the format is not supported by the database, the session fails.
  • Logging: When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.



Comments

Popular posts from this blog

CMN_1650 A duplicate row was attempted to be inserted into a dynamic lookup cache Dynamic lookup error.

Scenario: I have 2 ports going through a dynamic lookup, and then to a router. In the router it is a simple case of inserting new target rows (NewRowLookup=1) or rejecting existing rows (NewRowLookup=0). However, when I run the session I'm getting the error: "CMN_1650 A duplicate row was attempted to be inserted into a dynamic lookup cache Dynamic lookup error. The dynamic lookup cache only supports unique condition keys." I thought that I was bringing through duplicate values so I put a distinct on the SQ. There is also a not null filter on both ports. However, whilst investigating the initial error that is logged for a specific pair of values from the source, there is only 1 set of them (no duplicates). The pair exists on the target so surely should just return from the dynamic lookup newrowlookup=0. Is this some kind of persistent data in the cache that is causing this to think that it is duplicate data? I haven't got the persistent cache or...

SQL Transformation with examples

============================================================================================= SQL Transformation with examples   Use : SQL Transformation is a connected transformation used to process SQL queries in the midstream of a pipeline . We can insert, update, delete and retrieve rows from the database at run time using the SQL transformation. Use SQL transformation in script mode to run DDL (data definition language) statements like creating or dropping the tables. The following SQL statements can be used in the SQL transformation. Data Definition Statements (CREATE, ALTER, DROP, TRUNCATE, RENAME) DATA MANIPULATION statements (INSERT, UPDATE, DELETE, MERGE) DATA Retrieval Statement (SELECT) DATA Control Language Statements (GRANT, REVOKE) Transaction Control Statements (COMMIT, ROLLBACK) Scenario: Let’s say we want to create a temporary table in mapping while workflow is running for some intermediate calculation. We can use SQL transformat...

Load the session statistics such as Session Start & End Time, Success Rows, Failed Rows and Rejected Rows etc. into a database table for audit/log purpose.

                                                                                                                                                                     ...