Skip to main content

Efficient SQL Statements : SQL Tunning Tips




Efficient SQL Statements

This is an extremely brief look at some of the factors that may effect the efficiency of your SQL and PL/SQL code. It is not intended as a thorough discussion of the area and should not be used as such.

Check Your Stats

The Cost Based Optimizer (CBO) uses statistics to decide which execution plan to use. If these statistics are incorrect the decision made by the CBO may be incorrect. For this reason it is important to make sure that these statistics are refreshed regularly. The following article will help you achieve this aim.

Why Indexes Aren't Used

The presence of an index on a column does not guarantee it will be used. The following is a small list of factors that will prevent an index from being used.
  • The optimizer decides it would be more efficient not to use the index. If your query is returning the majority of the data in a table, then a full table scan is probably going to be the most efficient way to access the table.
  • You perform a function on the indexed column i.e. WHERE UPPER(name) = 'JONES'. The solution to this is to use a Function-Based Index.
  • You perform mathematical operations on the indexed column i.e. WHERE salary + 1 = 10001
  • You concatenate a column i.e. WHERE firstname || ' ' || lastname = 'JOHN JONES'
  • You do not include the first column of a concatenated index in the WHERE clause of your statement. For the index to be used in a partial match, the first column (leading-edge) must be used. Index Skip Scanning in Oracle 9i and above allow indexes to be used even when the leading edge is not referenced.
  • The use of 'OR' statements confuses the Cost Based Optimizer (CBO). It will rarely choose to use an index on column referenced using an OR statement. It will even ignore optimizer hints in this situation. The only way of guaranteeing the use of indexes in these situations is to use an INDEX hint.

EXISTS vs. IN

The EXISTS function searches for the presence of a single row meeting the stated criteria as opposed to the IN statement which looks for all occurrences.
TABLE1 - 1000 rows
TABLE2 - 1000 rows
(A)
SELECT t1.id
FROM   table1 t1
WHERE  t1.code IN (SELECT t2.code 
                   FROM   table2 t2);
(B)
SELECT t1.id
FROM   table1 t1
WHERE  EXISTS (SELECT '1' 
               FROM   table2 t2 
               WHERE  t2.code = t1.code)
For query A, all rows in TABLE2 will be read for every row in TABLE1. The effect will be 1,000,000 rows read from items. In the case of query B, a maximum of 1 row from TABLE2 will be read for each row of TABLE1, thus reducing the processing overhead of the statement.
Rule of thumb:
  • If the majority of the filtering criteria are in the subquery then the IN variation may be more performant.
  • If the majority of the filtering criteria are in the top query then the EXISTS variation may be more performant.
I would suggest they you should try both variants and see which works the best.
Note. In later versions of Oracle there is little difference between EXISTS and IN operations.

Presence Checking

The first question you should ask yourself is, "Do I need to check for the presence of a record?" Alternatives to presence checking include:
  • Use the MERGE statement if you are not sure if data is already present.
  • Perform an insert and trap failure because a row is already present using the DUP_VAL_ON_INDEX exception handler.
  • Perform an update and test for no rows updated using SQL%ROWCOUNT.
If none of these options are right for you and processing is conditional on the presence of certain records in a table, you may decide to code something like the following.
SELECT Count(*)
INTO   v_count
FROM   items
WHERE  item_size = 'SMALL';
 
IF v_count = 0 THEN
  -- Do processing related to no small items present
END IF;
If there are many small items, time and processing will be lost retrieving multiple records which are not needed. This would be better written like one of the following.
SELECT COUNT(*)
INTO   v_count
FROM   items
WHERE  item_size = 'SMALL'
AND    rownum = 1;
 
IF v_count = 0 THEN
  -- Do processing related to no small items present
END IF;
OR
SELECT COUNT(*)
INTO   v_count
FROM   dual
WHERE  EXISTS (SELECT 1
               FROM items
               WHERE item_size = 'SMALL');
 
IF v_count = 0 THEN
  -- Do processing related to no small items present
END IF;
In these examples only single a record is retrieved in the presence/absence check.

Inequalities

If a query uses inequalities (item_no > 100) the optimizer must estimate the number of rows returned before it can decide the best way to retrieve the data. This estimation is prone to errors. If you are aware of the data and it's distribution you can use optimizer hints to encourage or discourage full table scans to improve performance.
If an index is being used for a range scan on the column in question, the performance can be improved by substituting >= for >. In this case, item_no > 100 becomes item_no >= 101. In the first case, a full scan of the index will occur. In the second case, Oracle jumps straight to the first index entry with an item_no of 101 and range scans from this point. For large indexes this may significantly reduce the number of blocks read.

When Things Look Bad!

If you have a process/script that shows poor performance you should do the following:
  • Write sensible queries in the first place!
  • Identify the specific statement(s) that are causing a problem. The simplest way to do this is to use SQL Trace, but you can try running the individual statements using SQL*Plus and timing them (SET TIMING ON)
  • Use EXPLAIN to look at the execution plan of the statement. Look for any full table accesses that look dubious. Remember, a full table scan of a small table is often more efficient than access by index.
  • Check to see if there are any indexes that may help performance.
  • Try adding new indexes to the system to reduce excessive full table scans. Typically, foreign key columns should be indexed as these are regularly used in join conditions. On occasion it may be necessary to add composite (concatenated) indexes that will only aid individual queries. Remember, excessive indexing can reduce INSERT, UPDATE and DELETE performance.

Driving Tables (RBO Only)

The structure of the FROM and WHERE clauses of DML statements can be tailored to improve the performance of the statement. The rules vary depending on whether the database engine is using the Rule or Cost based optimizer. The situation is further complicated by the fact that the engine may perform a Merge Join or a Nested Loop join to retrieve the data. Despite this, there are a few rules you can use to improve the performance of your SQL.
Oracle processes result sets a table at a time. It starts by retrieving all the data for the first (driving) table. Once this data is retrieved it is used to limit the number of rows processed for subsequent (driven) tables. In the case of multiple table joins, the driving table limits the rows processed for the first driven table. Once processed, this combined set of data is the driving set for the second driven table etc. Roughly translated into English, this means that it is best to process tables that will retrieve a small number of rows first. The optimizer will do this to the best of it's ability regardless of the structure of the DML, but the following factors may help.
Both the Rule and Cost based optimizers select a driving table for each query. If a decision cannot be made, the order of processing is from the end of the FROM clause to the start. Therefore, you should always place your driving table at the end of the FROM clause. Subsequent driven tables should be placed in order so that those retrieving the most rows are nearer to the start of the FROM clause. Confusingly, the WHERE clause should be writen in the opposite order, with the driving tables conditions first and the final driven table last. ie.
FROM  d, c, b, a
WHERE a.join_column = 12345
AND   a.join_column = b.join_column
AND   b.join_column = c.join_column
AND   c.join_column = d.join_column;
If we now want to limit the rows brought back from the "D" table we may write the following.
FROM  d, c, b, a
WHERE a.join_column = 12345
AND   a.join_column = b.join_column
AND   b.join_column = c.join_column
AND   c.join_column = d.join_column
AND   d.name = 'JONES';
Depending on the number of rows and the presence of indexes, Oracle my now pick "D" as the driving table. Since "D" now has two limiting factors (join_column and name), it may be a better candidate as a driving table so the statement may be better written as follows.
FROM  c, b, a, d
WHERE d.name = 'JONES'
AND   d.join_column = 12345
AND   d.join_column = a.join_column
AND   a.join_column = b.join_column
AND   b.join_column = c.join_column
This grouping of limiting factors will guide the optimizer more efficiently making table "D" return relatively few rows, and so make it a more efficient driving table.
Remember, the order of the items in both the FROM and WHERE clause will not force the optimizer to pick a specific table as a driving table, but it may influence it's decision. The grouping of limiting conditions onto a single table will reduce the number of rows returned from that table, and will therefore make it a stronger candidate for becoming the driving table.

Caching Tables

Queries will execute much faster if the data they reference is already cached. For small frequently used tables performance may be improved by caching tables. Normally, when full table scans occur, the cached data is placed on the Least Recently Used (LRU) end of the buffer cache. This means that it is the first data to be paged out when more buffer space is required. If the table is cached (ALTER TABLE employees CACHE;) the data is placed on the Most Recently Used (MRU) end of the buffer, and so is less likely to be paged out before it is re-queried. Caching tables may alter the CBO's path through the data and should not be used without careful consideration.

Improving Parse Speed

Execution plans for SELECT statements are cached by the server, but unless the exact same statement is repeated the stored execution plan details will not be reused. Even differing spaces in the statement will cause this lookup to fail. Use of bind variables allows you to repeatedly use the same statements whilst changing the WHERE clause criteria. Assuming the statement does not have a cached execution plan it must be parsed before execution. The parse phase for statements can be decreased by efficient use of aliasing. If an alias is not present, the engine must resolve which tables own the specified columns. The following is an example.
Bad Statement
Good Statement
SELECT first_name,
       last_name,
       country
FROM   employee,
       countries
WHERE  country_id = id
AND    lastname   = 'HALL';
SELECT e.first_name,
       e.last_name,
       c.country
FROM   employee e,
       countries c
WHERE  e.country_id = c.id
AND    e.last_name  = 'HALL';

Packages Procedures and Functions

When an SQL statement, or anonymous block, is passed to the server it is processed in three phases.
Phase
Actions
Parse
Syntax Check and Object Resolution
Execution
Necessary Reads and Writes performed
Fetch
Resultant rows are Retrieved, Assembled, Sorted and Returned
The Parse phase is the most time and resource intensive. This phase can be avoided if all anonymous blocks are stored as Database Procedures, Functions, Packages or Views. Being database objects their SQL text and compiled code is stored in Data Dictionary and the executable copies reside in the Shared Pool.


Comments

Popular posts from this blog

Informatica Quiz: Set 1

                                                                                                                                                                       Quiz: Informatica Set 1 Which one is not correct about filter transformation? Explanation: Filter generally parses single condition. For multiple condition we can use router Act as a 'where' condition Can't passes multiple conditions Act like 'Case' in pl/sql (wrong) If one record does not match condition, the record is blocked Can we calculate in aggrigator ? Explanation: No Yes (correct) Which one is not a type of  fact? Explanation: Semi-aditive Additive Confirm fact Not additive (wrong)  Which one is not a type of dimension ? Explanation: Conformed dimension Rapidly changing dimension (correct) Junk dimension Degenerated dimension Which of these not correct about Code Page? Explanation: A code page contains encodin

Function : NVL2 and COALESCE

NVL2 The NVL2 function accepts three parameters. If the first parameter value is not null it returns the value in the second parameter. If the first parameter value is null, it returns the third parameter. The following query shows NVL2 in action. SQL> SELECT * FROM null_test_tab ORDER BY id;           ID COL1        COL2        COL3        COL4 ---------- ---------- ---------- ---------- ----------          1 ONE         TWO         THREE       FOUR          2             TWO         THREE       FOUR          3                         THREE       FOUR          4                        THREE       THREE   4 rows selected. SQL> SELECT id, NVL2(col1, col2, col3) AS output FROM null_test_tab ORDER BY id;                  ID OUTPUT ---------- ----------                  1 TWO                  2 THREE                  3 THREE                  4 THREE   4 rows selected.   SQL> COALESCE The COALESCE function was introduced in Oracle 9i

Load the session statistics such as Session Start & End Time, Success Rows, Failed Rows and Rejected Rows etc. into a database table for audit/log purpose.

                                                                                                                                                                       Scenario: Load the session statistics such as Session Start & End Time, Success Rows, Failed Rows and Rejected Rows etc. into a database table for audit/log purpose. Solution: After performing the below solution steps your end workflow will look as follows: START => SESSION1 => ASSIGNMENT TASK => SESSION2 SOLUTION STEPS SESSION1 This session is used to achieve your actual business logic. Meaning this session will perform your actual data load. It can be anything File Table. à File or Table à Table, File à WORKFLOW VARIABLES Create the following workflow variables. => $$Workflowname => $$SessionStartTime => $$SessionEndTime => $$TargetSuccessrows => $$TargetFailedRows ASSIGNMENT TASK Use the Expression tab in the Assignment Task and assign as follo