Oracle Performance Tuning: March 2011

Friday, March 25, 2011

Monitor CPU Usage

The other day we encountered another problem. One of our reports was too memory intensive and was taking too much CPU space. So much so that in fact it was effecting other processes. I won’t go into other approaches that we discussed but the gist is we eventually decided to cap CPU usage. The idea was to raise an alarm when CPU usage reaches Y% and remains at or above that level for more than X secs. For example, it reaches 80% and remains at or above 80% for more than 100 Secs. At which point, the DBA will step in and check whether the report is impacting other process or not. If yes, DBA will kill the report or else let it run its due course.

Since, this is a straight forward requirement (and we thought it’s quite common too), our team requested the DBA team to create a metric in OEM. Somehow, that particular metric has a bug in Oracle 10g R2 (or that’s what was told to us by our Oracle Liaison) and was only solved in 11g. Upgrading was out of question and hence we decided to write the script ourselves.

We first thought of monitoring CPU usage from unix server and got very good help here by Burleson. It would have sufficed, however this approach was again discarded as unix box was not owned by DB team. :( I turned to Burleson once again and he helped me here and here. I took his advice and wrote my own script. This might not be the best way to monitor CPU usage but what the heck, it worked for me. And it’s sweet and simple. :)

CREATE OR REPLACE FUNCTION MONITOR_CPU_USAGE

(

p_level NUMBER,

p_time NUMBER

) RETURN NUMBER IS

/***************************************************************************

NAME: MONITOR_CPU_USAGE

PURPOSE: To Monitor CPU usage. When CPU usage exceeds a certain

threshold level (passed by user) and stays there for a

certain amount of time (passed by user), it'll return

1 as Alarm, otherwise it'll return 0. This function

will be scheduled to execute every 15 seconds.

PARAMETER: There are two parameter.

1. p_level: NUMBER. This is to set the level of CPU

utilization.

2. p_time: NUMBER: This is to set the time. It is

to be passed in seconds.

For example, to check if the CPU usage is above 80%

for 2 min, pass, 80 and 120 as parameters.

***************************************************************************/

--Local Variable Declaration

ln_flag NUMBER;

ln_count NUMBER;

ln_sum NUMBER;

BEGIN

--Initialize return parameter.

ln_flag := 0;

SELECT COUNT ( * ),

SUM (FLAG)

INTO ln_count,

ln_sum

FROM (SELECT CASE WHEN V.VALUE > p_level THEN 1 ELSE 0 END FLAG

FROM V$SYSMETRIC_HISTORY V

WHERE V.METRIC_NAME = 'Host CPU Utilization (%)'

AND V.BEGIN_TIME > (SELECT MAX (V1.BEGIN_TIME) - p_time / 86400

FROM V$SYSMETRIC_HISTORY V1));

--If count of records is not equal to sum of the records, the CPU

--utilization is not above the level set by user for the number of

--seconds passed by user.

IF ln_count = ln_sum THEN

ln_flag := 1;

ELSE

ln_flag := 0;

END IF;

RETURN ln_flag;

--In case of exception, do not raise the alarm.

EXCEPTION

WHEN OTHERS THEN

RETURN 0;

END MONITOR_CPU_USAGE;

This script will take in the level (in %age) and time (in secs) and will return 1 or 0. It’s a pretty simple function and easy to understand. My main intention was to keep the function as simple as possible to avoid the high memory usage by the function itself. This function can now be scheduled, using dbms scheduler, to run periodically and monitor CPU.

Conclusion: Everything is possible, so long as you’ve access to the proper tables. ;)

Monday, March 7, 2011

Implementing Hash Map in Oralce Pl/SQL

In performance tuning, there is no set path. One always has to burn one’s hand and try different routes and then take the most optimized one. The other day we’re tuning an old code which loads source data from a file to temp table to final tables. The first part of the process was handled by SQL Loader and was pretty fast already and hence we didn’t want to check the external tables option. Hence we decided to focus on the second part, loading from staging table to final tables.

Each record in the staging tables was at the lowest level, say child, but included both parent and grandparent fields. So records looked something like this:

111----Grand Parent Fields---222|555---Parent Fields---666|ABC---Child Fields---DEF
111----Grand Parent Fields---222|555---Parent Fields---666|GHI---Child Fields---JKL
111----Grand Parent Fields---222|777---Parent Fields---888|MNO---Child Fields---PQR
333----Grand Parent Fields---444|999---Parent Fields---000|STU---Child Fields---XYZ

Wherein each child record was different. A parent record can repeat for a certain number of children and similarly a grandparent will repeat for certain number of parent records.

The existing code was something like this:

For ln_loop IN (Select all_records from Staging_table)

LOOP
----Get Grandparent level information---
   SELECT fields
     INTO local variables
     FROM table
    WHERE field1 = ln_loop.field1
      AND field2 = ln_loop.field2;

----Get some Parent level informtion ---

----Get some Child level informaion ---

----INSERT/UPDATE Final tables one by one---

END LOOP;

The first approach was changing the mode from row-based to set-based using BULK COLLECT and FOR ALL. However, since our code had to get information at three different levels and insert into multiple table, we had to create one more staging table to insert final data into it. This reduced the time from X min for Y records to X/2 min for same number of records. But the problem with this approach was too many changes which might have increased our testing life cycle period.

The second approach was to introduce hash-map. We created an hash array and the index of the array was Grandparent level identifier. We stored our values into this Hash array. Next time, if another record comes belonging to the same grandparent, we can just check if the value exist in hash array or not. If not, then compute and store the value in hash array or else take it from the array. So, if there are 10 different records at GP level, each GP has 10 P Levels and each P has 10 C records; there are a total of 1000 unique C level records. Earlier we’re hitting DB for 1000 times for a single GP, P and C level query. With this approach we reduced it to 10 times for GP, 100 times for P and 1000 times for C level query. The overall time taken was reduced to X/2.5 for Y records. And biggest advantage was less code change. We only had to add IF..ELSE and TYPE handling statements.

The pseudo-code is as follows:

Type lr_record is record(Field1 NUMBER,
                         Field2 NUMBER,
                         .
                         .
                         .
                         FieldN NUMBER);

Type lt_hash_array is table of lr_record index by varchar2(20);

la_hash_array lt_hash_array;

For ln_loop IN (Select all_records from Staging_table)
LOOP
IF la_hash_array.exists(ln_loop.field1||’-‘||ln_loop.field2) then
     --Take value from array
ELSE
   SELECT fields
     INTO local variables
     FROM table
    WHERE field1 = ln_loop.field1
      AND field2 = ln_loop.field2;

  --Update hash array

  la_hash_array.exists(ln_loop.field1||’-‘||ln_loop.field2).field1 := local_var from previous query.

  la_hash_array.exists(ln_loop.field1||’-‘||ln_loop.field2).field2 := local_var from previous query.

  .

  .

  .

  la_hash_array.exists(ln_loop.field1||’-‘||ln_loop.field2).fieldN := local_var from previous query.

END IF;

END LOOP;

Conclusion: Like I said, there is no fixed rule. In this case, Hash map wins over BULK method esp with regards to less change and less work for comparable performance. However, BULK is still my favorite whenever am loading data from one system to another.

Thursday, March 3, 2011

Performance Tuning Made Easy

Everyone can feel confused in a strange situation. When we find ourselves in an organization for the first time, we don’t know where to go and whom to talk to. When we sit down at the steering wheel for the first time, we can’t make out what to do next with the vehicle. We don’t know what to do when we first turn on your PC or enter the Internet. We’ll get the experience over time and go right to the elevator or turn on the ignition or perform the habitual sequence of actions on our PC almost subconsciously. But we need a guide at first and this role is going to be played by this article.

When the term ‘Performance tuning’ comes to our (humble developers, hereafter referred to as HD) mind, we think that it’s something for fundoo DBAs (hereafter referred to as FD). HDs are not supposed to tune the code, only develop them. That’s why we have FDs; to tune it. But then there’s always a good code and bad code and HDs never ever write bad code. So what makes the code good or bad? Well there are millions of guidelines but one of the foremost and important one that comes to a HD’s mind instantly is performance. So, there it is again; Performance Tuning. Ok now, if there’s no saving from the behemoth then let’s face it head on. After all, we HDs are afraid of nothing. And if FDs can do it, so can we.

Like every scenario or problem, 80/20 rule applies here also. 80% of the problems get solved by taking some easy 20% steps. For the next 20%, we’ll have to work a little hard and we’ll do that in the last part of the article. So HDs, let us roll up our sleeves and get to work.

Tackling 80%

Now that we have waged the war against the goliath, we must know and equip ourselves with the weapon(s) that we are going to use. This will help us to identify the weak areas and make our job of finishing the task easily. Our main weapon is EXPLAIN PLAN.

EXPLAIN PLAN:

It shows us the step by step instruction of the query execution. The row source tree is the core of the execution plan. It shows the following information:

• An ordering of the tables referenced by the statement

• An access method for each table mentioned in the statement

• A join method for tables affected by join operations in the statement

• Data operations like filter, sort, or aggregation

In addition to the row source tree, the plan table contains information about the following:

• Optimization, such as the cost and cardinality of each operation

• Partitioning, such as the set of accessed partitions

• Parallel execution, such as the distribution method of join inputs

Explain plan is stored in plan_table. It can be invoked for any SQL statement by using the following syntax:

Explain plan for <HDs SQL Statement>;

A normal explain plan sans costs looks like this,

Rows Execution Plan

12 SORT AGGREGATE

2 SORT GROUP BY

76563 NESTED LOOPS

76575 NESTED LOOPS

19 TABLE ACCESS FULL TEST_2

76570 TABLE ACCESS BY INDEX ROWID TEST_3

76570 INDEX RANGE SCAN (object id 171)

76563 TABLE ACCESS BY INDEX ROWID TEST_4

11432983 INDEX RANGE SCAN (object id 184)

HDs can also choose to see cost, CPU, IO and %CPU, object type, owner etc by selecting relevant columns from the plan_table. We HDs generally work on DB tools like Toad, Pl/SQL Developer and these tools provide the necessary option to view the explain plan.

Now we have the explain plan but how do we know where to attack, which are the weak areas. Well here are a few pointers:

1. The higher the cost, the more time that part of query is taking. Quite obvious, isn’t it? But other thing to keep in mind is sometimes unnecessary small operations hampers performance to a great extent.

2. Look for the following in particular,

a. Full scans

b. Unselective range scans

c. Late predicate filters

d. Wrong join order

e. Late filter operations

For example, in the above explain plan, the last step is a very unselective range scan that is executed

76563 times, accesses 11432983 rows, throws away 99% of them, and retains 76563 rows. Why access 11432983 rows to realize that only 76563 rows are needed? Similarly, there is one full table scan, marked in red, which can be reduced.

There are 3 modes of optimization used by oracle engine:

1. Rule Based

a. Oldest and the most stable one.

b. Uses information in data dictionary to make decision.

c. Order of tables and conditions matters.

d. Inefficient in choosing index.

e. Doesn’t support new features such as Bitmap Indexes, Function based Indexes, table partitions.

f. Almost obsolete from 10g onwards.

g. Extensively supports Rule optimizer goals.

2. Cost Based

a. Makes decision on the basis of information available about the schema objects.

b. Efficient in using index.

c. Extensively supports All Rows, First Rows optimizer goals.

3. Choose

a. Decides which one of the above should be used on whether statistics about objects are available or not.

b. Not recommended.

i. For example, Can be a disaster in case incomplete statistics are available about objects. Because it takes CBO and compute rest of the statistics at the run time.

Apart from explain plan, other weapons are TKPROF and SQL TRACE but for a HD, explain plan is easy to understand, use and effective. Now that we have learnt how to use the weapon and ID the weak areas, let’s understand how to counter and hit those weak areas and improve the performance.

INDEX & FULL TABLE SCAN:

We HDs, as I said earlier, are very simple and humble developers. We take whatever our fellow FDs say for granted. So one fine day, one of the FDs said that Indexes help in improving the performance and are always the best way for performance tuning. So what we HD did? We created an index every time there was a problem in performance. Most of the times it worked just fine, after all it was recommended by FDs. But sometimes it didn’t and we never enquired why and passed on the query to FD.

Let us get our facts straight for once now. Indexes only help in fetching the records. They always impair normal insert and update operation. (They do help in insert and update operation when it’s based on a query and that’s why I said normal). Now consider the following two statements:

SELECT empno, ename FROM emp WHERE ename like ‘smith’;

SELECT empno, ename FROM emp WHERE LOWER(ename) like ‘smith’;

Both statements are almost similar except one difference in the WHERE clause. The latter uses a function on the column being searched. Now let us assume that both queries are having performance issues. A HD in this scenario will simply create an index on column ename, without ever knowing that whether that index is being used or not. Because a normal index won’t be used by the second query. At this point we HDs should ask ourselves one very important question; if we are not using index for fetching data, and that’s the only thing index’s good at, then why have we created it?

Before creating an index on a column, the column must pass through the following criteria:

1. % age of data to be fetched

a. If more than 10% of total data, don’t create one.

2. Where and how it is used

a. If in WHERE clause

i. Direct comparison like in first query: Create a normal index

ii. Using a function like in second query: Create a function based index.

iii. Along with some other columns: Create a composite index

3. If the data in the column has

a. high cardinality: B‐Tree Index

b. low cardinality: Bitmap Index

So steps, if we encounter a full table scan in explain plan now, are as follows:

1. Check the query for the columns of that table.

2. Identify how they are used.

3. Check whether they have indexes on them.

a. If yes, then the real reason why they are not used. For example,

i. A composite index is created whereas only one column is used in the query

ii. A normal index is created instead of a function based one.

4. If no, proceed to the next step

5. (Re)create an appropriate index.

a. In case of index present, do check whether the existing index is needed or not.

i. It may not be feasible to drop the index if another query uses it.

6. Re‐execute the query and check explain plan.

Rest assured, performance will be good now. Just a word of caution for HDs here; It’s not good to have too many indexes on a table. It’ll make the job of our old friend FDs difficult and we don’t want that.

By rule, religiously follows the below points:

1. Analyze the table, if optimizer mode is CBO

a. FDs can pitch in here too for collecting statistics using STATSPACK etc.

b. Use dbms_stats.gather_schema_stats(‘<schema_name>',options=>'GATHER AUTO');

c. ANALYZE TABLE tab_name ESTIMATE STATISTICS SAMPLE 30 PERCENT

i. Just a sample command to estimate the statistics. It can be used to compute/estimate/delete statistics about any DB object.

2. Follow the rules in case of RBO.

a. Tables in decreasing order of number of records in FROM clause.

i. Decreasing after applying the filter conditions.

b. Filter condition in increasing order in WHERE clause after joining conditions.

i. Increasing as in the condition which filters most rows.

3. Never create an index on small tables for example, Department.

4. Primary key and unique constraints creates index automatically.

5. Always join tables on indexed column.

6. Always make sure that index created is being used.

a. Using NOT can suppress index.

i. For example, ‘NOT <’ is equal to ‘>=’ and hence use latter instead. Similarly avoid, ‘!=’, ‘<>’.

b. Avoid comparing index to different data type as it might perform internal casting and that’ll suppress index.

c. Avoid comparing index to the same index column as it’ll disable index use

With these points, index part is over and we can now safely move on to other points.

JOINS:

No HD can claim that (s)he doesn’t have to join tables to retrieve data. And the more table HD joins, more performance degrades. To stem this, always ensure

1. Index on columns on which tables are joined.

a. B‐Tree/ Bitmap index. No composite index.

2. Always and always have N‐1 conditions for N tables unless needed by functional requirements.

a. Less conditions will perform Cartesian product

b. More will cause unnecessary scans.

3. Have the table with least number of records in the end. (For RBO)

4. For self-join, hash joining gives the fastest throughput.

5. If possible, create JOIN INDEX on the columns that are regularly joined.

By the time we’re done with JOINS and INDEXES, almost 90% of the cases and 80% of the query is tuned.

Still we HDs never stop at 80%. We want 100%. So, let’s buckle up and move on to tackle those last 20%.

Tackling 20%

The following steps can be implemented to further tune the query. As always, use the explain plan,

1. Use Nested Loop or Hash Join over sort‐merge join

a. Nested loop when one table is bigger than the other.

i. A nested loop join has the fastest response time provided index is there on driving table.

b. Force these joins using hints. Following will use nested loop with index on table_name.

i. /*+ USE_NL_WITH_INDEX (table_name) */

ii. Always use index with nested loop join.

iii. Index is not necessarily required for hash join. Though hash join

1. Can be used only in case of equi‐joins.

2. Is usually used when at least one of the two join tables does not have an index on the join column or when the database server must read a large number of rows from both tables.

3. No index and sorting are required on the tables being joined.

2. Go for parallel access in case of full table scan of small tables.

a. Use parallel scan hint.

3. Use best of RBO and CBO to arrive at the solution

a. Using the ordered/ordered_predicate hint will bypass the very expensive and time consuming parsing operation.

b. Override Oracle’s parameter that controls the number of possible joins evaluated by the cost‐based optimizer when evaluating a query.

i. optimizer_search_limit : specifies the maximum number of table join combinations that will be evaluated. Default is 5.

ii. optimizer_max_permutations : specifies upper boundary for the maximum number of permutations considered by the optimizer. Default is 80,000.

1. For example, for a query with 6 tables, there are 720 ways these tables can be joined.

4. Try using sub queries. Can improve performance, if used correctly. For example,

a. Used to retrieve records from large table instead of joining it, when the aforesaid table has no business requirement in the query.

b. If conditions are more and less numbers of records are fetched. (Vice versa applies to join.)

c. Can be used as inline table/view to get distinct values and/or to avoid two time consuming activities in the same query, calling function and ‘distinct’ records. Etc

5. Avoid expensive functions in the query

a. Use CASE over DECODE

b. Group by, Order by, Distinct, Union, Minus and Intersect are expensive. Try rewriting the query.

6. Use pl/sql for complex logic.

a. Will enhance cursor sharing also.

7. Avoid calling stored function from a stored function.

a. Also, avoid clubbing functions in the same SELECT clause.

8. Don’t use views for performance tuning

a. Don’t query a subset of the complex view, as all the tables in the view will be joined. Use direct query on table instead.

b. Queries against views with complex WHERE clauses will often override any tuning hints that are placed within the view, causing suboptimal execution plans.

c. A view can't be treated as a finite table, and adding SQL hints to view queries will often result in suboptimal execution plans. Any time the optimizer gets “confused,” it will perform an unnecessary full‐table scan

9. Avoid using ‘*’ in the query as it results in one extra step for the parser and hence inefficient.

10. Use Count(Indexed column, preferably PK) instead of Count(*).

11. Use commit regularly to improve the performance as it release the resources held.

Above 11 steps will further tune the query by about 15%. Now HDs must be wondering that what happened to the rest 5%. Well all I can say is that in that scenario contact the author of this article or any one of those FDs. We almost forgot about them. Didn’t we?

Now that we learnt everything, well almost everything, about performance tuning, the term doesn’t scare us anymore. And like I said in the beginning, we HDs only write good code. We do.

Before I finish this article, there is just one more thing I’d like to share. How much should we tune and when we should stop? Well since I know all HDs, I being one of them, we’ll have to exercise control. While tuning code, HDs need to measure the performance difference and the cost of their efforts. And then ask themselves, is the return on investment worth their efforts? If the answer is yes, go full throttle ahead… till the target is achieved. When agreed‐upon performance goals are achieved, stop all tuning efforts. Remember to exercise control and avoid the temptation to tweak it just a little bit more. We know that may be a hard habit to break, but perseverance will prevail.

To conclude this article, I’d say Performance tuning is a step‐by‐step process of iteratively investigating, determining, and implementing tuning solutions using a proven methodology. While tuning a system it is important to know when to tune, what to tune, how much tuning it requires, and when the tuning efforts should stop. Specific goals need to be set and all tuning efforts need to cease when those goals are attained. This article is just a guideline and not laid down rules. The data present greatly affects the query’s performance.