The other day we encountered another problem. One of our reports was too memory intensive and was taking too much CPU space. So much so that in fact it was effecting other processes. I won’t go into other approaches that we discussed but the gist is we eventually decided to cap CPU usage. The idea was to raise an alarm when CPU usage reaches Y% and remains at or above that level for more than X secs. For example, it reaches 80% and remains at or above 80% for more than 100 Secs. At which point, the DBA will step in and check whether the report is impacting other process or not. If yes, DBA will kill the report or else let it run its due course.
Since, this is a straight forward requirement (and we thought it’s quite common too), our team requested the DBA team to create a metric in OEM. Somehow, that particular metric has a bug in Oracle 10g R2 (or that’s what was told to us by our Oracle Liaison) and was only solved in 11g. Upgrading was out of question and hence we decided to write the script ourselves.
We first thought of monitoring CPU usage from unix server and got very good help here by Burleson. It would have sufficed, however this approach was again discarded as unix box was not owned by DB team. :( I turned to Burleson once again and he helped me here and here. I took his advice and wrote my own script. This might not be the best way to monitor CPU usage but what the heck, it worked for me. And it’s sweet and simple. :)
CREATE OR REPLACE FUNCTION MONITOR_CPU_USAGE
(
p_level NUMBER,
p_time NUMBER
) RETURN NUMBER IS
/***************************************************************************
NAME: MONITOR_CPU_USAGE
PURPOSE: To Monitor CPU usage. When CPU usage exceeds a certain
threshold level (passed by user) and stays there for a
certain amount of time (passed by user), it'll return
1 as Alarm, otherwise it'll return 0. This function
will be scheduled to execute every 15 seconds.
PARAMETER: There are two parameter.
1. p_level: NUMBER. This is to set the level of CPU
utilization.
2. p_time: NUMBER: This is to set the time. It is
to be passed in seconds.
For example, to check if the CPU usage is above 80%
for 2 min, pass, 80 and 120 as parameters.
***************************************************************************/
--Local Variable Declaration
ln_flag NUMBER;
ln_count NUMBER;
ln_sum NUMBER;
BEGIN
--Initialize return parameter.
ln_flag := 0;
SELECT COUNT ( * ),
SUM (FLAG)
INTO ln_count,
ln_sum
FROM (SELECT CASE WHEN V.VALUE > p_level THEN 1 ELSE 0 END FLAG
FROM V$SYSMETRIC_HISTORY V
WHERE V.METRIC_NAME = 'Host CPU Utilization (%)'
AND V.BEGIN_TIME > (SELECT MAX (V1.BEGIN_TIME) - p_time / 86400
FROM V$SYSMETRIC_HISTORY V1));
--If count of records is not equal to sum of the records, the CPU
--utilization is not above the level set by user for the number of
--seconds passed by user.
IF ln_count = ln_sum THEN
ln_flag := 1;
ELSE
ln_flag := 0;
END IF;
RETURN ln_flag;
--In case of exception, do not raise the alarm.
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END MONITOR_CPU_USAGE;
This script will take in the level (in %age) and time (in secs) and will return 1 or 0. It’s a pretty simple function and easy to understand. My main intention was to keep the function as simple as possible to avoid the high memory usage by the function itself. This function can now be scheduled, using dbms scheduler, to run periodically and monitor CPU.
Conclusion: Everything is possible, so long as you’ve access to the proper tables. ;)
No comments:
Post a Comment