Cost-Based Optimizer - CBO Roadmap


CBO cost-based optimizer, initialization parameters, object statistics, system statistics, cost model, PGA management


The cost-based optimizer (CBO) is not only one of the most complex pieces of software that constitute the Oracle kernel, it is also one of the most unappreciated. Why? Because to effectively use the CBO you have to correctly configure it and, therefore, understand how it works! In fact, without an optimal configuration, the CBO will generate poor execution plans that lead to poor performance.

The CBO is influenced by many configuration settings. Your settings for important CBO parameters can have a dramatic impact on CBO performance, and this is the place to start when exploring the CBO. Let's start by choosing your CBO optimizer_mode and then examine other important CBO parameters.

What are the Optimizer Statistics?

Optimizer statistics are a collection of data that describe the database, and the objects in the database. These statistics are used by the Optimizer to choose the best execution plan for each SQL statement. Statistics are stored in the data dictionary and can be accessed using data dictionary views such as USER_TAB_STATISTICS. Optimizer statistics are different from the performance statistics visible through V$ views. The information in the V$ views relates to the state of the system and the SQL workload executing on it.

Note: Piece of information was taken from Understanding Optimizer Statistics - Oracle eBooks.

Not Using Bind Variables Correctly

From the performance point of view, bind variables introduce both an advantage and a disadvantage. The advantage of bind variables is that they allow the sharing of cursors in the library cache, and in doing so they avoid the hard parses and the associate to overhead. The disadvantage of using bind variable in WHERE clauses, and only in WHERE clauses are that crucial information is sometimes hidden from the query optimizer.

Please reference as well in Histograms Tuning - Quick Guide

Key Concepts and Best Practices

Selectivity and Cardinality: The selectivity is a value between 0 and 1 representing the fraction of rows filtered by an operation, for example, if an access operation read 120 rows from a table and, after applying a filter, returns 18 of them, the selectivity is 0.15 (18/120). the selectivity can also be expressed as a percentage, so 0.15 can also be expressed as 15%. Selectivity close to 0, it is said to be strong, close it to 1 it is said is weak. Cardinality is the number of rows returned by an operation. the formula of the cardinality is cardinality = selectivity * num_rows

Cursor: Is a memory structure that enables a program to access to resources, that references a private SQL area with an associated shared SQL area. The Lyfe cycle of a cursor is: Open Cursor, Parse Cursor, Define output variables, Bind input Variables, execute cursor, Fetch cursor, Close cursor. The result of a parse operation is a parent cursor and child cursor stored in a Shared SQL Area inside the library cache.

Have a take look to initialization parameters:

SQL> show parameter cursor_sharing


------------------------------------ ----------- ------------------------------ cursor_sharing string EXACT

SQL> show parameters optimizer


------------------------------------ ----------- ------------------------------ optimizer_capture_sql_plan_baselines boolean FALSE optimizer_dynamic_sampling integer 2 optimizer_features_enable string optimizer_mode string ALL_ROWS

SQL> select sql_id, child_number, optimizer_mode, plan_hash_value from v$sql where sql_id='b8r7afryn9grg';

SQL_ID CHILD_NUMBER OPTIMIZER PLAN_HASH_VALUE ------------- ------------ ---------- --------------- b8r7afryn9grg 0 ALL_ROWS 2739518964 b8r7afryn9grg 1 ALL_ROWS b8r7afryn9grg 7 ALL_ROWS b8r7afryn9grg 8 ALL_ROWS

4 rows selected.

SQL> select address, hash_value, executions, loads, version_count, invalidations, parse_calls from v$sqlarea where sql_id = '&1'; 2 3 Enter value for 1: b8r7afryn9grg old 3: where sql_id = '&1' new 3: where sql_id = 'b8r7afryn9grg'

ADDRESS HASH_VALUE EXECUTIONS LOADS VERSION_COUNT INVALIDATIONS PARSE_CALLS ---------------- ---------- ---------- ---------- ------------- ------------- ----------- 00000000B47BA968 4249140975 14709 12 9 0 1902

1 row selected.

Any featured should be used only if the advantage related to its utilization outweigh the disadvantages. in some situations, it is easy to decide. For example, there is no reason using bind variables with SQL statements without WHERE clause (for example, INSERT statements). On the other hand, bind variables should be avoided at all costs whenever there's a high risk of being stung by bind variable peeking. This especially true when the following three conditions are met.

  • When the query has to check whether a value is outside the range of available values (that is, lower than the minimum value or higher than the maximum value stored in the column).

  • When a predicate in the WHERE clause is based on a range condition (for example, HIREDATE > '2009-12-31')

  • When the query optimizer makes use of histograms.

As a result, for cursors that can be shared, you should not use bind variable if one of the preceding three conditions is met. In all other cases, the situation is even less clear. Nevertheless, it is possible to consider two main cases:

SQL statements processing little data: Whenever little data is processed, the hard parse time might be close to or even higher than execution time. In that kind of situation, using bind variables and therefore avoiding a hard parse is usually a must. This especially true for SQL statements that expected to be executed frequently. Typically, such SQL statements are used in data entry system (OLTP Systems).

SQL statements processing a lot of data: Whenever a lot of data is processed, the hard parse time is usually several orders of magnitude less than the executions time. In that kind situations, using bind variables is not only irrelevant for the whole response time, it also increases the risk that the query optimizer will generate very inefficient executions plans. Therefore, bind variables should be not used. Typically, such SQL statements are used for a batch jobs, for reporting purposes or, in data warehousing environments, by OLAP and BI Tools.

RoadMap Optimizer

Object Statistics

The CBO can generate good execution plans only if the object statistics, which describe the data stored in the database, are good as well, i.e. they must reflect existing data. Absent, out-of-date, or incorrect statistics can lead to poor execution plans.

The following object statistics are available.

Table statistics:

The number of rows.A number of blocks below the high watermark.Average row length in bytes.

Column statistics:

Number of distinct values.Number of NULL values.Data distribution (a.k.a. histograms).

Index statistics:

The number of distinct keys.Height (the number of blocks that have to be read to get a leaf block).A number of leaf blocks.Clustering factor (this value indicates how many adjacent index entries don’t refer to the same data block in the table).

For partitioned segments, object statistics exist at all levels. For example, for a sub-partitioned table, there are object statistics at the physical level (one set for each sub-partition) and at the logical levels (one set for each partition and one set for the table).

References Documents

How to Collect Database Statistics at a System Level

3 views0 comments