Oracle table() function disabling optimizations
2024-10-02
I'm working in a project using Oracle database where we have many
complex views and sometimes need separate functions to calculate things.
We noticed that including table()
function anywhere in a
query disables some of Oracle's optimizations.
The examples in this blog have been tested using Oracle 19.24.
Creating test data
Let's create some test data, using a million rows to ensure we get realistic plans.
CREATE TABLE department (
NUMBER,
depid VARCHAR2(30),
depname PRIMARY KEY (depid)
);
CREATE TABLE employee (
NUMBER GENERATED BY DEFAULT AS IDENTITY,
empid NUMBER,
depid PRIMARY KEY (empid),
FOREIGN KEY(depid) REFERENCES department(depid)
);
BEGIN
FOR i IN 1..1000 LOOP
INSERT INTO department(depid, depname) VALUES (i, 'department-'||i);
FOR j IN 1..1000 LOOP
INSERT INTO employee(depid) VALUES (i);
END LOOP;
END LOOP;
COMMIT;
END;
/
CREATE INDEX depname_idx ON department(depname);
CREATE INDEX dep_idx ON employee(depid);
CALL DBMS_STATS.GATHER_TABLE_STATS(user, 'department');
CALL DBMS_STATS.GATHER_TABLE_STATS(user, 'employee');
Trying different queries
Let's first try with a simple join:
SELECT /*+ GATHER_PLAN_STATISTICS */ count(empid)
FROM department d
JOIN employee e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
---------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| ---------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 7 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 7 |
| 2 | NESTED LOOPS | | 1 | 1000 | 1000 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 |
| * 4 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 5 | INDEX RANGE SCAN | DEP_IDX | 1 | 1000 | 1000 |00:00:00.01 | 4 |
|---------------------------------------------------------------------------------------------------------------
The plan is as expected, the predicate is used for an index access path to the first table, and the result is joined to the second table again using an index access path.
The same happens when using views, where the (here inline) views are merged into the above query:
SELECT /*+ GATHER_PLAN_STATISTICS */ count(empid)
FROM (SELECT * FROM department) d
JOIN (SELECT * FROM employee) e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
---------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| ---------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 7 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 7 |
| 2 | NESTED LOOPS | | 1 | 1000 | 1000 |00:00:00.01 | 7 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 |
| * 4 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 5 | INDEX RANGE SCAN | DEP_IDX | 1 | 1000 | 1000 |00:00:00.01 | 4 |
|---------------------------------------------------------------------------------------------------------------
Sometimes queries get more complex and the views become unmergeable.
In this case I've made the views unmergeable by using a
GROUP BY
:
SELECT /*+ GATHER_PLAN_STATISTICS */ emps
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
--------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| --------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 7 |
| 1 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.01 | 7 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 |
| * 3 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 |
|4 | VIEW PUSHED PREDICATE | | 1 | 1 | 1 |00:00:00.01 | 4 |
| * 5 | FILTER | | 1 | | 1 |00:00:00.01 | 4 |
|6 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 |
| * 7 | INDEX RANGE SCAN | DEP_IDX | 1 | 1000 | 1000 |00:00:00.01 | 4 |
|--------------------------------------------------------------------------------------------------------------
The plan is still great and we can see Oracle using a join-predicate
push-down (VIEW PUSHED PREDICATE
) with nested
looping to push the predicate resulted from the first view into the
second view. This example is quite artificial but in real life I've seen
this sort of thing happening all the time.
Now, let's see what happens when we make a trivial change of using a function returning multiple values:
CREATE FUNCTION repeat_value(p_value IN NUMBER, p_amount IN NUMBER) RETURN SYS.ODCINUMBERLIST AS
:= SYS.ODCINUMBERLIST();
m_ret SYS.ODCINUMBERLIST BEGIN
FOR i IN 1..p_amount LOOP
m_ret.EXTEND;COUNT) := p_value;
m_ret(m_ret.END LOOP;
RETURN m_ret;
END;
/
SELECT /*+ GATHER_PLAN_STATISTICS */ (SELECT count(*) FROM table(repeat_value(1, emps)))
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
---------------------------------------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| ---------------------------------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.06 | 1958 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 26 | | | |
| 2 | COLLECTION ITERATOR PICKLER FETCH | REPEAT_VALUE | 1 | 8168 | 1000 |00:00:00.01 | 26 | | | |
| * 3 | HASH JOIN | | 1 | 1 | 1 |00:00:00.06 | 1958 | 2546K| 2546K| 251K (0)|
|4 | JOIN FILTER CREATE | :BF0000 | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 5 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 6 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 | 1422K| 1422K| 491K (0)|
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| * 8 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|9 | VIEW | | 1 | 1000 | 1 |00:00:00.06 | 1955 | | | |
| 10 | HASH GROUP BY | | 1 | 1000 | 1 |00:00:00.06 | 1955 | 1558K| 1558K| 493K (0)|
| 11 | JOIN FILTER USE | :BF0000 | 1 | 1000K| 1000 |00:00:00.01 | 1955 | | | |
| * 12 | TABLE ACCESS FULL | EMPLOYEE | 1 | 1000K| 1000 |00:00:00.01 | 1955 | | | |
|---------------------------------------------------------------------------------------------------------------------------------------------
The plan is suddenly a lot slower. Oracle has switched into a hash join for some reason. Let's investigate by forcing it back into a nested loop with a hint:
SELECT /*+ GATHER_PLAN_STATISTICS USE_NL(d e) */ (SELECT count(*) FROM table(repeat_value(1, emps)))
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
--------------------------------------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| --------------------------------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.17 | 1958 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| 2 | COLLECTION ITERATOR PICKLER FETCH | REPEAT_VALUE | 1 | 8168 | 1000 |00:00:00.01 | 0 | | | |
| 3 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.17 | 1958 | | | |
| 4 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 5 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 | 1422K| 1422K| 495K (0)|
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| * 7 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 8 | VIEW | | 1 | 1 | 1 |00:00:00.17 | 1955 | | | |
|9 | SORT GROUP BY | | 1 | 1000 | 1000 |00:00:00.17 | 1955 | 64512 | 64512 |57344 (0)|
| 10 | TABLE ACCESS FULL | EMPLOYEE | 1 | 1000K| 1000K|00:00:00.02 | 1955 | | | |
| --------------------------------------------------------------------------------------------------------------------------------------------
Turns out Oracle is using a full scan instead of index access, which is probably why it decided to go with a hash join instead. But why? By just adding an innocent function call, our plan seems to have lost its ability to push predicates into views!
There's a support case Table Function Disables Push Predicate Query Transformation (Doc ID 3003810.1) about this, which says:
Cause: CBQT (Cost Based Query Transformation) is disabled by design because the TABLE() Function is included in the query
Solution: Re-write the query without the TABLE() Function.
So, whenever we need to add table()
function for
whatever reason, some important optimizations are disabled. Sometimes
you don't need to write the table-wrapper, but that's just syntactic
sugar and doesn't change anything. Neither does trying to force
predicate push-down:
SELECT /*+ GATHER_PLAN_STATISTICS USE_NL(d e) PUSH_PRED(e) */ (SELECT count(*) FROM repeat_value(1, emps))
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
--------------------------------------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| --------------------------------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.18 | 1958 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| 2 | COLLECTION ITERATOR PICKLER FETCH | REPEAT_VALUE | 1 | 8168 | 1000 |00:00:00.01 | 0 | | | |
| 3 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.18 | 1958 | | | |
| 4 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 5 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 | 1422K| 1422K| 531K (0)|
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| * 7 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 8 | VIEW | | 1 | 1 | 1 |00:00:00.18 | 1955 | | | |
|9 | SORT GROUP BY | | 1 | 1000 | 1000 |00:00:00.18 | 1955 | 64512 | 64512 |57344 (0)|
| 10 | TABLE ACCESS FULL | EMPLOYEE | 1 | 1000K| 1000K|00:00:00.02 | 1955 | | | |
| --------------------------------------------------------------------------------------------------------------------------------------------
Unfortunately table()
function is needed for various
cases:
- use a function returning a varray/table/pipelined-table
- use polymorphic table functions
- access rows from a nested table:
SELECT * FROM department, table(department.someNestedTable);
- replace varying length IN-lists with an array to reduce hard
parsing:
SELECT * FROM department WHERE depname IN (SELECT * FROM table(:1));
- something else?
Workaround
Luckily, at least for using functions, there's a workaround!
Instead of returning multiple values from a function, return them as
a VARCHAR
. This doesn't work for complex data, but it's
usable for things representable as simple strings:
CREATE FUNCTION repeat_value_str(p_value IN NUMBER, p_amount IN NUMBER) RETURN VARCHAR2 AS
VARCHAR2(32767);
m_ret BEGIN
FOR i IN 1..p_amount-1 LOOP
:= m_ret||p_value;
m_ret := m_ret||'_';
m_ret END LOOP;
:= m_ret||p_value;
m_ret RETURN m_ret;
END;
/
Then use regular
expressions to split the string into a UNION ALL
one
value at a time. This only works if you know the maximum possible amount
of values, and there can't be hundreds of them. In this case I've added
support for max 5 values, which obviously is not enough for our 1000
employee departments:
SELECT /*+ GATHER_PLAN_STATISTICS USE_NL(d e) */ (
SELECT count(*)
FROM (SELECT a, regexp_count(a, '[^_]+') b FROM (SELECT /*+ NO_MERGE */ repeat_value_str(1, LEAST(emps, 5)) a FROM DUAL)) foo,
SELECT regexp_substr(a, '[^_]+', 1, 1) c FROM DUAL UNION ALL
LATERAL (SELECT regexp_substr(a, '[^_]+', 1, 2) c FROM DUAL UNION ALL
SELECT regexp_substr(a, '[^_]+', 1, 3) c FROM DUAL UNION ALL
SELECT regexp_substr(a, '[^_]+', 1, 4) c FROM DUAL UNION ALL
SELECT regexp_substr(a, '[^_]+', 1, 5) c FROM DUAL
WHERE 1 = CASE WHEN b <= 5 THEN 1 ELSE to_number('a') END)
WHERE c IS NOT NULL
)FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));
If we look at the plan, it's back into pushing down predicates. Note
that the NO_MERGE
hint is required since otherwise Oracle
tends to inline the function invocation into all branches of the union,
which would destroy performance with any non-trivial function.
-----------------------------------------------------------------------------------------------------------------------------------------------
Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
| -----------------------------------------------------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 7 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 25 | | | |
| 2 | NESTED LOOPS | | 1 | 5 | 5 |00:00:00.01 | 25 | | | |
| 3 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 25 | | | |
| 4 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| 5 | VIEW | VW_LAT_33CB5887 | 1 | 5 | 5 |00:00:00.01 | 0 | | | |
| 6 | UNION-ALL | | 1 | | 5 |00:00:00.01 | 0 | | | |
| * 7 | FILTER | | 1 | | 1 |00:00:00.01 | 0 | | | |
|8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| * 9 | FILTER | | 1 | | 1 |00:00:00.01 | 0 | | | |
|10 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| * 11 | FILTER | | 1 | | 1 |00:00:00.01 | 0 | | | |
|12 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| * 13 | FILTER | | 1 | | 1 |00:00:00.01 | 0 | | | |
|14 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| * 15 | FILTER | | 1 | | 1 |00:00:00.01 | 0 | | | |
|16 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| 17 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.01 | 7 | | | |
| 18 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 19 | HASH GROUP BY | | 1 | 1 | 1 |00:00:00.01 | 3 | 1422K| 1422K| 492K (0)|
| 20 | TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| * 21 | INDEX RANGE SCAN | DEPNAME_IDX | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|22 | VIEW PUSHED PREDICATE | | 1 | 1 | 1 |00:00:00.01 | 4 | | | |
| * 23 | FILTER | | 1 | | 1 |00:00:00.01 | 4 | | | |
|24 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 | | | |
| * 25 | INDEX RANGE SCAN | DEP_IDX | 1 | 1000 | 1000 |00:00:00.01 | 4 | | | |
|-----------------------------------------------------------------------------------------------------------------------------------------------
Unfortunately, there is no free lunch. Especially when the amount of branches goes into hundreds, hard parsing time tends to blow up several seconds, depending somewhat on the surrounding query. So we are essentially trading off parsing time for query execution time. You need to consider which one is more important for your use cases, although Oracle being able to share parsed queries often makes long parse times a non-issue.
Summary
Oracle disables some optimizations like join predicate pushdown
whenever a table()
function is present anywhere in the
parsed query structure. There are ways to rewrite queries to not use
table function, but sometimes it gets difficult. I hope Oracle will
consider this design choice in some future database version, since for
many cases it might be quite a silver
bullet.