Architecturally Elegant - lahteenmaki.net

Oracle table() function disabling optimizations

Jyri-Matti Lähteenmäki

2024-10-02

Tags: sql

I'm working in a project using Oracle database where we have many complex views and sometimes need separate functions to calculate things. We noticed that including table() function anywhere in a query disables some of Oracle's optimizations.

The examples in this blog have been tested using Oracle 19.24.

Creating test data

Let's create some test data, using a million rows to ensure we get realistic plans.

CREATE TABLE department (
  depid NUMBER,
  depname VARCHAR2(30),
  PRIMARY KEY (depid)
);

CREATE TABLE employee (
  empid NUMBER GENERATED BY DEFAULT AS IDENTITY,
  depid NUMBER,
  PRIMARY KEY (empid),
  FOREIGN KEY(depid) REFERENCES department(depid)
);

BEGIN
  FOR i IN 1..1000 LOOP
    INSERT INTO department(depid, depname) VALUES (i, 'department-'||i);

    FOR j IN 1..1000 LOOP
      INSERT INTO employee(depid) VALUES (i);
    END LOOP;
  END LOOP;
  COMMIT;
END;
/

CREATE INDEX depname_idx ON department(depname);
CREATE INDEX dep_idx ON employee(depid);

CALL DBMS_STATS.GATHER_TABLE_STATS(user, 'department');
CALL DBMS_STATS.GATHER_TABLE_STATS(user, 'employee');

Trying different queries

Let's first try with a simple join:

SELECT /*+ GATHER_PLAN_STATISTICS */ count(empid)
FROM department d
JOIN employee e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  SORT AGGREGATE                       |             |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   NESTED LOOPS                        |             |      1 |   1000 |   1000 |00:00:00.01 |       7 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT  |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  4 |     INDEX RANGE SCAN                  | DEPNAME_IDX |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  5 |    INDEX RANGE SCAN                   | DEP_IDX     |      1 |   1000 |   1000 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------

The plan is as expected, the predicate is used for an index access path to the first table, and the result is joined to the second table again using an index access path.

The same happens when using views, where the (here inline) views are merged into the above query:

SELECT /*+ GATHER_PLAN_STATISTICS */ count(empid)
FROM (SELECT * FROM department) d
JOIN (SELECT * FROM employee) e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  SORT AGGREGATE                       |             |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   NESTED LOOPS                        |             |      1 |   1000 |   1000 |00:00:00.01 |       7 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT  |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  4 |     INDEX RANGE SCAN                  | DEPNAME_IDX |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  5 |    INDEX RANGE SCAN                   | DEP_IDX     |      1 |   1000 |   1000 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------

Sometimes queries get more complex and the views become unmergeable. In this case I've made the views unmergeable by using a GROUP BY:

SELECT /*+ GATHER_PLAN_STATISTICS */ emps
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |             |      1 |        |      1 |00:00:00.01 |       7 |
|   1 |  NESTED LOOPS                        |             |      1 |      1 |      1 |00:00:00.01 |       7 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT  |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  3 |    INDEX RANGE SCAN                  | DEPNAME_IDX |      1 |      1 |      1 |00:00:00.01 |       2 |
|   4 |   VIEW PUSHED PREDICATE              |             |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  5 |    FILTER                            |             |      1 |        |      1 |00:00:00.01 |       4 |
|   6 |     SORT AGGREGATE                   |             |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  7 |      INDEX RANGE SCAN                | DEP_IDX     |      1 |   1000 |   1000 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------------

The plan is still great and we can see Oracle using a join-predicate push-down (VIEW PUSHED PREDICATE) with nested looping to push the predicate resulted from the first view into the second view. This example is quite artificial but in real life I've seen this sort of thing happening all the time.

Now, let's see what happens when we make a trivial change of using a function returning multiple values:

CREATE FUNCTION repeat_value(p_value IN NUMBER, p_amount IN NUMBER) RETURN SYS.ODCINUMBERLIST AS
    m_ret SYS.ODCINUMBERLIST := SYS.ODCINUMBERLIST();
BEGIN
    FOR i IN 1..p_amount LOOP
        m_ret.EXTEND;
        m_ret(m_ret.COUNT) := p_value;
    END LOOP;
    RETURN m_ret;
END;
/

SELECT /*+ GATHER_PLAN_STATISTICS */ (SELECT count(*) FROM table(repeat_value(1, emps)))
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |              |      1 |        |      1 |00:00:00.06 |    1958 |       |       |          |
|   1 |  SORT AGGREGATE                         |              |      1 |      1 |      1 |00:00:00.01 |      26 |       |       |          |
|   2 |   COLLECTION ITERATOR PICKLER FETCH     | REPEAT_VALUE |      1 |   8168 |   1000 |00:00:00.01 |      26 |       |       |          |
|*  3 |  HASH JOIN                              |              |      1 |      1 |      1 |00:00:00.06 |    1958 |  2546K|  2546K|  251K (0)|
|   4 |   JOIN FILTER CREATE                    | :BF0000      |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   5 |    VIEW                                 |              |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   6 |     HASH GROUP BY                       |              |      1 |      1 |      1 |00:00:00.01 |       3 |  1422K|  1422K|  491K (0)|
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT   |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | DEPNAME_IDX  |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|   9 |   VIEW                                  |              |      1 |   1000 |      1 |00:00:00.06 |    1955 |       |       |          |
|  10 |    HASH GROUP BY                        |              |      1 |   1000 |      1 |00:00:00.06 |    1955 |  1558K|  1558K|  493K (0)|
|  11 |     JOIN FILTER USE                     | :BF0000      |      1 |   1000K|   1000 |00:00:00.01 |    1955 |       |       |          |
|* 12 |      TABLE ACCESS FULL                  | EMPLOYEE     |      1 |   1000K|   1000 |00:00:00.01 |    1955 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

The plan is suddenly a lot slower. Oracle has switched into a hash join for some reason. Let's investigate by forcing it back into a nested loop with a hint:

SELECT /*+ GATHER_PLAN_STATISTICS USE_NL(d e) */ (SELECT count(*) FROM table(repeat_value(1, emps)))
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |              |      1 |        |      1 |00:00:00.17 |    1958 |       |       |          |
|   1 |  SORT AGGREGATE                        |              |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   2 |   COLLECTION ITERATOR PICKLER FETCH    | REPEAT_VALUE |      1 |   8168 |   1000 |00:00:00.01 |       0 |       |       |          |
|   3 |  NESTED LOOPS                          |              |      1 |      1 |      1 |00:00:00.17 |    1958 |       |       |          |
|   4 |   VIEW                                 |              |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   5 |    HASH GROUP BY                       |              |      1 |      1 |      1 |00:00:00.01 |       3 |  1422K|  1422K|  495K (0)|
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT   |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  7 |      INDEX RANGE SCAN                  | DEPNAME_IDX  |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  8 |   VIEW                                 |              |      1 |      1 |      1 |00:00:00.17 |    1955 |       |       |          |
|   9 |    SORT GROUP BY                       |              |      1 |   1000 |   1000 |00:00:00.17 |    1955 | 64512 | 64512 |57344  (0)|
|  10 |     TABLE ACCESS FULL                  | EMPLOYEE     |      1 |   1000K|   1000K|00:00:00.02 |    1955 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Turns out Oracle is using a full scan instead of index access, which is probably why it decided to go with a hash join instead. But why? By just adding an innocent function call, our plan seems to have lost its ability to push predicates into views!

There's a support case Table Function Disables Push Predicate Query Transformation (Doc ID 3003810.1) about this, which says:

Cause: CBQT (Cost Based Query Transformation) is disabled by design because the TABLE() Function is included in the query

Solution: Re-write the query without the TABLE() Function.

So, whenever we need to add table() function for whatever reason, some important optimizations are disabled. Sometimes you don't need to write the table-wrapper, but that's just syntactic sugar and doesn't change anything. Neither does trying to force predicate push-down:

SELECT /*+ GATHER_PLAN_STATISTICS USE_NL(d e) PUSH_PRED(e) */ (SELECT count(*) FROM repeat_value(1, emps))
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |              |      1 |        |      1 |00:00:00.18 |    1958 |       |       |          |
|   1 |  SORT AGGREGATE                        |              |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   2 |   COLLECTION ITERATOR PICKLER FETCH    | REPEAT_VALUE |      1 |   8168 |   1000 |00:00:00.01 |       0 |       |       |          |
|   3 |  NESTED LOOPS                          |              |      1 |      1 |      1 |00:00:00.18 |    1958 |       |       |          |
|   4 |   VIEW                                 |              |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   5 |    HASH GROUP BY                       |              |      1 |      1 |      1 |00:00:00.01 |       3 |  1422K|  1422K|  531K (0)|
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT   |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  7 |      INDEX RANGE SCAN                  | DEPNAME_IDX  |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  8 |   VIEW                                 |              |      1 |      1 |      1 |00:00:00.18 |    1955 |       |       |          |
|   9 |    SORT GROUP BY                       |              |      1 |   1000 |   1000 |00:00:00.18 |    1955 | 64512 | 64512 |57344  (0)|
|  10 |     TABLE ACCESS FULL                  | EMPLOYEE     |      1 |   1000K|   1000K|00:00:00.02 |    1955 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Unfortunately table() function is needed for various cases:

  • use a function returning a varray/table/pipelined-table
  • use polymorphic table functions
  • access rows from a nested table: SELECT * FROM department, table(department.someNestedTable);
  • replace varying length IN-lists with an array to reduce hard parsing: SELECT * FROM department WHERE depname IN (SELECT * FROM table(:1));
  • something else?

Workaround

Luckily, at least for using functions, there's a workaround!

Instead of returning multiple values from a function, return them as a VARCHAR. This doesn't work for complex data, but it's usable for things representable as simple strings:

CREATE FUNCTION repeat_value_str(p_value IN NUMBER, p_amount IN NUMBER) RETURN VARCHAR2 AS
    m_ret VARCHAR2(32767);
BEGIN
    FOR i IN 1..p_amount-1 LOOP
        m_ret := m_ret||p_value;
        m_ret := m_ret||'_';
    END LOOP;
    m_ret := m_ret||p_value;
    RETURN m_ret;
END;
/

Then use regular expressions to split the string into a UNION ALL one value at a time. This only works if you know the maximum possible amount of values, and there can't be hundreds of them. In this case I've added support for max 5 values, which obviously is not enough for our 1000 employee departments:

SELECT /*+ GATHER_PLAN_STATISTICS USE_NL(d e) */ (
    SELECT count(*)
    FROM (SELECT a, regexp_count(a, '[^_]+') b FROM (SELECT /*+ NO_MERGE */ repeat_value_str(1, LEAST(emps, 5)) a FROM DUAL)) foo,
    LATERAL (SELECT regexp_substr(a, '[^_]+', 1, 1) c  FROM DUAL UNION ALL
             SELECT regexp_substr(a, '[^_]+', 1, 2) c  FROM DUAL UNION ALL
             SELECT regexp_substr(a, '[^_]+', 1, 3) c  FROM DUAL UNION ALL
             SELECT regexp_substr(a, '[^_]+', 1, 4) c  FROM DUAL UNION ALL
             SELECT regexp_substr(a, '[^_]+', 1, 5) c  FROM DUAL
             WHERE 1 = CASE WHEN b <= 5 THEN 1 ELSE to_number('a') END)
    WHERE c IS NOT NULL
)
FROM (SELECT depid, depname FROM department GROUP BY depid, depname) d
JOIN (SELECT depid, count(*) emps FROM employee GROUP BY depid) e ON e.depid = d.depid
WHERE depname = 'department-1';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'allstats last'));

If we look at the plan, it's back into pushing down predicates. Note that the NO_MERGE hint is required since otherwise Oracle tends to inline the function invocation into all branches of the union, which would destroy performance with any non-trivial function.

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |      1 |        |      1 |00:00:00.01 |       7 |       |       |          |
|   1 |  SORT AGGREGATE                        |                 |      1 |      1 |      1 |00:00:00.01 |      25 |       |       |          |
|   2 |   NESTED LOOPS                         |                 |      1 |      5 |      5 |00:00:00.01 |      25 |       |       |          |
|   3 |    VIEW                                |                 |      1 |      1 |      1 |00:00:00.01 |      25 |       |       |          |
|   4 |     FAST DUAL                          |                 |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   5 |    VIEW                                | VW_LAT_33CB5887 |      1 |      5 |      5 |00:00:00.01 |       0 |       |       |          |
|   6 |     UNION-ALL                          |                 |      1 |        |      5 |00:00:00.01 |       0 |       |       |          |
|*  7 |      FILTER                            |                 |      1 |        |      1 |00:00:00.01 |       0 |       |       |          |
|   8 |       FAST DUAL                        |                 |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|*  9 |      FILTER                            |                 |      1 |        |      1 |00:00:00.01 |       0 |       |       |          |
|  10 |       FAST DUAL                        |                 |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 11 |      FILTER                            |                 |      1 |        |      1 |00:00:00.01 |       0 |       |       |          |
|  12 |       FAST DUAL                        |                 |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 13 |      FILTER                            |                 |      1 |        |      1 |00:00:00.01 |       0 |       |       |          |
|  14 |       FAST DUAL                        |                 |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 15 |      FILTER                            |                 |      1 |        |      1 |00:00:00.01 |       0 |       |       |          |
|  16 |       FAST DUAL                        |                 |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|  17 |  NESTED LOOPS                          |                 |      1 |      1 |      1 |00:00:00.01 |       7 |       |       |          |
|  18 |   VIEW                                 |                 |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|  19 |    HASH GROUP BY                       |                 |      1 |      1 |      1 |00:00:00.01 |       3 |  1422K|  1422K|  492K (0)|
|  20 |     TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENT      |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|* 21 |      INDEX RANGE SCAN                  | DEPNAME_IDX     |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|  22 |   VIEW PUSHED PREDICATE                |                 |      1 |      1 |      1 |00:00:00.01 |       4 |       |       |          |
|* 23 |    FILTER                              |                 |      1 |        |      1 |00:00:00.01 |       4 |       |       |          |
|  24 |     SORT AGGREGATE                     |                 |      1 |      1 |      1 |00:00:00.01 |       4 |       |       |          |
|* 25 |      INDEX RANGE SCAN                  | DEP_IDX         |      1 |   1000 |   1000 |00:00:00.01 |       4 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Unfortunately, there is no free lunch. Especially when the amount of branches goes into hundreds, hard parsing time tends to blow up several seconds, depending somewhat on the surrounding query. So we are essentially trading off parsing time for query execution time. You need to consider which one is more important for your use cases, although Oracle being able to share parsed queries often makes long parse times a non-issue.

Summary

Oracle disables some optimizations like join predicate pushdown whenever a table() function is present anywhere in the parsed query structure. There are ways to rewrite queries to not use table function, but sometimes it gets difficult. I hope Oracle will consider this design choice in some future database version, since for many cases it might be quite a silver bullet.