The path less taken…

Sorry I’ve been away for so long — travel and what not — and I don’t have one of those fancy EVDO cards :-)

Anyway, here’s a quiz to pass the time:

Given a table as follows:

Account (
account_id number,
parent_account_id number,
account_name varchar2(20)
)

And sample data as follows:

account_id, parent_account_id, account_name

1,null,young
26653,1,radatz
150406,26653,stanley
150507,26653,smith
150508,26653,nipper
27854,1,longborg
160403,27854,smith

You need to produce 2 queries using only SQL (no PL/SQL functions):

1. Given an account_id, return the path of account_names from the root to the account, delimited by ‘.’
2. Given a path of account_names delimited by ‘.’, return the account_id of the account

Be able to explain why your queries are the most efficient possible. :-)

10 Responses to “The path less taken…”

  1. Mathew Butler Says:

    I thought I would give this a go – a good excuse to play with sys_connect_by_path and the possibility of learning something. I’m not sure if the you were expecting the path to be reversed eg: root.name1.name2 rather than name2.name1.root.

    In terms of query performance, the first use avoids too many full table scans by using sys_connect_by_path and the max() analytic function. The second can be supported by an pure index lookup.

    I haven’t created either of the indexes that support both queries.

    SQL> select * from v$version
    2 /

    BANNER
    —————————————————————-
    Oracle9i Enterprise Edition Release 9.2.0.5.0 – Production
    PL/SQL Release 9.2.0.5.0 – Production
    CORE 9.2.0.6.0 Production
    TNS for 32-bit Windows: Version 9.2.0.5.0 – Production
    NLSRTL Version 9.2.0.5.0 – Production

    SQL>
    SQL> drop table account
    2 /

    Table dropped.

    SQL>
    SQL> create table account (
    2 account_id number,
    3 parent_account_id number,
    4 account_name varchar2(20)
    5 )
    6 /

    Table created.

    SQL>
    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 1, null, ‘young’)
    3 /

    1 row created.

    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 26653, 1, ‘radatz’)
    3 /

    1 row created.

    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 150406, 26653, ‘stanley’)
    3 /

    1 row created.

    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 150507, 26653, ‘smith’)
    3 /

    1 row created.

    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 150508, 26653, ‘nipper’)
    3 /

    1 row created.

    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 27854, 1, ‘longborg’)
    3 /

    1 row created.

    SQL> insert into account( account_id, parent_account_id, account_name )
    2 values ( 160403, 27854, ‘smith’)
    3 /

    1 row created.

    SQL> commit
    2 /

    Commit complete.

    SQL> select * from account
    2 /

    ACCOUNT_ID PARENT_ACCOUNT_ID ACCOUNT_NAME
    ———- —————– ——————–
    1 young
    26653 1 radatz
    150406 26653 stanley
    150507 26653 smith
    150508 26653 nipper
    27854 1 longborg
    160403 27854 smith

    7 rows selected.

    SQL>
    SQL> /*
    DOC>1. Given an account_id, return the path of account_names from the root to the account, delimited by `.’
    DOC>*/
    SQL> column path format a50
    SQL>
    SQL> variable accnt number
    SQL>
    SQL> exec :accnt := 150406

    PL/SQL procedure successfully completed.

    SQL>
    SQL> set autotrace on explain statistics
    SQL>
    SQL> select /* get the path where the level is the deepest level */
    2 path
    3 from (
    4 select /* identify the deepest level */
    5 max(lvl) over (partition by 1) max_lvl
    6 ,lvl
    7 ,path
    8 from (
    9 select /* use heirachical query to generate the delimited path*/
    10 level lvl
    11 ,sys_connect_by_path(account_name,’.’) path
    12 from account
    13 start with account_id = :accnt
    14 connect by prior parent_account_id = account_id
    15 )
    16 ) where lvl = max_lvl
    17 /

    PATH
    ————————————————–
    .stanley.radatz.young

    Execution Plan
    ———————————————————-
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 VIEW
    2 1 WINDOW (BUFFER)
    3 2 VIEW
    4 3 CONNECT BY (WITH FILTERING)
    5 4 NESTED LOOPS
    6 5 TABLE ACCESS (FULL) OF ‘ACCOUNT’
    7 5 TABLE ACCESS (BY USER ROWID) OF ‘ACCOUNT’
    8 4 NESTED LOOPS
    9 8 BUFFER (SORT)
    10 9 CONNECT BY PUMP
    11 8 TABLE ACCESS (FULL) OF ‘ACCOUNT’

    Statistics
    ———————————————————-
    0 recursive calls
    0 db block gets
    13 consistent gets
    0 physical reads
    0 redo size
    394 bytes sent via SQL*Net to client
    503 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    8 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL>
    SQL> /*
    DOC>2. Given a path of account_names delimited by `.’, return the account_id of the account
    DOC>*/
    SQL>
    SQL> variable path varchar2(255)
    SQL>
    SQL> exec :path := ‘.stanley.radatz.young’;

    PL/SQL procedure successfully completed.

    SQL>
    SQL> select /* substring out the first account name and filter */
    2 account_id
    3 from account
    4 where account_name = substr(substr(:path,2),1,instr(substr(:path,2),’.’)-1)
    5 /

    ACCOUNT_ID
    ———-
    150406

    Execution Plan
    ———————————————————-
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 TABLE ACCESS (FULL) OF ‘ACCOUNT’

    Statistics
    ———————————————————-
    0 recursive calls
    0 db block gets
    4 consistent gets
    0 physical reads
    0 redo size
    383 bytes sent via SQL*Net to client
    503 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL>

    Regards,

  2. ddelmoli Says:

    Pretty good.

    For #1 — try to do it without the need for analytics. (Oh, and reverse the path :-)

    For #2 — Careful, try a path that ends in SMITH…

  3. Todd Barry Says:

    For #1, how about:

    sql>select ltrim(sys_connect_by_path(account_name, ‘.’), ‘.’) path
    2 from account
    3 where account_id = 150406
    4 start with parent_account_id is null
    5 connect by prior account_id = parent_account_id;

    PATH
    ——————–
    young.radatz.stanley

  4. Peter Says:

    scott@PKB> select * from account;

    ACCOUNT_ID PARENT_ACCOUNT_ID ACCOUNT_NAME
    ———- —————– ——————–
    1 young
    26653 1 radatz
    150406 26653 stanley
    150507 26653 smith
    150508 26653 nipper
    27854 1 longborg
    160403 27854 smith

    7 rows selected.

    scott@PKB> var id number
    scott@PKB> exec :id := 150508

    PL/SQL procedure successfully completed.

    SQL>select sys_connect_by_path(account_name,’.’) name
    2 from
    3 (
    4 select account_name, account_id, parent_account_id
    5 from account
    6 start with account_id = :id
    7 connect by prior parent_account_id = account_id
    8 )
    9 where account_id = :id
    10 start with parent_account_id is null
    11 connect by prior account_id = parent_account_id
    12 /

    NAME
    ————————————————————
    .young.radatz.nipper
    SQL>exec :path := ‘.young.radatz.nipper’

    PL/SQL procedure successfully completed.

    SQL>select
    2 account_id
    3 from account
    4 where substr(:path,instr(:path,’.’,-1,1)+1,length(account_name)) = account_name
    5 start with parent_account_id is null and substr(:path,2,length(account_name)) = account_name
    6 connect by prior account_id = parent_account_id
    7 and substr(:path,instr(:path,’.’,1,level)+1,length(account_name)) = account_name
    8 /

    ACCOUNT_ID
    ———-
    150508

  5. Peter Says:

    Here’s the reasoning for my two queries :-

    The first one uses the CONNECT BY syntax wíthin an inline view starting with the account_id as the root (the reverse of the hierarchy you were after). In this way, we pinpoint the part of the hierarchy we want straight away. The outer query uses the CONNECT BY syntax again to get the results in the order you required (using SYS_CONNECT_BY_PATH)

    The second query traverses the tree in the ‘right’ order from the start. However, it uses SUBSTR,INSTR and LENGTH functions in conjunction with the LEVEL pseudocolumn to compare the account_name both in the START WITH and CONNECT BY. It thereby limits the branches in the hierarchy it needs to evaluate to a minimum.

    br
    Peter

  6. Naresh Says:

    Hi,

    I tried below for question 1 – Dominic please comment. I believe it is more performant because we will scan ONLY the required accounts from the table. A very interesting question!

    Thanks, Naresh.

    SQL> select * from account;

    ACCOUNT_ID PARENT_ACCOUNT_ID ACCOUNT_NAME
    ———- —————– ——————–
    1 young
    26653 1 radatz
    150406 26653 stanley
    150507 26653 smith
    150508 26653 nipper
    27854 1 longborg
    160403 27854 smith

    This query below just to show an intermediate ‘logic test’ step:

    1 select
    2 path,
    3 instr(path, ‘.’, -1, level) offset1,
    4 instr(path, ‘.’, -1, level + 1) offset2,
    5 sys_connect_by_path(substr(path, instr(path, ‘.’, -1, level + 1) + 1,
    6 instr(path, ‘.’, -1, level) – (instr(path, ‘.’, -1, level + 1) + 1) ),
    7 ‘.’) new_path
    8 from (
    9 select sys_connect_by_path(account_name, ‘.’) || ‘.’ path, level lvl
    10 from account where parent_account_id is NULL
    11 start with account_id = 150507 connect by prior parent_account_id = account_id
    12 )
    13* connect by level /

    PATH OFFSET1 OFFSET2 NEW_PATH
    —————————————- ———- ———- —————————————-
    .smith.radatz.young. 20 14 .young
    .smith.radatz.young. 14 7 .young.radatz
    .smith.radatz.young. 7 1 .young.radatz.smith

    Final query:

    1 select
    2 path,
    3 sys_connect_by_path(substr(path, instr(path, ‘.’, -1, level + 1) + 1,
    4 instr(path, ‘.’, -1, level) – (instr(path, ‘.’, -1, level + 1) + 1) ),
    5 ‘.’) new_path
    6 from (
    7 select sys_connect_by_path(account_name, ‘.’) || ‘.’ path, level lvl
    8 from account where parent_account_id is NULL
    9 start with account_id = 150507 connect by prior parent_account_id = account_id
    10 )
    11 where level = lvl
    12* connect by level /

    PATH NEW_PATH
    —————————————- —————————————-
    .smith.radatz.young. .young.radatz.smith

    Elapsed: 00:00:00.87

    Execution Plan
    ———————————————————-
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 FILTER
    2 1 CONNECT BY (WITHOUT FILTERING)
    3 2 COUNT
    4 3 VIEW
    5 4 FILTER
    6 5 CONNECT BY (WITH FILTERING)
    7 6 NESTED LOOPS
    8 7 TABLE ACCESS (FULL) OF ‘ACCOUNT’
    9 7 TABLE ACCESS (BY USER ROWID) OF ‘ACCOUNT’
    10 6 NESTED LOOPS
    11 10 BUFFER (SORT)
    12 11 CONNECT BY PUMP
    13 10 TABLE ACCESS (FULL) OF ‘ACCOUNT’

    Statistics
    ———————————————————-
    0 recursive calls
    0 db block gets
    13 consistent gets
    0 physical reads
    0 redo size
    589 bytes sent via SQL*Net to client
    651 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    8 sorts (memory)
    0 sorts (disk)
    1 rows processed

  7. Naresh Says:

    For some reason, the previous post did not show the connect by clause in outer query completely, it should be

    connect by level

  8. Naresh Says:

    OK, seems to be some meta character issue – the last condition in outer query should be:

    connect by level “less than or equal to” lvl

  9. Narendra Says:

    For question #1:

    SQL> variable accnt number ;
    SQL> exec :accnt := 150508 ;

    PL/SQL procedure successfully completed.
    SQL> select acct from (select account_id, sys_connect_by_path(account_name, ‘.’) acct from acct connect by prior account_id = parent_account_id start with account_id = :accnt) where account_id = :accnt ;
    ACCT
    ————————————————–
    .nipper
    Execution Plan
    ———————————————————-
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 VIEW
    2 1 FILTER
    3 2 CONNECT BY (WITH FILTERING)
    4 3 NESTED LOOPS
    5 4 TABLE ACCESS (FULL) OF ‘ACCT’
    6 4 TABLE ACCESS (BY USER ROWID) OF ‘ACCT’
    7 3 NESTED LOOPS
    8 7 BUFFER (SORT)
    9 8 CONNECT BY PUMP
    10 7 TABLE ACCESS (FULL) OF ‘ACCT’

    Statistics
    ———————————————————-
    0 recursive calls
    0 db block gets
    15 consistent gets
    0 physical reads
    0 redo size
    217 bytes sent via SQL*Net to client
    276 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    3 sorts (memory)
    0 sorts (disk)
    1 rows processed

    How about this 3memory sorts compared to 7 memory sorts in earlier example ?

  10. ddelmoli Says:

    It doesn’t give the full path name…

Leave a Reply

Posting code can be a pain. To make sure your code doesn't get eaten, you may want to pre-format it first by using HTML Encoder