Recursive Query Used for Transitive Closure

Recursive query used for transitive closure

You can simplify in several places (assuming acct_id and parent_id are NOT NULL):

WITH RECURSIVE search_graph AS (
SELECT parent_id, ARRAY[acct_id] AS path
FROM account

UNION ALL
SELECT g.parent_id, sg.path || g.acct_id
FROM search_graph sg
JOIN account g ON g.acct_id = sg.parent_id
WHERE g.acct_id <> ALL(sg.path)
)
SELECT path[1] AS child
, path[array_upper(path,1)] AS parent
, path
FROM search_graph
ORDER BY path;
  • The columns acct_id, depth, cycle are just noise in your query.
  • The WHERE condition has to exit the recursion one step earlier, before the duplicate entry from the top node is in the result. That was an "off-by-one" in your original.

The rest is formatting.

If you know the only possible circle in your graph is a self-reference, we can have that cheaper:

WITH RECURSIVE search_graph AS (
SELECT parent_id, ARRAY[acct_id] AS path, acct_id <> parent_id AS keep_going
FROM account

UNION ALL
SELECT g.parent_id, sg.path || g.acct_id, g.acct_id <> g.parent_id
FROM search_graph sg
JOIN account g ON g.acct_id = sg.parent_id
WHERE sg.keep_going
)
SELECT path[1] AS child
, path[array_upper(path,1)] AS parent
, path
FROM search_graph
ORDER BY path;

SQL Fiddle.

Note there would be problems (at least up to pg v9.4) for data types with a modifier (like varchar(5)) because array concatenation loses the modifier but the rCTE insists on types matching exactly:

  • Surprising results for data types with type modifier

Recursive query challenge - simple parent/child example

With help from RhodiumToad on #postgresql, I've arrived at this solution:

WITH RECURSIVE node_graph AS (
SELECT ancestor_node_id as path_start, descendant_node_id as path_end,
array[ancestor_node_id, descendant_node_id] as path
FROM node_relations

UNION ALL

SELECT ng.path_start, nr.descendant_node_id as path_end,
ng.path || nr.descendant_node_id as path
FROM node_graph ng
JOIN node_relations nr ON ng.path_end = nr.ancestor_node_id
)
SELECT * from node_graph order by path_start, array_length(path,1);

The result is exactly as expected.

Transitive-Closure Table Restructure

Yes, if you want to store the transitive closure, you need to store all paths.

For some operations, it's even helpful to store the path of length 0:

| component | component_of | depth |
|-----------|--------------|-------|
| D | D | 0 |
| D | A | 1 |
| D | B | 2 |
| C | C | 0 |
| B | B | 0 |
| B | A | 1 |
| A | A | 0 |

In MySQL 8.0, none of this will be needed. We'll finally be able to use recursive queries.

How to CONSTRUCT a property's transitive closure in SPARQL?

Problems in your code

So I tried to make subquery like this: …

I'm not sure what subquery syntax you're trying to use, but it's not legal. Subqueries are just wrapped in braces. E.g.,

select ?s where {
{ select ?x where { ... } }
}

The query you've written isn't legal.

But I obtained an error.

If you don't show us the error, we certainly can't help in fixing it. I expect that you got a syntax error. You can validate your query using sparql.org's query validator. That probably won't help too much in this case, though, because you can only use select queries in subqueries. (It would be very helpful, though, if construct queries were supported for subqueries.)

Transitive closures for construct queries

In general, you can't do arbitrary recursion in SPARQL. However, in the specific case that you've got, you can use property paths in the pattern to construct the transitive closure of a pattern. E.g.,

construct { ?a :partOf ?b }
where { ?a :partOf+ ?b }

This says that if there's a path of non-zero length from ?a to ?c using just :partOf links, then include a triple ?a :partOf ?b in the constructed output.

hierarchical data in a database: recursive query vs. closure tables vs. graph database

The whole closure table is redundant if you can use recursive queries :)

I think it's much better to have a complicated recursive query that you have to figure out once than deal with the extra IO (and disk space) of a separate table and associated triggers.

I have done some simple tests with recursive queries in postgres. With a few million rows in the table queries were still < 10ms for returning all parents of a particular child. Returning all children was fast too, depending on the level of the parent. It seemed to depend more on disk IO fetching the rows rather than the query speed itself. This was done single user, so not sure how it would perform under load. I suspect it would be very fast still if you can also hold most of the table in memory (and setup postgres correctly). Clustering the table by parent id also seemed to help.

detect cycles in a graph in SQL using recursive common table expression

CREATE TABLE #myEdge (
ID INT IDENTITY(1,1) PRIMARY KEY,
NodeIDFrom INT,
NodeIDTo INT
)

INSERT INTO #myEdge(NodeIDFrom, NodeIDTo) VALUES (4, 5),(5, 6),(6, 4);

DECLARE @rootNode AS integer = 4;

--compute the transitive closure from this root. column Niveau holds the recursion nesting level.

WITH cte_transitive_closure(rootNode, NodeFrom, NodeTo, Niveau)
AS (
SELECT @rootNode, NULL, @rootNode, 0

UNION ALL

SELECT rootNode, e.NodeIDFrom, e.NodeIDTo, Niveau+1
FROM cte_transitive_closure AS cte
JOIN #myEdge AS e ON cte.NodeTo=e.NodeIdFrom
WHERE Niveau < 99
)
SELECT *
INTO #transitive_closure
FROM cte_transitive_closure;

SELECT * FROM #transitive_closure;

--starting from root as a reached target, move back in the trace until the root is hit again.

WITH cte_cycle(NodeFrom, NodeTo, Cycle)
AS (
SELECT @rootNode,NULL,0

UNION ALL

SELECT t.NodeFrom, t.NodeTo, 0
FROM cte_cycle AS cte
JOIN #transitive_closure AS t ON cte.NodeFrom = t.NodeTo
WHERE t.NodeFrom != @rootNode AND Cycle=0

UNION ALL

SELECT t.NodeFrom, t.NodeTo, 1
FROM cte_cycle AS cte
JOIN #transitive_closure AS t ON cte.NodeFrom = t.NodeTo
WHERE t.NodeFrom = @rootNode AND Cycle=0
)
SELECT DISTINCT * FROM cte_cycle;

result set:

NodeFrom -> NodeTo (Cycle)

4 -> NULL (0)

4 -> 5 (1)

5 -> 6 (0)

6 -> 4 (0)


Related Topics



Leave a reply



Submit