Experiments of Query Rewriting Algorithms (Pure and PureC)

Benchmarks

As benchmarks dedicated to existential rules are not available yet, we considered rule bases obtained by translation of description logic ontologies, more specifically DL-LiteR ontologies (which allows to compare with DL tools producing unions of conjunctive queries):

Each ontology is provided with 5 handcrafted queries.

Table 1 shows the size of these ontologies, as well as the number of hierarchical and compilable rules (see our paper at IJCAI 2015 for more details).

Table 1: Types of rules in the ontologies
Ontology # Hier. rules # Non-hier. compilable rules # Compilable rules # All rules
A 72 4 76 102
S 16 28 44 52
U 36 36 72 77
V 202 20 222 222
G 26980 836 27816 50764
O 35390 0 35390 43351

Download the full benchmark: AGOSUV-bench.tar.gz.
This archive contains the DL-Lite and Dlgp v2 versions of each ontology, as well as the associated queries in Sparql, Dlgp v2 and in the REQUIEM query syntax.

Experiments

(May 2014)

In the following, PURE denotes our basic query rewriting algorithm and PUREC its optimization with compiled rules. The output of PUREC is called a ``pivotal UCQ''; it can be unfolded into a regular UCQ (or transformed into a Datalog query, or evaluated as such, depending on the capabilities of the query engine or assumptions on the data).

Timeout was set to 10 minutes.

Impact of rule compilation on the number of rewritings

Table 2 shows the size of the output UCQ (which is the same for all systems outputing a minimal, sound and complete UCQ), the size of the pivotal UCQ computed by PUREC, the number of generated queries in each case (excluding the initial query, hence zero may occur) and the number of CQs generated by PUREC by brute unfolding of the pivotal UCQ (i.e., unfolding without elimination of redundant queries). Missing values are due to timeouts. For Q3, Q4 and Q5 on O, PURE did not end before the timeout, hence the size of the UCQ was computed with other tools (see the last table). Note that for Q4, the size of the UCQ is precisely approximated by the brute-force unfolding (i.e., no redundancy elimination is actually needed).

Table 2: Impact of rule compilation on sizes
UCQ pivotal-UCQ # generated CQs brute unfolding
PUREC PURE PUREC PUREC
A Q1 27 2 459 13 702
Q2 50 2 171 1 104
Q3 104 1 316 0 104
Q4 224 2 826 5 520
Q5 624 1 1416 0 624
S Q1 6 1 9 0 6
Q2 2 1 137 0 2
Q3 4 1 275 0 4
Q4 4 1 450 0 4
Q5 8 1 688 0 8
U Q1 2 1 1 0 2
Q2 1 1 105 0 1
Q3 4 1 42 0 4
Q4 2 1 2142 0 2
Q5 10 1 153 0 10
V Q1 15 1 14 0 15
Q2 10 1 9 0 10
Q3 72 1 117 0 72
Q4 185 1 328 0 185
Q5 30 1 59 0 30
G Q1 2 1 2 0 2
Q2 1152 1 1275 0 1152
Q3 488 5 1514 4 488
Q4 147 1 154 0 147
Q5 324 19 908 18 324
O Q1 27 20 28 21 27
Q2 1356 1264 1355 1263 1356
Q3 33887 1 - 0 33887
Q4 34733 682 - 793 34733
Q5 36612 - - - -

We find a huge gap between the sizes of the output; the pivotal UCQ is often restricted to a single CQ even when the UCQ has thousands of CQs (which also explains the gap between the numbers of generated queries).

Impact of rule compilation on rewriting times

Table 3 shows the runtimes for PURE and PUREC. Timeout (10 minutes) is denoted by TO. Time is measured by increments of 10 ms, hence 0 means less than 10 ms. All tests were performed on a DELL machine with a processor at 3.60 GHz and 16 GB of RAM.

Table 3: Impact of rule compilation on times (ms)
Rewriting time Unfolding time with redundancy elimination Total time
PUREC PUREC PURE PUREC
A Q1 10 120 180 130
Q2 0 40 90 40
Q3 10 20 170 30
Q4 0 130 280 130
Q5 0 440 1500 440
S Q1 0 0 0 0
Q2 0 0 100 0
Q3 0 0 70 0
Q4 0 0 110 0
Q5 10 10 120 20
U Q1 0 0 10 0
Q2 0 0 100 0
Q3 0 0 30 0
Q4 0 0 1500 0
Q5 0 10 20 10
V Q1 0 0 10 0
Q2 0 10 10 10
Q3 0 70 110 70
Q4 0 60 120 60
Q5 0 10 10 10
G Q1 0 10 0 10
Q2 50 570 1060 620
Q3 70 190 1020 260
Q4 10 0 20 10
Q5 30 60 890 90
O Q1 130 10 440 140
Q2 1110 760 1160 1870
Q3 90 557900 TO 557990
Q4 430 TO TO TO
Q5 TO TO TO TO

We see that PUREC remains faster than PURE when we include the time required to unfold the pivotal UCQ into a UCQ, except for Q2 on O, which comes from the fact that the pivotal UCQ is almost as large as the UCQ. Almost all the unfolding time is actually spent in checking redundancies (in the current implementation, this test is performed at the end by a pairwise comparison of queries).

Comparaison with other query rewriting tools

We compared Pure and PureC (version of April 2014) with the following query rewriting tools:

Nyaya
version of October 2013
processes general existential rules (as PURE and PUREC)
tree-witness rewriting (OnTop)
version of April 2014 on www.dcs.bbk.ac.uk/~roman/tw-rewriting/, run in Full mode
dedicated to DL-Lite
Rapid
version of May 2014 on www.image.ece.ntua.gr/~gstam/
dedicated to DL-Lite
Requiem
version of April 2014 on www.cs.ox.ac.uk/projects/requiem/home.html, run in optimized Full modality
dedicated to DL-Lite
Iqaros
version v0.2 on code.google.com/p/iqaros/
dedicated to DL-Lite

The reported runtime is computed internally by each tool. The runtime of PURE and PUREC is the CPU time (computed with the getCurrentThreadCpuTime() method from the ThreadMXBean class; it was measured by increments of 10 ms by the OpenJDK 6 Java Runtime Environment).

Note that we do not include here experiments with tools that do not provide time measurements or gave erroneous results.

Table 4: Runtime comparison
PURE PUREC PUREC to UCQ Nyaya tw-rew Rapid Requiem Iqaros
A Q1 180 10 130 1122 12 18 264 50
Q2 90 0 40 862 11 23 105 59
Q3 170 10 30 2363 9 34 134 190
Q4 280 0 130 5557 12 48 252 136
Q5 1500 0 440 33206 10 93 467 571
S Q1 0 0 0 4 5 7 6 5
Q2 100 0 0 4 7 9 164 17
Q3 70 0 0 46 8 13 443 79
Q4 110 0 0 7 8 12 674 65
Q5 120 10 20 8 8 15 5981 296
U Q1 10 0 0 8 8 6 5 13
Q2 100 0 0 4 6 9 128 21
Q3 30 0 0 12 8 7 211 47
Q4 1500 0 0 6 8 13 1091 57
Q5 20 0 10 10 8 15 2943 93
V Q1 10 0 0 13 5 9 10 10
Q2 10 0 10 51 9 5 10 13
Q3 110 0 70 21 8 25 62 22
Q4 120 0 60 28 8 32 136 35
Q5 10 0 10 22 10 26 73 42
G Q1 0 0 10 - 5 5 47 49
Q2 1060 50 620 - 11 74 209048 5864
Q3 1020 70 260 - 21 59 259103 9184
Q4 20 10 10 - 6 10 190250 771
Q5 890 30 90 - 26 40 238459 7401
O Q1 440 130 140 - 19 23 3447 6673
Q2 1160 1110 1870 - 578 953 21786 27818
Q3 TO 90 557990 - 77 611 TO TO
Q4 TO 430 TO - 1238 14698 TO 139980
Q5 TO TO TO - TO 562223 TO TO

We see that PUREC (with or without unfolding) scales well on large DL-Lite ontologies (except for extreme cases which are difficult for all tools, see Ontology O). We emphasize that, being devoted to general existential rules, it does not exploit the specificities of DL-Lite. See our paper at IJCAI 2015 for further comments on these results.