DLGP: An extended Datalog Syntax (Version 2.1)

Download the PDF version

Abstract:

This document specifies the version 2.1 of dlgp, a textual format for the existential rule / Datalog± framework. This format is meant to be an exchange format at once human-friendly, concise and easy to parse. It can be seen as an extension of the commonly used format for plain Datalog. It is called "dlgp" for "Datalog Plus". A file may contain four kinds of knowledge elements: facts, existential rules, negative constraints and conjunctive queries.

Change logs

v2.1

v2.0

Introduction

The dlgp format encodes existential rules (and other constructs that can be seen as special kinds of existential rules: facts, negative constraints and conjunctive queries). A basic logical notion is that of an atom, which is composed of a predicate (or relation name) and arguments called terms. Terms can be variables or constants. Predicates can be of any arity greater or equal to one.

In their simplest form, predicates and terms are encoded by string identifiers built with letters from the Latin aphabet, digits and the underscore character (_). As usually in Datalog, variables begin with an uppercase letter, while constants and predicates begin with a lowercase letter. To make the dlgp format compatible with data from the Semantic Web, more elaborate kinds of identifiers are mandatory. The version 2.x introduces the notions of IRI and literal, according to Turtle format (http://www.w3.org/TR/turtle/). Specifically:

For instance, here are different ways of writing a predicate:

All these forms are usable to encode a constant. Moreover, constants can be described by literals, e.g., -5.1, true, "constant", or any other Turtle literal. Note that the tokens true and false are interpreted as Boolean literals and not as IRIs.

The special predicate = encodes equality between terms (e.g., X = Y).

Sets of such atoms are logically interpreted as conjunctions of atoms. The following four kinds of knowledge elements are built upon sets of atoms:

A dlgp document is any sequence of such elements. The file name has extension .dlp or .dlgp. Characters are assumed to be encoded in UTF-8. Analysis directives are introduced by the symbol @ and comments by the symbol %.

Syntax of the dlgp format

In the following the syntax is specified by a grammar in BNF style: non-terminal symbols are enclosed in angle brackets <>, terminal symbols are in bold font, the | symbol indicates a choice, parts enclosed in square brackets ([]) are optional, choice1..choice1choicen indicates a choice within an interval; parts enclosed in braces can be repeated from 0 to n times ({repeated-pattern}*) or from 1 to n times ({repeated-pattern-at-least-once}+).

Comments

Comments are introduced by the symbol % outside Turtle tokens (i.e., outside <IRIREF> and <literal>, which may themselves contain the symbol %). A comment ends at the end of the same line or at the end of the file. Moreover, comments introduced by %% can be interpreted in a specific way. Our parser generates an event when such a specific comment is read, which can be exploited by event listeners.

Parsing Information

Please refer to Turtle syntax for building an absolute IRI from a relative IRI (and @base directives) or from a prefixed name (and @prefix directives). Note that the directive base may occur at most once. Similarly, it is not possible to successively assign several IRIREF to the same prefix (@prefix directive). These two directives may occur only in the head of the file.

A <l-ident> token is seen as a relative IRI, and the corresponding absolute IRI is obtained by adding the IRI of the @base directive in front of the <l-ident> token.

Elements used to define tokens

<uppercase-letter> ::= A..Z
<lowercase-letter> ::= a..z
<digit> ::= 0..9
<underscore> ::= _
<letter> ::= <uppercase-letter> | <lowercase-letter>
<simple-char> ::= <letter> | <digit> | <underscore>
<PN_CHARS> see turtle grammar
<space> ::= #x20 /* #x20 = space character */

Tokens

<u-ident> ::= <uppercase-letter> {<simple-char>}*
<l-ident> ::= <lowercase-letter> {<simple-char>}*
<label> ::= {<PN_CHARS> | <space>}*

Global Grammar

<document> ::= <header> <body>
<header> ::= { <base> | <prefix> | <top> | <una> }*
<base> ::= @base <IRIREF>
<prefix> ::= @prefix <PNAME_NS> <IRIREF>
<top> ::= @top <l-ident> |
  @top <IRIREF>
<una> ::= @una
<body> ::= {<statement>}* |
  {<section>}*
<section> ::= @facts {<fact>}* |
  @rules {<rule>}* |
  @constraints {<constraint>}* |
  @queries {<query>}*
<statement> ::= <fact> | <rule> | <constraint> | <query>
<fact> ::= [ [<label>] ] <not-empty-conjunction>.
<rule> ::= [ [<label>] ] <not-empty-conjunction> :- <conjunction>.
<constraint> ::= [ [<label>] ] ! :- <not-empty-conjunction>.
<query> ::= [ [<label>] ] ? [(<term-list>)] :- <conjunction>.
<conjunction> ::= [<not-empty-conjunction>]
<not-empty-conjunction> ::= <atom> {, <atom>}*
<atom> ::= <std-atom> | <equality>
<equality> ::= <term> = <term>
<std-atom> ::= <predicate>(<not-empty-term-list>)
<term-list> ::= [<not-empty-term-list>]
<not-empty-term-list> ::= <term> {, <term>}*
<term> ::= <variable> | <constant>
<predicate> ::= <l-ident> | <IRIREF> | <PrefixedName>
<variable> ::= <u-ident>
<constant> ::= <l-ident> | <IRIREF> | <PrefixedName> | <literal>

The symbol @ is used to introduce several kinds of annotations:

To encode other kinds of information about the knowledge base, it is recommended to use specific comments introduced by %%.

Note on the scope of variable identifiers.

While the scope of a constant or a predicate identifier is the whole document, the scope of a variable is local to a <statement>. Thus two different facts, rules or constraints actually do not share any variable (more precisely, variables with the same name in different statements are each bound by their own quantifier).

Example:

p(X,a), q(X,Y).
q(X,b).
is logically interpreted as ∃X∃Y (p(X,a) ∧ q(X,Y)) ∧ ∃X q(X,b)

while:

  p(X,a), q(X,Y), q(X,b).
is logically interpreted as ∃X∃Y (p(X,a) ∧ q(X,Y) ∧ q(X,b)).

Examples

A simple example using annotations


    @facts
    [f1] p(a), relatedTo(a,b), q(b).
    [f2] p(X), t(X,a,b), s(a,z).
    t(X,a,b), relatedTo(Y,z).
    @constraints
    [c1] ! :- relatedTo(X,X).
    [constraint_2] ! :- X=Y, t(X,Y,b).
    ! :- p(X), q(X).
    @rules
    [r1] relatedTo(X,Y) :- p(X), t(X,Z).
    s(X,Y), s(Y,Z) :- q(X),t(X,Z).
    [rA 1] p(X) :- q(X).
    Y=Z :- t(X,Y),t(X,Z).
    s(a) :- .
    s(Z) :- a=b, X=Y, X=a, p(X,Y).
    @queries
    [q1] ? (X) :- p(X), relatedTo(X,Z), t(a,Z).
    [Query2] ? (X,Y) :- relatedTo(X,X), Y=a.
    ? :- p(X).
    ?() :- .
  

An example which illustrates the use of IRIs and literals

The three first facts have the same interpretation.

    @base <http://www.example.org/>
    @prefix ex: <http://www.example.org/>
    @prefix inria-team: <https://team.inria.fr/>
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#>
    @facts
    % use of @base
    [f 1] <Pred>(1.5).
    % use of @prefix
    [f 2] ex:Pred("1.5"^^xsd:decimal).
    % absolute IRIs
    [f 3] <http://www.example.org/Pred>("1.5"^^<http://www.w3.org/2001/XMLSchema#decimal>).
    % use of @base for the predicate and @prefix for the argument
    [f 4] team(inria-team:graphik).
  

A syntactically correct but not human-friendly file


    [f1] p(a), relatedTo(a,b), q(b). [f2] p(X), t(X,a,b), s(a,z).
    [c1] !:-relatedTo(X
    % this is a comment
    ,X).
    [q1]?(X) :- p(X), relatedTo(X,Z), t(a,Z).
    t(X,a,b).
    [r1] relatedTo(X,Y) :- p(X), t(X,Z).
    [constraint_2] ! :- X=Y, t(X,Y,b).
    s(X,Y), s(Y,Z) :- % This is another comment
    q(X),t(X,Z).
    [rA_1] p(X)
    :-
    q(X)
    . Y=Z :- t(X,Y),t(X,Z).
    [Query2]
    ? (X,Y) :- relatedTo(X,X), Y=a.
    s(Z) :- a=b, X=Y, X=a, p(X,Y).
    !:- p(X), q(X).
    relatedTo(Y,z).?    :- p(X).