Extensiones de bases de datos relacionales y deductivas ...

UNIVERSIDAD COMPLUTENSE DE MADRID

FACULTAD DE INFORMÁTICA

Departamento de Sistemas Informáticos y Computación

TESIS DOCTORAL

Extensiones de bases de datos relacionales y deductivas: fundamentos

teóricos e implementación

MEMORIA PARA OPTAR AL GRADO DE DOCTOR

PRESENTADA POR

Gabriel Aranda López

Directores

Susana Nieva Soto

Fernando Sáenz Pérez Jaime Sánchez Hernández

Madrid, 2016

© Gabriel Aranda López, 2015

Extensiones de bases de datosrelacionales y deductivas:

fundamentos teóricos e implementación

TESIS DOCTORAL

Departamento de Sistemas Informáticos y Computación,Facultad de Informática,

Universidad Complutense de Madrid

Autor:Gabriel Aranda López

Directores:Susana Nieva Soto

Fernando Sáenz PérezJaime Sánchez Hernández

Tesis doctoral en formato publicaciones presentada por Gabriel Aranda López en el Depar-tamento de Sistemas Informáticos y Computación de la Universidad Complutense de Madridpara la obtención del título de doctor en Ingeniería Informática.

Terminada en Madrid el 20 de Octubre de 2015.

I

Nube de palabras

II

Índice general

Abstract V

Resumen VII

Agradecimientos IX

I Memoria 1

1. Introducción 31.1. Motivación . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2. Objetivos y aportaciones: de HH:(C) a HR-SQL . . . . . . . . . . . . . . . . 101.3. Organización del trabajo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4. Publicaciones asociadas a la tesis . . . . . . . . . . . . . . . . . . . . . . . . 121.5. Estado del arte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5.1. Bases de datos deductivas . . . . . . . . . . . . . . . . . . . . . . . . 131.5.2. Bases de datos con restricciones . . . . . . . . . . . . . . . . . . . . 171.5.3. Bases de datos deductivas con razonamiento hipotético . . . . . . . . 191.5.4. Bases de datos relacionales, uso de la recursión y el razonamiento

hipotético . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2. Negación, hipótesis y cuantificadores en bases de datos deductivas con restric-ciones 232.1. Introducción . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2. Fundamentos teóricos de HH:(C) . . . . . . . . . . . . . . . . . . . . . . . . 35

2.2.1. Semántica de pruebas . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.2. Semántica de punto fijo . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3. El sistema HH:(C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.3.1. Fases de cómputo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.3.2. Consultas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3.3. Implementación de los resolutores . . . . . . . . . . . . . . . . . . . 492.3.4. Funciones de agregación . . . . . . . . . . . . . . . . . . . . . . . . . 502.3.5. Restricciones de integridad . . . . . . . . . . . . . . . . . . . . . . . 522.3.6. Cómputo de la semántica de punto fijo . . . . . . . . . . . . . . . . . 542.3.7. El caso de la implicación . . . . . . . . . . . . . . . . . . . . . . . . 55

III

3. Recursión extendida y razonamiento hipotético en sistemas de bases de datosrelacionales 613.1. Introducción . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2. Extendiendo SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2.1. El lenguaje de consulta . . . . . . . . . . . . . . . . . . . . . . . . . 673.2.2. El lenguaje de definición de vistas . . . . . . . . . . . . . . . . . . . 68

3.3. Fundamentos teóricos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3.1. Semántica para las bases de datos . . . . . . . . . . . . . . . . . . . 703.3.2. La semántica de las consultas . . . . . . . . . . . . . . . . . . . . . 733.3.3. La semántica de las vistas . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4. El sistema R-SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.4.1. Cómputo de las bases de datos R-SQL . . . . . . . . . . . . . . . . . 803.4.2. El algoritmo de estratificación . . . . . . . . . . . . . . . . . . . . . 83

3.5. El sistema HR-SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.5.1. Estructura del sistema . . . . . . . . . . . . . . . . . . . . . . . . . . 853.5.2. Cálculo del punto fijo . . . . . . . . . . . . . . . . . . . . . . . . . . 863.5.3. Vistas y consultas en HR-SQL . . . . . . . . . . . . . . . . . . . . . 88

3.6. Análisis de rendimiento . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4. Conclusiones y trabajo futuro 97

Bibliografía 101

II Publicaciones 113

5. Publicaciones asociadas al segundo capítulo 1155.1. Implementing a Fixpoint Semantics for a Constraint Deductive Database based

on Hereditary Harrop Formulas . . . . . . . . . . . . . . . . . . . . . . . . . 1165.2. Incorporating Integrity Constraints to a Deductive Database System . . . . . . 1285.3. An Extended Constraint Deductive Database: Theory and Implementation . . 140

6. Publicaciones asociadas al tercer capítulo 1756.1. Formalizing a Broader Recursion Coverage in SQL . . . . . . . . . . . . . . . 1766.2. Incorporating Hypothetical Views and Extended Recursion into SQL Database

Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1926.3. R-SQL: An SQL Database System with Extended Recursion. . . . . . . . . . 206

IV

Abstract

In this work we present some contributions to the field of database languages. We considerthree general goals:

1. Improve the expressiveness of current database languages.

2. Develop formal semantics for our proposal of extended database languages.

3. Implement these semantics into practical database systems.

We have followed these steps moving in different database fields. On the one hand, inthe deductive database field, we have proposed HH:(C) which extends deductive databaselanguages allowing hypothetical queries and universal quantifications. On the other hand, wehave moved to the relational database field and proposed HR-SQL that incorporates hypot-hetical queries as well as recursive definitions aimed to overcome some expressive limitationsof standard database languages. Next, we introduce both proposals.

The scheme of Hereditary Harrop formulas with constraints, HH(C), was proposed as abasis for Constraint Logic Programming languages. In the same way that Datalog emergesfrom logic programming as a deductive database language, such formulas can support avery expressive framework for constraint deductive databases, incorporating the intuitionisticimplication that allows hypothetical queries and the use of quantifiers even in the constraintlanguage. As negation is needed in the database field, HH(C) is extended with negation to getHH:(C). The second chapter of this work presents the theoretical foundations of HH:(C) andan implementation that shows the viability and expressive power of the proposal. Moreover,the language is designed in a flexible way in order to support different constraint systems.The implementation includes several domains, and it also supports aggregates and strongintegrity constraints as found in database languages. The formal semantics of the language isdefined by a proof-theoretic calculus, and for the operational mechanism we use a stratifiedfixpoint semantics, which is proved to be sound and complete w.r.t. the former. Hypotheticalqueries and aggregates require a more involved stratification than the common one used inDatalog. The resulting fixpoint semantics constitutes a suitable foundation for the systemimplementation.

The Structured Query Language (SQL) is one of the most recognized and used databaselanguages. It can be considered as a declarative programming language, but in its origin itlacked recursion. Although nowadays there are SQL database systems that partially supportrecursion, current database systems supporting recursive SQL impose restrictions on queriessuch as linearity, and do not allow mutual recursion. In addition, those systems are not foundedon a formal semantics.

In the third chapter of this work we introduce the database language and prototype R-SQLthat is an approach to overcome those drawbacks. Other useful aspect that has been studied

V

in the field of deductive databases is the use of hypothetical queries. We present a system,called HR-SQL, that enhances R-SQL in two main aspects.

On the one hand, it incorporates hypothetical queries as well as recursive and hypotheticalview definitions, in a novel way which cannot be found in any other SQL system. In particular,allowing both positive and negative assumptions. All these features have been founded byextending the fixpoint semantics of R-SQL. On the other hand, the implementation of HR-SQL we have developed improves the efficiency of the previous prototype and is integrated ina commercial DBMS. We have also conducted some experiments to analyze its performance.

Keywords

Deductive Databases, Constraints, Hereditary Harrop Formulas, Fixpoint Semantics, Relatio-nal Databases, SQL, Recursion, Hypotheses.

UNESCO Categories

120312 Data banks

120323 Programming Languages

VI

Resumen

En esta memoria hacemos contribuciones dentro del campo de los lenguajes de bases dedatos. Nos hemos propuesto tres objetivos fundamentales:

1. Mejorar la expresividad de los lenguajes de bases de datos actuales.

2. Desarrollar semánticas formales para nuestras propuestas de lenguajes de bases de datosextendidos.

3. Llevar a cabo la implementación de las semánticas anteriores en sistemas de bases dedatos prácticos.

Hemos conseguido estos tres objetivos en distintas áreas dentro de las bases de datos. Porun lado, en el campo de las bases de datos deductivas, proponemos HH:(C). Este lenguajeextiende las capacidades de los lenguajes de bases de datos deductivos con restricciones da-do que permite consultas hipotéticas y cuantificación universal. Por otro lado, utilizamos elestudio dentro de las bases de datos deductivas y lo aplicamos a las bases de datos relacio-nales. En concreto proponemos HR-SQL que incorpora consultas hipotéticas y definicionesrecursivas no lineales y mutuamente recursivas. La idea tras esta propuesta es superar al-gunas limitaciones expresivas del lenguaje estándar de definición de bases de datos SQL. Acontinuación introducimos ambas aproximaciones.Las fórmulas de Harrop hereditarias con restricciones, HH(C), se han usado como base paralenguajes de programación lógica con restricciones. Al igual que la programación lógica dasoporte a lenguajes de bases de datos deductivas como Datalog (con restricciones), este marcose usa como base para un sistema de bases de datos deductivas que mejora la expresividad delos sistemas aparecidos hasta el momento.

En el segundo capítulo de esta memoria se muestran los resultados teóricos que funda-mentan el lenguaje HH:(C) y una implementación concreta de este esquema que demuestrala viabilidad y expresividad del esquema. Las principales aportaciones con respecto a Datalogson la incorporación de la implicación intuicionista, que permite formular hipótesis, y el usode cuantificadores incluso en el lenguaje de restricciones. El sistema está diseñado de formaque soporta diferentes sistemas de restricciones. La implementación incluye varios dominiosconcretos y también funciones de agregación y restricciones de integridad que son habitua-les en otros lenguajes de bases de datos relacionales. El significado del lenguaje se definemediante una semántica de pruebas y el mecanismo operacional se define mediante una se-mánica de punto fijo que es correcta y completa con respecto a la primera. Para el cómputode las consultas hipotéticas y de las funciones de agregación se hace uso de una noción deestratificación más compleja que la que usa Datalog. La semántica de punto fijo desarrolladaconstituye un marco apropiado que lleva a la implementación de un sistema concreto.

El lenguaje de consultas estructurado SQL es el lenguaje estándar de definición y consultade bases de datos relacionales. Se trata de un lenguaje declarativo que carecía de recursión

VII

en sus orígenes. Sin embargo, hoy en día los lenguajes de bases de datos basados en SQLsoportan la recursión de forma parcial imponiendo algunas restricciones como la linealidadde las definiciones recursivas y no permitiendo la recursión mutua. Además estas extensionesno están integradas en las semánticas disponibles para SLQ.

En el tercer capítulo de esta memoria proponemos el lenguaje y el sistema R-SQL. Estaaproximación supera las limitaciones de definiciones recursivas del estándar. Además hemosdotado al lenguaje inicial de un lenguaje de definición de vistas y de un lenguaje de consultapropios que permiten razonamiento hipotético, con lo que surge el lenguaje HR-SQL. Estesegundo lenguaje mejora R-SQL en dos aspectos.

En primer lugar, incorpora hipótesis en vistas y consultas permitiendo razonamiento hi-potético con suposiciones positivas y negativas. El fundamento semántico de HR-SQL estáinspirado en la investigación para HH:(C). Por otro lado, se ha llevado a cabo una mejora dela eficiencia del cálculo del punto fijo en el sistema que se presenta como una capa superiorsobre los sistemas de bases de datos relacionales existentes. Finalmente, presentamos losresultados de una comparativa de la eficiencia del sistema con otros sistemas de bases dedatos actuales.

Palabras clave

Bases de datos deductivas, restricciones, fórmulas hereditarias de Harrop, semántica de puntofijo, bases de datos relacionales, SQL, recursión, hipótesis.

Códigos UNESCO

120312 Bancos de datos

120323 Lenguajes de Programación

VIII

Agradecimientos

Quisiera agradecer a Francisco Javier López Fraguas por darme la oportunidad de trabajaren el Grupo de Programación Declarativa y dedicarme a la investigación durante los primerosaños de mi vida profesional.

Gracias también a mis directores Susana, Fernando y Jaime por introducirme en el mundode la investigación y por la dedicación mostrada en sus explicaciones y revisiones que hanpermitido realizar este trabajo.

La presente tesis se enmarca dentro del trabajo desarrollado en el Grupo de Programa-ción Declarativa de la Universidad Complutense de Madrid (grupo 910502 del catálogo degrupos reconocidos por la UCM) y ha contado con el apoyo de los siguientes proyectos deinvestigación:

PROMETIDOS-CM. Programa Métodos Rigurosos de Desarrollo de Software de laComunidad de Madrid (S-2009/TIC-1465).

FAST-STAMP. Proyecto Foundations and Applications of declarative Software Tech-nologies del Ministerio de Ciencia e Innovación (TIN2008-06622-C03).

CAVI-ART. Proyecto Computer Assisted ValIdation by Analysis, annotation, pRoof, andTesting del Ministerio de Economía y Competitividad(TIN2013-44742-C4-3-R).

NGREENS Proyecto Next-Generation Energy-Efficient Secure Software Software-CM dela Comunidad de Madrid (S2013/ICE-2731).

Además se ha contado con las ayudas al grupo de investigación mediante las convocatoriasde referencias UCM-BSCH-GR35/10-A-910502 y UCM-BSCH-GR3/14-910502.

IX

Parte I

Memoria

1

Capítulo 1

Introducción

Las bases de datos son una componente esencial de cualquier negocio o actividad empresa-rial relacionada con banca, enseñanza, telecomunicaciones o comercio, entre otros ejemplos[110, 122]. Están presentes habitualmente en nuestra actividad cotidiana: cuando navegamospor la red o hacemos compras online se está accediendo o actualizando una base de datosaunque no siempre seamos conscientes de ello.

Para gestionar una gran cantidad de información, un sistema gestor de bases de datos rela-cionales (en adelante SGBDR) debe proporcionar al usuario dos herramientas fundamentales.En primer lugar una estructura para almacenar los datos de forma ordenada. En segundolugar un mecanismo de consulta y manipulación de datos sencillo y eficiente. En esta te-sis estudiamos diferentes tipos de bases de datos: bases de datos relacionales (en adelanteBDR), las bases de de datos deductivas (en adelante BDD) y, dentro de las segundas, noscentraremos en las bases de datos con restricciones. En este capítulo introducimos algunasideas generales de las mismas.

Los SGBDR han sido objeto de estudio durante más de cuatro décadas [28, 92, 92, 36].El SGBDR debe proporcionar al usuario un lenguaje de base de datos que permita definirla información extensionalmente en forma de tablas e intensionalmente en forma de vistas.Además debe proporcionar un lenguaje de consulta que permita acceder a una base de datospara recuperar información. Dada la gran cantidad de información que contienen las bases dedatos actuales (como la de un banco por ejemplo) es importante diseñar lenguajes que seencarguen de esta tarea de forma eficiente. También es importante tratar de aportar la mayorexpresividad posible a estos lenguajes para permitir al usuario recuperar información de formasintética.

Los fundamentos de las BDR los encontramos en el modelo relacional que incluye comolenguajes formales el álgebra relacional [28], el cálculo relacional de tuplas [26, 25] y elcálculo relacional de dominios [27]. El modelo relacional es el más utilizado en la actualidadpara implementar bases de datos. El álgebra relacional (en adelante AR) fundamenta el len-guaje estructurado de consultas (en adelante SQL por sus siglas en inglés) que es reconocidocomo lenguaje de bases de datos estándar por el Instituto Nacional estadounidense de están-dares (en adelante ANSI por sus siglas en inglés) y también por la Organización Internacionalpara la Estandarización (en adelante OSI por sus siglas en inglés) [36]. Sin embargo, estemodelo se ha mostrado insuficiente en la formulación de consultas. Un defecto importante esel uso limitado de recursión dado que no permite expresar consultas como el cierre transitivode un grafo. Este tipo de consultas puede expresarse en la lógica de predicados y en sistemasde BDD [102]. En la actualidad la mayoría de los SGBDR que utilizan SQL no se ajustan al

3

estándar dado que permiten duplicados por ejemplo. Sin embargo, sí lo hacen en cuanto altratamiento de la recursión restringiéndola al caso lineal y no permitiendo recursión mutua.Tan solo algunos sistemas en el entorno académico que manejan SQL [68, 103] permiten unuso más general de la recursión.

La aplicación de la programación lógica (en adelante PL) al campo de las bases de datosda lugar a las BDD [89, 11, 62, 92, 1]. Una base de datos deductiva incluye mecanismos paradefinir reglas que pueden deducir información adicional a partir de unos hechos almacenados.Las reglas se especifican mediante un lenguaje declarativo y posteriormente haciendo usode un motor de inferencia se deduce nueva información. La mayoría de los sistemas de BDDutilizan el lenguaje Datalog [103] que surge como una extensión de Prolog para bases de basesde datos y que sigue siendo un referente en este campo [11, 126, 39, 20, 99, 100]. Datalogutiliza técnicas de estratificación para incorporar negación y recursión en sus bases de datos.Las BDD se aplican en diferentes áreas de la ciencia como la educación y la inteligenciaartificial. Podemos encontrar un gran número de sistemas de BDD como XSB [104], bddbddb[65], LDL++ [3], DES [103], ConceptBase [54], QL [90], DLV [68], LogiQL [41] y 4QL [71].

La investigación en bases de datos con restricciones [64, 95] comenzó con el objetivo deextender la expresividad de las BDD al igual que la programación lógica con restricciones(en adelante CLP por sus siglas en inglés) extiende a la PL [52]. En este campo se avanzósobre todo centrándose en los lenguajes de consulta sin recursión y con restricciones. Estoslenguajes llevaron a la investigación de problemas interesantes y derivaron en aplicaciones quese usan en muchas áreas como representación de la información espacial [46], la representaciónde datos espacio-temporales [105] y la bioinformática [94].

El esquema HH(C) [66] se propuso originalmente como un lenguaje de programación lógicaextendido con restricciones y cuantificadores. Este lenguaje está basado en la lógica intui-cionista [72] y utiliza las fórmulas de Harrop hereditarias (HH) que fundamentan –-Prolog[74] junto con restricciones que pertenecen a un sistema de restricciones C que parametrizael esquema. HH(C) mejora la expresividad de CLP dado que permite objetivos que incluyendisyunciones, implicaciones, y cuantificadores universales y existenciales.

Situándonos en el contexto de los lenguajes de bases de datos, el objetivo fundamentalde esta tesis es añadir, de forma bien fundamentada, expresividad a los lenguajes de consultade bases de datos deductivas y relacionales. Otro de los objetivos de esta tesis es trasladarlos fundamentos semánticos estudiados a la implementación de sistemas de bases de datosconcretos. En particular, presentamos los trabajos llevados a cabo para implementar dossistemas: uno deductivo, basado en el lenguaje HH:(C) (Hereditary Harrop formulas withNegation and Constraints) y uno relacional, basado en el lenguaje HR-SQL (Hypotheticaland Recursive Structured Query Language) que aparece por primera vez en las publicacionesasociadas a esta tesis.

HH:(C) [83] surge con la idea de adaptar HH(C) como lenguaje de bases datos, para lo quees necesario incorporar la negación al lenguaje para dar soporte a operaciones entre conjuntoscomo la diferencia y conseguir así completitud con respecto al AR. Lo novedoso de estaaproximación es la aplicación de los elementos expresivos que provienen de la lógica HH a lasbases de datos. La implicación anidada permite formular consultas hipotéticas y constituyeuna de las principales aportaciones de esta tesis. Además se incorporan al lenguaje de base dedatos otras funcionalidades de la lógica HH(C) como son los cuantificadores existenciales yuniversales, y las restricciones. Otra de las aportaciones que presentamos en esta memoria esla incorporación de funciones de agregación y restricciones de integridad a HH:(C). Se puedenencontrar diferentes propuestas de trabajos sobre incorporación de funciones de agregación

4

tanto a las bases de datos con restricciones geométricas (véase el capítulo 6 de [64]) comoa las BDD [31, 91, 130, 131]. Por su parte, las restricciones de consistencia de los datosson también conocidas como restricciones fuertes de integridad en el contexto de las BDD[20, 63, 55], y no se deben confundir con las restricciones del sistema de HH:(C) pertenecientesal sistema de restricciones C que parametriza el esquema. Las restricciones de integridadgarantizan un uso seguro de la base de datos y son, por ejemplo, la clave primaria y la claveajena. En HH:(C) el cálculo de funciones de agregación se delega en los resolutores delsistema de restricciones y las restricciones de integridad se calculan también utilizando losresolutores que, en este caso, devuelven cierto o falso según se cumpla o no una restricciónconcreta.

De la idea de trasladar las funcionalidades y formalismos de HH:(C) a un lenguaje relacio-nal surge HR-SQL. El lenguaje, sus fundamentos semánticos y el sistema que los implementason un resultado de esta tesis, que se ha abordado de manera incremental. En primer lugar,con el objetivo de trasladar la expresividad de las definiciones recursivas a las bases de datosrelacionales, se desarrolló el lenguaje R-SQL que permite definiciones recursivas de relacionesno lineales y mutuamente recursivas, superando así la limitación de recursión del estándarSQL-99 [36]. Los fundamentos semánticos de R-SQL son próximos a los de HH:(C) y estántambién basados en una semántica de punto fijo estratificada que calcula el significado deuna base de datos por capas o estratos.

La siguiente funcionalidad de HH:(C) que trasladamos al marco relacional es la capacidadde plantear hipótesis en vistas y consultas. Usando de nuevo una semántica similar a la deHH:(C) ampliamos los fundamentos de R-SQL para desarrollar HR-SQL. De esta forma seproporciona significado a un lenguaje muy cercano a SQL que permite recursión extendiday razonamiento hipotético. El sistema que implementa HR-SQL hace uso de los sistemasde bases de datos relacionales existentes y los extiende con capacidades habituales de loslenguajes de bases de datos deductivas (como la posibilidad definir relaciones recursivas nolineales o mutuamente recursivas) y otras capacidades que provienen de HH:(C) (como elmanejo de hipótesis en vistas y consultas).

1.1. Motivación

Para motivar el trabajo y con el objetivo de mostrar las ventajas y expresividad de losmarcos propuestos, mostramos algunos ejemplos de bases de datos y consultas con HH:(C)y HR-SQL.

En el contexto de las bases de datos deductivas utilizamos el término predicado que secorresponde con el término relación en bases de datos relaciones. De igual forma decimos quelos objetivos se corresponden con las consultas. Además podemos distinguir dos componentesen una base de datos: la base de de datos extensional que se compone de predicados que estándefinidos mediante hechos y la base de datos intensional que se define mediante predicadosque contienen al menos una regla. A continuación presentamos una colección de ejemplosque aceptan los sistemas HH:(C) y HR-SQL en cada caso.

El lenguaje HH:(C)

En el primer ejemplo que presentamos de HH:(C) definimos una base de datos para lasasignaturas cursadas por alumnos. La sintaxis concreta del lenguaje que se usa es muy cercanaa Prolog. Además se usa not(A) para representar la negación de un átomo A y el símbolo

5

=> para las implicaciones anidadas (en una consulta o en el cuerpo de una regla o cláusula).Utilizamos también % para introducir comentarios en el código.

Hemos implementado varios sistemas de restricciones para HH:(C): booleanos, reales,enteros y dominios finitos. Cada sistema de restricciones tiene asociado su dominio correspon-diente que es necesario definir explícitamente en el caso de un dominio finito. En el ejemplo,utilizamos el dominio de los números reales y dos dominios enumerados que definimos explíci-tamente: uno para los nombres de alumnos (alum_dt) y otro para las asignaturas (asig_dt).Dichos dominios se definen en HH:(C) como:

domain(alum_dt,[angela, david, joseluis, nicolas]).domain(asig_dt,[introduccion_programacion,

programacion_declarativa,programacion_funcional,programacion_logica]).

donde domain es la palabra reservada del sistema para definir un nuevo dominio finito yencontramos entre corchetes los valores respectivos de estos dominios.

La relación que definimos a continuación proporciona información sobre el nombre delAlumno, la Asignatura cursada y la Nota obtenida.

% curso(Alumno, Asignatura, Nota)curso(angela, introduccion_programacion, 5.0).curso(nicolas, introduccion_programacion, 7.0).curso(david, introduccion_programacion, 2.0).curso(angela, programacion_declarativa, 3.0).

En HH:(C) también es necesario hacer declaración explícita de los tipos de los predicados.Al igual que sucede con las BDR la noción de tipo está estrechamente ligada a los domi-nios denotados por los sistemas de restricciones correspondientes. Para la relación cursointroducimos su declaración de tipo:

type(curso(alum_dt,asig_dt,real)).

Continuamos con la definición de la base de datos introduciendo un predicado que determi-na que para poder matricularse de la asignatura programacion_declarativa_avanzadaes necesario haber aprobado (y cursado) introduccion_programacion y haber cursadoprogramacion_declarativa (aunque no necesariamente haber aprobado esta segunda):

% matriculaPDA(Alumno, Asignatura).matriculaPDA(Alumno, programacion_declarativa_avanzada):-curso(Alumno, introduccion_programacion, Nota),Nota>=5.0,curso(Alum, programacion_declarativa, X).

A continuación presentamos algunos ejemplos de consultas a esta base de datos. Dado queel lenguaje incorpora negación, una consulta puede determinar quién no puede matricularseen programacion_declarativa_avanzada:

6

hhnc> not(matriculaPDA(Alumno, programacion_declarativa_avanzada)).

La respuesta en nuestro sistema de bases de datos es una restricción:

Alumno/= angela

que especifica que cualquier alumno distinto de Angela no puede matricularse, dado que ellaes la única que ha aprobado y cursado las asignaturas requeridas.

Además de la posibilidad de plantear consultas hipotéticas, una de las aplicaciones quepresenta este trabajo es el uso de funciones de agregación. Un ejemplo de la combinación deambas es la siguiente consulta: suponiendo que el alumno José Luis obtuviese un 9.0 en laasignatura introduccion_programacion ¿cuál sería la media de las calificaciones de losalumnos de esta asignatura?

hhnc> curso(joseluis, introduccion_programacion, 9.0)=>Avg=avg(curso(Alumno, introduccion_programacion, Nota), Nota).

Las funciones de agregación se presentan dentro de las restricciones del lenguaje y se resuelvenenviándolas al resoluto correspondiente de HH:(C). En este caso la función media (avg) tienedos argumentos: el predicado al que se aplica (curso) y la variable sobre la que se calcula lamedia (Nota), y se resuelve en el sistema de restricciones reales. La respuesta a la consultaformulada es la siguiente restricción:

Avg = 5.75.

Para resolver una restricción de integridad se genera una restricción de nuestro sistema derestricciones que se envía a los resolutores utilizados. En el ejemplo, para especificar que queel par (Alumno, Asignatura) conforma la clave primaria del predicado curso se utiliza lasiguiente declaración al definir la base de datos:

:- pk(curso(Alumno, Asignatura, Nota),(Alumno, Asignatura)).

Otra de las ventajas de trabajar con este lenguaje es la posibilidad de tratar con ciclosdentro de un grafo. Supongamos que añadimos una nueva definición de predicado para es-pecificar de forma más sencilla cuándo una asignatura es prerrequisito de otra siguiendo laformulación que aparece en [102]. En la siguiente relación encontramos una parte extensionaldefinida mediante dos hechos, y otra intensional, definida mediante una regla:

% pre(Asignatura, Asignatura).pre(programacion_funcional, introduccion_programacion).pre(programacion_logica, programacion_funcional).

pre(Pre, Post) :- pre(Pre, Asignatura),pre(Asignatura, Post).

Podemos preguntar si al añadir un determinado prerrequisito se introduce un ciclo:

hhnc> pre(introduccion_programacion, programacion_logica)=>pre(X, X).

La respuesta es cierto.A continuación presentamos cómo expresar también este ejemplo en el segundo lenguaje

de bases de datos presentado en esta tesis: HR-SQL.

7

El lenguaje HR-SQL

En HR-SQL se definen relaciones asignando instrucciones select (del lenguaje de consultaSQL) a nombres de relación junto con su esquema (un esquema se compone de variablesjunto con sus tipos correspondientes que provienen del estándar de SQL). En este lenguajedistinguimos también la parte extensional de la intensional en una base de datos.

Comenzamos definiendo la relación curso que establece la correspondencia entre cadaalumno y la nota obtenida en una asignatura determinada (al igual que el predicado deidéntico nombre introducido previamente). Para definir la parte extensional de una base dedatos en HR-SQL utilizamos instrucciones from-less (siguiendo la nomenclatura inglesa) quepermiten algunos SGBDR para definir tuplas sin origen de datos implícito sino explícito1.

curso (alumno varchar(20), asignatura varchar(20), nota float) :=select ’Angela’, ’Introduccion programacion’, 5.0 unionselect ’Nicolas’, ’Introduccion programacion’, 7.0 unionselect ’David’, ’Introduccion programacion’, 2.0 unionselect ’Angela’, ’Programacion declarativa’, 3.0;

La relación que determina quién puede matricularse en Programacion declarativaavanzada se puede definir haciendo uso de dos relaciones auxiliares aprobarIP y cursarPD:

aprobarIP(alumno varchar(20)):=select curso.alumno from curso wherecurso.asignatura = ’Introduccion programación’ andcurso.nota>=5.0;

cursarPD(alumno varchar(20)):=select curso.alumno from curso wherecurso.asignatura = ’Programacion declarativa’;

matriculaPDA(alumno varchar(20)):=select aprobarIP.alumno from aprobarIP,cursarPD whereaprobarIP.alumno = cursarPD.alumno;

La consulta de quién no puede matricularse en Programacion declarativa avanzada seformula como:

hr-sql> select curso.alumno from cursoexceptselect * from matriculaPDA;

En lugar de restricciones, HR-SQL devuelve tuplas con los valores correspondientes comoresultado:

[(David; )(Nicolas; )]

El resultado se presenta en forma de tuplas unitarias siguiendo la formulación que devuelveel sistema implementado que se incorpora en un SGBDR. En este caso, el sistema gestor esPostgreSQL y HR-SQL utiliza su notación al devolver la respuesta a una consulta.

1La tabla dual de Oracle consigue un efecto similar para devolver constantes o, en general, resultados de calcularexpresiones.

8

Con el siguiente ejemplo presentamos una consulta hipotética equivalente a la introducidaen HH:(C) que utiliza la función de agregación avg. Para incluir hipótesis en una consulta,HR-SQL incorpora la construcción assume <Hipótesis> in previa a la consulta de SQL quedevuelve el resultado. La consulta se formula por tanto como:

hr-sql> assume ’Joseluis’, ’Introduccion Programacion’, 9.0in cursoIPselect avg(nota) from cursoIP;

donde cursoIP es una relación auxiliar que devuelve los alumnos matriculados en IntroduccionProgramacion. La respuesta es la tupla unitaria con el valor de la función de agregaciónaplicada a la relación curso teniendo en cuenta la hipótesis incorporada: [(5.75,)].

La formulación del predicado pre se especifica en HR-SQL de la siguiente forma2:

pre(pred varchar(20), pos varchar(20)) :=select ’Programacion funcional’, ’Introduccion programacion’unionselect ’Programacion logica’, ’Programacion funcional’unionselect pre1.pred, pre2.pos from pre as pre1,pre as pre2where pre1.pos = pre2.pred;

Definimos a continuación la consulta equivalente para obtener qué asignaturas formanparte de un ciclo cuando se asume una nueva tupla en el grafo de prerequisitos:

hr-sql> assume select ’Introduccion programacion’,’Programacion logica’ in pre

select pre.pred from pre where pre.pred = pre.pos;

Como hemos señalado, además del razonamiento hipotético, HR-SQL extiende la recursión deSQL-99. Por ejemplo, con nuestro lenguaje podemos definir de manera sencilla dos relacionesmutuamente recursivas para representar respectivamente los números pares e impares hasta100:

par(x integer) :=select 0 unionselect impar.x+1 from impar;

impar(x integer) :=select par.x+1 from par where par.x<100;

Finalmente, como ejemplo relación no lineal proponemos la siguiente representación paralos números de Fibonacci inferiores también a 100:

fib(n integer, f integer) :=select 0,1 unionselect 1,1 unionselect fib1.n+1,fib1.f+fib2.f from fib as fib1, fib as fib2where fib1.n=fib2.n+1 and fib1.n<100.

2La implementación actual de HR-SQL permite introducir alias en la sintaxis del lenguaje, si bien esta caracte-rística no aparece reflejada en las publicaciones asociadas a esta tesis.

9

Como hemos señalado, el estándar SQL-99 no admite este tipo de definiciones mutua-mente recursivas. Dado que las primitivas aritméticas pueden introducir relaciones infinitas(y no terminación en el cómputo) utilizamos la condición que aparece en la cláusula where(fib1.n<100) para limitar el número de llamadas.

Una vez motivadas algunas de las capacidades expresivas que proporcionan estos lenguajesen los contextos deductivo y relacional pasamos a presentar los objetivos y aportaciones deesta tesis.

1.2. Objetivos y aportaciones: de HH:(C) a HR-SQL

Comenzamos este trabajo con el objetivo de extender los trabajos teóricos [83] para fun-damentar e implementar HH:(C) como lenguaje de consulta de bases de datos deductivas.Además nos propusimos incorporar funciones de agregación y restricciones de integridad. A lolargo del desarrollo de la tesis se ha trabajado en estos objetivos.

En concreto, en [A.1] se presenta la primera implementación de HH:(C). En [A.3] abor-damos la incorporación de las funciones de agregación a HH:(C) aprovechando la semánticaestratificada de punto fijo para el cálculo de agregados. De forma similar, en [A.2] incorpora-mos las restricciones de integridad para las bases de datos deductivas utilizando el lenguajedel sistema de restricciones para especificarlas de forma sencilla.

Nuestra investigación se centra asimismo en trasladar ciertas ventajas de las bases dedatos deductivas a un SGBDR y también utilizar técnicas semánticas propias de las BDD paraformalizar el modelo relacional. En concreto, se trata de incorporar a los sistemas de basesde datos relacionales actuales un modelo de recursión más expresivo que permita la recursiónno lineal y la recursión mutua así como el manejo de hipótesis en vistas y consultas. Conesta idea surge HR-SQL como un sistema que utiliza un lenguaje de base relacional que usatécnicas deductivas para el cómputo de punto fijo.

En [B.1] definimos el lenguaje R-SQL. En esta publicación presentamos una semántica depunto fijo estratificada que proporciona significado a bases de datos del lenguaje que permitedefiniciones recursivas no lineales y mutuamente recursivas. Además proponemos la primeraimplementación del sistema. Después, en el artículo [B.2] presentamos HR-SQL que permiteincorporar hipótesis en vistas y consultas. Como veremos al final de la sección 1.5 existenotros trabajos sobre razonamiento hipotético en los SGBDR. Sin embargo, HR-SQL extiendela expresividad a la hora de incorporar hipótesis en vistas y consultas dado que permite hacersuposiciones positivas y negativas.

Respecto a los fundamentos teóricos que sustentan este trabajo presentamos una semánticade punto fijo estratificada para HH:(C) (que sirve de semántica operacional para la implemen-tación de un sistema concreto). También se presenta una semántica de pruebas desarrolladapreviamente [83] y se demuestra que la semántica de punto fijo es correcta y completa conrespecto a ella. El lenguaje HH:(C) es paramétrico con respecto al sistema genérico de res-tricciones C, que al ser sustituido por un determinado sistema de restricciones da lugar auna instancia concreta. Presentamos también una semántica de punto fijo para el lenguajeHR-SQL inspirada en la semántica de punto fijo que fundamenta HH:(C). De esta formaproporcionamos significado a las bases de datos del lenguaje así como a las consultas y vistasque pueden contener hipótesis.

Respecto a la implementación presentamos los sistemas HH:(C) y HR-SQL. Al igualque ocurre con la definición semántica, hemos implementado el sistema HH:(C) de formaindependiente del sistema de restricciones concreto. Asimismo, hemos implementado varios

10

sistemas de restricciones para el sistema: booleanos, dominios finitos y reales. Para la im-plementación de HH:(C) hemos usado el lenguaje SWI-Prolog [129]. Este sistema deductivoacepta como entrada las bases de datos del lenguaje y, haciendo uso del las restriccionesasociadas al resolutor correspondiente, proporciona significado a sus predicados y consultasque combinan implicaciones, cuantificadores y restricciones. Además se han implementado lasfunciones de agregación delegando su cómputo al resolutor correspondiente del sistema. Asícomo también las restricciones de integridad haciendo uso de las restricciones del sistemapara su implementación.

El sistema HR-SQL está implementado también siguiendo su formalización semántica yutilizando SWI-Prolog. Este sistema se ha diseñado como una capa sobre los sistemas rela-cionales comerciales. En particular puede trabajar con dos SGBDR: con PostgreSQL y conIBM DB2. La capa superior implementada en Prolog calcula la semántica de las bases dedatos de HR-SQL y materializa las tablas resultantes en el SGBDR subyacente.

Utilizamos el lenguaje Python y el lenguaje de cuarta generación SQL PL (integrado enDB2 de IBM) como lenguajes intermedios que sirven para comunicar el sistema implementadoen Prolog y el SGBDR. Nuestros sistemas crean programas en estos lenguajes intermedios yestos programas generan las bases de datos HR-SQL mediante instrucciones SQL estándarembebidas en ellos.

Hemos trabajado también en la eficiencia de HR-SQL con respecto a R-SQL. La nociónde estratificación ha evolucionado desde una que aglutina el máximo número de relacionesen un mismo estrato, tanto en el caso HH:(C) como en la primera versión de R-SQL, auna que minimiza el número de relaciones en cada uno de los estratos para HR-SQL. Alminimizar las relaciones en los estratos se mejora la eficiencia dado que se reduce el númerode iteraciones de los bucles utilizados para alcanzar el punto fijo. También el cómputode punto fijo ha mejorado. En HR-SQL se separan las definiciones de relaciones recursivasen el caso base y el caso recursivo, extrayendo el primero del cuerpo del bucle en cadaestrato. Así se evita recalcular inútilmente la parte no recursiva en las sucesivas iteracionesnecesarias para alcanzar el punto fijo. Esta técnica es habitual en el campo de las BDD [122].Finalmente usamos tablas temporales para obtener las tuplas que se añaden (o eliminan) a lasrelaciones de la base de datos al calcular vistas y consultas hipotéticas. Dado que las tablastemporales no generan entradas de log en el SGBDR ni demandan gestión de concurrenciahan resultado una herramienta adecuada para nuestro fin, sin que ello conlleve una granpérdida de rendimiento, si bien es cierto que esta funcionalidad no está disponible en todoslos SGBDR actuales (véase la sección 3.6).

De este modo contribuimos en dos áreas en las bases de datos: por un lado, aportandoy fundamentando el lenguaje HH:(C) en el área de las BDD, que extiende las capacidadesde Datalog; y por otro lado, en el de las BDR, mejorando los SGBDR existentes al permitirhipótesis en vistas y consultas, así como un tratamiento más general de la recursión. Tambiénpresentamos técnicas semánticas que son novedosas en ambos campos.

1.3. Organización del trabajo

La memoria se divide en cuatro capítulos (incluyendo este primer capítulo introductorio)con el siguiente contenido:

En el capítulo 2 presentamos el lenguaje HH:(C), su sintaxis, la definición de su sistemade restricciones y las semánticas que fundamentan este lenguaje. También presentamos

11

el sistema implementado siguiendo la semántica de punto fijo. Con ello se resumen loscontenidos de las publicaciones [A.1, A.2, A.3].

En el capítulo 3 presentamos el marco teórico HR-SQL: su sintaxis y su semántica depunto fijo. Además presentamos la implementación de un sistema basado en este marco.En este capítulo se resumen las publicaciones [B.1, B.2, B.3].

Finalmente en el capítulo 4 presentamos las conclusiones y planteamos líneas de trabajofuturo.

1.4. Publicaciones asociadas a la tesis

[A.1] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez.Implementing a Fixpoint Semantics for a Constraint Deductive Database based onHereditary Harrop Formulas.En Procedings of the 11th International ACM SIGPLAN Symposium of Principles andPractice of Declarative Programing (PPDP’09), páginas 117–128. ACM Press, 2009.! Página 116

[A.2] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Incorporating Integrity Constraints to a Deductive Database System.En XI Jornadas sobre Programación y Lenguajes, PROLE2011 (SISTEDES)editores: Purificación Arenas, Victor M. Gulías y Pablo Nogueira, páginas 141–152,Septiembre, 2011.! Página 128

[A.3] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.An Extended Constraint Deductive Database: Theory and implementation.The Journal of Logic and Algebraic Programming, volumen 21, páginas 20–52, 2013.! Página 140

[B.1] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Formalizing a Broader Recursion Coverage in SQL.En Symposium on Practical Aspects of Declarative Languages (PADL’13), volumen7752 de LNCS, páginas 93 – 108, 2013.! Página 176

[B.2] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Incorporating Hypothetical Views and Extended Recursion into SQL Database Systems.En Ken Mcmillan, Aart Middeldorp, Geoff Sutcliffe, y Andrei Voronkov, editores, LPAR-19, volumen 26 de EPiC Series, páginas 9–22. EasyChair, 2014.! Página 192

[B.3] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.R-SQL: An SQL Database System with Extended Recursion.En Electronic Communications of the EASST, volumen 64: Programming and ComputerLanguages, páginas 1–18, 2013.! Página 206

A continuación presentamos el estado del arte.

12

1.5. Estado del arte

Comenzamos haciendo un repaso de modelos para las BDD y los sistemas a los que danlugar. Hacemos énfasis en las diferentes aproximaciones para incorporar negación y agregaciónen los lenguajes de bases de datos deductivas dado que en la memoria presentamos cómo seincorporan ambas en HH:(C). A continuación se hace una revisión de distintos sistemas debases de datos con restricciones y sus aplicaciones. Terminamos el capítulo presentando otrasaproximaciones para la recursión y el razonamiento hipotético en las BDR.

1.5.1. Bases de datos deductivas

Los sistemas de BDD son aquellos que obtienen nuevos datos a través de un motor deinferencia. Puede incorporar gestor de transacciones, control de seguridad y control de persis-tencia, entre otras funcionalidades. Las BDD se llaman también bases de datos lógicas, dadoque tienen su génesis en la PL. Una característica de las BDD, compartida con las bases dedatos relacionales, es que sus lenguajes de consulta tienen la propiedad de ser declarativos.Esto significa que permite al usuario hacer una consulta planteando qué información quierenobtener, en vez de cómo realizar la operación.

Según [77] la incorporación de la lógica ha aportado un gran número de contribuciones alas bases de datos, entre las que se pueden destacar:

Formalización de base de datos, consulta y respuesta a una consulta.

El reconocimiento de que la programación lógica extiende a las bases de datos relacio-nales.

Presentación de la semántica de múltiples clases de bases de datos que incluyen formasalternativas de negación y disyunción.

Comprensión de las restricciones de integridad y la forma en que se pueden aprovecharal realizar actualizaciones y optimización de la semántica de consultas.

Formalización y soluciones a los problemas de actualización de datos y de vistas.

Comprensión de la recursión y la forma en que puede ser implementada prácticamente.

Comprensión de las relaciones entre los sistemas basados en la lógica y los sistemasbasados en el conocimiento [93].

Formalización de la gestión de la información incompleta en sistemas de bases de cono-cimiento.

Correspondencia entre formalismos alternativos de razonamiento no monótono y basesde datos y de conocimiento.

La mayoría de los sistemas deductivos están inspirados en Prolog. A la hora de diseñar eimplementar se debe tener en cuenta:

La estrategia de evaluación de Prolog puede conducir a cómputos infinitos debido alos predicados recursivos, incluso con programas sin negación o también en ausencia desímbolos de función o aritméticos. Sin embargo, en los lenguajes de BDR más extendidosbasados en SQL se espera que las consultas terminen siempre.

13

La corrección y completitud del método de evaluación.

La cantidad de información es lo suficientemente grande como para formar parte delalmacenamiento secundario en una aplicación típica de bases de datos. Para un buenrendimiento del sistema, el acceso eficiente a estos datos es crucial.

Finalmente, un objetivo primordial de las bases de datos deductivas es tratar con unsuperconjunto del AR que permita recursión sin que ello conlleve un gran número deaccesos a disco, que sea terminante.

El origen de las bases de datos deductivas se puede encontrar en trabajos relacionadoscon demostradores automáticos de problemas y en la PL. En otro estudio realizado tambiénpor Minker [76] se sugiere que Green y Raphael [40] fueron los primeros en relacionar lademostración de teoremas y la deducción en bases de datos. Estos desarrollaron una serie desistemas consulta y respuesta que usaban una versión del principio de resolución de Robinson[98], demostrando así que la deducción se puede usar de manera sistemática en el contextode las bases de datos.

Los primeros sistemas que implementan estas ideas son MRPPS [78], DEDUCE [22] yDADM [60]. El primero, MRPPS, era un intérprete que fue desarrollado por el grupo deMinker entre 1970 y 1978. De él se puede destacar que incluyó una de las primeras propuestasde consultas recursivas. DEDUCE fue implementado por IBM en la década de los 70, y usabareglas basadas en cláusulas de Horn recursivas lineales por la izquierda. Finalmente en DADMse hizo explícita la diferencia entre la parte extensional e intensional de una base de datos yse presentaba la representación de la intensional en forma de grafos de conexión.

En 1976, van Emdem y Kowalski [32] mostraron que el mínimo punto fijo de un progra-ma lógico con cláusulas de Horn era el modelo mínimo de Herbrand. Esto fundamentó lasemántica de los programas lógicos, así como de las bases de datos deductivas, dado quela semántica operacional consiste en el cómputo de punto fijo asociado a una base de datosdeductiva (al menos, a las basadas en evaluación ascendente).

Los primeros trabajos se centraron en establecer los objetivos de las BDD y el desarrollode sus fundamentos semánticos. La siguiente fase se centró en el desarrollo de la evaluacióneficiente de consultas. Henschen y Naqvi [49] propusieron una de las primeras técnicas efi-cientes para evaluar consultas en el contexto de las bases de datos. Tras esto, en un artículode Ullman [121] se fijó un marco para la implementación. Con este fin, el autor se centró nosolo en técnicas para evaluar consultas sino que también llamó la atención sobre el problemade la no terminación.

El área de las BDD alcanzó gran importancia en 1984 con el comienzo de tres importantesproyectos. El proyecto Nail! en Stanford, LDL en Austin y el proyecto de bases de datosdeductivas ECRC representaron la mayor contribución a las bases de datos deductivas fuerade las universidades.

El proyecto ECRC fue coordinado por J. M.f Nicolas. La primera fase [17] llevó al estudiode los algoritmos de desarrollo de los primeros prototipos, comprobación de integridad y unsistema inicial que exploraba la comprobación de consistencia. La segunda fase trajo prototiposmás funcionales: Megalog [12], DedGin [125], EKS-V1 [67]. El sistema EKS daba soporte arestricciones de integridad y algunas funciones de agregación que usaban recursión. De esteproyecto se derivan investigaciones como las que llevó a cabo el Groupe Bull, que desarrollóbases de datos deductivas comerciales y orientadas a objetos.

El proyecto LDL [120] comenzó también en 1984. En 1986 se evidenció que la combinaciónde Prolog con las bases de datos relacionales no era una solución satisfactoria, así que

14

comenzaron con el desarrollo de técnicas ascendentes para el cálculo de la semántica dela base de datos. El prototipo LDL se desarrolló en 1988 y tuvo nuevas versiones entre1989 y 1991. Este fue el primer sistema de bases deductivas de propósito general que estuvodisponible. Dicho sistema incorporaba negación estratificada y era compilado por un sistemaque producía código C. Encontramos una presentación del lenguaje LDL en [80]. El sistemaLDL++ en MCC [131] es el sucesor directo que comenzó en 1991. Este sistema incluyenegación no estratificada y funciones de agregación. Actualmente el sistema LDL++ haevolucionado al sistema Deals [109] (http://wis.cs.ucla.edu/deals).

El proyecto Nail! (Not Another Implementation of Logic!) comenzó en Stanford en 1985siguiendo las ideas que aparecen en [121]. En colaboración con el grupo MCC apareció elprimer artículo sobre conjuntos mágicos (Magic Sets) [8]. Se desarrolló un prototipo inicial[79] y finalmente fue abandonado dado que el paradigma puramente declarativo no resultabacómodo para la realización de muchas aplicaciones.

El proyecto Aditi comenzó en 1988 en la Universidad de Melbourne. Las principales contri-buciones de este proyecto son la formulación de una evaluación naïve que ha sido muy usadaen trabajos posteriores [6], la adaptación de conjuntos mágicos para programas estratifica-dos [5], indexación y optimización de programas con restricciones [61]. El trabajo del grupose encaminó hacia el desarrollo de su prototipo, haciendo especial énfasis en las relacionesresidentes en disco. Se puede ver una visión general de este sistema en [123].

El sistema ConceptBase [55] desarrollado en la Universidad de Passau y Aachen desde 1987trataba de combinar reglas deductivas con un modelo de datos semántico. El sistema [56]también tiene soporte de restricciones de integridad. ConceptBase se ha usado en numerosasaplicaciones en universidades europeas y tiene una versión comercial.

El proyecto CORAL perteneciente a la Universidad de Wisconsin comenzó en 1988. La ideaoriginal era el desarrollo del algoritmo de plantillas mágicas (Magic Templates) [85], queofrecía la posibilidad de usar tuplas no cerradas. Este proyecto aporta grandes contribucionesen el desarrollo de semánticas de multiconjuntos para PL y optimización cuando se trata decomprobaciones de duplicados [70]. Los primeros resultados de técnicas ascendentes eficientescon respecto a espacio aparecen en [82, 115]. Además la presentación de la evaluación deprogramas con funciones de agregación se puede encontrar en [113]. El resultado de que laevaluación ascendente domina asintóticamente a la descendente (en el contexto de programascon cláusulas de Horn) se obtuvo a través de este proyecto [114]. El primer prototipo delsistema CORAL estuvo operativo en 1990. Esta versión soportaba agregación no estratificaday negación, usando un algoritmo propuesto en [86]. Podemos encontrar una visión generaldel sistema en [87] y la implementación aparece descrita en [88]. La extensión que soportacaracterísticas orientadas a objetos es Coral++ [111].

El proyecto XSB [104] es otro trabajo relacionado, fue coordinado por D.S. Warren. Sedesarrolló un sistema que soporta negación estratificada y agregación (además de un meta-intérprete para programas bien fundamentados), tuplas no cerradas y relaciones residentesen disco. La implementación se basa en la resolución OLDT [116]. La máquina abstractade Warren WAM (Warren Abstract Machine) [127], una máquina abstracta para implementarsistemas Prolog, se adaptó para usar la evaluación descendente que se usa en XSB.

El sistema educativo DES es un sistema de bases de datos deductivas desarrollado enla Universidad Complutense [103], es gratuito y de código abierto. Incluye lo lenguajes deprogramación Datalog, SQL y AR. Soporta negación estratificada, depuración declarativa,generación de casos de prueba para vistas SQL, funciones y predicados de agregación, ypredicados join, restricciones de integridad fuertes, tablas memo [108] y consultas hipotéticas.

15

http://wis.cs.ucla.edu/deals

El interprete Inter4QL [71] está basado en un lenguaje de bases de datos llamado 4QL quepermite negación en cuerpos y cabezas de las reglas. 4QL utiliza una semántica multivaloradade cuatro valores: true, false, inconsistent y unknown. Esta semántica proporciona significadopara un tratamiento uniforme de lo que se denomina suposición del múndo cerrado (o CWA porsus siglas en inglés). Además el lenguaje puede representar otros formalismos como distintasvariantes de razonamiento por defecto, razonamiento autoepistémico y otros formalismos parala desambiguación de información inconsistente.

El lenguaje de consulta lógico LogiQL [41] es un lenguaje de programación declarativo queproviene de Datalog y está desarrollado por LogicBlox Inc. para su motor de bases de datosLogicBlox. Se ha desarrollado utilizando técnicas eficientes para la evaluación de consultas,gestión de concurrencia, optimización del trabajo en red, análisis de programas así como paramodelos de programación declarativos y reactivos.

El sistema bddbddb [65] (BDD-Based Deductive DataBase por sus siglas en inglés) esuna implementación de Datalog que representa la información usando diagramas binarios dedecisión. Estos diagramas son estructuras de datos que pueden representar de forma sintéticarelaciones con gran cantidad de datos y proporcionan un conjunto de operaciones muy eficiente.Esto hace que bddbddb pueda representar y operar con relaciones que contienen un númeroextremadamente grande de datos.

En [89] se encuentra una tabla comparativa de varios sistema que mostramos y ampliamoscon sistemas actuales en la figura 1.1. Además hemos incluido también nuestro sistemaHH:(C) en esta tabla. Los parámetros sobre los que hacemos comparativa son:

1. Recursión (Rec.). Muchos de los sistemas permiten usar recursión general. Sin embargo,algunos limitan la recursión a una serie de casos restringidos relacionados con búsquedade grafos.

2. Negación. La mayoría de los sistemas permiten negación en el cuerpo de las reglas.Cuando esto ocurre suele haber más de un punto fijo mínimo y el sistema debe seleccionaruno de ellos en función del modelo pretendido.

3. Agregación. Un problema parecido al de la negación aparece con la agregación (su-ma, promedio, etc). Esto hace que aparezca más de un modelo mínimo que debemosdiscriminar.

Expresividad y modelos para la negación en bases de datos deductivas

Con respecto al fundamento semántico de otras propuestas para la incorporación de nega-ción, destacamos:

la aproximación de los Modelos Estables de Gelfond y Lifschitz [37]. Se trata de unasemántica declarativa para programas con negación basada en lógica.

La semántica bien fundada (Well-Founded Semantics) de Van Gelder, et al. [124]. Enesta aproximación la idea principal es la de unfounded set que se usa para formalizar lanegación.

Si compráramos la expresividad de nuestra propuesta con la de estos modelos para la ne-gación, debemos señalar que las dos aproximaciones anteriores trabajan con instancias básicas(ground según la nomenclatura inglesa) en los cuerpos de las cláusulas. Por el contrario,los sistemas de restricciones de HH:(C) permiten representación intensional y respuestas más

16

Nombre Desarrollado Ref. Rec. Negación Agregación

Aditi U. Melbourne [123] Sí Estratificada Estratificadabddbddb U. Stanford [65] Sí Estratificada NoConcept U. Aachen [56] Sí Localmente NoBase EstratificadaCORAL U. Wisconsin [87] Sí Modularmete Modularmente

Estratificada EstratificadaDES U. Complutense [103] Sí Estratificada EstratificadaEKS ECRC [67] Sí Estratificada EstratificadaHH:(C) U. Complutense [A.3] Sí Estratificada EstratificadaInter4QL U. Varsovia [71] Sí Semántica No

MultivaloradaLDL MCC [23] Sí Estratificada EstratificadaLDL++ Restringida RestringidaLogicBlox LogicBox Inc. [41] Sí Semántica Parcial

MultivaloradaSUNY [104] Sí Modularmente

XSB Stony Bien Fundada EstratificadaBrook

Figura 1.1: Comparativa de implementaciones de sistemas de bases de datos deductivas.

generales que si los limitamos a restricciones de igualdad de variables. Por ejemplo, las res-tricciones sobre reales permiten representar datos posiblemente infinitos. Sin embargo, nopodemos representarlas con átomos básicos de forma directa. Además cabe señalar que tan-to los modelos estables como la semántica bien fundada permiten bases de datos que noserían estratificables en nuestra aproximación. Es decir, el uso de la estratificación suponeuna limitación en cuanto a las bases de datos que es posible representar en el lenguaje. Sinembargo, dados los recursos de HH:(C) (el uso de restricciones, cuantifcadores e implicación)cuando se presenta una base de datos no estratificable en algunos casos se puede encontraruna base de datos de nuestro lenguaje que sea equivalente (veáse ejemplo 1 de [A.2]).

Finalizamos señalando que answer set programming [69] es otra aproximación de la pro-gramación declarativa que incorpora negación y es adecuada para trabajar con problemas debúsqueda de dificultad combinatoria. Está basada en modelos estables y utiliza resolutorescomo mecanismo computacional de inferencia. Esta propuesta incluye restricciones que seusan para resolver restricciones de integridad, para descartar modelos y obtener una respues-ta. Sin embargo, en HH:(C) estas restricciones forman parte de la respuesta. Además estaaproximación no permite el manejo de implicación que hace nuestro sistema.

1.5.2. Bases de datos con restricciones

Una de las ventajas del uso de restricciones en el contexto de la PL es su capacidad paratratar con infinitos datos mediante el uso de representaciones finitas. Las bases de datos conrestricciones [64] heredan esta característica.

La investigación en bases de datos con restricciones comenzó con el objetivo de definiruna versión de CLP orientada a las bases de datos. El primer objetivo fue usar técnicas

17

ascendentes para procesar reglas Datalog usando además restricciones. Así se hacía posible eluso de CLP para aplicaciones de manera que los datos podían representarse como conjuntos derestricciones (por ejemplo, datos espaciales). Según avanzaba la investigación resultó que elproblema de dar soporte a la recursión en presencia de restricciones no llegaba a ser resueltode manera satisfactoria. Por tanto, en este campo se avanzó sobre todo centrándose en elcaso no recursivo. Los lenguajes de consultas no recursivos con restricciones llevaron a lainvestigación de problemas interesantes y nada triviales y se usan en diferentes campos [29].

La idea de usar restricciones para representar objetivos se había discutido en el campode las matemáticas (por Whitney [128] por ejemplo) y en el de la investigación operativa(por Dantzig [81]), y se había usado en algunas aplicaciones de gestión de bases de datosespaciales (la más notable CAD/CAM [106]).

En el campo de las bases de datos, Kanellatis et al [58] son los primeros en definir unmarco de trabajo sistemático para el uso de restricciones como modelo de datos complejos ylenguajes de consulta sobre dichos datos.

Se han definido álgebras y cálculos para aplicaciones concretas siguiendo algunas líneas delmodelo relacional. Estos sistemas se utilizan para dar modelos concretos a datos temporales,denominados eventos, y que aparecen en intervalos regulares. Este modelo aparece descritoen [57, 9] y también en el capítulo 13 de [64].

Otro modelo parecido es la propuesta de [46] para la representación de información delespacio en un GIS (Geographic Information System), como la intersección de semiplanos. Ellenguaje de [46] es un caso particular del lenguaje de consulta con restricciones lineales ysin proyección. Otra propuesta en la misma dirección es la que aparece en [47], en la que setrata de incluir información de dependencia del dominio en la base de datos.

El sistema MLPQ (Management of Linear Programming Queries) es un sistema de basesde datos con restricciones lineales. Se desarrolló en la Universidad de Nebraska-Lincoln. Laprimera versión se presentó en [97] e incluye consultas SQL y programación lineal comofunciones básicas para implementar agregados (como máximo y mínimo). La segunda versiónse presentó en [59] e incluye consultas Datalog tanto recursivas como no recursivas y unainterfaz gráfica de usuario con operadores espaciales de dos dimensiones. Se pueden encontrardos grandes aplicaciones de MLPQ: la investigación operativa y el tratamiento con datosespaciales y espacio temporales.

El sistema DISCO (Datalog with Integer Set of COnstraints) es un sistema de bases dedatos que implementa Datalog con restricciones booleanas sobre enteros o conjuntos de tiposenteros. Fue desarrollado en la Universidad de Nebraska. La primera versión del sistema fuepresentada por Revesz en [18]. La segunda versión de DISCO está descrita en [105] e incluyerestricciones de desigualdad e igualdad booleana sobre conjuntos de enteros. En generalla igualdad y la desigualdad booleanas no pueden ser usadas de manera conjunta en unarestricción.

El prototipo DEDALE es una de las primeras implementaciones de un sistema de bases dedatos basado en un sistema de restricciones lineales. Es un proyecto de INRIA con el grupoVERSO y el grupo VERTIGO. El prototipo DEDALE se utiliza para aplicaciones geométricasen diversas áreas como GIS o bases de datos espacio temporales. El sistema se describe en[45] y su modelo de datos en [44].

Otras aplicaciones de las bases de datos con restricciones son:

En visión por computador, para la indexación de bases de datos de imagenes en funciónde su forma y contorno [53, 43].

En bioinformática, para el desarrollo de un autómata para la descodificación del genoma

18

humano, uno de los problemas más importantes en bioinfomática que se ha tratadousando el sistema LDL [94].

También se usa en el modelado de entorno, para el desarrollo de mapas térmicos quese utilizan para evitar la propagación de fuego [48].

1.5.3. Bases de datos deductivas con razonamiento hipotético

Una de las principales aportaciones de este trabajo es la incorporación de razonamientohipotético haciendo uso de implicaciones anidadas, una característica que es poco habitual enlenguajes de bases de datos de la que si encontramos trabajos en la PL (véase por ejemplo[72, 66, 4]).

Uno de los trabajos más importantes dentro del campo de las bases de datos deductivases la contribución de Antony J. Bonner [13, 14]. La aproximación de Bonner consiste en unaextensión de la lógica de cláusulas de Horn que permite consultas hipotéticas en el lenguaje,de una forma similar a nuestro enfoque. En los trabajos de Bonner se permite la adición oborrado de tuplas temporalmente en la base de datos. En ambos casos, incorporación (A B[add:C]) y borrado (A B [del:C]), el término atómico C se añade o borra cuando serealiza una consulta A a la base de datos extensional B. Su aproximación tiene una serie delimitaciones con respecto a HH:(C):

HH:(C) permite cláusulas como antecedentes en la consultas hipotéticas, no solo unátomo, lo cual permite cambiar dinámicamente la base de datos intensional y no solola extensional.

El lenguaje de bases de datos que presentamos en este trabajo combina el razonamien-to hipotético con restricciones y nuevas conectivas. El hecho de manejar todas estascaracterísticas conjuntamente añade una expresividad a nuestra aproximación de la queadolece la propuesta de Bonner.

Un sistema reciente que implementa Datalog hipotético basado en técnicas de tabling es[102]. En este trabajo se implementa un cálculo de consultas hipotéticas que permite laaparición de átomos básicos tanto positivos A, como negativos not(A) en el cuerpo de unaconsulta.

Si se formula la consulta sobre una relación por primera vez, se añade una nueva entradaa la tabla de respuestas. Se descompone la consulta, para ver si algún subconjunto deella está presente en la tabla de respuestas y se elabora una sustitución lo más generalposible que permanecerá en esta tabla para futuras consultas.

Si se formula una consulta que está presente en la tabla de respuestas, el sistemaresponderá directamente devolviendo el valor de la tabla de forma inmediata.

Para el caso de una consulta, o subconsulta, con un átomo negado, not(A), se trataráde buscar una sustitución que satisfaga A. En caso de que no se encuentre, la consultanegativa puede ser probada.

Esta implementación se basa en la asunción del mundo cerrado de [122] e implementa lasideas de Bonner [15], mediante técnicas de tabling [108].

19

1.5.4. Bases de datos relacionales, uso de la recursión y el razona-miento hipotético

Una BDR es una base de datos que sigue el modelo relacional, el cual es el modelo másutilizado en la actualidad para implementar bases de datos. Permiten establecer relacionesentre los datos (guardados en tablas), y a través de dichas relaciones conectar los datos delas tablas correspondientes, de ahí proviene el nombre del modelo. Tras ser postuladas susbases en 1970 por Edgar Frank Codd, de los laboratorios IBM en San José (California), notardó en consolidarse como un nuevo paradigma en los modelos de base de datos.

El estándar SQL-99 [36] incluye una sintaxis para la definición de vistas recursivas (véanselos capítulos 9 de [73], 4 de [110] y 10 de [36]). Para la formulación de una vista recursiva esnecesario utilizar explícitamente las palabras reservadas WITH RECURSIVE evitando así quese creen vistas recursivas de forma accidental (aunque este no es un requisito que impongantodos los SGBDR) que pueden llevar a la no terminación del cómputo. Podemos formular estetipo de vistas temporales en muchos de los SGBDR actuales como son PostgreSQL, DB2y Oracle. Por otro lado MySQL y Microsoft Access no permiten ningún tipo de definiciónrecursiva.

Como señalamos en la primera subsección de este capítulo, existen una serie de limitacionesal de definir vistas recursivas en SQL [102].

Se requiere el uso de UNION ALL a la hora de definir la unión del caso base y el casorecursivo para evitar el descarte de duplicados.

Cuando se define el cierre transitivo de un grafo no se comprueba si las nuevas tuplasañadidas pertenecen a la definición recursiva, lo que lleva a la no terminación paraalgunos casos.

No se permite más de una llamada a una misma relación en la definición recursiva,i.e.,la recursión está limitada al caso lineal.

No está permitida la recursión mutua.

En la literatura sobre semánticas de bases de datos SQL no encontramos una formalizaciónque combine recursión y consultas hipotéticas de la forma en que se plantea en este trabajo.Sin embargo, hay algunos trabajos relacionados con el razonamiento hipotético BDR queintroducimos a continuación. Estos trabajos pueden considerarse los primeros en abordar lainclusión de información hipotética en una base de datos relacional.

En [112] se permite una expresión limitada de consultas hipotéticas. Las suposicionesse calculan haciendo uso del operador de reemplazamiento, que mantiene la informaciónsupuesta hasta que la consulta termina. En el trabajo se abordan dos tipos de razonamientohipotético con dos aproximaciones:

Se permite añadir (APPEND), actualizar (RETRIEVE) y borrar (DELETE) un número (po-siblemente 0) de tuplas en una base de datos.

Se introduce el concepto de experto que permite especificar definiciones de relaciones enfunción de una decisión que se tomará más adelante.

Este trabajo no incluye la recursión y no se ha implementado en un sistema concreto.En [42] se presenta el AR extendida para tratar con consultas hipotéticas (de la forma Q

when ffUgg) haciendo uso de actualizaciones en la base de datos, pero sin recursión ni una

20

implementación concreta. El resultado de la consulta Q es el valor de la base de datos DBtras la ejecución de U.

La principal diferencia de estos trabajos con la aproximación HR-SQL es:

HR-SQL permite un uso más general de la recursión permitiendo definiciones no linealesy mutuamente recursivas.

Nuestra aproximación permite hacer suposiciones en consultas y vistas no solo de tuplas,sino también de relaciones intensionales de la base de datos.

Nuestros desarrollos semánticos han servido de base para un sistema concreto que ade-más se integra con los sistemas de bases de datos actuales como una capa adicionalextendiendo el SGBDR.

Para concluir, destacamos de nuevo el sistema educativo DES [103] dado que soportatambién SQL e hipótesis en consultas y vistas como HR-SQL. Este sistema admite tam-bién su misma sintaxis (incluyendo la palabra reservada assume para incorporar hipótesis).Actualmente funciona tanto en SWI-Prolog como en SICStus Prolog. Sin embargo, su fun-cionamiento es independiente del Prolog subyacente. SQL hipotético en DES se basó enprincipio en el trabajo de [42], i.e., se modificaban las relaciones afectadas por las hipótesisantes de una consulta y se restauraban las relaciones originales una vez que se devolvía elresultado. La implementación actual se fundamenta en el trabajo de Bonner [13] y su mo-tor de inferencia deductivo traduce SQL a Datalog hipotético mediante técnicas de tabling[108]. Esta aproximación permite una expresividad similar a HR-SQL pero sus fundamentossemánticos son distintos y las implementaciones a las que dan lugar hacen uso también detécnicas distintas. En concreto HR-SQL implementa una semántica de punto fijo por estratossiguiendo la aproximación de HH:(C) más cercano al enfoque de [122, 74].

Con este sistema concluimos la revisión del estado del arte sobre las capacidades de nues-tros sistemas en comparación con otros sistemas de bases de datos. Continuamos presentandoel marco HH:(C) en el primer capítulo de la memoria.

21

Capítulo 2

Negación, hipótesis ycuantificadores en bases de datosdeductivas con restricciones

En este capítulo presentamos los fundamentos teóricos y la implementación del esque-ma de bases de datos HH:(C). El esquema está basado en la lógica HH (fórmulas deHarrop Hereditarias), un lenguaje más rico que las cláusulas de Horn, dado que incluyecuantificadores en los objetivos así como implicación y disyunción. El lenguaje de basede datos HH:(C) combina consultas hipotéticas (derivadas de la implicación), negación,restricciones y también nuevos cuantificadores que no aparecen en otros sistemas de ba-ses de datos deductivas. Para dotar de semántica al lenguaje definimos un cálculo depruebas, así como una semántica de punto fijo extendiendo técnicas de estratificaciónpropias de bases de datos deductivas. Asimismo probamos que esta semántica operacio-nal es correcta y completa con respecto al cálculo. La semántica de punto fijo guía laimplementación de un sistema que hemos desarrollado en SWI-Prolog. A lo largo delcapítulo presentamos tres instancias del sistema de restricciones con sus correspondientesresolutores: booleanos, dominios finitos y reales. Hemos integrado además en el sistemafunciones de agregación y restricciones de integridad habituales en los sistemas de basesde datos relacionales.

2.1. Introducción

En este capítulo resumimos la investigación que aparece en las publicaciones [A.1, A.2,A.3]. En estas publicaciones presentamos, en primer lugar, el esquema de bases de datosHH:(C) y sus ventajas frente a otros sistemas de bases de datos deductivas. También eneste resumen se aborda la implementación del sistema basado en el esquema teórico. Co-mo lenguaje para la implementación hemos usado SWI-Prolog [129] y hemos adaptado susresolutores de restricciones para ajustarlos a las instancias del sistema de restricciones delesquema.

23

Publicaciones

A continuación hacemos un repaso de los contenidos referentes a HH:(C) que podemosencontrar en cada una de las publicaciones a las que se refiere este capítulo:

La mayoría de los contenidos del capítulo corresponden al material publicado en [A.3]que describe tanto fundamentos teóricos de HH:(C) como la descripción de su imple-mentación. También en esta publicación encontramos cómo se incorporan las funcionesde agregación en el sistema. El cómputo de las funciones de agregación hace uso de laestratificación, que se usa inicialmente para incorporar la negación a las bases de datosdeductivas, como podemos ver en la sección 2.3.4.

Cronológicamente, el primer artículo publicado de esta memoria es [A.1]. Se trata de unartículo que describe el sistema que implementa el esquema HH:(C). En él se puedenencontrar todas las características del sistema, un resumen de los resultados teóricos dela semántica que provee de significado a las bases de datos del lenguaje, así como unadescripción de los resolutores que implementan los sistemas de restricciones concretos:booleanos, dominios finitos y reales. Este artículo presenta el núcleo del sistema, queen aquel momento carecía de funciones de agregación y de restricciones de integridad.

Finalmente, en [A.2] abordamos la incorporación de restricciones de integridad al sis-tema. Se ha aprovechado nuestro marco de bases de datos deductivas con restricciones(concretamente el uso de las técnicas de estratificación basadas en la construcción deun grafo de dependencias asociado a una base de datos) para añadir esta funcionalidad.Dada la expresividad de nuestro lenguaje, nuestra definición de restricciones de integridades muy intuitiva y además asegura un funcionamiento correcto en presencia de cómputoslocales derivados del cálculo hipotético (veáse la sección 2.3.5).

Contribuciones

A lo largo del capítulo presentamos el nuevo enfoque que supone HH:(C) como lenguajede BDD con restricciones y sus aportaciones a este campo. A continuación resumimos estasaportaciones:

HH:(C) extiende a la lógica que soporta el lenguaje de base de datos. En suscomienzos HH(C) [66, 35] carecía de negación. Sin embargo, aportaba recursos expresivoscomo son el cuantificador universal y la implicación, que no tienen otros lenguajes deprogramación lógica con restricciones [52]. De la misma forma que surge Datalog a partirde Prolog, HH:(C) surge a partir de HH(C) incorporando la negación para poder aplicarloal campo de las BDD. Al estar basado en un lenguaje de programación lógica extendido,aporta al campo de las bases de datos las capacidades ya mencionadas heredadas dellenguaje original HH: mayor expresividad, nuevas conectivas y la posibilidad de representarinfinitos datos. Además, dado que hemos incorporado negación a HH:(C), el lenguajees completo con respecto al álgebra relacional que fundamenta las bases de datosrelacionales.

La ventaja del uso de restricciones. Nuestro esquema utiliza restricciones, las cua-les permiten representar infinitos datos y aportan una gran expresividad y sencillez aldesarrollar bases de datos en las que podemos definir intensionalmente la información.Además el lenguaje de restricciones del esquema HH:(C) es más expresivo que el habitualen bases de datos con restricciones.

24

De las aportaciones de nuestro lenguaje de bases de datos la más novedosa es la po-sibilidad de definir consultas hipotéticas (denominadas what-if queries en el modelorelacional), que son muy útiles para tomar decisiones basadas en datos especulativos.Además aporta sobre el trabajo de Datalog hipotético [15, 16, 13] la posibilidad de queen las consultas aparezcan cuantificadores y la capacidad de incluir restricciones sobrelas variables de las cláusulas. El resultado es un lenguaje con mayor expresividad y ca-paz de representar nuevas bases de datos y consultas que no se pueden definir en otrossistemas.

Se ha desarrollado un marco teórico para HH:(C). Presentamos su semántica depruebas y su semántica de punto fijo estratificado que sirve de semántica operacional.Además, en [A.3] demostramos la equivalencia entre ambas.

El esquema sirve como base para una implementación real. En la última sec-ción de este capítulo demostramos la utilidad práctica del sistema basado en HH:(C)implementado en SWI-Prolog y presentamos ejemplos reales que procesa este sistema.

El lenguaje de bases de datos HH:(C) y el sistema que lo implementa son los resultadosde estas tesis. En las siguientes secciones desarrollamos cada una de las características deHH:(C) que hemos introducido.

El lenguaje HH:(C)

Como ya hemos señalado, HH(C) se presentó como un lenguaje de programación lógica.Más adelante, se introdujo la negación, resultando HH:(C), y se demostró que podía ser degran utilidad dentro de los lenguajes de base de datos debido a algunas características queiremos desgranando a lo largo del capítulo.

Comenzamos abordando algunas nociones como recurso genérico sobre la sintaxis y lasemántica pretendida de HH:(C).

Introducimos primero la sintaxis. Para construir elementos de lenguaje HH:(C), necesita-mos considerar un conjunto numerable de variables y una signatura que debe contener:

símbolos de predicados definidos para construir átomos y que representan los nombresde las relaciones de las bases de datos,

símbolos de predicado predefinidos que incluyen, al menos, el de comparación » y unsímbolo de predicado de igualdad ı para construir restricciones atómicas y,

constantes y símbolos de operación dependientes de un sistema de restricciones concretopara construir términos (junto con las variables).

Formalmente podemos clasificar las fórmulas bien construidas en HH:(C) en cláusulas D(que definirán las relaciones de la base de datos) y objetivos G (que defininirán las consultasa la base de datos). Se definen recursivamente mediante las siguiente reglas:

D ::= A j G ) A j D1 ^D2 j 8xD

G ::= A j:A j C j G1 ^ G2 j G1 _ G2 j D ) G j C ) Gj 9xG j 8xG

donde A es un átomo, i.e., una fórmula de la forma p(t1; : : : ; tn), p es un símbolo de predicadodefinido con aridad n, y t1; : : : ; tn son términos. Para construir términos disponemos de unconjunto de símbolos de constantes y de operaciones, y de un conjunto de variables.

25

Representamos una restricción con el símbolo C. Veremos la forma de construir restriccio-nes más adelante cuando especifiquemos su sintaxis y las condiciones que tiene que cumplirun sistema de restricciones. Obsérvese que no se permite la negación en la cabeza de unacláusula, solo en su cuerpo.

Las restricciones en el esquema HH:(C)

En esta sección presentamos en primer lugar el sistema de restricciones de HH:(C) y, másadelante, las ventajas de su uso en la programación lógica y las bases de datos.

La restricciones que usamos pertenecen a un sistema genérico

C = hLC;‘Ci ;

donde LC es el lenguaje de restricciones y ‘C es una relación de deducibilidad. ` ‘C C expresaque la restricción C se infiere en el sistema de restricciones mediante la relación ‘C a partirdel conjunto de restricciones `. Imponemos unas condiciones mínimas a C para que pueda serun sistema de restricciones válido:

LC debe contener, al menos, toda fórmula de primer orden construida usando:

› > (true), ? (false),

› símbolos de predicados predefinidos,

› las conectivas ^;:, y el cuantificador existencial 9.

Con respecto a ‘C:

› Debe incluir las reglas de inferencia de la lógica intuicionista para las conectivas ycuantificadores que hemos mencionado previamente.

› Debe ser compacto, i.e., ` ‘C C implica que existe un conjunto finito `0 „ `, talque `0 ‘C C.

› Debe ser cerrado bajo sustitución, i.e., ` ‘C C implica que `ff ‘C Cff para todasustitución ff.

La incorporación de la negación (o conectiva :) al lenguaje HH demanda que la negaciónse incorpore también en el sistema de restricciones C. Decimos que una restricción C esC-satisfactible si ; ‘C 9C, donde 9C representa el cierre existencial de C. C y C0 son C-equivalentes si C ‘C C0 y C0 ‘C C.

En los ejemplos, además de las condiciones mínimas incluimos sistemas que incorporanotras conectivas como _, constantes, operadores aritméticos y más predicados predefinidos(>,–, . . . ). En concreto, para el sistema de restricciones sobre reales R, LR es el lenguajede primer orden con todas las conectivas habituales, incluyendo la negación. De esta forma,definimos ` ‘R C cuando AxR [ ` ‘ı C, donde AxR es la axiomatización de Tarski de losnúmeros reales [117] y ‘ı es la relación de deducibilidad de la lógica clásica con igualdad.Un ejemplo concreto de restricción dentro de este sistema es :(x ı 0;3); que escribimos parasimplificar como x 6ı 0;3.

La ventaja del uso de restricciones en el contexto de la programación lógica es queañaden de forma natural una manera de tratar con conjuntos de infinitos datos usando repre-sentaciones finitas. Las bases de datos con restricciones [64] heredan esta capacidad, comoveremos en el ejemplo 1.

26

Para los ejemplos que presentamos a continuación, usamos la instancia HH:(R) si estamosusando solo datos reales, sin embargo usamos HH:(FR) si usamos valores reales R juntocon otros de dominio finito F. Encontramos más información de este sistema híbrido en [34].

Ejemplo 1 En este ejemplo usamos la instancia HH:(R) para describir regiones en el plano.Identificamos una región mediante su función característica (una función booleana que hacecorresponder el valor cierto a los puntos de la región y falso al resto de puntos del plano). Porejemplo, un rectángulo queda determinado por su esquina inferior izquierda (x1; y1) y su es-quina superior derecha (x2; y2). A continuación, vemos cómo expresar su función característicamediante el uso de cláusulas:

8 rectangle(x1; y1; x2; y2; x; y)( x – x1 ^ x » x2 ^ y – y1 ^ y » y2:

Podemos representar un rectángulo como un conjunto infinito de puntos de forma finitamediante el uso de restricciones. Desde nuestra perspectiva, si pensamos en la expresividad delas bases de datos, se trata de una característica muy útil dado que nos permite definir fórmulasmás complejas que las habituales. A pesar de que las bases de datos fueron concebidaspara tratar con datos finitos, mediante restricciones ampliamos su capacidad a los conjuntos(potencialmente) infinitos de datos. Esta es una aportación de HH:(C).

El objetivo rectangle(0; 0; 4; 4; x; y)^rectangle(1; 1; 5; 5; x; y) representa la intersecciónde dos rectángulos, cuya respuesta representamos usando una restricción:

(x – 1) ^ (x » 4) ^ (y – 1) ^ (y » 4)

Por otro lado, un círculo se puede representar usando su centro y su radio, también medianterestricciones no lineales:

8 circle(xc; yc; r; x; y)( (x` xc) ˜ ˜2 + (y ` yc) ˜ ˜2 » r ˜ ˜2:

Con esta base de datos propuesta podemos consultar si un punto (x; y), que cumplex2 +y2 = 1 (circunferencia centrada en el origen, de radio 1), se encuentra dentro del círculocon centro (0; 0) y radio 2:

8 (x2 + y2 ı 1) circle(0; 0; 2; x; y))

En este caso la respuesta es falso. Este ejemplo no se puede expresar en otras bases de datosdeductivas, dado que, además de restricciones, incluye cuantificador universal e implicación.Tampoco Datalog hipotético [13] ni Datalog con restricciones [95] podrían tratar con esteejemplo dado que utiliza el cuantificador universal. �

Ya que contamos con todos los elementos que lo caracterizan, pasamos a presentar HH:(C)como lenguaje de bases de datos.

HH:(C) como lenguaje de bases de datos

En la definición de bases de datos HH:(C) podemos diferenciar entre los hechos (átomosbásicos) que definen la parte extensional de la base de datos y las cláusulas, con cabeza ycuerpo, a las que denominamos la parte intensional de la base de datos. Esta parte intensionalse corresponde con las vistas de las bases de datos relacionales y la parte extensional conlas tuplas de una relación.

27

También establecemos una correspondencia entre los objetivos de los lenguajes lógicos ylas consultas a la base de datos. En nuestra propuesta la respuesta a una consulta es unarestricción que representa las tuplas o valores que hacen cierta la consulta. Una base dedatos, denotada por ´, es un conjunto de cláusulas.

En primer lugar, mostramos cómo los operadores del AR se pueden expresar mediantepredicados en nuestro lenguaje de bases de datos. En la figura 2.1 vemos cómo expresar laproyección, la selección, el producto cartesiano, la unión y la diferencia en HH:(C).

› Proyección. E = ıi1; : : : ; ıik(E1)

8 e(xi1; : : : ; xik)( e1(x1; : : : ; xn):

› Selección. E = fft1„t2(E1)

8 e(x1; : : : ; xn)( e1(x1; : : : ; xn) ^ C„:

› Producto Cartesiano. E = E1 ˆ E2

8 e(x1; : : : ; xn; xn+1; : : : ; xm)( e1(x1; : : : ; xn) ^ e2(xn+1; : : : ; xm):

› Unión. E = E1 [ E2

8 e(x1; : : : ; xn)( e1(x1; : : : ; xn) _ e2(x1; : : : ; xn):

› Diferencia. E = E1 ` E2

8 e(x1; : : : ; xn)( e1(x1; : : : ; xn) ^ :e2(x1; : : : ; xn):

E y Ei son expresiones relacionales expresadas como predicados e

y ei respectivamente. C„ es la restricción que se corresponde con la

condición t1„t2.

Figura 2.1: Operadores relacionales y sus correspondientes predicados de HH:(C).

Para la selección se necesita que el sistema de restricciones C incorpore el operador „ parapoder construir la restricción correspondiente. Por ejemplo, ff$i»$j (selección de todos los ticon i » j) se corresponde con xi » xj. Como hemos mencionado en la sección anterior,cualquier sistema de restricciones C debe incluir al menos ı y » para poder conformar unainstancia válida de HH:(C).

Ejemplo 2 Como ejemplo de uso de la negación y haciendo referencia al ejemplo 1, definimosla región rayada de la figura 2.2 como el resultado de restar al área del rectángulo externoel rectángulo interno mediante el objetivo:

rectangle(0; 0; 4; 4; x; y) ^ :rectangle(1; 1; 3; 3; x; y):

(0,0)

(1,1)

(3,3)

(4,4)

Figura 2.2: Regiones del plano

28

la respuesta se expresa mediante la restricción:

(y > 3 ^ y » 4 ^ x – 0 ^ x » 4) _ (y – 0 ^ y < 1 ^ x – 0 ^ x » 4)_(y – 0 ^ y » 4 ^ x > 3 ^ x » 4) _ (y – 0 ^ y » 4 ^ x – 0 ^ x < 1)

Con esta restricción expresamos los datos del área buscada (sobre un conjunto de puntosen el espacio infinitos) con una representación finita. �

En el modelo relacional se trabaja con datos finitos, que se pueden definir directamentecomo tablas de la base de datos e indirectamente mediante el uso de vistas. Una relación secompone de un nombre y un número determinado de argumentos (su aridad). El significadode una relación se corresponde con un conjunto de tuplas.

En HH:(C) un predicado tiene también un nombre y una aridad. El significado se corres-ponde con una conjunto restricción sobre sus argumentos.

Para establecer la correspondencia entre las bases de datos relacionales y las bases dedatos HH:(C) introducimos un ejemplo que muestra como expresar las mismas relacionesdesde ambas aproximaciones relacional y deductiva respectivamente.

Ejemplo 3 En la figura 2.3 definimos extensionalmente algunas relaciones (client y mort-gageQuote), e intensionalmente otra (accounting) usando los operadores del AR y tambiéncomo predicados de HH:(C).

(a) Definimos relaciones extensionalmente como tablas:

name balance salary

smith 2000 1200brown 1000 1500

mcandrew 5300 3000client

name quote

brown 400mcandrew 100mortgageQuote

(b) Ejemplo de relación definida usando los operadores del AR:

accounting ıname;salary;quote(ffquote–100(client on mortgageQuote))

(c) Las mismas 3 relaciones anteriores usando el lenguaje de bases de datos HH:(C):

client(smith; 2000; 1200): mortgageQuote(brown; 400):

client(brown; 1000; 1500): mortgageQuote(mcandrew; 100):

client(mcandrew; 5300; 3000):

8name 8salary 8quote 8balance (accounting(name; salary; quote)(client(name; balance; salary) ^mortgageQuote(name; quote) ^quote – 100):

Figura 2.3: Relaciones de AR y Predicados HH:(C)

Las relaciones extensionales se definen como tablas en el modelo relacional (a) y comopredicados extensionales en nuestro lenguaje de bases de datos (los hechos de (c)). La rela-ción accounting se define usando operadores en el modelo relacional (b) y como predicado

29

intensional en HH:(C). En el ejemplo vemos la correspondencia de los operadores relacionalescon las cláusulas de nuestro lenguaje (c).

En el AR el resultado del cómputo de la vista accounting es la siguiente relación:

name salary quote

brown 1500 400

mcandrew 3000 100

accounting

En HH:(C) esta vista equivale a la consulta accounting(n; s; q) que tiene por respuesta:

(n ı brown ^ s ı 1500 ^ q ı 400) _ (n ı mcandrew ^ s ı 3000 ^ q ı 100):

Un lector familiarizado con modelos de bases de datos deductivas podrá argumentar quenuestro ejemplo puede ser fácilmente trasladado al modelo Datalog con restricciones [58].Este ejemplo trata de introducir la correspondencia entre AR y HH:(C). Más adelante loextendemos, en el ejemplo 6, para mostrar las nuevas funcionalidades que aporta nuestrolenguaje. �

Consultas hipotéticas

Una de las principales aportaciones de HH:(C) es la capacidad de formular consultashipotéticas. En las bases de datos deductivas el principal referente sobre cómputo de consultashipotéticas es Bonner [13] y la reciente implementación de Datalog hipótetico [102]. En amboscasos se permite añadir hechos temporalmente a la base de datos.

Presentamos un sencillo ejemplo que demuestra que nuestro lenguaje es tan expresivocomo los enfoques anteriores, i.e., un ejemplo donde añadimos temporalmente tuplas a labase de datos en el contexto de una consulta.

Ejemplo 4 Suponemos que tenemos una base de datos con información de las vías de trenconstruidas entre distintos puntos del mapa, como podemos ver en la figura 2.4.

railway(madrid; talavera).railway(talavera; navalmoral).railway(navalmoral; caceres).railway(caceres; badajoz).

Con la siguiente cláusula incluimos todos los puntos que pueden ser unidos mediante lasvías del tren como cierre transitivo del predicado railway.

8 railway(x; y)( railway(x; z) ^ railway(z; y).

Ahora supongamos que definimos cuáles de estos puntos tienen estaciones que permitencoger un tren mediante el predicado station(x). Por ejemplo, tenemos station(madrid)

y station(caceres). De manera intuitiva podemos definir que se puede viajar a dos puntossiempre que:

tengamos estación en el origen,

exista estación en el destino y,

haya vías que conecten estas dos estaciones.

30

Figura 2.4: Algunos puntos de la base de datos de vías del tren.

Esto se escribe en HH:(C) como:

8 travel(x; y)( station(x) ^ railway(x; y) ^ station(y).

Podemos consultar qué estaciones tendríamos que construir en nuestra red de trenes paraviajar desde Madrid a Talavera. Para ello formulamos la consulta:

station(x)) travel(madrid; talavera)

La restricción respuesta es x ı talavera. �

Expresividad de HH:(C)

Una vez introducidas la sintaxis y las características de HH:(C), introducimos más ejemplosque demuestran las ventajas de nuestro lenguaje de bases de datos frente a los habituales.Más concretamente, veremos ejemplos en los que manejamos de forma integrada:

la capacidad de formular consultas hipotéticas,

el uso del cuantificador existencial y el universal,

la capacidad de proporcionar resultados usando restricciones.

Para los siguientes ejemplos volvemos a usar una instancia de HH:(C) que combina res-tricciones de dominio finito y real, i.e., HH:(FR).

Ejemplo 5 Utilizamos la siguiente base de datos para una compañía aérea, compuesta por elpredicado flight(Origin;Destination; T ime) que representa la base de datos extensionalde vuelos directos desde Origin hasta Destination y con duración Time:

flight(mad; par; 1;5).flight(par; ny; 10).flight(london; ny; 9).

31

Además, travel(Origin;Destination; T ime) representa la base de datos intensional.La idea tras esta relación es la capacidad de viajar desde Origin hasta Destination sidisponemos de un tiempo Time, con la posibilidad de concatenar más de un vuelo.

8 travel(x; y; t)( flight(x; y; w) ^ t – w.8 travel(x; y; t)( flight(x; z; t1) ^ travel(z; y; t2) ^ t – t1 + t2.

Comenzamos con las consultas de la base de vuelos. Por ejemplo en HH:(C) se puedeconsultar cuál es la duración que debe tener un vuelo desde Madrid hasta Londres para poderviajar desde Madrid a Nueva York en un tiempo de a lo sumo 11 horas.

flight(mad; london; t)) travel(mad; ny; 11)

La respuesta es la restricción 11 – t+ 9 que es FR-equivalente a la respuesta t » 2.Otro ejemplo de consulta hipotética es preguntar si se puede volar desde Madrid hacia

algún sitio en un tiempo mayor que 1;5 horas. El objetivo

8t(t > 1;5) 9y travel(mad; y; t))

es además un ejemplo de uso de cuantificación universal y de uso explícito del cuantificador9 para no devolver una respuesta concreta para y. La respuesta a esta consulta es >.

Comparando HH:(C) con el cálculo relacional, podemos formular la consulta

:(9t flight(x; y; t)) ^ x 6ı y

o su equivalente (8t :flight(x; y; t))^x 6ı y, para determinar las ciudades sin vuelos directosentre ellas. Esta fórmula no es segura en el cálculo relacional de dominios [27] dado quecontiene una fórmula negada cuyas variables libres no están limitadas. En HH:(C) esto sedelega en el sistema de restricciones de la instancia concreta como veremos más adelante enlas secciones teóricas.

De hecho (8t :flight(x; y; t))^ x 6ı y es una consulta válida en HH:(FR) que tiene porrespuesta la restricción:

(x 6ı mad _ y 6ı par) ^ (x 6ı par _ y 6ı ny) ^ (x 6ı lon _ y 6ı ny)

en el domino de ciudades de la base de datos. De nuevo, se trata de una consulta que nopuede ser formulada en Datalog, en este caso debido al cuantificador universal.

Supongamos la situación más realista de que los vuelos puedan retrasarse. Un retraso enel vuelo entre x e y de un tiempo d 1 se representa mediante el predicado delay(x; y; d). Acontinuación definimos el predicado itinerary para representar los posibles viajes en la basede datos teniendo en cuenta los retrasos mediante delay:

8 itinerary(x; y; t; 0)( flight(x; y; t) ^ :delay(x; y; d):

8 itinerary(x; y; t; d)( flight(x; y; t1) ^ delay(x; y; d) ^ t – t1 + d.8 itinerary(x; y; t; d)( itinerary(x; z; t1; d1) ^ itinerary(z; y; t2; d2)^

^ t – t1 + t2 ^ d = d1 + d2

Al igual que en la relación travel, t representa un valor mayor o igual a la duración totaldel itinerario y d el retraso acumulado. Las tuplas de delay pueden estar en la base de datosextensional o bien se pueden asumir en una consulta, como por ejemplo en:

8x(delay(par; x; 1) ^ delay(mad; par; 0;5))) itinerary(mad; ny; t; d):

1Por simplicidad en este ejemplo suponemos que no hay más de un vuelo con el mismo origen y destino.

32

que representa el tiempo necesario para volar desde Madrid a Nueva York, asumiendo quecualquier vuelo desde París tiene un retraso de una hora, y además, el vuelo de Madrid a Parísse retrasa media hora.

Para resolver esta consulta, la cláusula

8x(delay(par; x; 1) ^ delay(mad; par; 0;5))

se añade localmente a la base de datos y se descarta tras el cómputo. �

Ejemplo 6 A continuación extendemos la base de datos para un banco del ejemplo 3. Labase de datos extensional viene dada mediante las relaciones que dan información de losclientes y de su cuota hipotecaria en euros:

%client(Name;Balance;Salary): %mortgageQuote(Name;Quote).client(smith; 2000; 1200): mortgageQuote(brown; 400).client(brown; 1000; 1500): mortgageQuote(mcandrew; 100).client(mcandrew; 5300; 3000).

También tenemos información extensional de las deudas pendientes y las oficinas asignadasa cada cliente:

%branch(Office; Name): %pastDue(Name; Amount).branch(lon; smith): pastDue(smith; 3000).branch(mad; brown): pastDue(mcandrew; 100).branch(par;mcandrew):

Para simplificar añadimos implícitamente la restricción adicional de que un cliente puedetener a lo sumo una cuota hipotecaria.

La primera relación de la parte intensional de la base de datos representa los clientes quetienen asignada una cuota hipotecaria.

8 hasMortgage(x)( mortgageQuote(x; y).

La siguiente relación nos informa de los clientes en números rojos, como aquéllos cuyadeuda es mayor que su saldo.

8 debtor(x)( client(x; y; z) ^ pastDue(x;w) ^ w > y.

Con la siguiente relación se determina la tasa de interés que se aplicará a cada cliente:

8 interestRate(x; 2)( client(x; y; z) ^ y < 1200:

8 interestRate(x; 5)( client(x; y; z) ^ y – 1200:

Usamos la relación newMortgage(Name;Quote) para ampliar la hipoteca con una nuevacuota Quote a clientes Name que no tienen un saldo negativo y verifican alguna de las doscondiciones: si no tiene ya una hipoteca o bien si su nueva cuota no es superior al 40% desu sueldo. En general, no se concederá una nueva hipoteca si su cuota supera el 40% delsalario del cliente.

8 newMortgage(x;w)( client(x; y; z) ^ :debtor(x)^:hasMortgage(x) ^ w » 0;4 ˜ z:

8 newMortgage(x;w)( client(x; y; z) ^ :debtor(x)^mortgageQuote(x;w0) ^ w + w0 » 0;4 ˜ z:

33

Además definimos una relación que incluye los clientes que tienen una hipoteca.

8 gotMortgage(x)( newMortgage(x;w).

Si cumple los requisitos para una nueva hipoteca se le puede dar un crédito personal dehasta 6.000. O bien, en caso contrario esta cantidad asciende a un intervalo entre 6.000y 20.000, dado que involucra menos riesgo. La relación personalCredit(Name; Amount)formaliza de forma sencilla todas las condiciones que acabamos de imponer.

8 personalCredit(x; y)( (gotMortgage(x) ^ y < 6000) _(:gotMortgage(x) ^ y – 6000 ^ y < 20000):

Definimos un nuevo predicado del lenguaje que nos dará información del sueldo de clientescon cuota hipotecaria superior a 100 mediante la relación accounting(Name; Salary;Quote)

que es la misma accounting del ejemplo 3.

8 accounting(x; z; w)( client(x; y; z) ^mortgageQuote(x;w) ^ w – 100.

A continuación mostramos ejemplos de consultas. Un primer ejemplo sencillo sería con-sultar si todos los clientes están en números rojos.

8x debtor(x):

cuya respuesta obvia es ?.La existencia de deudores con un descubierto superior a 1.000 se pueden averiguar con:

9x 9y debtor(x) ^ pastDue(x; y) ^ y > 1000:

y la respuesta es >. Estamos usando cuantificadores sobre las variables x e y, dado queno queremos respuesta explícita sobre ellas. En otro caso obtendríamos como respuesta unarestricción sobre estas variables.

La siguiente consulta devuelve la tasa de interés de un cliente cualquiera si este tiene unbalance mayor que 2.000.

8x 9y 9z (client(x; y; z)) (y > 2000) interestRate(x;w))):

En este ejemplo usamos una implicación anidada para formular una consulta hipotéticacuya respuesta es la restricción w ı 5.

Usamos la conectiva : para preguntar a qué clientes se les puede conceder una hipotecade 400 pero no un crédito.

newMortgage(x; 400) ^ :personalCredit(x; y):

y la respuesta es x ı mcandrew ^ y – 6000 ^ y < 20000, que quiere decir que podríamosconcederle la hipoteca solamente al cliente McAndrew pero no un crédito entre 6.000 y20.000. �

Con este ejemplo concluimos la presentación sobre las posibilidades expresivas que ofreceHH:(C) que no aparecen en otros lenguajes de bases de datos. A continuación abordamos losfundamentos teóricos del esquema y su implementación.

34

2.2. Fundamentos teóricos de HH:(C)

En esta sección presentamos cómo se han adaptado los formalismos previos para darfundamento teórico al esquema HH:(C). Los resultados que aparecen en esta sección, asícomo sus demostraciones, se pueden encontrar en [A.3]. Hemos definido dos semánticas paranuestro lenguaje de bases de datos: una semántica de pruebas y una semántica de punto fijo.Además, terminamos esta sección demostrando que ambas son equivalentes.

Para el lenguaje HH(C) se habían definido anteriormente una semántica de pruebas [66]y una semántica de punto fijo [35] que sentaron las bases del esquema. En esta secciónpresentamos cómo se incorpora la negación al esquema. Al introducir la negación en ellenguaje surge el problema de asegurar la existencia de un único modelo mínimo para una basede datos. Como hemos señalado en el capítulo anterior, este problema se ha abordado en enel campo de la programación lógica mediante distintas propuestas [21, 2, 107, 37, 124, 100].Sin embargo, en este trabajo el manejo de la negación cobra especial importancia dado que:

Hemos incluido la negación en el lenguaje para que sea completo con respecto al AR.

HH:(C) maneja consultas hipotéticas que conllevan cómputos locales debido a los cam-bios temporales que se producen en la base de datos al introducir hipótesis, y por tantoes necesario adaptar técnicas conocidas en el campo deductivo como el grafo de depen-dencias y la estratificación para poder asegurar un cómputo correcto (como vemos en lasección 2.2.2).

2.2.1. Semántica de pruebas

El cálculo que fundamenta la semántica denotacional de HH:(C) se denomina UC: (porsus siglas en inglés Uniform Calculus handling Constraints and Negation). Se trata de uncálculo de secuentes que surge al añadir la negación al cálculo UC que se introdujo en [66]para formalizar HH(C).UC: combina reglas de inferencia de la lógica intuicionista con la relación de deducibilidad

‘C de un sistema de restricciones genérico C. La idea es que una consulta G será cierta parauna base de datos ´ si la restricción C se satisface. El cálculo UC: lleva a cabo solamentedemostraciones uniformes en el sentido de Miller et. al. [75], i.e., demostraciones orientadasa los objetivos. Las reglas del cálculo aparecen en la figura 2.5.

La notación ´; ` ‘UC: G denota que el secuente ´; ` ‘ G se prueba usando las reglas deUC:. En general, si ´;C ‘UC: G, entonces C se denomina restricción respuesta a la consultaG en la base de datos ´, y se identifica con la respuesta de una consulta G que formulamosa dicha base de datos ´. Los secuentes tienen la forma ´; ` ‘ G, donde las bases de datos´ y los conjuntos de restricciones ` están a la izquierda y las consultas a la derecha. Unademostración de un secuente es un árbol finito. La raíz del árbol es el secuente que queremosprobar, los nodos internos son también secuentes que son instancias de la conclusión de unaregla del cálculo siendo sus hijos las premisas de dicha regla, mientras que los nodos hojason de la forma ` ‘C C.

En la figura 2.5 utilizamos la noción de elaboración de un programa ´ (véase su de-finición en la sección 3.1.2 de [A.3]). Las cláusulas elaboradas son fórmulas de la forma8x1 : : :8xn(G ) A). Sin embargo, las cláusulas dentro de G no tienen que estar elaboradas.Además si A0 y A son átomos de la forma p(t01; : : : ; t

0n) y p(t1; : : : ; tn), respectivamente,

A0 ı A representa la restricción t01 ı t1 ^ : : : ^ t0n ı tn.

35

` ‘C C´; ` ‘ C (C)

´; ` ‘ 9x1 : : :9xn((A0 ı A) ^ G)

´; ` ‘ A (Clause) (˜); donde

8x1 : : :8xn(G ) A0) es una variante de una fórmula que aparece en elab(´)

´; ` ‘ Gi´; ` ‘ G1 _ G2

(_) (i = 1; 2)´; ` ‘ G1 ´; ` ‘ G2

´; ` ‘ G1 ^ G2(^)

´; D; ` ‘ G´; ` ‘ D ) G

())´; `; C ‘ G

´; ` ‘ C ) G()C)

´; `; C ‘ G[y=x] ` ‘C 9yC´; ` ‘ 9xG (9)(˜˜)

´; ` ‘ G[y=x]

´; ` ‘ 8xG (8)(˜˜)

` ‘C :C para todo ´;C ‘ A´; ` ‘ :A (:)

(˜) x1; : : : ; xn frescas para A(˜˜) y frescas para las fórmulas en la conclusión de la regla

Figura 2.5: Reglas para el cálculo de secuentes UC:

La incorporación de la negación hace necesaria la extensión de la noción de derivabilidad.Con la regla (:) se formaliza la derivación de átomos negados. Para interpretar una consulta:A para una base de datos ´ se obtiene una restricción respuesta C. Si C0 es una respuestaposible a la consulta A sobre ´, entonces C ‘C :C0. Consideramos (:) una metarregladado que su premisa toma todas las derivaciones de la forma ´;C ‘ A del átomo A. Enla práctica hay una derivación para :A cuando el conjunto de restricciones respuesta de A,sobre ´, es finito. A continuación mostramos un ejemplo de árbol de derivación para la reglade la negación y terminaremos la sección abordando la terminación de este cálculo.

Ejemplo 7 Volviendo a los ejemplos 1 y 2, sea ´ el conjunto:

f8 (x – x1 ^ x » x2 ^ y – y1 ^ y » y2 ) rectangle(x1; y1; x2; y2; x; y))g,

y G ” rectangle(0; 0; 4; 4; x; y);:rectangle(1; 1; 3; 3; x; y). La restricción respuesta

C ” ((y > 3) ^ (y » 4) ^ (x – 0) ^ (x » 4)) _((y – 0) ^ (y < 1) ^ (x – 0) ^ (x » 4)) _((y – 0) ^ (y » 4) ^ (x > 3) ^ (x » 4)) _((y – 0) ^ (y » 4) ^ (x – 0) ^ (x < 1))

se puede obtener mediante la siguiente deducción:

C ‘R 9a1 9a2 9b1 9b2 9x1 9y1(a1 ı 0 ^ x1 ı x ^ : : :)´;C ‘ 9a1 9a2 9b1 9b2 9x1 9y1(a1 ı 0 ^ x1 ı x ^ x1 – a1^a2 ı 0 ^ y1 ı y ^ x1 » b1 ^ b1ı4 ^ y1–a2 ^ b2ı4 ^ y1»b2)

(C)

´;C ‘ rectangle(0; 0; 4; 4; x; y)(Clause)

D

´;C ‘ rectangle(0; 0; 4; 4; x; y) ^ :rectangle(1; 1; 3; 3; x; y)(^)

donde D es una deducción para ´;C ‘ :rectangle(1; 1; 3; 3; x; y) cuyos últimos pasos tienenla forma:

C ‘R :(x–1 ^ y–1^x » 3 ^ y » 3)

< resto de la derivación >

´;x – 1 ^ y – 1^x » 3 ^ y » 3

‘ rectangle(1; 1; 3; 3; x; y)

´;C ‘ :rectangle(1; 1; 3; 3; x; y)(:)

36

Para asegurar un proceso de resolución de objetivos correcto y completo debemos imponeralgunas condiciones de finitud que hacen viable la metarregla (:).

Se debe garantizar que el conjunto de respuestas para un átomo (que aparece negado dentrode un objetivo) se pueda calcular en un número finito de pasos. En concreto hay que garantizarque no hay infinitas restricciones respuesta para un átomo. Para ello es necesario imponer unascondiciones de terminación a los sistemas de restricciones C (similares a las condiciones deseguridad que aparecen definidas en [96]). Al imponer unas condiciones de compacidad a Cgarantizamos que se pueda representar la respuesta en en lenguaje de C, mediante un conjuntofinito de restricciones y, que por tanto, el cómputo termina si se garantiza monotonía.

El uso de la estratificación es una técnica adecuada para garantizar monotonía en presenciade la negación. Además otra ventaja del uso de la estratificación es que se puede combinarde forma sencilla con nuestra noción de sistema de restricciones lo que lleva a una semánticaoperacional para HH:(C) que dota de significado a toda la base de datos y tiene en cuentacondiciones de seguridad para la terminación del cómputo.

A continuación explicamos la semántica de punto de fijo de HH:(C) que fundamenta laimplementación del sistema y terminamos demostrando la equivalencia entre la semántica depruebas presentada aquí y la de punto fijo.

2.2.2. Semántica de punto fijo

Al igual que hemos hecho con la semántica de pruebas, en esta sección presentamos lasprincipales aportaciones del trabajo frente a lo anteriormente publicado en [35] para interpretarlas bases de datos HH:(C). La semántica original de punto fijo de HH(C) estaba basada en unarelación de forzado entre programas, conjuntos de restricciones y objetivos que establecíansi una interpretación hacía cierto un objetivo G, el contexto h´;`i de un programa ´ y unconjunto de restricciones `. Seguimos una aproximación similar a la que aparece en el capítulo3 de [122], dado que usamos técnicas de estratificación de una base de datos que se basa enla definición de un grafo de dependencias.

Introducimos informalmente las interpretaciones como funciones que se aplican a cada basede datos ´ y devuelven como resultado conjuntos de pares (compuestos por un átomo y surestricción asociada). Las interpretaciones dependen siempre del contexto al que se aplicandado que al calcular consultas con implicación o bien ´ o bien ` pueden aumentar localmentecon el antecedente de la implicación. En nuestro esquema las interpretaciones son funcionesque dan significado a toda la base de datos devolviendo conjuntos de pares (A; C).

Grafo de dependencias y estratificación

Dado un conjunto de cláusulas y objetivos ˘, el grafo de dependencias correspondiente,DG˘, es un grafo dirigido tal que:

los nodos son los símbolos de predicados definidos en ˘,

y los arcos vienen determinados por los símbolos de implicación de las fórmulas.

Como hemos dicho, cuando construimos el grafo de dependencias, debemos tener en cuentaque las implicaciones pueden aparecer dentro de un objetivo, y por tanto, en el cuerpo de unacláusula. Una implicación de la forma F1 ) F2 produce arcos en nuestro grafo desde lossímbolos de predicado definidos que aparecen dentro de F1 hacia cada símbolo de predicadodefinido que aparece dentro de F2.

37

¬

¬

¬

debtor

newMortgage

gotMortgage

hasMortgage

client

interestRate

pastDue

accounting

personalCredit

mortgageQuote

Figura 2.6: Grafo de dependencias del ejemplo 6

Los arcos pueden estar etiquetados negativamente (y los representamos con el símbolo :)si el átomo correspondiente aparece negado a la izquierda de la implicación. En el caso delas restricciones, dado que no contienen símbolos de predicados definido, no producen estetipo de dependencias.

Ejemplo 8 Sea ´ la base de datos para el banco del ejemplo 6. En la figura 2.6 se muestrael grafo de dependencias para ´ (menos el predicado branch que se representaría como unnodo aislado). �

A continuación formalizamos la definición de dependencias entre predicados.

Definición 1 Dado un conjunto de fórmulas ˘, su correspondiente grafo de dependenciasDG˘, y dos predicados cualesquiera p y q, se dice que:

q depende de p si hay un camino desde p hasta q en DG˘.

q depende negativamente de p si hay un camino desde p hasta q en DG˘ con, al menos,un arco etiquetado negativamente. �

Partiendo de la definición de dependencias, definimos la noción de estratificación paraun conjunto de predicados. Utilizamos esta noción siguiendo la aproximación de [122] paraasegurar que el significado de un predicado esté completamente calculado antes de aplicarla negación sobre él, i.e., por ejemplo en el programa p(x)( :q(x) el significado de q debeser calculado antes que el significado de p.

Definición 2 Sea ˘ un conjunto de fórmulas y P = fp1; : : : ; png el conjunto de los símbolosde predicados definidos en ˘. Una estratificación de ˘ es una función s : P ! f1; : : : ; ng talque s(p) » s(q) si q depende de p, y s(p) < s(q) si q depende negativamente de p. Decimosque ˘ es estratificable si podemos encontrar una estratificación para él.

Ejemplo 9 Una estratificación para la base de datos ´ del ejemplo 5 incluirá todos lospredicados dentro del estrato 1 menos nondeltravel y trip, que pertenecerán al estrato 2.

38

Intuitivamente, este hecho nos muestra que para evaluar nondeltravel, el resto de predicados(menos trip) deberían haber sido evaluados previamente (en particular delayed). Cuando seformula la consulta:

G ” 9t deltravel(x; y; t)) delayed(x; y);

el conjunto aumentado ´ [ fGg continúa siendo estratificable. Sin embargo, si se formula

G0 ” trip(mad; lon; T )) delay(mad; ny; t);

el conjunto extendido ´ [ fG0g será no estratificable. Este hecho se debe a que G0 añade ladependencia trip ! delay, y consecuentemente, cualquier estratificación s debe satisfacers(trip) » s(delay) » s(delayed) < s(nondeltravel) » s(trip), lo cual no es posible. �

En adelante, suponemos que existe una estratificación s para el conjunto ´[fGg. Tambiénusamos la noción de estrato de un átomo (estrato de su símbolo de predicado). Finalmentemostramos la forma de extender esta noción a cualquier fórmula, o conjunto de fórmulas,siguiendo la siguiente definición:

Definición 3 Sea F un objetivo o una cláusula. El estrato de una fórmula F , denominadostr(F ), se define recursivamente como:

str(C) = 1

str(p(t1; : : : ; tn)) = s(p)

str(:A) = 1 + str(A)

str(F1�F2) = max(str(F1); str(F2)), donde � 2 f^;_;)gstr(QxF ) = str(F ), donde Q 2 f9; 8g

Además, el estrato de un conjunto de fórmulas ˘ es str(˘) = maxfstr(F ) j F 2 ˘g. �

Todas estas definiciones, junto con sus explicaciones y otros ejemplos se pueden encontrartambién en la Sección 5 de [A.3].

Interpretaciones estratificadas y relación de forzado

SeaW el conjunto de bases de datos estratificables con respecto a una estratificación fija s.Las interpretaciones y el operador de punto fijo se aplican sobre las distintas interpretacionesde bases de datos de W. Estas interpretaciones se calculan estrato por estrato mediante eluso de un operador de punto fijo que definimos más adelante (siguiendo una aproximaciónsimilar a [122]).

Sea At el conjunto de átomos abiertos, i.e., símbolos de predicado de una signatura aplica-dos a variables; y sea SLC el conjunto de fórmulas C-satisfactibles módulo C-equivalencia. Elconjunto AtˆSLC es finito dado que consideramos signaturas finitas y sistemas de restriccio-nes compactos. Una interpretación sobre un estrato i para una base de datos pertenecerá alconjunto de pares (A; [C]) 2 AtˆSLC; donde str(A) » i y [C] es el conjunto de restriccionesC-equivalentes a C.

Definimos formalmente una interpretación.

Definición 4 Sea i – 1. Una interpretación I sobre un estrato i es una función

I :W ! P(Atˆ SLC);

tal que, para todo ´ 2 W, si (A; [C]) 2 I(´) entonces str(A) » j. Denotamos con Ii alconjunto de interpretaciones sobre i. �

39

Para simplificar utilizamos la siguiente notación:

(A; C) 2 AtˆSLC, en vez de (A; [C]): asumiendo que este C es cualquier restricción querepresente a su clase de equivalencia [C].

[I(´)]i representa f(A; C) 2 I(´) j str(A) = ig:

Nótese que, si str(´) = k, entonces f[I(´)]i j 1 » i » kg es una partición de I(´). Paracada i – 1, definimos un orden sobre Ii como:

Definición 5 Sea i – 1 y I1; I2 2 Ii. I1 es menor o igual que I2 en el estrato i, denotado porI1 vi I2, siempre que para todo ´ 2 W se satisfagan las siguientes condiciones:

[I1(´)]j = [I2(´)]j, para todo 1 » j < i.

[I1(´)]i „ [I2(´)]i. �

Es sencillo concluir que para todo i – 1, (Ii;vi) es un retículo. La idea tras esta definiciónes, que cuando una interpretación sobre un estrato i aumenta, la información del estratoinferior permanece invariable. De forma que, si str(:A) = i, dado que str(A) = i ` 1, elsignificado de A en el estrato i no cambia y con ello podemos garantizar la monotonía inclusopara átomos negados.

Lema 1 Para cualquier i – 1, toda cadena de interpretaciones (Ii;vi), fIngn–0, tal queI0 vi I1 vi I2 vi : : : , tiene un supremo

Fn–0 In, que definimos como (

Fn–0 In)(´) =S

fIn(´) j n – 0g, para todo ´ 2 W.

La siguiente definición formaliza la noción de que una interpretación I hace cierta laconsulta G en el contexto de una base de datos ´, si se satisface la restricción C. Comohemos anticipado, asumimos que s es una estratificación válida para ´ y también para la basede datos extendida ´ [ fGg.

Definición 6 Sea i – 1. La relación de forzado �� entre pares I;´ y pares (G; C) (dondeI 2 Ii, str(G) » i, y C es C-satisfactible) se define recursivamente mediante las reglassiguientes. Cuando I;´ �� (G; C), decimos que (G; C) es forzado por I en el contexto de ´.

I;´ �� (C0; C) () C ‘C C0.

I;´ �� (A; C) () (A; C) 2 I(´).

I;´ �� (:A; C) () para cada (A; C0) 2 I(´), es cierto que C ‘C :C0. Si no existenpares de la forma (A; C0) en I(´), entonces C ” >.

I;´ �� (G1 ^ G2; C) () para cada i 2 f1; 2g, I;´ �� (Gi; C).

I;´ �� (G1 _ G2; C) () para algún i 2 f1; 2g I;´ �� (Gi; C).

I;´ �� (D ) G; C) () I;´ [ fDg �� (G; C).

I;´ �� (C0 ) G; C) () I;´ �� (G; C ^ C0).

I;´ �� (9xG; C) () existe C0 tal que I;´ �� (G[y=x]; C0), donde y no aparece libreen ´, 9xG, C, y C ‘C 9yC0.

I;´ �� (8xG; C) () I;´ �� (G[y=x]; C), donde y no aparece libre en ´, 8xG, C. �

40

La significado de la interpretación de un estrato viene dado por el menor punto fijo de unoperador continuo que transforma interpretaciones y que definimos a continuación:

Definición 7 Sea i – 1 un estrato. El operador Ti : Ii ! Ii transforma interpretacionessobre i como sigue. Para I 2 Ii, ´ 2 W y (A; C) 2 At ˆ SLC, se tiene (A; C) 2 Ti(I)(´)

cuando:

(A; C) 2 [I(´)]j para algún j < i, o

str(A) = i y hay una variante 8x1 : : :8xn(G ) A0) de una cláusula en elab(´), tal quelas variables x1 : : : xn no aparecen libres en A y se cumple que:

I;´ �� (9x1 : : :9xn(A ı A0 ^ G); C): �

Un aspecto importante del operador Ti es que, para una base de datos ´, Ti añade lainformación que se obtiene exclusivamente a partir de las cláusulas de ´ cuyas cabezas sonátomos del estrato i, y la información del estrato inferior permanece invariable. Nótese que,si str(A) = i, entonces str(9x1 : : :9xn(A ı A0 ^ G)) » i. El operador Ti es monótono ycontinuo, es decir:

Lema 2 (Monotonía de Ti) Sea i – 1 y I1; I2 2 Ii tal que I1 vi I2. Entonces se cumpleque:

Ti(I1) vi Ti(I2):

Lema 3 (Continuidad de Ti) Sea i – 1 y fIngn–0 una familia enumerable de interpretacio-nes sobre i, tal que I0 vi I1 vi I2 vi : : : . entonces:

Ti(Gn–0

In) =Gn–0

Ti(In):

Las demostraciones de los lemas anteriores se pueden encontrar en el apéndice A de [A.3].Al ser el operador Ti continuo para todo i – 1, por el teorema de Knaster-Tarski [118],

T1 tiene un mínimo punto fijo, denotado con fix1, tal que:

fix1 =Gn–0

T n1 (I?);

donde la interpretacion I? representa la función constante ;. Procediendo de manera similar,se puede definir una cadena:

fixi`1 vi Ti(fixi`1) vi Ti(Ti(fixi`1)) vi : : : ;vi T ni (fixi`1); : : : ;

para cada estrato i > 1,y podemos encontrar el siguiente punto fijo de dicha cadena:

fixi =Gn–0

T ni

(fixi`1):

En particular, si k es el estrato máximo en ´, fixk se simplifica escribiendo solamente fix.Por tanto, fix(´) representa los pares (A; C) tales que A se puede deducir de ´ si C sesatisface.

Mostramos estas nociones teóricas de manera práctica con el siguiente ejemplo.

Ejemplo 10 Dado el dominio finito [a; b; c], considérese el programa ´:

41

p(a).p(b).

8 t(x)( p(x).8 q(x)( :p(x).8 u(x)( :q(x).8 r(x)( (p(x)) q(x)).

En este programa p y t pertenecen al primer estrato, q y r al estrato 2 y u al estrato 3. Acontinuación mostramos la evolución del cálculo del punto fijo para cada uno de los estratos.La relación de forzado es no determinista y, por tanto, las restricciones que se muestran enel ejemplo corresponden a representantes de sus respectivas clases de equivalencia.

Para el primer estrato se considera el operador T1:

› La primera iteración corresponde a (T1(;))(´). Consideramos la cláusula p(a) ytratamos de probar si hay una C tal que:

;;´ �� ((x ı a) ^ >); C):

Una restricción válida es C ” (x ı a) y por tanto, el punto fijo de este primer estratocontiene el par (p(x); x ı a). Análogamente, también contiene el par (p(x); x ı b)usando la cláusula p(b), por tanto:

T1(;)(´) = f(p(x); x ı a); (p(x); x ı b)g:

Este operador también considera la cláusula que queda del estrato 1, sin embargo,al aplicarse sobre el conjunto vacío no existen pares para esta cláusula que debanaparecer en T1(;)(´).

› La segunda iteración corresponde a (T1(T1(;)))(´).

Utilizando ahora la cláusula restante del primer estrato, 8x(p(x)) t(x)), se tratade probar que:

T1(;);´ �� (9x0(x ı x0 ^ p(x0)); C):

Para eliminar el cuantificador existencial, de acuerdo con la definición, se sustituyex0 por una nueva variable y y se obtiene:

T1(;);´ �� (x ı y ^ p(y)); C0):

De forma que C ‘C 9yC0. Para la conjunción, siguiendo de nuevo la definición de larelación de forzado, se debe verificar T1(;);´ �� (x ı y; C0) y T1(;);´ �� (p(y); C0).Por ejemplo, la restricción C ” (x ı a) _ (x ı b) satisface las condiciones nece-sarias. Por lo tanto T1(;)(´) contiene el par (t(x); x ı a _ x ı b) que com-pleta el punto fijo del primer estrato2. Así, fix1 queda definido por el conjuntof(p(x); x ı a); (p(x); x ı b); (t(x); x ı a _ x ı b)g. En adelante, los pasos refe-rentes a la eliminación de 9 y ı no se mostrarán para simplificar.

Para el segundo estrato se aplica el operador T2, que opera sobre fix1 y sobre ´. Portanto, su primera y única iteración corresponde al cálculo de (T2(fix1))(´):

2 En esta iteración se vuelven a considerar p(a) y p(b) pero no se pueden añadir nuevos pares a los ya existentes.

42

› Haciendo uso de la cláusula 8x(:p(x)) q(x)); se tendrá que verificar:

fix1;´ �� (9x0(x ı x0 ^ :p(x0)); C):

Lo que lleva a que C debe verificar C ‘C :C0 para todas las C0 tales que (p(x0); C0) 2fix1(´). Para ello se usan los pares pertenecientes a fix1(´) y se obtienen (x ı a)

y (x ı b). Por lo que T2(fix1(´)) contiene el par (q(x); x 6ı a ^ x 6ı b).

› Consideramos ahora la cláusula 8x((p(x)) q(x))) r(x)). Comprobamos si:

fix1;´ �� (9x0(x ı x0 ^ p(x0)) q(x0)); C):

Lo cual nos lleva a aumentar ´ con p(x0) (según la relación de forzado) y tratarde probar si:

fix1;´ [ fp(x0)g �� (q(x0); C0):

No es posible encontrar, en este caso, una C0 que satisfaga esta relación y por tantoT2(fix1) = T

j2 (fix1) para j – 1, por lo que será un punto fijo fix2 para el estrato

2.

Para el tercer estrato se procede como en estratos anteriores aplicando T3 sobre fix2 ysobre ´. Utilizamos la cláusula del predicado en el tercer estrato 8x(:q(x) ) u(x)).Procediendo de manera análoga a los anteriores pasos, el par (u(x); x ı a _ x ı b)

aparece en (T3(fix2))(´) junto con los pares correspondientes a fix2(´).

Concluimos, mostrando el punto fijo final fix3 = f(p(x); x ı a); (p(x); x ı b);(t(x); x ıa _ x ı b); (q(x); x 6ı a ^ x 6ı b);(u(x); x ı a _ x ı b)g: �

Corrección y completitud

Decir que el esquema HH:(C) es correcto y completo con respecto al cálculo UC: esequivalente a sostener que la relación de forzado, considerando el punto fijo del últimoestrato de la base de datos para una una consulta concreta, coincide con la derivación en elcálculo UC: para dicha consulta.

Más concretamente si str(G) = i entonces fixi fuerza (G; C) en el contexto de ´ sí ysolo sí C es una restricción respuesta de G en ´. Esta demostración aparece con todo detalleen [A.3], sin embargo en esta memoria exponemos un resumen.

Las siguiente proposición presenta un resultado de equivalencia entre la semántica de puntofijo y la semántica de pruebas para el caso sin negación que utilizamos para la demostracióndel resultado final.

Proposición 1 Para todo i – 1, ´ 2 W, y para todo par (G; C) 2 G ˆ SLC, tal que G queno contiene negación, si str(G) » i entonces:

fixi;´ �� (G; C) () ´;C ‘UC: G:

Para el caso en que las consultas no tengan negación, la demostración de la correción y lacompletitud es similar a la que aparece en [35] para HH(C). Pasamos a presentar el resultadode corrección y completitud entre la semántica de punto fijo y el cálculo UC:.

Teorema 1 (Corrección y completitud) Para todo i – 1, ´ 2 W, y para todo par (G; C) 2G ˆ SLC, si str(G) » i entonces:

fixi;´ �� (G; C) () ´;C ‘UC: G:

43

Para demostrar este resultado es necesario hacer inducción sobre i. Al tratarse de unresumen, nos centramos en el caso en el que G se corresponde con :A dado que nos pareceel más representativo.

La proposición 1 captura el caso en que i = 1.

Para el caso i > 1, asumimos la siguiente hipótesis de inducción: para todo ´, G, C,con str(G) » i` 1 se cumple fixi`1;´ �� (G; C) () ´;C ‘UC: G.

› En la proposición 1 se hace inducción sobre la estructura de G, sin considerar el caso:A.

› Para el caso :A, partimos de que si fixi;´ �� (:A; C) () para todo C0 talque (A; C0) 2 fixi(´). Debemos demostrar C ‘C :C0 o bien que no existe dichaC0 y C ” >. Obviamente, str(A) » i ` 1, y lo anterior es equivalente a decir quepara toda restricción C0 tal que se cumpla fixi`i;´ �� (A; C0), debemos demostrarC ‘C :C0 o bien que no existe dicha C0 y C ” >.Aplicar la hipótesis de inducción es equivalente a decir que: o bien por cada C0 talque ´;C0 ‘UC: A, demostramos C ‘C :C0 o bien que no hay tal C0 y C ” >. Estoequivale a la definición de la regla del cálculo ´;C ‘UC: :A, como se puede ver enla definición 2.5:

` ‘C :C para todo ´;C ‘ A´; ` ‘ :A (:)

Todos los análisis de casos mencionados aparecen en el apéndice A de [A.3]. Finalmente,como consecuencia del teorema extraemos que:

(A; C) 2 fix(´) () ´;C ‘UC: A:

Lo que significa que, tal y como queríamos demostrar, los átomos del punto fijo de la basede datos son los que se derivan del cálculo utilizando la misma restricción en cada caso.

La ventaja de nuestra semántica de punto fijo es que se puede usar como base para laimplementación del sistema HH:(C) que presentamos a continuación. Para estos formalismosusamos un sistema de restricciones genérico C. Para el sistema podemos ver a C como unacaja negra capaz de comprobar la C-satisfactibilidad de una restricción de entrada C.

2.3. El sistema HH:(C)

En las secciones anteriores hemos presentado HH:(C) como lenguaje formal, así como dossemánticas. En esta sección presentamos una implementación de un sistema de bases de datosdeductivo con restricciones basado en dicho lenguaje.

En la implementación del sistema podemos distinguir dos bloques. El primero correspondea la implementación de la semántica de punto fijo. Como hemos señalado, esta implementa-ción está guiada por la definición de la semántica y es independiente del sistema de restric-ciones (véase la sección 2.3.6). La otra parte corresponde a la implementación del sistemade restricciones (véase la sección 2.3.3).

En esta sección se recopilan los contenidos aparecidos en [A.1] respecto a la implementa-ción del núcleo del sistema y las restricciones de integridad introducidas en [A.3].

El sistema está disponible en la dirección

https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems

44

https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems

junto con una batería de ejemplos y un manual de usuario.En los ejemplos del sistema usamos una sintaxis concreta para las cláusulas bastante

cercana a la sintaxis de Prolog, donde los predicados y los símbolos de constantes comienzancon minúscula y las variables comienzan con mayúscula. Utilizamos “,"para la conjunción ^y “;"para la disyunción _. Además usamos not para la negación, D=>G para la implicaciónD ) G, ex(X,G) representa 9x G, fa(X,G) se usa para 8x G y constr(Dom,C) para unarestricción C junto con su dominio Dom. El sistema también requiere declaración explícitade tipos para los predicados mediante:

type(predicate(dom_1, ..., dom_n)):

En general se pueden usar distintos resolutores para la misma base de datos; sin embargo,no se pueden combinar en una misma restricción. Es decir, no puede haber relaciones quecombinen, por ejemplo, dominio finito y dominio real en la parte intensional de la base dedatos, como interestRate(I; 2;0) :- client(I; B; S); constr(real; B < 1200;0) donde Ies una variable de dominio finito y B es una variable de dominio real.

Por tanto, los predicados con argumentos de distintos dominios son solamente los definidosextensionalmente y se usarán para que la base de datos sea más legible. A continuación,presentamos la declaración explícita de los dominios enumerados del ejemplo 6:

domain(client_dt,[smith,brown,mcandrew]).domain(branch_dt,[lon,ny,mad,par]).

Para evitar la combinación de dominios durante el cómputo en la base de datos definimosla relación client_id que asocia un identificador a cada cliente:

client_id(smith,1.0)client_id(brown,2.0)client_id(mcandrew,3.0).

El resto de predicados extensionales del ejemplo 6 se definen igual, salvo que cambiamosel nombre de los clientes por su identificador. Es decir, escribimos interestRate(I,2.0)en lugar de interestRate(I,brown).

Añadimos además un ejemplo de uno de los predicados intensionales para mostrar lasintaxis que procesa el sistema.

interestRate(I,2.0):- client(I,B,S), constr(real,B<1200.0).interestRate(I,5.0):- client(I,B,S), constr(real,B>=1200.0).

2.3.1. Fases de cómputo

A continuación esquematizamos las distintas fases de cómputo del sistema al calcular elpunto fijo de una base de datos Delta. Los distintos módulos del sistema aparecen en lafigura 2.7.Al calcular el punto fijo de una base de datos Delta, el sistema:

1. Comprueba e infiere los tipos de los predicados.

2. Construye el grafo de dependencias para Delta.

45

3. Calcula la estratificación s para Delta, si existe. Si no existe el sistema lanza un mensajede error y se detiene.

4. Si la fase anterior tiene éxito, calcula fix(Delta) (véase la sección 2.3.6).

El sistema mantiene en memoria el punto fijo fix(Delta), la estratificación s y el grafode dependencias para Delta.

Para la implementación de implicaciones anidadas y los agregados en el sistema hemosaprovechado la noción de grafo de dependencias para asegurar un cálculo correcto. Además delas dependencias que se explican en la sección 2.2.2, hemos añadido otras nuevas dependenciasnegativas para tratar las funciones de agregación (véase la sección 2.3.4) y las implicacionesanidadas (véase la sección 2.3.7).

Figura 2.7: Fases del sistema

Pasamos a presentar las distintas fases de cómputo de nuestro sistema con la base de datosdel ejemplo 6. Como hemos dicho, en primer lugar se infieren los tipos de los predicados y secalcula la estratificación. Para los predicados del ejemplo 6 la estratificación calculada es:

s = [(client; 1); (pastDue; 1); (mortgageQuote; 1); (debtor; 1);

(interestRate; 1); (hasMortgage; 1); (accounting; 1); (client_id; 1);

(branch; 1); (newMortgage; 2); (gotMortgage; 2); (personalCredit; 3)].

A continuación mostramos el punto fijo fix(Delta) de la base de datos propuesta que debecontener los pares (compuestos por un átomo y una restricción asociada) que corresponden ala parte extensional de la base de datos, por ejemplo:

(client(3;0; 5300;0; 3000;0); true);

así como los pares que se corresponden con la parte intensional:

46

En el estrato 1:

(debtor(1.0), true),(interestRate(2.0,2.0), true),(interestRate(X,Y),((X=1.0, Y=5.0); (X=3.0, Y=5.0))),(accounting(X,Y,Z),((Y=400.0, Z=1500.0, X=2.0); (Y=100.0, Z=3000.0, X=3.0))),(hasMortgage(X), (X=2.0;X=3.0))

En el estrato 2:

(newMortgage(X,Y),((Y=<200.0, X=2.0); (Y=<1100.0, X=3.0))),(gotMortgage(X), (X=2.0; X=3.0))

En el estrato 3:

(personalCredit(X,Y),((Y>=6000.0, Y<20000.0, X/=2.0, X/=3.0);(Y<6000.0, X=2.0); (Y<6000.0, X=3.0)))

2.3.2. Consultas

Cuando se formula una consulta G al sistema, este calcula, si existe, una nueva estra-tificación s0 para el conjunto Delta [ fGg. Si no existe dicha s0 la consulta no puede sercalculada. Si por el contrario, el sistema logra calcular la nueva s0, usamos el punto fijoalmacenado para calcular la respuesta.

Tal y como explicamos en la sección 2.2, la respuesta C debe satisfacer

fix(Delta); Delta �� (G; C):

Esta relación de forzado se implementa haciendo uso del predicado:

force(Delta; Stratification; I; G; C)

que explicamos en la sección 2.3.6. Este predicado usa la estratificación actual s. Para elcómputo del resultado de una consulta distinguimos dos casos:

Si s = s0, entonces como fix(Delta) se calcula con s, la restricción respuesta C sepuede obtener ejecutando force(Delta,s,fix(Delta),G,C).

Si s 6= s0, es porque G contiene alguna subconsulta de la forma D=>G’.

› En este caso el grafo de dependencias de Delta [ fGg (del cual se ha obtenido s0)contiene los arcos correspondientes a Delta además de aquellos correspondientes alas implicaciones de G.› La nueva estratificación s0 es válida también para Delta y con ella se obtiene elmismo punto fijo fix(Delta).

› Por tanto, al igual que sucede en el caso anterior, la restricción respuesta C seobtiene ejecutando el predicado force con la nueva estratificación:

47

force(Delta,s0,fix(Delta),G,C).

Necesitamos la información de la nueva estratificación s0 porque al resolver un objetivoG, debemos tener en cuenta también las implicaciones anidadas dentro de ella. Cuandovamos a resolver D=>G’, de acuerdo con la definición de la relación de forzado, aumen-tamos localmente nuestra base de datos Delta con la cláusula D dado que la respuestaa la consulta G depende de fix(Delta [ fDg).

En conclusión, puesto que la estratificación s0 se ha definido teniendo en cuenta dichasimplicaciones, es también una estratificación válida para Delta [ fDg. Por tanto podemosasegurar un cómputo correcto de la consulta en ambas situaciones.

Siguiendo con el ejemplo del banco, mostramos el cómputo para diferentes consultasincluyendo casos en los que la estratificación inicial es válida y casos en los que no. Comoprimer ejemplo preguntamos si todo cliente pertenece a la oficina de Madrid:

hhnc> fa(A,branch(mad,A)).

Esta consulta no requiere ningún cambio en el grafo de dependencias y se puede resolverusando el punto fijo almacenado. Una cuantificación universal sobre un dominio finito seconvierte en una restricción conjuntiva. Dicha restricción se resuelve instanciando la variablecon cada elemento del dominio. El resultado es trivialmente:

Answer: false

Como ejemplo de negación, podemos preguntar los clientes que no tienen hipoteca:

hhnc> not(hasMortgage(N)).

Esta consulta tampoco cambia la estratificación y el sistema utiliza de nuevo el puntofijo almacenado. Para obtener la respuesta a esta consulta el sistema busca en el punto fijotodas las restricciones C de la forma (hasMortgage(N),C) y después crea una conjunción denegaciones de las restricciones (N=3.0 y N=2.0 presentes en el punto fijo para hasMortgage)que se envía al resolutor. Finalmente la respuesta obtenida es:

Answer: N/=3.0, N/=2.0

Por último presentamos un ejemplo de consulta que fuerza al cambio de la estratificación.Este ejemplo no tiene un significado natural, solo trata de ilustrar esta situación:

hhnc> newMortgage(N,R) => interestRate(N,R).

La consulta introduce una nueva dependencia entre newMortgage e interestRate, portanto, interestRate pasa a estar en el estrato 2 en la nueva estratificación s0. Sin embargo,como hemos explicado antes, el punto fijo almacenado es válido para calcular esta consulta,que tiene por resultado:

Answer: (R=2.0, N=2.0); (R=5.0, N=1.0); (R=5.0, N=3.0)

De está forma, aunque la consulta demande cambiar la estratificación, podemos usar elpunto fijo almacenado.

48

2.3.3. Implementación de los resolutores

Para nuestro sistema implementamos tres sistemas de restricciones como posibles instanciasdel esquema HH:(C): booleanos, reales y dominios finitos.

Los sistemas de restricciones usan la siguiente notación concreta para cierto, falso, funcio-nes booleanas y cuantificadores:

true; false; not; ex(X; C):

Al igual que Prolog usamos “;"para la disyunción y “,"para la conjunción. Tambiénincluimos entre otros los siguientes operadores de comparación:

=; = =; >; >=:

que tienen el significado habitual.Las restricciones numéricas incluyen operadores arítmeticos (como +, -, : : :) y símbolos

de función (como por ejemplo abs para el valor absoluto). Además, los booleanos y losdominios finitos admiten el cuantificador universal (fa(X,C)).

Para los dominios finitos proporcionamos una restricción de dominio “X in Range”, dondeRange es un subrango de valores que se construye con V1..V2, que denota el conjunto devalores comprendidos en el intervalo cerrado entre V1 y V2, y R1\/R2, que representa launión de rangos.

Para la implementación de los sistemas de restricciones, i.e., para implementar ‘C, se hatomado como punto de partida la relación de deducibilidad de la lógica clásica con igualdad.Esta relación de deducibilidad satisface los requisitos mínimos que imponemos a los sistemasde restricciones en el fundamento teórico (véase la sección 2.1).

Como esquematizamos en la figura 2.8, para la implementación de esta relación hemosdesarrollado una interfaz genérica solve(I; C; SC) para la relación C ‘C SC. En general solvegenera o bien una restricción respuesta SC a partir de C, si es satisfactible, o false en otrocaso.

Para manejar consultas con agregados hemos tenido que añadir a la interfaz solve lainterpretación I sobre la que se calcula las funciones de agregado.

Figura 2.8: Interfaz genérica de los resolutores del sistema.

49

SC es también una restricción del sistema de restricciones C, pero simplificada y más legible,que puede ser una restricción simple o bien una disyunción de restricciones simples. Llamamosrestricción simple a aquella que no contiene ni disyunciones, ni cuantificadores ni negaciones.

La interfaz genérica solve se implementa con el predicado:

solve(+Interpretation,+Constraint,-SolvedConstraint)

que acepta como entrada una restricción Constraint y devuelve una restricción respuestaSolvedConstraint bajo una Interpretation. Para la implementación de los resolutoreshemos utilizado los de SWI-Prolog [129, 119]. Los resolutores de HH:(C) están implemen-tados como una capa superior sobre estos resolutores de SWI-Prolog porque, en general,las restricciones de nuestro sistema son más complejas que las que admiten los resolutoressubyacentes. Por tanto, la idea tras la interfaz solve es:

1. Discriminar el tipo de restricción para decidir a qué resolutor (reales, dominios finitos obooleanos) la envía.

2. Transformar dicha restricción en una restricción más sencilla y válida como entrada de losresolutores de SWI-Prolog. Las restricciones que los resolutores subyacentes resuelvense consideran restricciones primitivas.

3. Enviarla a los resolutores subyacentes para obtener una respuesta.

4. Finalmente, recomponer el resultado.

Como ejemplo de la implementación de los resolutores, consideramos el sistema de res-tricciones de los dominios finitos FD. Para esta instancia, nuestros resolutores asocian a losvalores (no enteros en general) otros valores numéricos enteros del sistema subyacente, i.e.,antes de enviar al resolutor una restricción, se reescribe usando el valor entero correspondientey tras la resolución se hace el proceso inverso.

Para restricciones más complejas (cuantificaciones y disyunciones) que los resolutores SWI-Prolog no manejan, se han implementado resolutores específicos (veánse los apéndices C.2 yC.3 de [A.3]).

2.3.4. Funciones de agregación

Las funciones de agregación se utilizan habitualmente en los sistemas de bases de datosrelacionales para obtener valores de resumen partiendo de conjuntos de valores no necesa-riamente numéricos. En esta sección explicamos cómo extendemos nuestro sistema de basesde datos deductivo con restricciones para tratar con agregados en el contexto del sistema derestricciones.

Cuando tratamos de añadir funciones de agregación al lenguaje de restricciones de nuestrosistema de bases de datos, debemos tener en cuenta algunas características. Las funcionesde agregación toman un conjunto de valores no necesariamente númerico y devuelven unvalor único. Además, dado que la restricción respuesta puede representar un conjunto infinito,algunos agregados (por ejemplo el recuento count) pueden no tener sentido. Para resolverestos obstáculos hemos aprovechado ciertos aspectos de la semántica de punto fijo de HH:(C).

Por un lado, la semántica de punto fijo estratificado que ha sido diseñada para tratar lanegación ha resultado un marco adecuado para incorporar funciones de agregación. Dadoque debemos garantizar la monotonía del cómputo, cuando se calcula el punto fijo para

50

un estrato dado, el resultado de una función de agregación sobre este estrato no debecambiar dinámicamente.

Por ejemplo, el recuento de tuplas para un estrato se puede averiguar cuando ese estratose ha calculado totalmente. Para asegurarnos de que este cómputo ha finalizado, antes decalcular la función de agregación, nos interesa que la relación a la que hace referenciala función de agregación esté en un estrato inferior, para lo que se añadirán nuevasdependencias al grafo, de forma análoga a cómo manejamos la negación en el sistema.

Los agregados se pueden representar como funciones del lenguaje de restricciones y, portanto, para implementar las funciones de agregación hemos delegado su cómputo alcorrespondiente resolutor de restricciones.

Por otro lado, para calcular el resultado de una función de agregación sobre un predicadop para cada par (p(x1; : : : ; xn); C) en la interpretación actual, se debe cumplir que Crestringe cada variable x1; : : : ; xn a un valor concreto. Es decir, no podemos calcularsumatorio de los valores de variable X en el predicado p si tenemos el par (p(X),X>3)dado que su valor sería infinito. En el caso contrario, nuestro sistema no es capaz deobtener un resultado para esta función.

A continuación presentamos mediante unos ejemplos las funciones de agregación en nuestrosistema. Hemos implementado las funciones count (recuento) sobre un predicado, tambiénsum (sumatorio), avg (media), min (mínimo) y max (máximo) sobre las variables de unpredicado. Las funciones de agregación aparecen dentro de una restricción del sistema:

constr(dom,res_agg).

Donde dom representa el dominio de dicha restricción y res_agg representa una restricciónque puede incluir funciones de agregado. Un ejemplo concreto de res_agg es:

X = min(p(A; B; C); B))

para calcular la función de agregado mínimo sobre el predicado p. El resultado es una res-tricción X = val donde val es el mínimo de los valores que toma la variable B para todoslos pares asociados al predicado p en el punto fijo de la base de datos. La única que presentauna notación distinta es la función de recuento que tiene como único argumento el predicadoal que se aplica.

Los detalles de la implementación se pueden encontrar en el apéndice C.2 de [A.3]. Pasa-mos a presentar algunos ejemplos prácticos.

Ejemplo 11 Por ejemplo, la vista:

liquid(Amount) :- constr(real,Amount=sum(client(N,B,S),B)).

definida para el ejemplo 6 de la base de datos para un banco, permite calcular la suma delos balances de los clientes, incluyendo la función de agregación sum en la restricción. Otroejemplo de uso de las funciones de agregación sum y count para definir el salario medio es:

avg_salary(Average) :- constr(real,Average=sum(client(N1,B1,S1),B1)/count(client(N2,B2,S2)).

51

La riqueza del lenguaje de bases de datos HH:(C) permite calcular una función de agre-gación combinada con una asunción que cambia los valores de la agregación. Por ejemplo, sepuede escribir la siguiente vista que contiene la hipótesis de que el cliente Brown tiene unadeuda,

view(X) :- pastDue(brown,200.0) =>constr(real,X=sum(pastDue(N,A),A))

Esta vista devolverá la suma de todas las deudas de los clientes en la base de datos deejemplo supuesta una nueva deuda para Brown. Añadiendo esta vista a la base de datos,podemos calcular el resultado:

HHn(C)> view(X).

que tiene como respuesta X=3300. �

De igual forma a lo que ocurre con una cláusula con un átomo negado en su cuerpo, siuna de las cláusulas que definen un predicado p contiene un agregado sobre el predicado q, elcómputo de q debe haber finalizado antes del comienzo del cómputo de p. Esta condición sepuede conseguir fácilmente introduciendo una dependencia negativa desde q hacia p, lo quegarantiza que el estrato de q es menor que el estrato de p, es decir en la estratificación secumple que s(q) < s(p). Para los ejemplos anteriormente propuestos se debe cumplir que:

s(client) < s(liquid), s(client) < s(avg_salary),

y también que:

s(pastDue) < s(view).

2.3.5. Restricciones de integridad

El término integridad de datos se refiere a la corrección de los datos en una base de datoscon respecto a las restricciones de integridad especificadas por el usuario. La integridad delos datos almacenados puede perderse de muchas maneras diferentes. Por ejemplo, puedenañadirse datos no válidos a la base de datos tales como un pedido que especifica un productono existente. Estas restricciones deben ser definidas por la persona que modela la base dedatos.

En esta sección presentamos nuestra propuesta de implementación de restricciones deintegridad [A.2] para el sistema HH:(C). La idea tras esta propuesta es:

En primer lugar, el usuario define un predicado utilizando el lenguaje HH:(C) para es-pecificar la restricción de integridad.

Durante el cómputo del punto fijo, se le asocia una restricción a dicho predicado.

› Si la restricción se satisface y se simplifica a true, el cómputo termina con norma-lidad.

› Si por el contrario, se simplifica a false la restricción de integridad se considera nocumplida, lo que conlleva que el cómputo se detenga y se muestre un mensaje deerror.

En general, al implementar restricciones de integridad se deben tener en cuenta variosaspectos sobre la implementación del sistema que detallamos a continuación.

52

Declaración de las restricciones de integridad

Como primer ejemplo de las restricciones de integridad volvemos a extender el ejemplo 6.La restricción de clave primaria (que evitaría que se introduzcan dos clientes con el mismoidentificador) se puede especificar definiendo una relación en la base de datos para la que surestricción respuesta es true, si no se cumple la condición de que todos los clientes tengandiferente identificador. En concreto añadiendo la siguiente cláusula a la base de datos:

pk_id_fails:- client_id(A,B), client_id(A,D),constr(real,(B/=D)).

Este ejemplo nos da una idea de que el problema de las restricciones de integridad sepuede representar de forma sencilla en el lenguaje HH:(C) dado que incluye restricciones.

Incumplimiento de una restricción de integridad

Nuestro sistema implementa un cómputo de punto fijo iterativo que va añadiendo tuplasen cada interpretación. Por tanto, un momento seguro para detectar que una restricciónno se cumple es cuando se añaden las tuplas. Nótese que, al permitir implicaciones, unarestricción puede no cumplirse también durante un cómputo local (necesario para una consultahipotética) que luego será posteriormente descartado. En el siguiente ejemplo vemos cómoes posible que una restricción de integridad no se cumpla en un cómputo local.

Ejemplo 12 Partimos del siguiente programa

p(0,0).q :- p(0,1) => r.

Si suponemos el predicado pk_p_fails como

pk_p_fails:- p(A,B), p(C,D), constr(data,(A/=C)).

en el cómputo local en el que se añade p(0,1) esta restricción de integridad se estaríaincumpliendo. Sin embargo, si se espera al final del cómputo para la comprobación, estaviolación nunca sería detectada. �

Por tanto, como acabamos de indicar una aproximación segura al problema es hacer lacomprobación cada vez que se añaden tuplas a la interpretación.

Respuesta al incumplimiento de una restricción de integridad

Lo habitual para sistemas de bases de datos es el lanzamiento de una excepción queindique la violación de la restricción de integridad. En nuestro sistema en cuanto se detectala violación de una restricción de integridad se detiene el cómputo y se lanza un mensaje deerror.

Implementación de clave primaria

En esta sección nos centramos en el manejo de la claves primarias aunque nuestro sistematambién permite restricciones de clave ajena y dependencias funcionales que se tratan deforma similar (véase la sección 3 de [A.2]).

53

Cuando un usuario define una base de datos y quiere incluir una restricción de integridad declave primaria sobre un subconjunto de argumentos del predicado pred debe añadir además unpredicado adicional de la forma pk(pred(X1:::XN); (Xi:::Xj)); donde (Xi:::Xj) es el subconjuntode variables de (X1; ::::; XN) que conforman la clave primaria.

Volviendo al ejemplo 6, utilizando un aserto de Prolog introducimos el predicado que sedebe añadir a la base de datos para especificar la restricción de clave primaria sobre unarelación que asigna una contraseña a cada identificador de cliente.

:- pk(client_pwd(Id,Pwd),Id).

Que significa que Id es la clave primaria de client_pwd. Por tanto, si ya tenemos en labase de datos:

client_pwd(1,123)

esta restricción impide que haya otra contraseña asignada al identificador 1.Como hemos señalado, el sistema delega en el resolutor de restricciones el cómputo de

las restricciones de integridad. Calcula una restricción respuesta que será cierta si se cumplela restricción y falsa en caso contrario. En caso de de que la restricción sea falsa el cómputose detiene y se muestra el mensaje de error. Todos los detalles de la implementación de lasrestricciones de integridad de nuestro sistema, así como un ejemplo paso a paso se puedenencontrar en [A.2].

2.3.6. Cómputo de la semántica de punto fijo

La implementación del sistema se basa en la semántica de punto fijo que hemos presen-tando en la sección 2.2.2. Sin embargo, debido a la presencia de implicaciones anidadas, haydeterminados aspectos de la implementación que requieren una explicación adicional. En es-ta sección se mostrará una visión general de dicha implementación presentando cómo se hatratado la relación de forzado para el caso de la implicación. En adelante los resolutores derestricciones serán invocados como cajas negras mediante la llamada a su interfaz genéricasolve.

Asumimos que ´ es una base de datos estratificable cuya estratificación s ha sido calculadaen anteriores fases del cómputo. Como ya hemos señalado tenemos almacenada ´ junto conla estratificación s (que representamos como una lista Stratification). El punto fijose calcula estrato por estrato, aunque en ocasiones, algún estrato deberá ser recalculadode manera local debido a la presencia de implicaciones anidadas. Este caso se explica condetalle en la sección 2.3.7.

Para comenzar, el predicado:

fixPointStrat(+Delta, +Stratification, +St, -Fix)

calcula Fix = fixSt(Delta) haciendo uso de Stratification, siendo Delta una base dedatos tal que str(Delta) = k. Este predicado devuelve fixk(Delta) mediante el cómputosucesivo de los puntos fijos de los estratos anteriores que van desde St = 1 hasta St = k.

Cada punto fijo se calcula iterando el operador que hemos introducido en la definición 7.Como hemos señalado en secciones anteriores, la implementación de la relación de forzadose corresponde con el predicado:

force(+Delta,+Stratification,+I,+G,-C)

54

Recordemos que la relación de forzado para el cálculo del punto fijo utiliza una interpreta-ción I = T n

i(fixi`1)(Delta), para algún n – 0 y un determinado estrato i > 0. La llamada

a force tendrá éxito si se cumple que:

T ni

(fixi`1); Delta �� (G; C).

El predicado force se implementa de manera determinista, haciendo una distinción decasos sobre la sintaxis del objetivo G. Todo el código de la implementación de la relación deforzado aparece en el apéndice D de [A.3].

Finalmente nótese que el predicado force(Delta,Stratification,G,C) debe construiruna restricción respuesta C tal que I; Delta �� (G,C). Esta restricción respuesta puede ser obien una restricción simple o una disyunción de restricciones como hemos visto en la secciónanterior.

Todos los detalles de la implementación tanto de la relación de forzado, como el códigoProlog se explican en [A.1]. Además, encontramos todo el código de la implementación deforce en el apéndice D de [A.3].

2.3.7. El caso de la implicación

La implementación de force(Delta,I,(D=>G),C) requiere una explicación adicional quesigue la presentación que aparece en la sección 8.1 de [A.3]. A continuación presentamos elcódigo Prolog que implementa el forzado de la implicación en el sistema.

force(Delta,Stratification,I,(D=>G),C) :- !,elab(D,De),localClauses(De,Ls),addLocalClauses(Ls,Delta,Delta1),getMaxStrat(G,Stratification,StG),fixPointStrat(Delta1,Stratification,StG,Fix),force(Delta1,Stratification,Fix,G,C).

Los predicados prolog: elab, localClauses y addLocalClauses se utilizan para trans-formar las cláusulas del programa en una representación interna que maneja el sistema yaparecen explicados en detalle en el apéndice D de [A.1].

Siguiendo la definición de la relación de forzado (véase la definición 6), la base de datosDelta se aumenta con la cláusula D (la hipótesis). Hasta el momento, la interpretación I seha calculado con respecto a la base de datos original Delta. Si consideramos el estrato i yla iteración n entonces (A; C) 2 I significa (A; C) 2 T n

i(I 0)(Delta), donde I 0 es el punto fijo

del estrato i` 1 construido para Delta. Según la teoría, el siguiente paso sería probar:

T ni

(I 0); Delta [ fDg �� (G; C):

Pero el problema es ¿cómo calculamos T ni

(I 0)(Delta[ fDg)? En este caso nuestra inter-pretación I no es válida por dos motivos:

En primer lugar, la relación I(´) „ I(´ [ fDg) no se satisface para todo I;´ y D.

En segundo lugar, porque I se ha construido para la base de datos Delta. En concreto,el punto fijo I 0 se ha calculado para Delta, y representa el punto fijo fixi`1(Delta).

55

Por lo tanto, no tenemos la información necesaria del conjunto buscado, i.e., la informaciónpara la base de datos extendida:

T ni

(I 0)(Delta [ fDg):

El problema descrito es que el operador de punto fijo Ti no es constructivo para el casode la implicación debido al aumento del conjunto de cláusulas. Para resolver este problemase han impuesto unas restricciones sintácticas que garanticen que el cómputo sea correctoy terminante, de nuevo haciendo uso de la estratificación y del grafo de dependencias comoexplicamos a continuación.

Sea StG = maxfSt j (p; St) 2 Stratification, p un símbolo de predicado en Gg. Elpunto fijo del estrato StG para Delta [ fDg se calcula localmente y debemos probar que:

fixStG; Delta [ fDg �� (G; C):

Consideramos ahora una cláusula en Delta de la forma A:-D=>G, tal que i = str(A).Partiendo de la definición 3, podemos deducir StG » i. Durante el cálculo de fixi(Delta),el par (A,C) se añadirá a la interpretación I y, siguiendo la definición de la relación deforzado, debemos probar que:

I, Delta �� (9x(A ı A’ ^ D=>G), C)

Tras la eliminación de los cuantificadores existenciales, ejecutamos en primer lugar:

force(Delta,Stratification,I,A ı A’,C)

y después la llamada a:

force(Delta,Stratification,I,(D=>G),C).

Este segundo force llamará a su vez a:

fixPointStrat(Delta1; Stratification; StG; Fix);

donde Delta1 = Delta [ fDg. Hasta aquí el cómputo es correcto. Sin embargo:

Si StG = i. Al tratar de construir fixi(Delta1) la cláusula A:-D=>G debe ser probadade nuevo, dado que el estrato de A es i. Este cómputo nos lleva a un bucle no terminanteporque se ejecuta force(Delta1,Stratification,I,(D=>G),C) y Delta1 aumentacon la elaboración de D una vez más, obteniendo Delta2, que se aumenta de nuevo conD y así de forma no terminante.

En caso de que StG < i, entonces Fix = fixStG(Delta1) se puede construir correcta-mente, puesto que al ser str(A) = i, la cláusula A:-D=>G no se considera en los estratosmenores que i.

En conclusión, necesitamos garantizar que StG < str(A). Para asegurar que se cumpla estacondición, el predicado con mayor estrato en G deberá depender negativamente del símbolo depredicado de A y, por tanto, para imponer esta condición añadimos al grafo de dependenciasarcos etiquetados negativamente desde todo símbolo de predicado definido que aparece en Ghacia el símbolo de predicado de A.

Esta condición adicional añade más restricciones sintácticas a la hora de que una base dedatos sea estratificable. Sin embargo, mantiene la implementación correcta y completa con

56

respecto al marco teórico. Esto supone una pérdida de expresividad dado que habrá más basesde datos no estratificables. Sin embargo, en la práctica, no es fácil encontrar ejemplos debases de datos reales que no cumplan esta nueva restricción sintáctica.

En el siguiente ejemplo, extraído de la sección 8.2 de [A.3], presentamos el funcionamientode la implicación para una base de datos concreta.

Ejemplo 13 Consideramos la base de datos Delta:

q(a).r(c).q(b).p(X) :- q(X) => r(X).

Como hemos explicado, si consideramos una estratificación donde todos los predicados p,q y r pertenecen al estrato 1, tendríamos una secuencia infinita de llamadas

fixPointStrat(Delta; Stratification; 1; Fix)

fixPointStrat(Delta [ fq(X)g; Stratification; 1; Fix)

fixPointStrat(Delta [ fq(X)g [ fq(X)g; Stratification; 1; Fix)

: : :

Sin embargo, con la nueva definición de grafo de dependencias, la Stratification obligaa que p pertenezca al estrato 2, mientras que q y r permanecen en el estrato 1.

Para el primer estrato:

fixPointStrat(Delta; Stratification; 1; Fix1)

obtiene Fix1=f(q(X),X=a),(q(X),X=b),(r(X),X=c)g, porque p(X) :- q(X)=>r(X) yano se considera en este estrato. Para el segundo y último estrato, en la llamada:

fixPointStrat(Delta,Stratification,2,Fix2)

Fix2 se construirá a partir Fix1 añadiendo los pares de la relación p. En la primera iteración,la cláusula que define p requiere ejecutar:

force(Delta,Stratification,Fix1,q(X)=>r(X),C),

para calcular la restricción C y poder añadir (p(X), C) en Fix1. Distinguimos dos pasos enla ejecución

1. La base de datos Delta se extiende con q(X), para obtener Delta1 y se evalúa lo-calmente el punto fijo del estrato 1 (el estrato de r) para la base de datos extendidaDelta1. Esto se consigue mediante la llamada:

fixPointStrat(Delta1,Stratification,1,Fix1’).

Dado que ahora no consideramos p en el estrato 1, Fix1’ se puede calcular correcta-mente como:

Fix1’=f(q(X),true), (q(X),X=a), (q(X),X=b), (r(X),X=c)g.

2. Forzado del objetivo r(X) con el nuevo punto fijo mediante la llamada:

57

force(Delta1,Stratification,Fix1’,r(X),C)

que devuelve la restricción C como X=c.

Tras este paso, (p(X), X=c) se añade a la interpretación anterior obteniendo:

Fix2 = f(p(X),X=c), (q(X),X=a), (q(X),X=b) (r(X),X=c)g

como el punto fijo final para Delta que buscamos. �

Con este ejemplo terminamos el segundo capítulo del trabajo, que resume las publicacio-nes [A.1,A.2,A.3], donde presentamos el lenguaje de bases de datos deductivas HH:(C), susaportaciones, su fundamento teórico y su implementación.

58

Publicaciones asociadas al capítulo 2[A.1] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez.Implementing a Fixpoint Semantics for a Constraint DeductiveDatabase based on Hereditary Harrop Formulas.En Procedings of the 11th International ACM SIGPLAN Symposium of Principles andPractice of Declarative Programing (PPDP’09), páginas 117–128. ACM Press, 2009.! Página 116

[A.2] G. Aranda, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Incorporating Integrity Constraints to a Deductive Database System.En XI Jornadas sobre Programación y Lenguajes, PROLE2011 (SISTEDES)editores: Purificación Arenas, Victor M. Gulías y Pablo Nogueira, páginas 141–152,Septiembre, 2011.! Página 128

[A.3] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.An Extended Constraint Deductive Database: Theory and imple-mentation.The Journal of Logic and Algebraic Programming, volumen 21, páginas 20–52, 2013.! Página 140

59

Capítulo 3

Recursión extendida yrazonamiento hipotético ensistemas de bases de datosrelacionales

En este capítulo presentamos la segunda parte del trabajo: los lenguajes de bases dedatos relacionales R-SQL y HR-SQL junto con sus fundamentos semánticos y sus im-plementaciones. Los fundamentos teóricos proporcionan semántica a estos dos lenguajes,ambos basados en el lenguaje de consulta estructurado SQL (Structured Query Langua-ge por sus siglas en inglés). La novedad que aporta R-SQL es la incorporación de unaextensión a SQL estándar para admitir relaciones definidas usando recursión no lineal yrecursión mutua. Por su parte, HR-SQL además de la recursión extendida incluye defini-ción de vistas y consultas hipotéticas. Hemos implementado en SWI-Prolog dos sistemaspara los lenguajes R-SQL y HR-SQL respectivamente como capas adicionales sobre siste-mas gestores de bases de datos relacionales existentes. Estos sistemas procesan las basesde datos de estos lenguajes de acuerdo con su semántica propuesta y materializan susrelaciones en tablas de bases de datos de los sistemas gestores.

3.1. Introducción

El lenguaje de consulta estructurado SQL (Structured Query Language) es el lenguajeestándar de definición y consulta de bases de datos relacionales. SQL se presentó como unlenguaje declarativo que carecía de recursión en sus orígenes [51]. El primer trabajo queencontramos como fundamento del modelo relacional es el AR que Codd presentó en losaños 70 [28]. Actualmente los sistemas de bases de datos que utilizan el lenguaje SQL (quese fundamenta en el AR) se ajustan al estándar ANSI/ISO [36] y soportan recursión de formaparcial dado que no permiten recursión no lineal ni recursión mutua.

Otra de las limitaciones expresivas del estándar es que no es posible definir relacioneshipotéticas como lo hacen algunos de los sistemas de bases de datos deductivas [101, 14]. Enel campo relacional podemos encontrar trabajos sobre consultas hipotéticas en entornos deprocesamiento analítico de datos online (OLAP) [7, 133], business intelligence [38], y comer-

61

cio electrónico [132]. Estos trabajos permiten asumir un conjunto de tuplas como hipótesisen el contexto de una consulta.

En este capítulo presentamos dos extensiones del lenguaje SQL para superar algunas limi-taciones expresivas del estándar. En primer lugar R-SQL extiende a SQL con un tratamientomás general de la recursión permitiendo definiciones recursivas no lineales y recursión mutua.En segundo lugar HR-SQL extiende a R-SQL permitiendo manejar información hipotética tan-to en vistas como en consultas. Esto supone una novedad dado que se añaden las hipótesisa un lenguaje basado en SQL. Ambos lenguajes están inspirados en el lenguaje HH:(C) quepresentamos en el capítulo anterior. Además hemos desarrollado dos implementaciones enSWI-Prolog [129] para los dos lenguajes que proponemos.

Publicaciones

A continuación presentamos las publicaciones que fundamentan este capítulo:

En [B.1] proponemos el lenguaje R-SQL cuyo objetivo es superar las limitaciones de larecursión del estándar. La idea tras esta publicación es adaptar técnicas propias de lasbases de datos deductivas, como por ejemplo la semántica de punto fijo estratificada, paradefinir la semántica de una base de datos relacional extendida. Nos adherimos ademásal modelo original del AR [28] que evita duplicados y valores null (véase también [30]).En esta memoria se presenta un sistema práctico para R-SQL basado en su semántica depunto fijo. Dicho sistema se articula como una capa adicional sobre un sistema gestor debases de datos relacional (en adelante SGBDR). Esta capa está implementada en SWI-Prolog, el SGBDR es PostgreSQL y usamos Python como lenguaje de comunicaciónentre Prolog y la base de datos de PostgreSQL. El sistema genera programas Python(a los que nos referimos como scripts según la nomenclatura inglesa) que consisten eninstrucciones imperativas junto con instrucciones SQL. La ejecución de los programasPython automatizan el acceso al SGBDR y mediante bucles se encargan de calcularel punto fijo de la base de datos siguiendo la semántica propuesta. El resultado dela ejecución de estos scripts es la materialización de las tuplas correspondientes a lasrelaciones R-SQL en tablas del SGBDR.

En [B.3] extendemos y mejoramos el sistema R-SQL. Se trata de un artículo centradoíntegramente en el sistema, su implementación y el análisis de su rendimiento. En éldefinimos una nueva estratificación maximizando el número de estratos para mejorar laeficiencia (véase la sección 3.4.2). Además de otras mejoras menores, simplificamos elcómputo del punto fijo de la base de datos extrayendo de los bucles el caso base de lasdefiniciones recursivas.

Finalmente en [B.2] abordamos nuestra propuesta para las consultas hipotéticas sobrebases de datos relacionales [50] presentando el esquema HR-SQL. Al igual que R-SQL, laimplementación del sistema HR-SQL está fundamentada por una semántica de punto fijoestratificada. Esta implementación se articula también como una capa sobre el SGBDRDB2 de IBM, en su versión de libre distribución. Implementamos el cómputo de puntofijo de las relaciones utilizando un programa del lenguaje de cuarta generación integradoen DB2: SQL PL.

Continuamos presentando las contribuciones de estas publicaciones.

62

Contribuciones

La primera aportación de esta parte de la tesis consiste en la formalización y diseño dedos sistemas de bases de datos relacionales, R-SQL y HR-SQL, que extienden los SGBDRactuales con recursión generalizada, en particular recursión mutua y recursión no lineal (en ellenguaje R-SQL) incluyendo además el razonamiento hipotético en vistas y consultas (en ellenguaje HR-SQL). La incorporación del razonamiento hipotético se hace a partir de R-SQL,extendiendo la sintaxis con la sentencia assume para incluir hipótesis en las consultas, lo quepermite además definir vistas que combinen recursión e hipótesis.

En cuanto a la formalización, se aporta una semántica formal para el modelo relacional in-corporando técnicas utilizadas habitualmente para fundamentar las bases de datos deductivas[122]. Se ha definido una semántica de punto fijo por estratos para ambos lenguajes.

A partir del marco teórico desarrollado hemos diseñado sendas implementaciones en SWI-Prolog para los lenguajes R-SQL y HR-SQL, que se incorporan como capas adicionales sobrelos SGBDR existentes, extendiéndolos así con nuevas funcionalidades. En concreto, el primeroextiende PostgreSQL y el segundo DB2. Además hemos implementado también un prototipode R-SQL sobre MySQL dado que este SGBDR no permite recursión en sus consultas.

Los sistemas R-SQL y HR-SQL generan scripts en Python y SQL PL respectivamentepara generar nuevas tablas con las tuplas correspondientes a su semántica propuesta en elSGBDR. Además HR-SQL utiliza tablas temporales para computar el resultado de vistas yconsultas con hipótesis. Este tipo de tablas tienen la ventaja de ser más eficientes desde elpunto de vista del rendimiento y además tras el cómputo son descartadas por lo que su usoresulta adecuado para este fin. HR-SQL utiliza la noción de dependencias entre relacionesque proviene de las BDD [122] para determinar qué tablas deben ser recalculadas.

Finalmente, proponemos también mejoras en el cálculo del punto fijo de las relacionesHR-SQL con respecto a R-SQL que optimimizan el rendimiento del primer sistema (véase lasección 3.6).

3.2. Extendiendo SQL

Comenzamos esta sección presentando el lenguaje de definición de bases de datos de R-SQL y de HR-SQL. Para la definición de las relaciones de la base de datos ambos lenguajesutilizan la misma sintaxis, por ello a lo largo de esta sección nos referiremos al lenguajeHR-SQL, entendiendo que en todo lo referente a definición de relaciones es válido tambiénpara R-SQL.

En las subsecciones 3.2.1 y 3.2.2 presentamos respectivamente el lenguaje de consultay el lenguaje de definición de vistas que son propios únicamente de HR-SQL que a su vezextiende a R-SQL como se presenta en [B.2].

El lenguaje HR-SQL utiliza una sintaxis muy similar a SQL. Sin embargo, a diferenciadel estándar hemos realizado algunos cambios para introducir relaciones recursivas cuandose definen las bases de datos. La definición de una base de datos consiste en asignaciones deinstrucciones de consulta de SQL estándar (a la que nos referimos como instrucción selecto sel_stm) a nombres de relaciones (R) junto a los nombres y tipos de sus campos (a loque nos referiremos como esquema de la relación o sch). Así, una definición de relación enHR-SQL es de la forma:

R sch := sel_stm;

donde R puede aparecer en sel_stm, y en este caso se trataría de una definición recursiva.

63

Una base de datos db es una secuencia no vacía de definiciones de relaciones que pueden serpor tanto recursivas. La sintaxis formal para las bases de datos HR-SQL se define usando lasreglas gramaticales que aparecen en la figura 3.1.

db ::= R sch := sel_stm; ... R sch := sel_stm;

sch ::= (A T,...,A T)

sel_stm ::= select exp,...,exp [from R,...,R [where cond]]sel_stm union sel_stm | sel_stm except sel_stm

exp ::= C | R.A | exp m_op exp | -exp

cond ::= true | false | exp b_op exp | not cond |cond [and|or] cond

m_op ::= + | - | / | *

b_op ::= = | <>| <| >| >= | <=

Figura 3.1: Reglas gramaticales del lenguaje HR-SQL

En la figura, además de las categorías gramaticales ya introducidas, T representa lostipos SQL que podemos encontrar en el estándar como integer, float, varchar(n);cond representa condiciones booleanas; m_op y b_op representan respectivamente operadoresmatemáticos y booleanos; y C representa cualquier constante válida de SQL.

Además utilizamos los corchetes para representar que el fragmento encerrado entre elloses opcional y R.A representa un atributo A calificado con la relación R a la que pertenece. Aligual que SQL, utilizamos * como azúcar sintáctico para representar la lista de proyección detodos los atributos de las relaciones de la cláusula from de una instrucción sel_stm. Porlegibilidad, utilizamos minúsculas para representar las palabras reservadas de HR-SQL queprovienen de SQL (como select, where, from) y aparecen en mayúscula en los ejemplosde SQL estándar.

A continuación introducimos unos conjuntos de relaciones que nos serán útiles al definirla semántica del lenguaje.

RNdb representa el conjunto de relaciones fR1; : : : ; Rng definidas en la base de datos db.

RNsel_stm representa el conjunto de nombres de relación que aparece en sel_stm.

Para el caso sel_stm = sel_stm1 except sel_stm2 también definimos RN:sel_stm comoel conjunto de relaciones que aparece en sel_stm2 (nótese que RN:sel_stm „ RNsel_stm).

Suponemos que para todo R sch:= sel_stm definido en db se verifica que RNsel_stm „ RNdb:Proponemos a continuación un ejemplo con otra aplicación del cierre transitivo de un grafo

a una base de datos de viajes que combina diferentes medios de transporte. Extenderemoseste ejemplo a lo largo del capítulo para mostrar distintas ventajas de nuestro lenguaje.

Ejemplo 14 La base de datos que mostramos a continuación está inspirada en un ejemploque aparece en [24] y representa los posibles medios de transporte que aparecen en la figura3.2 para la gestión de viajes en las Islas Canarias. Estos viajes pueden ser entre distintas islaso dentro de ellas.

64

En concreto, en esta base de datos se incluyen relaciones para vuelos (flight), autobuses(bus) y barcos (boat) todas con el siguiente esquema:

(ori varchar(10); des varchar(10); time float)

que almacena información sobre el origen (ori), destino (des) y duración (time) de unaserie de trayectos posibles en las Islas Canarias.

Figura 3.2: Representación gráfica de los medios de transporte posibles en las Islas Canariaspara el ejemplo 14.

La relación link incluye los posibles itinerarios utilizando cualquiera de los medios detransporte anteriores:

link(ori varchar(10), des varchar(10), time float):=select * from flight unionselect * from boat unionselect * from bus;

La relación travel es el cierre transitivo de link, i.e., nos dará información acerca deposibles viajes utilizando los transportes de la base de datos, que pueden (o no) concatenarvarios medios de transporte. Mostramos a continuación su definición en la sintaxis propuesta.

travel(ori varchar(10), des varchar(10), time float):=select * from link unionselect link.ori, travel.des, link.time + travel.timefrom link,travelwhere link.des = travel.ori;

Con esta relación obtenemos con la aparición de travel en su propia definición una tablacon los posibles itinerarios de viaje y su duración correspondiente. �

Por su parte, SQL-99 [36] admite definiciones recursivas de vistas mediante la cláusulaWITH RECURSIVE como presentamos a continuación mediante un ejemplo. La parte extensio-nal de la siguiente base de datos incluye las tablas mother y father con esquema (parentvarchar(20), child varchar(20)) que representan respectivamente que el primer atri-buto es la madre o el padre del segundo. Para establecer la relación de antepasado en SQLdebemos definir en primer lugar la vista auxiliar parent:

65

CREATE VIEW parent(parent,child) ASSELECT * FROM motherUNIONSELECT * FROM father;

La vista rec_ancestor establece la relación de antepasado:

CREATE OR REPLACE VIEW ancestor(ancestor,descendant) ASWITH RECURSIVE rec_ancestor(ancestor,descendant) ASSELECT * FROM parentUNION ALL

SELECT parent,descendantFROM parent, rec_ancestorWHERE parent.child=rec_ancestor.ancestor

SELECT * FROM rec_ancestor;

Se trata de una vista recursiva dado que se referencia a si misma en la cláusula FROM desu definición. Las implementaciones de SQL no admiten más de un caso base en la definiciónrecursiva. Esta limitación no se da en nuestro lenguaje y podemos formular la misma vistarec_ancestor como:

rec_ancestor(ancestor varchar(10), descendant varchar(10)):=select * from mother unionselect * from father unionselect ancestor, descendantfrom mother, father, rec_ancestorwhere father.descendant=rec_ancestor.ancestor or

mother.descendant=rec_ancestor.ancestor;

Como vemos, la formulación SQL es más compleja porque, entre otras cosas, exige añadiruna relación auxiliar cuando hay varios casos base (parent en el ejemplo). Además lasdefiniciones recursivas solo pueden aparecer en vistas y no en las relaciones de la base dedatos. La definición de rec_ancestor en HR-SQL es más sencilla, legible y compacta.

Podemos formular la vista travel del ejemplo 14 utilizando sintaxis SQL estándar:

CREATE OR REPLACE VIEW travel(ori,des,time) ASWITH RECURSIVE rec_travel(ori,des,time) ASSELECT link.* FROM linkUNION

SELECT rec_travel.ori,link.des, link.time + rec.travel.timeFROM rec_travel,linkWHERE rec_travel.des= link.ori

SELECT * FROM rec_travel;

Sin embargo, esta vista no tiene sentido en SQL dado que el estándar exige UNION ALLpara preservar duplicados cuando se utiliza la palabra reservada WITH RECURSIVE [102]. Portanto, esta vista sería rechazada por DB2, Oracle y PostgreSQL. Por otro lado, ni Access niMySQL proporcionan recursión en SQL.

Otra de las limitaciones de SQL-99 es que no permite recursión mutua. En el siguienteejemplo mostramos cómo se puede escribir en HR-SQL un sencillo ejemplo de dos relacionesmutuamente recursivas.

66

Ejemplo 15 Las siguientes relaciones even y odd representan respectivamente la secuenciasde números pares e impares hasta 100.

even(x integer) :=select 0 unionselect odd.x+1 from odd where odd.x<100;

odd(x integer) :=select even.x+1 from even where even.x<100;

Notése que select 0 es una instrucción SQL from-less (aceptada en muchos SGBDR) yque sencillamente devuelve la tupla especificada por las expresiones de la lista de proyección(0 en este caso). En la memoria utilizamos frecuentemente este tipo de instrucciones from-lesspara especificar los casos base de las definiciones recursivas. Otra alternativa que proporcionanlos SGBDR (como Oracle) para incluir este tipo de instrucciones es hacer referencia a la tabladual del sistema gestor en la cláusula FROM de la instrucción select (select 0 from dual).HR-SQL admite ambas formulaciones para definir sus casos base. Encontramos más ejemplosde relaciones con recursión no lineal y recursión mutua en la sección 2 de [B.1]. �

3.2.1. El lenguaje de consulta

Las consultas del lenguaje R-SQL, dado que no incluyen hipótesis, se limitan a la cate-goría sel_stm que aparece en la figura 3.2. Para formular consultas hipotéticas en HR-SQLextendemos las instrucciones select de R-SQL. La semántica pretendida para una consul-ta hipotética es el resultado de la consulta a la base de datos cuyas definiciones han sidoextendidas con las suposiciones que aparecen tras la palabra reservada assume.

A continuación en la figura 3.3 extendemos la gramática presentada de la figura 3.2 paraincorporar el lenguaje de consulta que permite hipótesis.

query ::= sel_stm | sel_hyp

sel_hyp ::= assume hypo, ... , hypo sel_stm

hypo ::= sel_stm [not] in R

Figura 3.3: El lenguaje de consulta de HR-SQL

Ejemplo 16 Un ejemplo de consulta HR-SQL para la base de datos del ejemplo 14 es:¿cuánto se tardaría desde Madrid hasta Valverde, suponiendo que no viajamos en ninguno delos itinerarios en barco que tardan más de una hora? Esta consulta se expresa en HR-SQLcomo:

assume select * from boat where boat.time>1 not in link(select travel.time from travel

where travel.ori = ’MAD’ and travel.des = ’VDE’);

En esta consulta select * from boat where boat.time>1 not in link constituye lahipótesis (hypo) que afecta a la instrucción select (sel_stm) y que escribimos entre paréntesispor legibilidad. �

67

3.2.2. El lenguaje de definición de vistas

En esta sección explicamos cómo HR-SQL extiende el lenguaje de definición para permitirla definición de vistas que pueden (o no) ser hipotéticas. Utilizamos el lenguaje de consultade la subsección anterior y definimos las vistas de forma similar a las relaciones, dándolesun nombre (esta vez sin esquema) y asignando a estos nombres una query para permitirreferencias desde otra vista o consulta e incluso recursión.

La aproximación que seguimos para el lenguaje de definición de vistas hipotéticas es similara la que sigue SQL para la recursión permitiendo la definición de vistas recursivas medianteWITH RECURSIVE. Permitimos el razonamiento hipotético mediante assume una vez que labase de datos ya está calculada y distinguimos entre relaciones que definen la base de datosy vistas en las que se pueden formular las suposiciones.

En adelante usamos V para representar nombres de vistas que se definen mediante unaconsulta no hipotética a las que se les asigna un sel_stm (y se definen de la misma formaque las relaciones en la base de datos). Usamos HV para las vistas hipotéticas a las que seles asigna un sel_hyp.

En la figura 3.4 se muestra la sintaxis para la definición de vistas.

vd ::= view ... viewview ::= V sch := sel_stm; |

HV sch := sel_hyp;

Figura 3.4: El lenguaje de definición de vistas de HR-SQL

Permitimos la recursión mutua solo para secuencias de definiciones de vistas no hipotéticaspara evitar que las llamadas de la recursión mutua junto con las hipótesis conlleve un cómputono terminante. No obstante, una vez declarada una secuencia de vistas mutuamente recursivas,se puede invocar el nombre de una de estas vistas dentro de una nueva vista hipotética.

Una secuencia de definiciones de vistas para la base de datos db, denotada como vd, esuna secuencia de la forma que aparece a continuación, tal que los nombres de relación queaparecen ella son o bien nombres de relación de db o bien nombres de vistas de vd.

V1 sch1 ::= sel_stm1;

: : :

Vm schm ::= sel_stmm;

HV1 sch1 ::= sel_hyp1;

: : :

HVr schr ::= sel_hypr;

Nótese que definimos en primer lugar las vistas normales y en segundo lugar las hipotéticas.Además para evitar secuencias de definiciones de vistas mutuamente recursivas e hipotéticasal mismo tiempo, imponemos las siguientes condiciones:

Para todo j = 1::m, Vj puede aparecer en cualquier vista. Sin embargo, Vj no puedeaparecer en una instrucción assume.

Para todo j = 1::r, HVj puede aparecer dentro de la instrucción select de su propiadefinición sel(sel_hyp

j), pero no en sel_stm

1; : : : ; sel_stm

m, sel_hyp

k, si k 6= j,

ni en la parte asumida de sel_hypj.

68

A continuación mostramos un ejemplo de definición de una vista hipotética usando ellenguaje de definición de vistas de HR-SQL.

Ejemplo 17 A partir del ejemplo 14 planteamos una nueva vista hipotética:

Suponemos que hay una erupción volcánica en El Hierro que conlleva el cierre del espacioaéreo y la eliminación del autobús en la isla. Además se añade un barco desde El Hierrohasta La Palma ¿Qué lugares del archipiélago son alcanzables bajo estas suposiciones?

Para entender mejor esta consulta podemos ver los trayectos en la figura 3.5. En ella co-loreamos los trayectos añadidos debido las suposiciones en color rojo. Además, distinguimoslas suposiciones negativas (not in) de las suposiciones positivas (in) utilizando trazo dis-continuo para las negativas y continuo para las positivas.

Figura 3.5: Representación gráfica de la vista del ejemplo 17.

Esta consulta se puede expresar usando el lenguaje de definición de vistas de HR-SQLcomo:

reachable(ori varchar(10),des varchar(10)) :=assume (select * from bus where bus.ori = ’VDE’ union

select * from flight not in link),select ’RES’,’SPC’,1.5 in boat

select link.ori, link.des from link unionselect link.ori, reachable.des from link, reachable

where link.des = reachable.ori

En esta vista combinamos las hipótesis con el cierre transitivo de los itinerarios. A conti-nuación mostramos las tuplas resultado:

[(’TNF’, ’LC’), (’GC’, ’MP’), (’LC’, ’GOM’), (’LP’, ’TNF’),(’VAL’, ’RES’), (’GOM’, ’VAL’), (’TNF’, ’GOM’), (’LP’, ’LC’),(’GOM’, ’RES’), (’LC’, ’VAL’), (’LP’, ’GOM’), (’TNF’, ’VAL’),(’LC’, ’RES’), (’TNF’, ’RES’), (’LP’, ’VAL’), (’LP’, ’RES’)]

Como hemos señalado, se corresponden con todos los trayectos posibles entre las ciudadesde las islas en caso de la erupción volcánica. �

69

3.3. Fundamentos teóricos

En esta sección se tratan los fundamentos teóricos del lenguaje SQL extendido que sepueden encontrar en las publicaciones [B.1,B.2]. En [B.1] proponemos una semántica de puntofijo estratificada para el lenguaje R-SQL (sin razonamiento hipotético). En [B.2] proponemosademás un lenguaje propio de definición para vistas y un lenguaje de consulta. Para el cálculode la semántica de los lenguajes de definición, de vistas y de consulta usamos técnicas depunto fijo estratificado al igual que hacíamos con el lenguaje HH:(C). Como es sabido laestratificación se basa en la construcción de un grafo de dependencias [122]. La noción de grafode dependencias entre relaciones ha resultado útil a la hora de incorporar el razonamientohipotético al lenguaje HR-SQL dado que se utiliza para identificar qué relaciones deben serrecalculadas cuando se define una vista o se plantea una consulta.

A continuación presentamos los fundamentos semánticos de HR-SQL y R-SQL. Como eshabitual para lenguajes de bases de datos relacionales, el significado de cada relación en unabase de datos de se corresponde con el conjunto de tuplas que satisfacen su definición. En lassiguientes subsecciones defininiremos la semánticas para vistas y consultas hipotéticas propiasolamente de HR-SQL.

3.3.1. Semántica para las bases de datos

Comenzaremos la sección de fundamentos teóricos presentando la definición del grafo dedependencias para una base de datos. En adelante denotamos como RNdb al conjunto denombres de relación definidos en la base de datos db.

Definición 8 El grafo de dependencias asociado a una base de datos db, que denotamoscomo DGdb, es un grafo dirigido donde:

Los nodos de DGdb son los elementos del conjunto RNdb y

los arcos de DGdb los definimos como sigue:

› Para toda relación R sch := sel_stm habrá un arco desde cada nombre de relaciónR0 2 RNsel_stm hacia R.› Estarán etiquetados negativamente todos los arcos que parten de nombres de rela-ciones que pertenecen a RN:sel_stm.

Para todo par de relaciones R1; R2 2 RNdb, diremos que R2 depende de R1 si hay un caminodesde R1 hasta R2 en DGdb. Y R2 depende negativamente de R1 si hay un camino desde R1

hasta R2 en DGdb con, al menos, un arco negativo. �

Estratificación

A continuación, usando la definición de DGdb de la subsección anterior, definimos el con-cepto de estratificación para una base de datos db.

Definición 9 Una estratificación de una base de datos db compuesta por n relaciones, es unafunción str : RNdb ! f1; : : : ; ng tal que:

str(Ri) » str(Rj), si Rj depende de Ri y

str(Ri) < str(Rj), si Rj depende negativamente de Ri.

70

Diremos que una base de datos db es estratificable si existe una estratificación para ella.Además, llamaremos a str(R) el estrato de R. Para las instrucciones sel_stm definimos suestrato como str(sel_stm) = maxfstr(Ri) j Ri 2 RNsel_stmg: �

Para las nociones teóricas utilizamos una base de datos concreta que denotamos como dby una estratificación para ella, denotada como str. Suponemos también que cada relación Rtiene su tupla de atributos Ai y tipos Ti asociados (que se corresponde con el esquema de larelación R), que escribimos R (A1 T1; :::; Ar Tr), de forma que cada tipo Ti, i = 1::r representasu dominio correspondiente Di. Por ejemplo el tipo integer representa el dominio de losnúmeros enteros.

Denotamos también con D el dominio universal que es la unión de todos los dominiosde los tipos de las base de datos. Dado que las relaciones pueden tener diferentes aridades,usaremos el conjunto T =

Sn–1Dn, en donde tomarán valores las tuplas de las relaciones

de db.De nuevo, como hacemos en la sección 2.2.2 del capítulo 2, usamos el concepto de

interpretación para dar semántica a las bases de datos de HR-SQL. Las interpretaciones sonfunciones que asocian un elemento de P(T ) (partes de T ) a cada una de las relacionesde RNdb y también se clasifican por estratos. La siguiente definición formaliza la noción deinterpretación:

Definición 10 Sea i – 1, una interpretación I para una base de datos db sobre un estrato ies una función I : RNdb ! P(T ); tal que para todas las relaciones R 2 RNdb con esquemasch se cumple que:

Si sch ” (A1 T1; : : : ; Ar Tr), y D1; : : : ; Dr son respectivamente los dominios asociados alos tipos T1; : : : ; Tr entonces I(R) „ D1 ˆ : : :ˆDr,

I(R) = ;, en caso de que str(R) > i.

La semántica de las bases de datos HR-SQL se calcula estrato por estrato. Cada inter-pretación se corresponde con el significado de un estrato. Al conjunto de interpretacionesde db sobre el estrato i le llamamos Idb

i. Pasamos a definir la relación de orden entre

interpretaciones:Sea I1; I2 2 Idbi . I1 es menor o igual que I2 en el estrato i, (denotado I1 vi I2), si se

satisfacen las siguientes condiciones para todo R 2 RNdb:

I1(R) = I2(R), si str(R) < i, y

I1(R) „ I2(R), si str(R) = i. �

Para todo i, (Idbi;vi) es un conjunto parcialmente ordenado. La idea de este conjunto

es que cuando una interpretación sobre un estrato cualquiera i crece, su conjunto de tuplasasociado puede incrementarse también. Sin embargo, los conjuntos de tuplas asociados arelaciones de estratos inferiores permanecen invariables. Además (Idb

i;vi) es un conjunto

completo parcialmente ordenado: si fIngn–0 es una cadena en (Idbi;vi), entonces I, definido

como I(R) =Sn–0 In(R), R 2 RNdb, es la menor de las cotas superiores de fIngn–0.

La siguiente definición formaliza el significado de sel_stm en el contexto de una interpre-tación I.

Definición 11 Sea i – 1, I 2 Idbi. Sea sel_stm una instrucción select, tal que str(sel_stm)

» i. Definimos recursivamente la interpretación de sel_stm con respecto a I para db, deno-tada con [[sel_stm]]I, de la siguiente forma:

71

[[select exp1; : : : ; expk]]I = {(exp1; : : : ; expk)},donde expi representa la evaluación de expi.

[[select exp1; : : : ; expk from R1; : : : ; Rm where cond]]I =

f(exp1[a=A]; : : : ; expk[a=A]) j a 2 I(R1)ˆ : : :ˆ I(Rm) y se satisface cond[a=A]g;

donde A representa una secuencia de atributos (prefijados con la relación a la que per-tenecen). Si Aj1; : : : ; A

jrj son los atributos de Rj, 1 » j » m, entonces:

› A es la secuencia completa de R1:A11; : : : ; R1:A1r1; : : : ; Rm:A

m1; : : : ; Rm:A

mrm;

› la notación expj[a=A], 1 » j » k, representa la evaluación de expj, una vez reem-

plazado A por a en expj; y

› cond[a=A] representa la evaluación lógica de cond con la sustitución anterior.

[[sel_stm1union sel_stm

2]]I = [[sel_stm

1]]I [ [[sel_stm

2]]I, donde [ representa la

unión de conjuntos.

[[sel_stm1except sel_stm

2]]I = [[sel_stm

1]]I n [[sel_stm

2]]I, donde n representa la

diferencia de conjuntos. �

Para todo i definimos un operador T dbi

sobre el conjunto Idbi

de interpretaciones delestrato i para db. Este operador es continuo como vemos en la proposición 2. De formaanáloga a como procedemos en HH:(C), el mínimo punto fijo de T db

ies la interpretación que

da significado a todas las relaciones de la base de datos db en el estrato i. A diferencia delo que ocurría en la semántica de punto fijo del capítulo anterior, calificaremos el operadorcon la base de datos db dado que el operador se aplica a una base de datos concreta. Usandode nuevo el teorema de Knaster-Tarski (como hicimos en la sección 2.2.2), el punto fijo sepuede obtener como el supremo de la cadena de interpretaciones que obtenemos mediantela aplicación sucesiva de este operador, partiendo de una interpretación mínima. El operadorT dbi

se define a continuación.

Definición 12 El operador T dbi

: Idbi! Idb

itransforma interpretaciones sobre i de la

siguiente forma. Para todo I 2 Idbi

y para todo R 2 RNdb:

T dbi

(I)(R) = I(R), si str(R) < i.

T dbi

(I)(R) = [[sel_stm]]I, si str(R) = i y sel_stm es la definición de R en db.

T dbi

(I)(R) = ;, si str(R) > i. �

Este operador T dbi

es continuo como se enuncia a continuación.

Proposición 2 (Continuidad de T dbi) Sea i – 1 y fIngn–0 una cadena de interpretaciones

sobre Idbi

(I0vi I1 vi I2 vi : : : ). Entonces, T dbi

(Fn–0 In) =

Fn–0 T

dbi

(In).

Por tanto, la existencia de un mínimo punto fijo que se consigue, estrato por estrato, esuna consecuencia directa del teorema de punto fijo de Knaster-Tarski [118] como se muestraen [B.1].

Teorema 2 Existe una interpretación fixdb : RNdb ! P(T ), tal que para R 2 RNdb, sisel_stm es la definición de R en db, entonces fixdb(R) = [[sel_stm]]fix

db.

72

Así pues, la interpretación fixdb nos da la semántica de db. La construcción de su punto fijoestrato por estrato se define aplicando el operador de forma sucesiva. Veremos a continuaciónlos casos T db

1 y T db2 y generalizaremos para T db

n .El operador T db

1 tiene un mínimo punto fijo fixdb1 , que esFn–0 (T db

1 )n

(;), el supremo de

la cadena f(T db1 )

n(;)gn–0, donde (T db

1 )n

(;) es el resultado de las n sucesivas aplicacionesde T db

1 , partiendo de la interpretación vacía. Consideramos la secuencia f(T db2 )

n(fixdb1 )gn–0

de interpretaciones en (Idb2 ;v2) mayor que fixdb1 . Si usamos la definición de T dbi

y teniendoen cuenta que fixdb1 (R) = ; para todo R tal que str(R) – 2, podemos probar fácilmente (porinducción sobre n – 0) que esa secuencia es también una cadena, fixdb1 v2 T

db2 (fixdb1 ) v2

T db2 (T db

2 (fixdb1 )) v2 : : : ;v2 (T db2 )

n(fixdb1 ); : : : con un supremo

fixdb2 =Gn–0

(T db2 )

n(fix1);

que es el mínimo punto fijo de T db2 que contiene fixdb1 , y que llamaremos fixdb2 .

Si definimos k como maxfstr(R) j R 2 RNdbg, podemos encontrar, para todo i; 1 < i » k,una cadena:

f(T dbi

)n

(fixdbi`1)gn–0;

y encontramos el punto fijo tal que:

fixdbi

=Gn–0

(T dbi

)n

(fixdbi`1);

Denominamos fixdb a fixdbk

dado que contiene la semántica del punto fijo de la base dedatos db.

Una vez formalizada la semántica de la base de datos pasamos presentar en primer lugarla semántica de las consultas y después la semántica del lenguaje de definición de vistas.

3.3.2. La semántica de las consultas

La respuesta a una consulta en HR-SQL para una base de datos estratificable db es lainterpretación de dicha consulta con respecto al punto fijo de la base de datos db. Comolas consultas en HR-SQL pueden ser hipotéticas debemos tener en cuenta que para obtenerla respuesta correcta es necesario modificar algunas de las relaciones de la base de datosen estos casos. Desde el punto de vista lógico una consulta hipotética se puede interpretarcomo una implicación de la lógica intuicionista clásica [72], es decir, representa el valor deun consecuente supuesto el antecedente.

La idea general tras el cálculo de consultas hipotéticas es que para cada consulta semodifican las relaciones necesarias de la base de datos (de las que depende la consulta) parareflejar las suposiciones y calcular su respuesta. Estas relaciones necesarias para una consultaconcreta son:

las que aparecen explícitamente tras la palabra reservada in (también not in) o

las que dependen de las anteriores según la definición 8, i.e., según el grafo de depen-dencias.

Para representar los cambios necesarios en la base de datos actual en el caso de unaconsulta hipotética usaremos la siguiente notación:

73

db[R sch := sel_stm0=R sch := sel_stm]

que representa la base de datos db una vez reemplazada la definición de R sch := sel_stmpor R sch := sel_stm0. También usaremos sel(query) para representar la sentencia sel_stmde la consulta query como se especifica a continuación:

sel(sel_stm)= sel_stm y

sel(assume hypo1; : : : ; hypok sel_stm) = sel_stm.

Para facilitar la lectura introducimos a continuación cómo se maneja una sola hipotésishypo dentro de la consulta query concreta mediante reemplazamiento. Para resolver lasecuencia hypo1; : : : ; hypok se deben aplicar secuencialmente estos reemplazamientos comoexplicamos en el ejemplo 18 más adelante.

La siguiente definición formaliza el concepto de respuesta para los distintos tipos de con-sultas.

Definición 13 Sea query una consulta para la base de datos db. Su respuesta con respectoa db, a la que denotaremos como [[query]]db, se define por casos:

consulta estándar (no hipotética): [[sel_stm]]db = [[sel_stm]]fixdb:

consulta hipotética: si R sch := sel_stmR es la definición de R en db, entonces:

› [[assume sel_stm0 in R sel_stm]]db= [[sel_stm]]fixdb0;

donde db0 = db[R sch := sel_stmR union sel_stm0= R sch := sel_stmR].

› [[assume sel_stm0 not in R sel_stm]]db = [[sel_stm]]fixdb0;

donde db0 = db[R sch := sel_stmR except sel_stm0= R sch := sel_stmR]. �

Ejemplo 18 Sea db la siguiente base de datos :

R1 (A int):= select 1 union select 2 union select 3;R2 (A int):= select 1 union select 3 union select 5 except

select R1.A from R1 where R1.A1 or R1.2;R3 (A int):= select R2.A from R2 union

select R3.A*2 from R3 where R3.A<5;

Para la explicación posterior, identificamos sel_stmR2 como el sel_stm que define R2, esdecir:

sel_stmR2 ”select 1 union select 3 union select 5except

select R1.A from R1 where R1.A=1 or R1.A=2;

Consideremos la siguiente consulta hipotética que denotamos como query:assume select R1.A from R1 where R1.A<3 in R2, select 3 not in R2

select R3.A from R3

Entonces [[query]]db = [[select R3.A from R3]]fixdb0 , donde db0 = (db)„ff siendo:

„ = [R2 := sel_stm0R2= R2 := sel_stmR2],

74

ff = [R2 := sel_stm0R2 except select 3=R2 := sel_stm0R2],

sel_stm0R2 ” sel_stmR2 union select R1.A from R1 where R1.A<3.

Por tanto, db0 es la siguiente base de datos:

R1 (A int):= select 1 union select 2 union select 3;R2 (A int):= ((select 1 union select 3 union select 5 except

select R1.A from R1 where R1.A1 or R1.2) unionselect R1.A from R1 where R1.A<3) except select 3;

R3 (A int):= select R2.A from R2 unionselect R3.A*2 from R3 where R3.A<5;

Con este ejemplo mostramos cómo se obtiene una nueva base de datos aplicando las sus-tituciones correspondientes para calcular una consulta hipotética. Tal y como introducimosal principio de esta sección es necesario obtener las relaciones necesarias para calcular unaconsulta. En este caso las relaciones necesarias para calcular la consulta son R2 y R3. �

El cálculo de una consulta no hipotética para una base de datos no presenta muchadificultad dado que el valor de [[sel_stm]]db es [[sel_stm]]fix

db y fixdb es conocido y coincidecon la instancia de la base de datos. Sin embargo, el caso de la consulta hipotética sel_hypes más complejo. Su significado pretendido es la interpretación de una instrucción select conrespecto a una nueva base de datos db0 en la cual hemos cambiado la definición de algunasrelaciones. Dado que hemos incorporado las suposiciones a sus respectivas relaciones, db0

debe ser una base de datos también estratificable para poder definir la interpretación fixdb0.Usando la semántica de punto fijo estratificada podemos simplificar algunas partes del

cómputo de fixdb0 (veáse también la sección 4 de [B.2]):

En primer lugar, el grafo de dependencias DGdb0 es una extensión de DGdb dado queRNdb0 = RNdb, i.e., toda relación de db0 estaba inicialmente en db. Sin embargo, debemostener en cuenta la nueva definición de la relación R: R sch := sel_stmR (unionjexcept)sel_stm0. Los arcos desde sel_stmR hacia R ya pertenecen al grafo inicial DGdb.

Podemos construir DGdb0 a partir de DGdb añadiendo los nodos para todo R0 2 RNsel_stm0

y los siguientes arcos: para todo R0 se añade un arco desde R0 hasta R. Etiquetamosnegativamente este arco en el caso de except, o bien si R0 2 RN:sel_stm0.

La estratificación str0 : RNdb0 ! f1; : : : ; ng para db0, si existe debe satisfacer questr0(R) – str(R). En general la instrucción select que define R en db0 contiene lainstrucción sel_stmR, que también define R en db.

En segundo lugar, para calcular el significado [[sel(sel_hyp)]]fixdb0 , solo necesitamos

calcular fixdb0(R0) para las relaciones de R0 tal que dicha relación en RNsel(sel_hyp) depen-de de R0. Además, no es necesario calcular fixdb0 para el estrato 1 dado que i = str0(R)

(i = minfstr0(Rj)j1 » j » kg). Por tanto fixdb0 puede ser calculado a partir de fixdb:

› Si suponemos que hay asunciones en las relaciones R1; : : : ; Rk entonces se cumpliráque fixdb0(R0) = fixdb(R0) para todo R0 con str0(R0) < i.

› Si S = fR00 jR0 2 RNsel(sel_hyp) y R0 depende de R00g, entonces obtenemos fixdb0 apartir de fixdb de la siguiente forma:

1. Se calcula fixdb0

i(R0) desde fixdb

i`1 para todas las relaciones de R0 2 S quecumplen que str0(R0) = i.

75

2. Se calcula fixdb0

j(R0) desde fixdb

0

j`1 para las relaciones R0 2 S y str0(R0) = j,j = i+ 1 : : : str0(sel(sel_hyp)).

Ejemplo 19 Consideremos la base de datos y la consulta del ejemplo 18. Sea str unaestratificación para db, tal que str(R1) = 1, str(R2) = 2, str(R3) = 3. En este caso strtambién es una estratificación válida para db0 y se puede verificar que:

fixdb(R1) = f(1); (2); (3)g,

fixdb(R2) = f(3); (5)g,

fixdb(R3) = f(3); (5); (6)g.

Como estamos calculando fixdb0, debemos tener en cuenta que RNsel(query) = fR3g. EntoncesS = fR00 jR0 2 fR3g y R0 depende de R00g = fR1; R2; R3g, el cómputo puede empezar en elestrato 2 = str(R2), siendo fixdb

0

1 = fixdb1 .Ahora R2 es la única relación en S en el estrato 2.

fixdb0

2 (R2) = f(1); (2); (5)g:

De manera análoga procedemos para el último estrato 3 y así calcular fixdb0

3 (R3) para obte-ner la respuesta:

fixdb0

3 (R3) =f(1); (2); (4); (5); (8)g = [[select R3.A from R3]]fixdb0

= [[query]]db. �

Una vez formalizada la semántica de las consultas continuamos presentando la semántica devistas del lenguaje HR-SQL.

3.3.3. La semántica de las vistas

Como aparece en la sección 3.2.2, para definir una vista asignamos un nombre de vista auna consulta. El hecho de definir vistas hipotéticas en una fase posterior a la definición dela base de datos permite hacer un análisis de los cambios demandados por las suposicionesen la base de datos y hacer un cómputo lo más eficiente posible cuando se implementa elsistema. Nótese que, dependiendo de la complejidad de la definición de la vista, el cálculopodría demandar en el caso peor la modificación de todas las relaciones en la base de datos.Sin embargo, dado que usamos una semántica de punto fijo estratificada podemos usar elgrafo de dependencias para recalcular solamente las relaciones que necesitamos modificar enel contexto de una vista hipotética.

El significado de la definición de un conjunto de vistas vd debe establecer la correspon-dencia entre cada nombre y su interpretación. Sin embargo, esa interpretación debe considerarla base de datos original extendida con las nuevas definiciones que aparecen en vd y extendertambién el grafo con las dependencias demandadas por la sentencia sel_hyp correspondien-te. Se ha diseñado la estratificación de forma que se asigna un estrato nuevo a cada nombrede vista, lo que permite reutilizar el punto fijo almacenado de la base de datos para calcularla semántica de las vistas hipotéticas. Los detalles de esta nueva estratificación aparecen enla sección 3.4.2.

Por legibilidad formalizamos cómo se obtiene la semántica para la definición de una solavista y terminamos generalizando cómo proceder para calcular la semántica de una secuenciade vistas vd.

76

Definición 14 Definimos el significado de una vista en el contexto de una base de datos dbdiferenciando entre V (no hipotética) y HV (hipotética).

Sea V sch := sel_stm la definición de una vista no hipotética para db.El significado de V con respecto a db, que denotamos como [[V]]db, es igual a [[sel_stm]]db0,donde db0 es el resultado de extender db con V sch := sel_stm como una nueva re-lación.

Sea HV sch := sel_hyp la definición de una vista hipotética para db.El significado de HV con respecto a db, que denotamos como [[HV]]db, es igual a[[sel_hyp]]db0, donde db0 es el resultado de extender db con HV sch := sel(sel_hyp)

como una nueva relación. �

En V sch := sel_stm, el valor [[V]]db = [[sel_stm]]db0 = [[sel_stm]]fixdb0 depende del

punto fijo de una nueva base de datos que debe ser estratificable. db0 es igual a db extendidacon V sch := sel_stm.

La nueva db es no estratificable si V aparece en alguna instrucción except dentro desel_stm. En otro caso, el punto fijo de la nueva base de datos es igual al punto fijo de dbsalvo para la relación V. Nótese que RNdb0 = RNdb [ fVg si no hay nuevas dependencias en dbproducidas por V.

Por tanto, si k es el máximo estrato de db (con n relaciones) entonces la estratificaciónstr0 para db0 se puede definir como str0 : RNdb0 ! f1; : : : ; n+ 1g, con str0(R) = str(R) paratodo R 2 RNdb y str0(V) = k + 1. Así, para i = 1::k tendremos que fixdb

0

i= fixdb

i. Y por

tanto:

fixdb0

= fixdb0

k+1 =Gm–0

(T db0k+1)m(fixdb);

donde fixdb0 es una extensión del punto fijo fixdb conocido. Esto supone que solo debemosrealizar el cómputo para el último estrato k + 1, esto es, para la relación V:

[[sel_stm]]fixdb0

= fixdb0

k+1(V):

La semántica del caso HV sch := assume sel_stm0 [not] in R sel_stm requieremodificar la base de datos de dos formas:

1. [[HV]]db = [[sel_hyp]]db0, de acuerdo con la definición 14, donde db0 es el resultado deextender db con HV sch := sel_stm.

2. [[sel_hyp]]db0 = [[sel_stm]]fixdb00 , de acuerdo con la definición 13, donde db00 = db0[R

sch := sel_stmR (union j except) sel_stm0=R sch := sel_stmR].

Con el punto 1 extendemos la base de datos con una nueva relación HV. La nueva definiciónde HV sin su hipótesis (HV sch := sel_stm) es sintácticamente correcta. Sin embargo, HVsch := sel_hyp no se permitiría como una definición válida en la base de datos original.

Con el punto 2 la suposición se incorpora a la relación correspondiente tal y como seexplica en la sección 3.3.2. De esta forma, las nuevas definiciones de relación en db00 son:

HV sch := sel(sel_hyp);R sch := sel_stmR (union j except) sel_stm0;

77

El cálculo de fixdb00 se puede simplificar. En primer lugar el grafo de dependencias DGdb00

se crea a partir de DGdb añadiendo nuevos arcos hacia la relación R. Es decir, añadimos unnuevo nodo para la vista HV con sus arcos correspondientes. Para todo R0 2 RNsel(sel_hyp) hayun arco desde R0 hasta HV que se etiqueta negativamente si R0 2 RN:sel(sel_hyp).

Una estratificación str0 : RNdb00 ! f1; : : : ; n+ 1g de db” si existe, puede asignar el estratok + 1 a HV tal y como sucede en el caso no hipotético, siendo k el último estrato de la dboriginal.

Así, fixdb00

kse puede calcular (siguiendo las ideas de la sección 3.3.2) partiendo del punto

fijo almacenado para la base de datos con sus correspondientes reemplazamientos. En estecaso el cálculo de fixdb

00

k+1 considerará solo HV. Para el estrato k + 1 se cumple que:

fixdb00

k+1(HV) = [[sel_stm]]fixdb00:

Ejemplo 20 Consideramos la base de datos db del ejemplo 18 y la vista hipotética:

HV (A int) := assume select R1.A from R1 where R1.A<3 in R2select 3 not in R2

select R3.A from R3 unionselect HV.A*3 from HV where HV.A<3;

Siguiendo la definición 14:

[[HV]]db = [[select R3:A from R3 union select HV:A ˜ 3 from HV whereHV:A < 3]]fixdb00;

donde db00 es una extensión de la base de datos db0 del ejemplo 18 con:

HV(A int) := select R3:A from R3 union select HV:A ˜ 3 from HV where HV:A < 3;

Una función str0 que aumenta str de forma que str0(HV) = 4 es una estratificación válidapara la nueva base de datos (que tiene en cuenta las definiciones extendidas con las suposi-ciones). Para 1 » i » 3 se cumple que fixdb

00

i= fixdb

0

ital y como sucedía en el ejemplo 19.

Dado que [[HV]]db coincide con fixdb00(HV), solo necesitamos calcular:

fixdb00

4 (HV) = (Fm–0 (T db00

4 )m

(fixdb00

3 ))(HV) =f(1); (2); (3); (4); (5); (6); (8)g. �

Semántica de las secuencias de vistas simultáneas

La idea es que la semántica de vd asocia a cada nombre de vista en vd la interpretación dela consulta sel_stm que la define. Sin embargo, si hay más de una vista no hipotética en vd,imponemos una restricción adicional: no podemos asignar la semántica [[V]]db a [[sel_stm]]db0

dado que db0 es el resultado de extender db con V sch := sel_stm. Esto se debe a queotros nombres definidos en vd distintos de V pueden aparecer dentro de sel_stm sin estardefinidos en db0. A continuación definimos la semántica de una secuencia de vistas vd.

Definición 15 Sea db una base de datos y sea vd una definición de vistas para db quedenotamos con la siguiente secuencia:

V1 sch1 ::= sel_stm1;

: : :

Vm schm ::= sel_stmm;

HV1 sch1 ::= sel_hyp1;

: : :

HVr schr ::= sel_hypr;

78

Definimos la semántica de vd como una función que asocia [[Vj]]db0 a Vj para j = 1::m y[[HVj]]db0 a HVj para j = 1::r, donde db0 es el resultado de extender db con:

V1 sch1:=sel_stm1; : : : ; Vm schm:=sel_stm

m; �

Nótese que, de acuerdo con la definición 14, [[Vj]]db0 = [[sel_stmj]]db00 para todo j = 1::m,

donde db00 es el resultado de extender db0 con Vj schj:= sel_stmj. Sin embargo, esta

definición ya aparece en db0 y por tanto se tiene que db00 = db0.HV1; : : : ; HVr no debe aparecer en sel_stm

jporque sus definiciones no son necesarias en

db0. No obstante, para todo 1 » j » r se cumple [[HVj]]db0 = [[sel_hypj]]db00 donde db00 es

el resultado de extender db0 con HVj schj := sel(sel_hypj). Este hecho permite que ladefinición de HVj sea recursiva.

Para calcular la semántica de todas las vistas en una definición simultánea, las vistas hi-potéticas deben ser calculadas de una en una. Además debemos tener en cuenta las siguientesconsideraciones:

Como sucedía en el caso de un sola vista, db0 debe ser también estratificable. Si db0

es estratificable podemos encontrar una estratificación str0 para db0 que cumpla questr0(Vj) > n para todo 1 » j » m.

La interpretación fixdb0 se puede obtener estrato por estrato:

› Empezando desde fixdb para el caso no hipotético.

› Para el caso hipotético se puede comenzar con cada vista hipotética de forma in-dependiente partiendo siempre de fixdb0 como la interpretación inicial y procesandocada vista como en el caso de una sola vista. El procesamiento debe ser repetidotantas veces como vistas hipotéticas haya en la secuencia vd.

Se han implementado dos instancias del marco teórico: en PostgreSQL y DB2. En lasiguiente sección presentamos los sistemas R-SQL y HR-SQL, su forma de funcionamiento yalgunos ejemplos de cómputo del punto fijo para sus dos implementaciones.

3.4. El sistema R-SQL

En esta sección presentamos el funcionamiento del sistema R-SQL que se describe en laspublicaciones [B.1,B.3]. El sistema está implementado en SWI-Prolog y utiliza el sistema decódigo abierto PostgreSQL como SGBDR. El punto fijo para las bases de datos se calculasiguiendo la semántica operacional que presentamos en la sección 3.3.

Procesamiento de las bases de datos

Antes de procesar una base de datos en R-SQL, debemos cargar los archivos fuente delsistema en Prolog mediante la siguiente instrucción:

:-[rsql].

Una vez que el sistema está cargado, el usuario puede procesar la definición de una base dedatos dbDef con el comando process(dbDef). Entonces el sistema analiza sintácticamentela definición de la base de datos y después calcula el grafo de dependencias y la estratificación

79

en caso de que exista (si no existe se lanza un mensaje de error y la ejecución termina).Finalmente el sistema genera un script Python que se encarga de automatizar la conexióncon PostgreSQL y que materializa las relaciones en este SGBDR. Tras este proceso el usuariose puede conectar a PostgreSQL para hacer consultas y modificar las relaciones de la basede datos. En esta memoria presentamos solo la implementación de R-SQL sobre PostgreSQLaunque también se ha implementado sobre MySQL.

El algoritmo de cómputo de punto fijo del sistema se explica en la sección 3.1 de [B.3], y esmuy similar al que presentamos en la sección 3.5.2 para HR-SQL. A continuación proponemosun ejemplo de cálculo del punto fijo de una base de datos y mostramos el código Python quese genera. Terminamos la sección presentando otra de las aportaciones que provienen de [B.3]:una optimización en el cálculo de la estratificación que minimiza el número de relaciones encada estrato para mejorar la eficiencia del cómputo.

3.4.1. Cómputo de las bases de datos R-SQL

En el siguiente ejemplo, extraído de [B.3], presentamos el cálculo del punto fijo de unabase de datos R-SQL paso a paso. Por legibilidad utilizamos mayúsculas para representarpalabras reservadas de SQL cuando pertenecen a un script de Python y minúsculas cuandoforman parte de una definición R-SQL. Sin embargo, el sistema admite las palabras reservadasde SQL tanto en mayúsculas como en minúsculas al igual que la mayoría de los SGBDR.

Ejemplo 21 Se trata de un ejemplo de cierre transitivo sobre una base de datos de vuelosentre distintas ciudades similar al ejemplo 17. Las ciudades de origen y destino de este ejemploson Lisboa, Madrid, París, Londres y Nueva York, y se representan respectivamente con lasconstantes lis, mad, par, lon, ny.

La relación reach consiste en los posibles vuelos entre estas ciudades, que pueden con-catenar (o no) varios vuelos. La relación travel es similar pero también nos devuelve laduración de los viajes en time.

flight(frm varchar(10), to varchar(10), time float) :=select ’lis’, ’mad’, 1.0 unionselect ’mad’, ’par’, 1.5 unionselect ’par’, ’lon’, 2.0 unionselect ’lon’, ’ny’, 7.0 unionselect ’par’, ’ny’, 8.0;

reach(frm varchar(10), to varchar(10)) :=select flight.frm, flight.to from flight unionselect reach.frm, flight.to from reach,flightwhere reach.to = flight.frm;

travel(frm varchar(10), to varchar(10), time float) :=select flight.frm, flight.to, flight.time from flight unionselect flight.frm, travel.to, flight.time+travel.timefrom flight, travel where flight.to = travel.frm;

Nótese que si el cierre transitivo de flight tiene un ciclo, la relación travel puede serinfinita porque al computarla se irian sumando tiempos de forma no terminante al recorrer

80

sucesivamente el ciclo. Para resolver este problema deberíamos imponer una limitación en sudefinición para asegurar la terminación en el cálculo de su resultado (como puede ser añadirun tiempo máximo). La situación descrita es un problema que surge en cualquier base dedatos relacional cuando se trata con ciclos en definiciones de grafos.

Sin embargo, esta limitación no afecta a la relación reach que se puede calcular de formafinita en el sistema R-SQL y llevaría a un cómputo no terminante en otros SGBDR. Esto sedebe a que el sistema calcula el punto fijo asegurando la terminación para estos casos enlos que se permite la existencia de ciclos en grafos dirigidos dado que se comprueba si se hanañadido nuevas tuplas en cada iteración del bucle. En la sección 3.5.2 podemos ver cómo segenera el código del script.

Seguimos presentando cómo se definen las relaciones del ejemplo propuesto en R-SQL.La relación madAirport contiene los vuelos que salen o aterrizan en Madrid, mientras queavoidMad contiene aquellos vuelos que ni aterrizan ni salen de Madrid.

madAirport(frm varchar(10), to varchar(10)) :=select reach.frm, reach.to from reachwhere (reach.frm = ’mad’ or reach.to = ’mad’);

avoidMad(frm varchar(10), to varchar(10)) :=select reach.frm, reach.to from reach except madAirport;

Esta combinación de la instrucción except y recursión no se permite en el estándar SQL-99como se muestra en [33].

El grafo de dependencias para las relaciones de esta base de datos aparece en la figura3.6. Como es habitual anotamos con el símbolo : las dependencias negativas del grafo.

Figura 3.6: DGdb del ejemplo 21.

A continuación mostramos el código Python que genera R-SQL para materializar las re-laciones en tablas del SGBDR. Hemos utilizado la biblioteca psycopg2 que permite formularuna consulta a PostgreSQL con la instrucción:

cursor:execute("query")

donde query es una consulta SQL válida. Como Python no proporciona en su sintaxis buclesrepeat (o do-while), hemos implementado una construcción similar mediante while True yla sentencia break cuando se cumple la condición de salida.

En el script generado se crean en primer lugar las tablas:

81

cursor.execute("CREATE TABLE flight(frm varchar(10), to varchar(10), time float);")

cursor.execute("CREATE TABLE travel(frm varchar(10), to varchar(10), time float);")

y para el estrato 1 se genera el siguiente código:

# Código para el estrato 1# Fragmento outcursor.execute("INSERT INTO flight

(SELECT ’lis’,’mad’,1 UNIONSELECT ’mad’,’par’,1.5 UNIONSELECT ’par’,’lon’,2 UNIONSELECT ’lon’,’ny’,7 UNIONSELECT ’par’,’ny’,8)")

Para la generación de código SQL se utiliza un algoritmo muy similar al presentado enla sección 3.5.2. Además se aplica la partición de definiciones para mejorar la eficiencia queaparece también en esta sección y que denominamos algoritmo in/out. Lo que se pretende coneste algoritmo es reducir el número de llamadas (e inserciones) al SGBDR dentro del bucle ycon ello mejorar el rendimiento del sistema. Se trata de una técnica que se usa habitualmenteen sistemas de BDD [122].

El estrato 2 contiene la relación travel cuya definición se divide en dos partes: la recursiva(in) que estará dentro del bucle while y la no-recursiva (out) que se compone de tuplas queque no se calculan recursivamente y no es necesario incluir en el bucle iterativo. Mostramosambos fragmentos:

# Código generado para el estrato 2# Fragmento outcursor.execute("INSERT INTO travel (SELECT * FROM flight);")

# Fragmento inwhile True:cursor.execute("INSERT INTO travel

(SELECT flight.frm,travel.to,flight.time+travel.timeFROM flight,travelWHERE flight.to = travel.frm)EXCEPTSELECT * FROM travel;")

newSize = relSize(["travel"])if (newSize != size):

size = newSizeelse:

break

donde la función relSize(<list of relations>) devuelve el número de tuplas para unarelación dada. Las tuplas añadidas para travel en cada iteración se muestran en la siguientetabla:

82

Conjunto de tuplas insertadas

Fragmento out f(lon; ny; 7;0); (par; lon; 2;0); (par; ny; 8;0);

(mad; par; 1;5); (lis; mad; 1;0)gFragmento in: iteración 1 f(lis; par; 2;5); (par; ny; 9;0); (mad; ny; 9;5); (mad; lon; 3;5)gFragmento in: iteración 2 f(lis; ny; 10;5); (lis; lon; 4;5); (mad; ny; 10;5)gFragmento in: iteración 3 f(lis; lon; 4;5); (mad; ny; 10;5); (lis; ny; 11;5)g

De manera análoga el sistema produce el código Python para los estratos 3 y 4, que corres-ponden respectivamente a las relaciones reach y madAirport. Para concluir, presentamosel código del script para generar la relación avoidMad que completa el estrato número 5:

# Código generado para el estrato 5# Fragmento outcursor.execute("INSERT INTO avoidMad

(SELECT travel.frm,travel.to FROM travelEXCEPT SELECT * FROM madAirport)");

Con este fragmento completamos el código que tiene como objetivo el cálculo del elpunto fijo para la base de datos propuesta. Los valores de flight, madAirport y avoidMadse representan gráficamente en la figura 3.7. �

Figura 3.7: Representación gráfica las tuplas de la base de datos del ejemplo 21.

Una vez que la base de datos R-SQL ha sido procesada, las tablas resultantes estándisponibles para ser consultadas en PostgreSQL. El usuario puede formular consultas o biendirectamente en PostgreSQL, o bien desde el inductor de comandos del sistema R-SQL. Enambos casos sin ningún cómputo adicional del punto fijo.

Terminamos esta sección presentando la propuesta de estratificación del sistema R-SQLque aparece en la sección 3.5 de [B.3] y que mejora la versión del sistema presentada en[B.1]. Está optimización también se aplica al sistema HR-SQL que presentamos en la últimasección del capítulo.

3.4.2. El algoritmo de estratificación

El algoritmo que presentamos a continuación trata de minimizar el número de relacionesen cada estrato para mejorar el cómputo del punto fijo que aparece en la sección 3.5.2.

83

El objetivo es que cada estrato i contenga o bien una sola relación, o bien un conjunto derelaciones que sean mutuamente recursivas. Por tanto, no es necesario iterar los bucles tantasveces como demande la relación que requiera mayor número de iteraciones de cada estrato.Esta mejora también se puede aplicar a la estratificación de las bases de datos del sistemaHH:(C) que presentamos en el capítulo anterior.

Presentaremos el nuevo algoritmo de estratificación mediante un grafo de dependencias.Para el ejemplo de la figura 3.8 una estratificación correcta es aquélla que asigna el estrato1 a las relaciones fa; b; c; d; eg y el estrato 2 a las relaciones ff; gg. De hecho, ésta es laestratificación que asigna el sistema HH:(C) a este grafo.

Sin embargo, en la figura 3.8 podemos ver que solamente es necesario que las relacionesmutuamente recursivas b y c pertenezcan al mismo estrato (debido a la dependencia mutuaentre ambas relaciones).

Figura 3.8: Ejemplo de grafo de dependencias

Para aislar las relaciones no mutuamente recursivas en un solo estrato se calcula unaestratificación que aumenta el número de estratos como vemos a continuación. Partimos debase de datos db y su grafo de dependencias asociado DGdb, el algoritmo de estratificación:

1. Calcula las componentes fuertemente conexas C de DGdb. En principio no tiene en cuentalas dependencias negativas. Sin embargo, una vez que se han obtenido estas componentes,debemos comprobar si existe un ciclo con alguna dependencia negativa. En ese caso dbno es estratificable y el cómputo debe terminar. Para las dependencias de la figura 3.8las componentes fuertemente conexas son:

fag; ffg; fgg; fb; cg; fdg y feg:

2. Agrupa las componentes fuertemente conexas de forma que se obtiene un nuevo grafocompuesto por nodos para cada componente C y arcos que unen dicha C con otrascomponentes C0. Si C contiene solamente una relación R y C0 contiene solo la relaciónR0 es inmediato que el arco correspondiente irá desde R hasta R0 en DGdb.

En nuestro ejemplo, las componentes fb; cg se asocian al nodo bc, mientras que el restose asocian trivialmente con el nodo que contiene su relación correspondiente. El graforesultante contiene los siguientes arcos:

fa! bc; bc! d; bc! e; a! f; f! gg:

3. El algoritmo obtiene una ordenación topológica del nuevo grafo. En el ejemplo estaordenación es a < f < g < bc < e < d.

84

4. Finalmente se vuelven a separar los nodos de las componentes para obtener la ordenacióntopológica de las componentes fuertemente conexas y se enumeran los nodos en ordenascendente. Para nuestro ejemplo obtenemos fag < ffg < fgg < fb; cg < feg < fdg.

La estratificación final es:

str(a) = 1; str(f) = 2; str(g) = 3; str(b) = str(c) = 4; str(e) = 5; str(d) = 6:

Ejemplo 22 Para el grafo de dependencias de la figura 3.6, el sistema R-SQL obtiene lasiguiente estratificación:

f(1, flight), (2, travel), (3, reach), (4, madAirport), (5, avoidMad)g. �

3.5. El sistema HR-SQL

En esta sección presentamos la implementación de HR-SQL sobre el SGBDR DB2 queaparece en [B.2]. Este sistema puede procesar las mismas bases de datos que R-SQL siguiendoun algoritmo similar para generar los scripts.

Como hemos visto en la sección 3.3.3 se incorpora la capacidad de que el usuario pue-da añadir suposiciones cuando se formulan consultas y se definen vistas. Esto aumenta lacomplejidad del sistema dado que estas suposiciones afectan a otras relaciones de la base dedatos cuya información puede aumentar o disminuir para obtener la respuesta de una consulta.En esta sección presentamos la estructura del sistema, el algoritmo para generar scripts parabases de datos, consultas y vistas, y una mejora de eficiencia en la ejecución de dichos scriptsmediante la partición de las definiciones recursivas en dos fragmentos: su caso base (out) y sucaso recursivo (in). Terminaremos la sección explicando cómo HR-SQL calcula la semánticade vistas y consultas hipotéticas y presentando algunos resultados de eficiencia.

3.5.1. Estructura del sistema

La estructura del sistema aparece en la figura 3.9. La interfaz de usuario consiste en elsiguiente inductor de comandos:

hr-db2=>

que funciona como una extensión del intérprete de comandos de DB2. El usuario puedeformular cualquier entrada válida de DB2 (entrada etiquetada como A en la figura 3.9),así como consultas y comandos HR-SQL (etiqueta B en la figura 3.9). En el inductor decomandos se puede escribir:

load_db <db_file>. Carga una definición de base de datos HR-SQL desde el archivodb_file y calcula el punto fijo. Las tuplas resultantes de cada relación se almacenancomo tablas de DB2.

load_vd <vd_file>. Carga una definición de vistas HR-SQL desde el archivo vd_file,calcula las tuplas de estas vistas y las materializa también en DB2.

Una consulta hipotética (sel_hyp). En este caso el sistema reconoce la consulta iden-tificando la palabra reservada assume.

85

Figura 3.9: Estructura del sistema HR-SQL.

Estas nuevas consultas son procesadas por el sistema como se muestra en la figura 3.9. Secomienza con el análisis sintáctico (que denominamos parser acogiéndonos a la nomenclaturainglesa) y después se construye el grafo de dependencias y se calcula la estratificación encaso de que exista. En otro caso se lanza un mensaje de error.

Para el cálculo de la estratificación, el sistema HR-SQL sigue el algoritmo presentado en3.4.2. Tras el cálculo de la estratificación, generamos automáticamente un script SQL PLcomo explicamos en la sección 3.5.2. La salida se ejecuta en el sistema de bases de datosDB2 (etiqueta C en la figura 3.9). La implementación de las vistas hipotéticas se explica enla sección 3.5.3.

Tanto el sistema (en su versión actual sobre PostgreSQL) como los ejemplos presentadosen este capítulo se pueden descargar de:

https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems/HR-SQLplus.

3.5.2. Cálculo del punto fijo

En la figura 3.10 mostramos el algoritmo que calcula el punto fijo para una base de datosHR-SQL. Este algoritmo produce las instrucciones select necesarias (además de las sentenciasCREATE e INSERT).

Esta versión del algoritmo (que se presenta en [B.2,B.3]) es una versión mejorada sobre laque se presenta en [B.1] porque simplifica el cómputo del bucle al incorporar las funcionesin y out como explicamos más adelante en esta sección.

Nuestro algoritmo parte de una estratificación para la base de datos donde numStr es elnúmero de estratos y NRi es el conjunto de relaciones que pertenecen al estrato i.

En primer lugar se crea una tabla para cada definición de relación de la base de datosde la forma R sch := sel_stmR (línea 1).

Después, el bucle while (líneas 3-10) calcula los puntos fijos:

fixdb1 ; fixdb2 ; : : : ; fix

dbnumStr

de forma iterativa. Este paso sería equivalente a iterar el operador correspondiente T dbi

que hemos presentado en la definición 11.

La iteración n-ésima del bucle repeat (líneas 5-9) calcula (T dbi

)n

(fixi`1).

86

https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems/HR-SQLplus

Iteramos el bucle externo mientras haya nuevas tuplas que añadir a las tablas del estratoactual. Esta comprobación se hace mediante la variable size que se incrementa solo sise han añadido tuplas nuevas al estrato i.

1 for all R 2 RNdb do CREATE TABLE R sch;2 i := 1

3 while i » numStr do4 for all R 2 RNi do INSERT INTO R out(sel_stmR);5 repeat6 size := rel_size (RNi)7 for all R 2 RNi do8 INSERT INTO R in(sel_stmR) EXCEPT SELECT * FROM R;9 until size = rel_size(RNi)10 i := i+ 1

Figura 3.10: Algoritmo de cálculo del punto fijo

El algoritmo in/out

Una de las características que mejoran la eficiencia del cálculo de relaciones con respectoal presentado en [B.1] y que aparece en [B.2] es el uso del algoritmo in/out para reducir elnúmero de iteraciones dentro del bucle while.

La idea es que la iteración del operador T dbi

es necesaria solo para calcular la parterecursiva dentro de una instrucción sel_stm y los casos base de la definición se pueden sacardel bucle. Así, las funciones in y out dividen cada sel_stm en dos partes:

La parte recursiva que añadimos en la instrucción INSERT dentro del bucle (línea 8 dela figura 3.10) y

los casos base de la definición recursiva que extraemos del bucle (línea 4).

Podemos distinguir fácilmente los fragmentos in y out de una instrucción sel_stm usandoel estrato de las relaciones que aparecen dentro de esta instrucción. Como hemos señaladoantes, hemos diseñado la estratificación de forma que:

Si una relación R de un estrato i depende de otra relación R0 (y no son mutuamenterecursivas), se debe cumplir que el estrato de esta R0 es inferior al de i y dicha relaciónha debido de ser calculada previamente.

En otro caso, el estrato de las dos es el mismo i (en caso de ser las dos relacionesmutuamente recursivas) y ambas R y R0 se deben calcular a la vez.

Ejemplo 23 Si tenemos una relación de la forma:

R := sel_stm1union sel_stm

2;

y además str(R) = i y str(sel_stm1) < i, entonces sel_stm1 será parte del fragmentoout pues las relaciones de las que depende están totalmente calculadas cuando se calculafixi y sus tuplas correspondientes se pueden insertar fuera del bucle, dado que las relacionesimplicadas ya han sido calculadas en estratos anteriores y, por tanto, sabemos que no van acambiar.

�

87

Las funciones in y out se definen recursivamente sobre la estructura de sel_stm. Porejemplo, si sel_stm ” sel_stm1 except sel_stm2 y str(sel_stm) = i, entonces tenemosque str(sel_stm

2) < i, y por tanto:

in(sel_stm) = in(sel_stm1) except sel_stm2;

out(sel_stm) = out(sel_stm1) except sel_stm2.

El algoritmo in/out aparece íntegro en la sección 3.2 de [B.3]. A continuación mostramoscómo se calculan las vistas hipotéticas para el sistema HR-SQL presentado en [B.3] medianteun ejemplo paso a paso.

3.5.3. Vistas y consultas en HR-SQL

El script SQL PL generado para procesar vistas sigue las ideas de la sección 3.3.3. Vamosa ilustrar mediante la vista reachable del ejemplo 17 los pasos que lleva a cabo el sistemapara calcular las tuplas correspondientes a una vista hipotética. En este ejemplo tratamosde mostrar el cómputo para una definición recursiva que contiene suposiciones positivas ynegativas una vez que la base de datos está ya calculada. Se recuerda a continuación ladefinición de reachable:

reachable(ori varchar(10),des varchar(10)) :=assume (select * from bus where bus.ori = ’VDE’ union

select * from flight not in link),select ’RES’,’SPC’,1.5 in boat


where link.des = reachable.ori

En primer lugar, el sistema extiende el grafo de dependencias original con nuevos arcosdebido a las suposiciones hipotéticas: dos arcos etiquetados negativamente a link, uno desdebus y otro desde flight. Debido a que el sistema maximiza el número de estratos, laestratificación calculada de la base de datos original es válida para la base de datos extendida.

Siguiendo las explicaciones de la sección 3.3.2, el sistema busca las relaciones que debenser recalculadas (que hemos denominado necesarias en la presentación de la semántica). Enconcreto, para obtener las tuplas de reachable las relaciones necesarias son boat y link.

El algoritmo que genera las instrucciones SQL para calcular la semántica de estas rela-ciones junto con la nueva vista es muy similar al que hemos presentado en la figura 3.10 paracalcular el punto fijo de toda la base de datos. Explicaremos las diferencias en el ejemplo.

Las relaciones necesarias para calcular la vista se crean localmente y se recalculan usandotablas temporales. La ventaja del uso de tablas temporales en un SGBDR es que permitencómputos locales mejorando el rendimiento en memoria. Además resultan adecuadas para elcálculo de vistas con hipótesis dado que una vez que el cómputo ha finalizado son descartadasy no se materializan en el SGBDR.

Para calcular el significado de reachable debemos reconstruir el punto fijo para el estratoi = minfstr(boat); str(link)g. Así creamos tablas temporales para las nuevas definicionesde boat y link que incorporan asunciones

A continuación presentamos el código SQL PL que se genera para las vistas de las nuevasdefiniciones (temporales) de boat y link.

En primer lugar se declaran las tablas temporales mediante las siguientes instrucciones:

88

DECLARE GLOBAL TEMPORARY TABLE link LIKE link;DECLARE GLOBAL TEMPORARY TABLE boat LIKE boat;

y luego se insertan las tuplas correspondientes a sus nuevas definiciones:

INSERT INTO SESSION.boat((SELECT ’TFS’,’GMZ’,1) UNION(SELECT ’GMZ’,’VDE’,1.5) UNION(SELECT ’SPC’,’TFN’,2 ) UNION(SELECT ’RES’,’SPC’,1.5));

INSERT INTO SESSION.link((SELECT * FROM flight UNIONSELECT * FROM SESSION.boat UNIONSELECT * FROM bus)EXCEPT

(SELECT * FROM bus WHERE bus.ori = ’VDE’ UNIONSELECT * FROM flight));

Podemos distinguir las tablas temporales en el código porque están prefijadas con lapalabra reservada SESSION. Para calcular una vista hipotética procedemos de la siguienteforma:

Partimos de la definición de la vista hipotética HV := sel_hypHV.

Para que la vista sea reconocida por el lenguaje SQL que procesa el SGBDR subya-cente debemos eliminar las hipótesis y utilizar la una nueva definición de la vista HV:= sel_stm, donde sel_stm es el resultado de remplazar R por SESSION.R dentro desel(sel_hypHV) para aquellas relaciones definidas como necesarias, i.e., link y boat.

Las tuplas de boat y link son calculadas y almacenadas en tablas temporales que sondescartadas tras el cómputo. El contenido final de la tabla reachable aparece en la siguientelista1:

ORI DES ORI DES----------------------------------------------------

TNF LC GC MPLC GOM LP TNFVAL RES GOM VALTNF GOM LP LCGOM RES LC VALLP GOM TNF VALLC RES TNF RESLP VAL LP RES

que recordamos se corresponde con los trayectos que se pueden hacer en el archipiélagocanario en caso de la erupción volcánica.

1El resultado del sistema se presenta mediante dos columnas (ORI y DES). En esta presentación utilizamoscuatro columnas por legibilidad.

89

Consultas en el sistema HR-SQL

El proceso necesario para obtener las tuplas asociadas a las consultas hipotéticas es muysimilar al explicado para las vistas. El sistema crea las tablas temporales siguiendo los mismospasos que sigue para las vistas. Sin embargo, en este caso el resultado de la consulta no sematerializa en la base de datos, sencillamente se muestra. El proceso es el siguiente:

Se calculan las relaciones necesarias usando el grafo de dependencias.

Se extienden las definiciones de las relaciones necesarias y se obtienen las tuplas paraellas.

La consulta es un nuevo sel_stm que es, como en el caso de las vistas, el resultado deremplazar R por SESSION.R para las relaciones necesarias dentro de la consulta.

Mediante un cursor de SQL PL se lanza la consulta sel_stm a la base de datos, seobtiene la respuesta y se muestra.

Las tablas temporales son descartadas y el resultado de la consulta no se materializa.

El lado derecho de la definición de la vista reachable es un ejemplo válido de consultaen el sistema HR-SQL. El proceso es el mismo que hemos presentado en la sección anteriorcon la salvedad de que, en vez de almacenar la vista reachable en la base de datos, semuestran las tuplas por pantalla como resultado de emitir la consulta sel_stm (con suscorrespondientes reemplazamientos) al SGBDR.

Cuando se formula una consulta no hipotética del lenguaje SQL estándar, el sistema lalanza directamente al SGBDR subyacente sin pasar por ningún otro procesamiento, como sepuede ver en la figura 3.9 al comienzo de esta sección.

A continuación, terminamos el capítulo presentando algunos resultados de eficiencia.

3.6. Análisis de rendimiento

En esta sección (extraída de [B.3]) se presentan resultados de rendimiento del sistema. Enprimer lugar mostramos la mejora de eficiencia del sistema por el uso del algoritmo in/outexplicado en la sección 3.5.2. En segundo lugar hacemos una comparativa entre distintosSGBDR actuales introduciendo las ventajas de una optimización semi-naïve para consultasrecursivas lineales basada en [121].

En esta sección utilizamos milisegundos como medida del tiempo de ejecución. Para medirel rendimiento se usa la media del número de ejecuciones de un programa eliminado el máximoy el mínimo de los tiempos tomados.

Análisis del algoritmo in/out

Para medir el rendimiento tomamos como caso de prueba la relación reach que imple-menta el cierre transitivo de la relación flight presentada en la sección 3.5 (usamos eltermino benchmark para referirnos a este caso de pruebas siguiendo la nomenclatura inglesa).Se recuerda a continuación la definición de reach:

90

reach(frm integer, to integer) :=select flight.frm, flight.to from flight unionselect reach.frm, flight.to from reach,flightwhere reach.to = flight.frm;

Como mostramos, hemos cambiado el tipo de los campos de varchar al tipo numéricointeger. Es decir, en adelante consideramos las conexiones entre los vuelos como las tuplasf(1; 2); (2; 3); : : : ; (n; n+ 1)g donde n+ 1 es el número de nodos en el grafo.

El cuadro 3.1 muestra los resultados de este benchmark con un número de tuplas deentrada (en la relación flight) que varía desde 100 hasta 350 en la primera columna. Lasegunda columna presenta el número de tuplas generadas en el resultado. La tercera y lacuarta columna muestran respectivamente el tiempo necesario para resolver la consulta enHR-SQL sin la mejora del algoritmo in/out (lo llamaremos sin FOI utilizando este acrónimopara representar sus siglas en inglés: Factoring-Out Improvement) y con esta mejora (conFOI) para el caso contrario. La quinta columna (speed-up) presenta la ganancia de velocidaddebido a FOI.

Estos benchmarks han sido probados en un ordenador con CPU Intel Core2 Quad a 2.4GHzy 3GB RAM. Se ha usado el sistema operativo Windows XP 32bits SP3 y el servidor de basede datos IBM DB2 Express Edition 10.1.0 con la configuración estándar por defecto.

Tuplas tuplas resultantes sin FOI con FOI speed-up diferencia100 5.050 1.135 1.050 8.1% 85150 11.325 4.438 3.428 29,4% 1.010200 20.100 10.048 8.172 23,0% 1.876250 31.375 19.001 16.041 18,5% 2.960300 45.150 32.710 28.381 15,3% 4.329350 61.425 50.085 44.175 13,4% 5.910

Cuadro 3.1: Resultados de la mejora FOI.

Como resultado de estas pruebas se confirma la mejora de rendimiento debida a extraerselect * from flight en la definición de reach (dentro del bucle repeat). Alcanzamosuna mejora de hasta casi un 30% solamente extrayendo este fragmento. Sin embargo, amedida que el número de tuplas se va incrementando, la mejora disminuye dado que se diluyeen comparación con el tiempo requerido para ejecutar el bucle repeat (debido a la ejecucióndel operador except dentro de él).

En la siguiente sección hacemos una comparativa con otros sistemas deductivos y relacio-nales.

Análisis del sistema

En esta sección incluimos en la comparativa, además de HR-SQL, otros sistemas de basesde datos actuales que incluyen consultas recursivas: PostgreSQL 9.3, Oracle 11g, y DB210.1. Trabajamos con todos ellos con su configuración por defecto.

Se utiliza también el benchmark de la sección anterior. Además presentamos una optimi-zación en el cálculo de la recursión (siguiendo la aproximación de la optimización diferencialsemi-naïve de [121]) orientada a mejorar los resultados de rendimiento.

91

Para hacer una comparación justa entre HR-SQL y otros SGBDR que no descartan dupli-cados omitimos el operador except.

Además incluimos en la comparativa los tiempos de respuesta de la última versión delsistema DLV DB. Se trata de un sistema deductivo capaz también de trabajar con diferentesSGBDR (utilizando también la conexión ODBC) y calcular el cierre transitivo. Sin embargo,este sistema calcula el resultado a partir de un programa lógico, en lugar de usar SQL.

Los valores obtenidos aparecen en el cuadro 3.2 en el eje horizontal. Las filas incluyen losSGBDR que consideramos (primera columna), el sistema concreto conectado con el SGBDRde la columna anterior (segunda columna), y en las cinco columnas restantes los tiemposde respuesta para cada instancia (desde 100 hasta 500 tuplas en la relación flight, quedevuelven desde 5.050 hasta 125.250 como respuesta a la consulta que utilizamos comobenchmark).

SGBDR Sistema 100 200 300 400 500

SQL nativo 161 187 240 360 713HR-SQL 500 3.198 12.406 39.802 71.922

PostgreSQL Diff-HR-SQL 208 459 1.073 2.271 4.115TDiff-HR-SQL 260 578 1.323 2.745 5.693

DLV DB 703 1.651 4.458 8.047 13.120

SQL nativo 604 1.781 5.765 13.349 26.297HR-SQL 880 3.802 12,057 27.989 56.641

Oracle Diff-HR-SQL 708 1.437 3.224 6.240 11.469TDiff-HR-SQL 646 995 1.708 2.453 3.422

DLV DB 6.875 12.849 18.912 30.583 42.146

SQL nativo 677 1.016 1.323 2.052 3.099HR-SQL 1.271 5.797 97.052 129.917 150.104

DB2 Diff-HR-SQL 698 932 2.672 2,859 3.213TDiff-HR-SQL 646 1.000 1.578 4.021 9.021

DLV DB 6.339 12.666 53.552 57.349 100.391

Cuadro 3.2: Análisis de los sistemas

A cada fila SGBDR (PostgreSQL, Oracle, DB2)2 le corresponden cinco filas que represen-tan los sistemas concretos. Dentro de estos, la primera fila SQL nativo representa al SGBDRcorresponde a la ejecución nativa del benchmark, i.e., trabajando con la formulación del cierretransitivo que cada SGBDR admite.

Por ejemplo, DB2 utiliza la siguiente sintaxis para el benchmark propuesto (donde rec esla relación recursiva temporal que utilizamos para construir la relación reach):

INSERT INTO reachWITH rec(frm,to) AS(SELECT * FROM flightUNION ALLSELECT flight.frm, rec.to FROM flight,recWHERE flight.to = rec.frm)

SELECT * FROM rec;2MySQL no soporta ningún tipo de consultas recursivas.

92

La siguiente fila presenta los resultados del sistema HR-SQL. En la fila Diff-HR-SQLaparecen los resultados para HR-SQL con la optimización diferencial semi-naïve. Grossomodo podemos decir que, en el contexto de una consulta recursiva lineal, esta optimizaciónutiliza para generar tuplas resultantes de cada iteración solo las generadas en la iteraciónanterior [121].

Para obtener este comportamiento hemos añadido un parámetro IT en el benchmark deHR-SQL donde se guarda la iteración en que cada tupla ha sido generada:

INSERT INTO reachSELECT flight.frm, reach.to, ITFROM flight, reachWHERE flight.to = reach.frm AND

reach.it = IT-1;

A continuación, la fila TDiff-HR-SQL representa una implementación alternativa del algo-ritmo de optimización diferencial semi-naïve que consiste en almacenar las tuplas generadasen cada iteración en una tabla temporal. Después el resultado de cada iteración se calculahaciendo la reunión (join) entre la tabla flight y dicha tabla temporal. Así se evita hacer,cada vez dentro del bucle, el recuento del número de tuplas para la relación reach, quecrece sucesivamente en cada iteración. Por tanto utilizamos dos tablas temporales: una paraacceder a las tuplas que se generan en la iteración anterior y otra para almacenar las nuevastuplas. En el ejemplo siguiente reach_temp1 almacena las tuplas generadas en la iteraciónanterior y reach_temp2 se usa para la iteración actual.

A continuación mostramos las instrucciones SQL (incluidas en un script) que se envían aDB2 en cada iteración (de nuevo, las tablas temporales se etiquetan con la palabra reservadaSESSION):

INSERT INTO SESSION.reach_temp2SELECT flight.ori, SESSION.reach_temp1.desFROM flight, SESSION.reach_temp1WHERE flight.des = SESSION.reach_temp1.ori;

...INSERT INTO reach SELECT * FROM SESSION.reach_temp1;DELETE FROM SESSION.reach_temp1;INSERT INTO SESSION.reach_temp1SELECT * FROM SESSION.reach_temp2;

DELETE FROM SESSION.reach_temp2;

La primera instrucción SQL guarda en reach_temp2 el resultado que acaba de ser cal-culado para la iteración actual. Las siguientes instrucciones cargan en la tabla reach elresultado de la iteración anterior y trasladan el resultado calculado en reach_temp2 areach_temp1 de forma que estarán disponibles para la siguiente iteración. Por otro lado seborra reach_temp2 para prepararlo también para la próxima iteración.

El uso de tablas temporales supone una mejora de rendimiento dado que no demandagenerar entradas de log ni gestión de concurrencia. Se calculan en memoria principal hastaagotarla y, en caso de que no haya suficiente memoria RAM, se utiliza la memoria secundaria.

Atendiendo a los números obtenidos, podemos destacar que los mejores resultados derendimiento los obtiene SQL nativo en PostgreSQL para todas las instancias consideradasdel benchmark. También que los peores resultados corresponden a HR-SQL sin optimización

93

(incluyendo el operador except), lo que también se debe a que la reunión y la diferenciadeben procesarse en cada iteración para todas las tuplas, incluyendo las que no se usarán paragenerar otras nuevas. La optimización diferencial semi-naïve (que evita también el operadorexcept) soluciona este problema en gran medida, con un factor de 150.104/3.213=46,7ˆ, alcomparar HR-SQL con Diff-HR-SQL para DB2. DLV DB es el siguiente sistema más eficiente.Se comporta mejor que HR-SQL pero peor que el resto. Dependiendo del SGBDR concreto, elsiguiente mejor resultado lo obtiene o bien Diff-HR-SQL o TDiff-HR-SQL: el primero obtienemejores resultados que el segundo para PostgreSQL y ocurre lo contrario para Oracle y DB2.Ambos obtienen mejores resultados que SQL nativo para Oracle y Diff-HR-SQL se comportade forma similar a DB2.

Estos números demuestran cómo se comportan técnicas similares gestionadas de formadiferente por cada SGBDR concreto. Además el uso de tablas temporales es de suma impor-tancia para ahorrar tiempo en Oracle y tiene el efecto contrario, a la vista de los resultados,en DB2.

Teniendo todo esto en cuenta, en el mejor caso podemos competir con un SGBDR con unfactor de 26.297/3.422=7,7ˆ y considerando el peor caso (con la mejor optimización) esteresultado sería de un factor de 4.115/713=5,8ˆ.

Para entender mejor esta diferencia debemos tener en cuenta que el sistema HR-SQLejecuta un script interpretado (Python) y en cada iteración varias sentencias SQL se envíanal SGBDR utilizando la conexión ODBC lo que supone una penalización considerable delrendimiento.

Finalmente, podemos concluir que el mecanismo de cálculo de las bases de datos HR-SQLes un buen punto de partida para explorar nuevas mejoras de rendimiento.

Con la presentación de HR-SQL concluimos el tercer capítulo de la tesis que resume loscontenidos que aparecen en las publicaciones asociadas [B.1,B.2,B.3]. A continuación, en elúltimo capítulo de la memoria, presentamos la conclusiones y planteamos el trabajo futuro.

94

Publicaciones asociadas al capítulo 3[B.1] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Formalizing a Broader Recursion Coverage in SQL.En Symposium on Practical Aspects of Declarative Languages (PADL’13), volumen7752 de LNCS, páginas 93 – 108, 2013.! Página 176

[B.2] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Incorporating Hypothetical Views and Extended Recursion into SQLDatabase Systems.En Ken Mcmillan, Aart Middeldorp, Geoff Sutcliffe, y Andrei Voronkov, editores, LPAR-19, volumen 26 de EPiC Series, páginas 9–22. EasyChair, 2014.! Página 192


95

Capítulo 4

Conclusiones y trabajo futuro

En la actualidad encontramos multitud de aplicaciones de bases de datos para la gestiónde todo tipo de empresas, entornos científicos e instituciones públicas. Son una parte esencialde cualquier negocio o actividad dado que en ellas se almacenan los datos estratégicos parasu funcionamiento. El modelo relacional es el más extendido y utilizado hoy en día paraimplementar bases de datos. Sin embargo, el estudio de lenguajes de BDD como Datalog[103, 101, 41] ha cobrado importancia en los últimos años por su sencillez y potencia expresiva.

En esta tesis hemos contribuido en estas dos áreas: por un lado, en el área de las BDD conrestricciones, hemos presentando, implementado y fundamentando el lenguaje HH:(C) queaporta, además de un lenguaje de restricciones más rico que los habituales, nuevas capacidadesque no permite el lenguaje de referencia Datalog con restricciones [95, 64]. En concreto hemosincorporado cuantificadores universales y consultas hipotéticas que pueden aparecer inclusoen relaciones recursivas.

Por otro lado, dentro de las BDR, hemos propuesto el lenguaje HR-SQL como extensióndel lenguaje estándar SQL [36], sus fundamentos semánticos y su implementación. Nues-tro lenguaje maneja razonamiento hipotético e incorpora un tratamiento más general de larecursión que permite recursión mutua y recursión no lineal.

Con respecto a HH:(C) hemos presentado:

Una formalización del lenguaje que incluye, ente otras facilidades, implicación intuicio-nista [72] para expresar consultas hipotéticas, cuantificadores universales y existencialesexplícitos. También se ha presentado una semántica operacional de punto fijo por estra-tos para el lenguaje que sirve de guía para la implementación de un sistema concreto.Además hemos demostrado la corrección y completitud entre la semántica operacionaly la semántica de pruebas definida previamente en [83]. El hecho de incorporar las hipó-tesis en consultas y predicados de la base de datos complica el cómputo del punto fijofrente al de otros sistemas de BDD. Sin embargo, la extensión de la noción de grafo dedependencias en el que se basa una estratificación para los predicados se ha demostradoeficaz para llevar a cabo el cómputo de forma correcta. La combinación de la semánticaestratificada con el uso de un sistema de restricciones compacto garantiza la terminacióndel cómputo.

Un sistema de bases de datos que implementa la semántica de punto fijo del lenguajeHH:(C). En el sistema destacamos la primera implementación que incorpora consultashipotéticas basadas en la implicación intuicionista para un lenguaje de BDD. Además elmarco teórico ha resultado adecuado para incluir en el sistema funcionalidades habituales

97

de los lenguajes relacionales:

› Hemos incorporado las funciones de agregación al sistema. Con este fin se ha pro-puesto usar un enfoque similar al utilizado para incorporar la negación en las basesde datos deductivas basado en la estratificación (capítulo 3 de [122]). Hemos imple-mentado la función de recuento sobre un predicado y también funciones de sumatorio,media, mínimo y máximo sobre las variables de un predicado. El uso de la estratifi-cación permite definir consultas hipotéticas con funciones de agregación asegurandola corrección de los datos incluso en los cómputos locales.

› Hemos implementado restricciones de integridad fuertes que garantizan un uso con-sistente de la base de datos. En concreto hemos implementado las restricciones declave primaria, clave ajena y dependencias funcionales. El hecho de disponer del sis-tema de restricciones en el esquema hace posible que las restricciones de integridadse puedan expresar de forma sencilla y que se puedan trasladar de forma directa ala implementación.

Con respecto a HR-SQL hemos presentado:

Una semántica de punto fijo por estratos inspirada en la semántica de HH:(C). Propo-nemos trasladar ciertas técnicas declarativas para dar semántica a un lenguaje de BDR.Cuando se formula una consulta hipotética se utiliza también la noción de grafo dedependencias para determinar las relaciones de la base de datos original que se ven afec-tadas y deben ser recalculadas. Dado que las hipótesis están limitadas a la definiciónde vistas y consultas, se aprovecha parte del cálculo del punto fijo (y se extiende elgrafo calculado) optimizando el cómputo de estas. De esta forma presentamos uno delos pocos formalismos teóricos para un lenguaje de BDR y además lo extendemos concapacidades de las BDD.

El sistema HR-SQL traslada características que provienen del marco formal a un SGBDRconcreto. Hemos propuesto una extensión del lenguaje SQL para incorporar razonamientohipotético y un tratamiento más general de la recursión que incluye recursión no lineal,recursión mutua y evita cómputos no terminantes al calcular el significado de relacionesque definen grafos con ciclos. En la actualidad muchos sistemas gestores aceptan lalimitada expresión de recursión del estándar SQL (como PostgreSQL, DB2 u Oracle) yotros directamente no admiten ningún tipo de definiciones recursivas en sus relaciones(como MySQL o Access). Tampoco el razonamiento hipotético aparece en ningún sistemade BDR de ámbito comercial. HR-SQL permite recursión extendida e hipótesis en vistasy consultas que se puede incorporar a la mayoría de los SGBDR. El único requisito paraextender un SGBDR con HR-SQL es que permita acceso mediante Python o un lenguajede cuarta generación. Además hemos presentado una comparativa del rendimiento delsistema con respecto a otros sistemas de BDD y de otros de BDR. Finalmente, HR-SQL utiliza el algoritmo in/out que reduce las llamadas al sistema gestor limitando lageneración e inserción de tuplas dentro de los bucles que calculan el punto fijo a laparte recursiva de las relaciones (en caso de que exista).

Como se ha visto, el uso de técnicas de bases de datos deductivas y de transformaciónde programas para la mejora del rendimiento es un campo de estudio abierto del que seobtienen buenos resultados. Los estudios que hemos llevado a cabo y los marcos teóricos quehemos propuesto permiten incorporar nuevas funcionalidades a cualquier SGBDR que ademásse presentan respaldadas por un fundamento consistente.

98

A partir de las contribuciones conseguidas planteamos algunas líneas de trabajo futuro acorto plazo. Del estudio desarrollado para extender sistemas gestores de BDR con un sistemaimplementado en SWI-Prolog [129] se puede mejorar la eficiencia de HH:(C) si delegamosdeterminadas partes del cómputo del punto fijo en el SGBDR. En concreto aprovechandola eficiencia de los sistemas relacionales actuales se podría delegar el cómputo de la parteextensional, de la parte no recursiva y de la parte no hipotética de las cláusulas de HH:(C)en un sistema relacional para los casos que presentan peor rendimiento.

Para HR-SQL proponemos, en primer lugar, integrar en el sistema la optimización dife-rencial semi-naïve siguiendo la aproximación de [121] que mejora el rendimiento del sistemacomo hacemos con el benchmark de la sección 3.6. Siguiendo esta línea, existen además mé-todos de optimización de la recursión lineal [84] que se pueden aplicar con facilidad a nuestrosistema haciendo uso también del grafo de dependencias junto con un estudio del tipo deconsulta (e.g., si conlleva un cierre transitivo, si contiene agregados o si genera duplicados,etc.). La combinación de estos dos métodos sería un buen objeto de estudio para analizar siredunda en una mejora de rendimiento.

Finalmente, otro avance a corto plazo que planteamos es permitir vistas hipotéticas enHR-SQL sin necesidad de materializarlas. En concreto se podría integrar el proceso de cálculode punto fijo en el propio SGBDR sin necesidad de un front-end como se ha hecho hasta ahora.Para ello proponemos utilizar las funciones de tablas disponibles en algunos sistemas gestores(como por ejemplo en DB2). El objetivo de esta propuesta es obtener resultados on-the-flycomo hacen los SGBDR para gestionar las vistas. Además el uso de las tablas temporalesparece también adecuado para obtener buenos resultados de eficiencia como hemos visto alusarlas para procesar vistas y consultas hipotéticas.

Otra línea de investigación a medio plazo que planteamos es la mejora de eficiencia deHR-SQL utilizando otras técnicas diferentes a las que hemos usado como son el uso de lastransformaciones magic sets [10] o técnicas de tabling [116] que podrían ser adaptadas paranuestra implementación, al igual que hacen otros sistemas actuales [103, 104, 19].

Finalmente, como trabajo a largo plazo, proponemos trasladar más características expre-sivas de HH:(C) a HR-SQL como son las restricciones. La representación de las restriccionescomo filas de las tablas relacionales puede aportar aún mayor capacidad expresiva.

99

Bibliografía

[1] Almendros-Jiménez, J.M. y A. Becerra-Terón: A Relational Algebra for FunctionalLogic Deductive Databases. En Proc. of 5th International Conference on Perspectivesof System Informatics, PSI03, número 2890 en LNCS, páginas 494–508. Springer, 2003.

[2] Apt, K. R., H. A. Blair y A. Walker: Towards a Theory of Declarative Knowledge.Foundations of deductive databases and logic programming, páginas 89–148, 1988.

[3] Arni, F., K. Ong, S. Tsur, H. Wang y C. Zaniolo: The Deductive Database SystemLDL++. TPLP, 3(1):61–94, 2003.

[4] Arruabarrena, R., P. Lucio y M. Navarro: A Strong Logic Programming View for StaticEmbedded Implications. En Foundations of Software Science and Computation Structu-re, Second International Conference, FoSSaCS’99, Held as Part of the European JointConferences on the Theory and Practice of Software, ETAPS’99, Amsterdam, The Net-herlands, March 22-28, 1999, Proceedings, páginas 56–72, 1999.

[5] Balbin, I., G. S. Port, K. Ramamohanarao y K. Meenakshi: Efficient Bottom-UP Compu-tation of Queries on Stratified Databases. J. Log. Program., 11(3&4):295–344, 1991.

[6] Balbin, I. y K. Ramamohanarao: A Generalization of the Differential Approach to Re-cursive Query Evaluation. J. Log. Program., 4(3):259–262, 1987.

[7] Balmin, A., T. Papadimitriou y Y. Papakonstantinou: Hypothetical Queries in an OLAPEnvironment. En Proceedings of the 26th International Conference on Very Large DataBases, VLDB ’00, páginas 220–231, San Francisco, CA, USA, 2000. Morgan Kauf-mann Publishers Inc., ISBN 1-55860-715-3. http://dl.acm.org/citation.cfm?id=645926.672016.

[8] Bancilhon, F., D. Maier, Y. Sagiv y J. D Ullman: Magic sets and other strange waysto implement logic programs. En PODS ’86: Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems, páginas 1–15, New York, NY,USA, 1986. ACM, ISBN 0-89791-179-2.

[9] Baudinet, M., M. Nizette y P. Wolper: On the Representation of Infinite Temporal Dataand Queries, 1991.

[10] Beeri, C. y R. Ramakrishnan: On the power of magic. Journal of Logic Programming,10(3-4):255–299, 1991, ISSN 0743-1066.

[11] Bertino, E., B. Catania y R. Gori: Enhancing the Expressive Power of the U-DatalogLanguage. Theory and Practice of Logic Programming, 1(1):105–122, 2001.

101

http://dl.acm.org/citation.cfm?id=645926.672016

http://dl.acm.org/citation.cfm?id=645926.672016

[12] Bocca, J. B.: MegaLog - A platform for developing Knowledge Base Management Sys-tems. En Makinouchi, Akifumi (editor): Database Systems for Advanced Applications’91, Proceedings of the Second International Symposium on Database Systems for Ad-vanced Applications, Tokyo, Japan, April 2-4, 1991, volumen 2 de Advanced DatabaseResearch and Development Series, páginas 374–380. World Scientific, 1991.

[13] Bonner, A. J.: Hypothetical Datalog: Complexity and Expressibility. Theoretical Com-puter Science, 76:144–160, 1988.

[14] Bonner, A. J.: A Logical Semantics For Hypothetical Rulebases With Deletion, 1997.

[15] Bonner, A. J. y L. T. McCarty: Adding Negation-as-Failure to Intuitionistic Logic Pro-gramming. En Lusk, Ewing L. y Ross A. Overbeek (editores): Logic Programming,Proc. of the North American Conference, páginas 681–703. The MIT Press, 1989.citeseer.ist.psu.edu/bonner92adding.html.

[16] Bonner, A. J., L. T. McCarty y K. Vadaparty: Expressing Database Queries with In-tuitionistic Logic. En Lusk, Ewing L. y Ross A. Overbeek (editores): Proceedingsof the North American Conference on Logic Programming, páginas 831–850, 1989,ISBN 0-262-62064-2. citeseer.ist.psu.edu/bonner89expressing.html.

[17] Bry, F., H. Decker y R. Manthey: A Uniform Approach to Constraint Satisfaction andConstraint Satisfiability in Deductive Databases. En EDBT ’88: Proceedings of theInternational Conference on Extending Database Technology, páginas 488–505, London,UK, 1988. Springer-Verlag, ISBN 3-540-19074-0.

[18] Byon, J.H. y P. Z. Revesz: DISCO: A Constraint Database System with Sets. EnIn CONTESSA Workshop on Constraint Databases and Applications, páginas 68–83.Springer-Verlag, 1995.

[19] Cabeza, D. y M. Hermenegildo: The Ciao Module System: A New Module System forProlog . ENTCS, 30(3):122 – 142, 2000, ISSN 1571-0661. Parallelism and Implemen-tation Technology for (Constraint) Logic Programming (in connection with ICLP’99,International Conference on Logic Programming).

[20] Calì, A., G. Gottlob y T. Lukasiewicz: Datalog˚: a unified approach to ontolo-gies and integrity constraints. En ICDT ’09: Proceedings of the 12th InternationalConference on Database Theory, páginas 14–30, New York, NY, USA, 2009. ACM,ISBN 978-1-60558-423-2.

[21] Chandra, A. K. y D. Harel: Horn Clauses Queries and Generalizations. J. Log. Program.,2(1):1–15, 1985.

[22] Chang, C.L.: DEDUCE 2: Further Investigation of Deduction in Relational Data Ba-ses. En IBM, Res.R. No.RJ2147, San Jose; ACM Computing Reviews 40,416. ACMComputing Reviews, Mayo 1978.

[23] Chimenti, D., R. Gamboa, R. Krishnamurthy, S. Naqvi, S. Tsur y C. Zaniolo: The LDLSystem Prototype. IEEE Transactions on Knowledge and Data Engineering, 2:76–90,1990.

[24] Christiansen, H. y T. Andreasen: A Practical Approach to Hypothetical Database Que-ries. En Transactions and Change in Logic Databases, volumen 1472 de LNCS, páginas340–355. Springer, 1998, ISBN 3-540-65305-8.

102

citeseer.ist.psu.edu/bonner92adding.html

citeseer.ist.psu.edu/bonner89expressing.html

[25] Codd, E. F.: Data Base Sublanguage Founded on the Relational Calculus. IBM Re-search Report, San Jose, California, RJ893, 1971.

[26] Codd, E. F.: A Database Sublanguage Founded on the Relational Calculus. En Pro-ceedings of 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control,San Diego, California, November 11-12, 1971, páginas 35–68, 1971.

[27] Codd, E. F.: Relational Completeness of Data Base Sublanguages. In: R. Rustin (ed.):Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose,California, 1972. db/labs/ibm/RJ987.html.

[28] Codd, E.F.: A Relational Model for Large Shared Databanks. Communications of theACM, 13(6):377–390, June 1970.

[29] Correas, J., J. M. Gómez, M. Carro, D. Cabeza y M. V. Hermenegildo: A GenericPersistence Model for (C)LP Systems (and Two Useful Implementations). En Jayara-man, Bharat (editor): Practical Aspects of Declarative Languages, 6th InternationalSymposium, PADL 2004, Dallas, TX, USA, June 18-19, 2004, Proceedings, volu-men 3057 de Lecture Notes in Computer Science, páginas 104–119. Springer, 2004,ISBN 3-540-22253-7. http://dx.doi.org/10.1007/978-3-540-24836-1_8.

[30] Date, C. J.: SQL and relational theory: how to write accurate SQL code. O’Reilly,Sebastopol, CA, 2009.

[31] Dell’Armi, T., W. Faber, G. Ielpa, N. Leone y G. Pfeifer: Aggregate Functions in DLV.En Answer Set Programming 2003 (SP03), Messina, Sicily, september, 26-28 2003.

[32] Emden, M. H. Van y R. A. Kowalski: The Semantics of Predicate Logic as a Program-ming Language. J. ACM, 23(4):733–742, 1976.

[33] Finkelstein, S. J., N. Mattos, I. S. Mumick y H. Pirahesh: Expressing Recursive Queriesin SQL. Informe técnico, ISO, 1996.

[34] García-Díaz, M. y S. Nieva: Solving Constraints for an Instance of an Extended CLPLanguage over a Domain based on Real Numbers and Herbrand Terms. Journal ofFunctional and Logic Programming, 2003(2), September 2003.

[35] García-Díaz, M. y S. Nieva: Providing Declarative Semantics for HH Extended ConstraintLogic Programs. En Proceedings of the 6th ACM SIGPLAN Int. Conf. on PPDP, páginas55 – 66, 2004.

[36] Garcia-Molina, H., J. D. Ullman y J. Widom: Database systems - the complete book(2. ed.). Pearson Education, 2009, ISBN 978-0-13-187325-4.

[37] Gelfond, M. y V. Lifschitz: The Stable Model Semantics For Logic Programming. EnICLP/SLP, páginas 1070–1080. MIT Press, 1988.

[38] Golfarelli, M. y S. Rizzi: What-if Simulation Modeling in Business Intelligence. IJDWM,5(4):24–43, 2009.

[39] Greco, S.: Dynamic Programming in Datalog with Aggregates. IEEE Trans. on Knowl.and Data Eng., 11(2):265–283, 1999, ISSN 1041-4347.

103

db/labs/ibm/RJ987.html

http://dx.doi.org/10.1007/978-3-540-24836-1_8

[40] Green, C. C. y B. Raphael: The use of theorem-proving techniques in question-answeringsystems. En ACM ’68: Proceedings of the 1968 23rd ACM national conference, páginas169–181, New York, NY, USA, 1968. ACM.

[41] Green, T. J.: LogiQL: A Declarative Language for Enterprise Applications. En Pro-ceedings of the 34th ACM Symposium on Principles of Database Systems, PODS’15, páginas 59–64, New York, USA, 2015. ACM, ISBN 978-1-4503-2757-2. http://doi.acm.org/10.1145/2745754.2745780.

[42] Griffin, T. y R. Hull: A Framework for Implementing Hypothetical Queries. En SIGMODConference, páginas 231–242, 1997.

[43] Grosky, W. I. y R. Mehrotra: Introduction: Image Database Management. Computer,22(12):7–8, 1989, ISSN 0018-9162.

[44] Grumbach, S., P. Rigaux, L. Chesnay y L. Segoufin: Spatio-Temporal Data Handlingwith Constraints, 1998.

[45] Grumbach, S., P. Rigaux, L. Chesnay y L. Segoufin: The DEDALE System for ComplexSpatial Queries, 1998.

[46] Günther, O. y J. Bilmes: Tree-Based Access Methods for Spatial Databases: Implemen-tation and Performance Evaluation. IEEE Trans. on Knowl. and Data Eng., 3(3):342–356, 1991, ISSN 1041-4347.

[47] Hansen, M. R., B. S. Hansen, P. Lucas y P. van Emde Boas: Integrating relational data-bases and constraint languages. Comput. Lang., 14(2):63–82, 1989, ISSN 0096-0551.

[48] Hargrove, W. W., R. H. Gardner, M. G. Turner, W. H. Romme y D. G. Despain: Simu-lating fire patterns in heterogeneous landscapes, 2000.

[49] Henschen, L. J. y S. A. Naqvi: On compiling queries in recursive first-order databases.J. ACM, 31(1):47–85, 1984, ISSN 0004-5411.

[50] Inmon, W. H.: Building the data warehouse. QED Information Sciences, Inc., Wellesley,MA, USA, 2005.

[51] ISO/IEC: SQL:1986 ISO/IEC 1986 Standard, 1986.

[52] Jaffar, J. y J. L. Lassez: Constraint Logic Programming. En 14th ACM Symp. onPrinciples of Programming Languages (POPL’87), páginas 111–119, Munich, Germany,January 1987. ACM Press.

[53] Jagadish, H. V.: A retrieval technique for similar shapes. En SIGMOD ’91: Proceedingsof the 1991 ACM SIGMOD international conference on Management of data, páginas208–217, New York, NY, USA, 1991. ACM, ISBN 0-89791-425-2.

[54] Jarke, M., M. A. Jeusfeld y C. Quix: ConceptBase V7.1 User Manual. Informe técnico,RWTH Aachen, April 2008.

[55] Jeusfeld, M. y M. Jarke: From Relational to Object-Oriented Integrity Simplification,1991.

[56] Jeusfeld, M. y M. Staudt: Query Optimization in Deductive Object Bases, 1993.

104

http://doi.acm.org/10.1145/2745754.2745780

http://doi.acm.org/10.1145/2745754.2745780

[57] Kabanza, F., J m. Stevenne y P. Wolper: Handling Infinite Temporal Data. En Journalof Computer and System Sciences, páginas 392–403, 1990.

[58] Kanellakis, P. C., G. M. Kuper y P. Z. Revesz: Constraint Query Languages. EnSymposium on Principles of Database Systems, páginas 299–313, 1990.

[59] Kanjamala, P., P. Z. Revesz y Y. Wang: MLPQ/GIS: A GIS using Linear ConstraintDatabases. En Proc. Ninth International Conference on Management of Data, páginas389–393. McGraw Hill, 1998.

[60] Kellogg, C. y L. Travis: Reasoning with Data in a Deductively Augmented Data Mana-gement System. En Advances in Data Base Theory, páginas 261–295, 1979.

[61] Kemp, D. B., K. Ramamohanarao, I. Balbin y K. Meenakshi: Propagating Constraintsin Recusive Deduction Databases. En NACLP, páginas 981–998, 1989.

[62] Kowalski, R. A.: Logic for Data Description. En Logic and Data Bases, páginas 77–103,1977.

[63] Kowalski, R. A., F. Sadri y P. Soper: Integrity Checking in Deductive Databases. En InProceedings of the VLDB International Conference, páginas 61–69. Morgan KaufmannPublishers, 1987.

[64] Kuper, G., L. Libkin y J. Paredaens (editores): Constraint Databases. Springer, 2000.

[65] Lam, M. S., J. Whaley, V. B. Livshits, M. C. Martin, D. Avots, M. Carbin y C. Unkel:Context-sensitive program analysis as database queries. En Li, Chen (editor): PODS,páginas 1–12. ACM, 2005, ISBN 1-59593-062-0.

[66] Leach, J., S. Nieva y M. Rodríguez-Artalejo: Constraint Logic Programming with Here-ditary Harrop Formulas. TPLP, 1(4):409–445, 2001.

[67] Lefebvre, A.: Towards an Efficient Evaluation of Recursive Aggregates in DeductiveDatabases. New Generation Comput., 12(2):131–160, 1994.

[68] Leone, N., G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri y F. Scarcello: TheDLV system for knowledge representation and reasoning. ACM Trans. Comput. Log.,7(3):499–562, 2006.

[69] Lifschitz, V.: Introduction to Answer Set Programming. Introductory course at the 16thEuropean Summer School in Logic, Language and Information. Unpublished Draft,2004. Available at www.cs.utexas.edu/users/vl/mypapers/esslli.ps, 2004.

[70] Maher, M. J. y R. Ramakrishnan: D’eja vu in fixpoints of logic programs. En in Pro-ceedings of the North American Conference on Logic Programming, páginas 963–980.The MIT Press, 1989.

[71] Maluszynski, J. y A. Szalas: Logical Foundations and Complexity of 4QL, a QueryLanguage with Unrestricted Negation. CoRR, abs/1011.5105, 2010. http://arxiv.org/abs/1011.5105.

[72] McCarty, L. Thorne: Clausal Intuitionistic Logic I - Fixed-Point Semantics. J. Log.Program., 5(1):1–31, 1988.

105

http://arxiv.org/abs/1011.5105

http://arxiv.org/abs/1011.5105

[73] Melton, J. y A. R. Simon: SQL: 1999 - Understanding Relational Language Compo-nents. Morgan Kaufmann, Mayo 2001, ISBN 1558604561.

[74] Miller, D.: A Logic Programming Language with Lambda-Abstraction, Function Va-riables, and Simple Unification. J. Log. Comput., 1(4):497–536, 1991. http://dx.doi.org/10.1093/logcom/1.4.497.

[75] Miller, D., G. Nadathur, F. Pfenning y A. Scedrov: Uniform Proofs as a Foundation forLogic Programming. Annals of Pure and Applied Logic, 51:125–157, 1991.

[76] Minker, J.: Perspectives in Deductive Databases. En PODS, página 135, 1987.

[77] Minker, J.: Logic and Databases: A 20 Year Retrospective. En Logic in Databases,páginas 3–57, 1996.

[78] Minker, J. y J. M. Nicolas: On recursive axioms in deductive Databases. InformationSystems, 8(1):1–13, 1983.

[79] Morris, K. A., J. F. Naughton, Y. P. Saraiya, J. D. Ullman y A. Van Gelder: YAWN!(Yet Another Window on NAIL!). IEEE Data Eng. Bull., 10(4):28–43, 1987.

[80] Naqvi, S. y S. Tsur: A logical language for data and knowledge bases. Computer SciencePress, Inc., New York, NY, USA, 1989, ISBN 0-7167-8200-6.

[81] Nash, J. C.: The (Dantzig) Simplex Method for Linear Programming. Computing inScience and Engg., 2(1):29–31, 2000, ISSN 1521-9615.

[82] Naughton, J. F. y R. Ramakrishnan: How to forget the past without repeating it. Journalof ACM, 41(6):1151–1177, 1994, ISSN 0004-5411.

[83] Nieva, S., F. Sáenz-Pérez y J. Sánchez: Formalizing a Constraint Deductive DatabaseLanguage based on Hereditary Harrop Formulas with Negation. En Proc. 9th Interna-tional Symposium on Functional and Logic Programming (FLOPS’08), volumen 4989de LNCS, páginas 289–304. Springer Verlag, 2008.

[84] Ordonez, C.: Optimization of Linear Recursive Queries in SQL. IEEE Transactions onKnowledge and Data Engineering, 22(2):264–277, 2010, ISSN 1041-4347.

[85] Ramakrishnan, R.: Magic Templates: A Spellbinding Approach to Logic Programs. EnJournal of Logic Programming, páginas 140–159, 1988.

[86] Ramakrishnan, R., D. Srivastava y S. Sudarshan: Controlling the Search in Bottom-Up Evaluation. En In Joint Intl. Conference and Symposium on Logic Programming,páginas 273–287, 1992.

[87] Ramakrishnan, R., D. Srivastava y S. Sudarshan: CORAL—Control, Relations and Lo-gic. En In Proceedings of the International Conference on Very Large Databases, páginas238–250, 1992.

[88] Ramakrishnan, R., D. Srivastava, S. Sudarshan y P. Seshadri: Implementation of theCORAL Deductive Database System, 1993.

[89] Ramakrishnan, R. y J.D. Ullman: A survey of research on Deductive Databases. TheJournal of Logic Programming, 23(2):125–149, 1993.

106

http://dx.doi.org/10.1093/logcom/1.4.497

http://dx.doi.org/10.1093/logcom/1.4.497

[90] Ramalingam, G. y E. Visser (editores): Proceedings of the 2007 ACM SIGPLAN Works-hop on Partial Evaluation and Semantics-based Program Manipulation, 2007, Nice,France, January 15-16, 2007. ACM, 2007, ISBN 978-1-59593-620-2.

[91] Ramamohanarao, K. y J. Harland: An introduction to Deductive Database Languagesand Systems. The VLDB Journal, 3(2):107–122, 1994, ISSN 1066-8888.

[92] Reiter, R.: Towards a Logical Reconstruction of Relational Database Theory. En OnConceptual Modelling (Intervale), páginas 191–233, 1982.

[93] Reiter, R.: A Theory of Diagnosis from First Principles. Artif. Intell., 32(1):57–95, Abril1987, ISSN 0004-3702. http://dx.doi.org/10.1016/0004-3702(87)90062-2.

[94] Revesz, P. Z.: Refining Restriction Enzyme Genome Maps. Constraints, 2(3/4):361–375,1997, ISSN 1383-7133.

[95] Revesz, P. Z.: Datalog and Constraints. En Kuper, G., L. Libkin y J. Paredaens (edito-res): Constraint Databases, capítulo 7, páginas 151–174. Springer, 2000.

[96] Revesz, P. Z.: Introduction to Constraint Databases. Springer, 2002.

[97] Revesz, P.Z. y Yiming Li: MLPQ: a linear constraint database system with aggregateoperators. Database Engineering and Applications Symposium, International, 0:132,1997.

[98] Robinson, J. A.: A Machine-Oriented Logic Based on the Resolution Principle. J. ACM,12(1):23–41, 1965, ISSN 0004-5411.

[99] Ronen, R. y O. Shmueli: Evaluating very large datalog queries on social net-works. En EDBT ’09: Proceedings of the 12th International Conference on Ex-tending Database Technology, páginas 577–587, New York, NY, USA, 2009. ACM,ISBN 978-1-60558-422-5.

[100] Ross, K. A.: Modular stratification and magic sets for Datalog programs with negation.J. ACM, 41(6):1216–1266, 1994, ISSN 0004-5411.

[101] Sáenz-Pérez, F.: Implementing Tabled Hypothetical Datalog. En Proceedings of the25th IEEE International Conference on Tools with Artificial Intelligence, ICTAI’13,November 2013.

[102] Sáenz-Pérez, F.: Towards Bridging the Expressiveness Gap Between Relational andDeductive Databases. En XIII Jornadas sobre Programación y Lenguajes, PROLE2013(SISTEDES), September 2013.

[103] Sáenz-Pérez, F.: Datalog Educational System Version 3.10, January 2015.http://des.sourceforge.net.

[104] Sagonas, K., T. Swift y D. S. Warren: XSB as an Efficient Deductive Database Engine.En In Proceedings of the ACM SIGMOD International Conference on the Managementof Data, páginas 442–453. ACM Press, 1994.

[105] Salomon, A.: Implementation of a Database System with Boolean Algebra Constraints.Tesis de Doctorado, University of Nebraska, 1998.

107

http://dx.doi.org/10.1016/0004-3702(87)90062-2

[106] Shah, J. J. y M. Mantyla: Parametric and Feature Based CAD/Cam: Concepts, Tech-niques, and Applications. John Wiley & Sons, Inc., New York, NY, USA, 1995,ISBN 0471002143.

[107] Shepherdson, J.C.: Negation in Logic Programming. En Minker, J. (editor): Foundationsof Deductive Databases and Logic Programming, páginas 19–88. Kaufmann, Los Altos,CA, 1988.

[108] Shih, C. y S.W. Dietrich: Extension Table Evaluation of Datalog Programs with Nega-tion. En Proceedings of the IEEE International Phoenix Conference on Computers andCommunications, volumen AZ, páginas 792–798. Scottsdale, March 1991.

[109] Shkapsky, A., M. Yang y C. Zaniolo: Optimizing recursive queries with monotonic ag-gregates in DeALS. En Gehrke, Johannes, Wolfgang Lehner, Kyuseok Shim, Sang KyunCha y Guy M. Lohman (editores): 31st IEEE International Conference on Data Engi-neering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, páginas 867–878. IEEE,2015.

[110] Silberschatz, A., H. Korth y S. Sudarshan: Database Systems Concepts. McGraw-Hill,Inc., New York, USA, 5a edición, 2006, ISBN 0072958863, 9780072958867.

[111] Srivastava, D., R. Ramakrishnan, P. Seshadri y S. Sudarshan: Coral++: Adding Object-Orientation to a Logic Database Language. En In Proceedings of the InternationalConference on Very Large Data Bases, páginas 158–170. Morgan Kaufmann Publishers,Inc, 1993.

[112] Stonebraker, M. y K. Keller: Embedding Expert Knowledge and Hypothetical Data Ba-ses into a Data Base System. En The 1980 ACM SIGMOD International Conference onManagement of Data, SIGMOD ’80, páginas 58–66. ACM, 1980, ISBN 0-89791-018-4.http://doi.acm.org/10.1145/582250.582261.

[113] Sudarshan, S. y R. Ramakrishnan: Aggregation and Relevance in Deductive Databases.En In Proceedings of the International Conference on Very Large Databases, páginas501–511, 1991.

[114] Sudarshan, S. y R. Ramakrishnan: Optimizations of bottom-up evaluation with non-ground terms: extended abstract. En ILPS ’93: Proceedings of the 1993 internationalsymposium on Logic programming, páginas 557–574, Cambridge, MA, USA, 1993. MITPress, ISBN 0-262-63152-0.

[115] Sudarshan, S., D. Srivastava, R. Ramakrishnan y J. F. Naughton: Space Optimization inthe Bottom-Up Evaluation of Logic Programs. En in: Proc. SIGMOD, páginas 5370–6,1990.

[116] Tamaki, H. y T. Sato: OLD resolution with tabulation. En Proceedings on Thirdinternational conference on logic programming, páginas 84–98, New York, NY, USA,1986. Springer-Verlag New York, Inc., ISBN 0-387-16492-8.

[117] Tarski, A.: A Decision Method for Elementary Algebra and Geometry. University ofCalifornia Press, 1951.

[118] Tarski, A.: A lattice-theoretical fixpoint theorem and its applications. Pacific Journalof Mathematics, 5:285–309, 1955.

108

http://doi.acm.org/10.1145/582250.582261

[119] Triska, M.: Generalising Constraint Solving over Finite Domains. En Proceedings ofthe 24th International Conference on Logic Programming, ICLP ’08, páginas 820–821,Berlin, Heidelberg, 2008. Springer-Verlag, ISBN 978-3-540-89981-5. http://dx.doi.org/10.1007/978-3-540-89982-2_89.

[120] Tsur, S. y C. Zaniolo: LDL: A Logic-Based Data Language. En Chu, Wesley W., GeorgesGardarin, Setsuo Ohsuga y Yahiko Kambayashi (editores): VLDB’86 12th InternationalConference on Very Large Data Bases, August 25-28, 1986, Kyoto, Japan, Proceedings,páginas 33–41. Morgan Kaufmann, 1986, ISBN 0-934613-18-4.

[121] Ullman, J.D.: Implementation of Logical Query Languages for Databases. ACM Trans.Database Syst., 10(3):289–321, 1985.

[122] Ullman, J.D.: Database and Knowledge-Base Systems Vols. I (Classical Database Sys-tems) and II (The New Technologies). Computer Science Press, 1995.

[123] Vaghani, J., K. Ramamohanarao, D. B. Kemp, Z. Somogyi y P.J. Stuckey: DesignOverview of the Aditi Deductive Database System. En Proceedings of the SeventhInternational Conference on Data Engineering, páginas 240–247, Washington, DC, USA,1991. IEEE Computer Society, ISBN 0-8186-2138-9.

[124] Van Gelder, A., K. A. Ross y J. S. Schlipf: The well-founded semantics for generallogic programs. J. ACM, 38(3):619–649, 1991, ISSN 0004-5411.

[125] Vieille, L.: Recursive Query Processing: Fundamental Algorithms and the DedGin Sys-tem. En Prolog and Databases, páginas 135–158. World Scientific, 1988.

[126] Wang, H. y C. Zaniolo: Aggregates in Recursive Datalog and SQL3 Queries. In-forme técnico 980043, IEEE Computer Society, 1998. citeseer.ist.psu.edu/wang98aggregates.html.

[127] Warren, D. S.: The XWAM: A machine that integrates prolog and deductive databases.En Technical Report, 1989.

[128] Whitney, H.: Geometric integration theory. Princeton University Press, Princeton, N. J.,1957.

[129] Wielemaker, J.: An overview of the SWI-Prolog Programming Environment. En Mes-nard, Fred y Alexander Serebenik (editores): Proceedings of the 13th InternationalWorkshop on Logic Programming Environments, páginas 1–16, 2003.

[130] Zaniolo, C.: Key Constraints and Monotonic Aggregates in Deductive Databases.En Computational Logic: Logic Programming and Beyond, Essays in Honour of Ro-bert A. Kowalski, Part II, páginas 109–134, London, UK, 2002. Springer-Verlag,ISBN 3-540-43960-9.

[131] Zaniolo, C., N. Arni y K. Ong: Negation and Aggregates in Recursive Rules: the LDL++Approach, 1993.

[132] Zhang, Y., H. Chen, H. Sheng y Z. Wu: Applying Hypothetical Queries to E-CommerceSystems to Support Reservation and Personal Preferences. En Proceedings of the 11thInternational Database Engineering and Applications Symposium, IDEAS ’07, páginas46–53, Washington, DC, USA, 2007. IEEE Computer Society, ISBN 0-7695-2947-X.http://dx.doi.org/10.1109/IDEAS.2007.15.

109

http://dx.doi.org/10.1007/978-3-540-89982-2_89

http://dx.doi.org/10.1007/978-3-540-89982-2_89

citeseer.ist.psu.edu/wang98aggregates.html

citeseer.ist.psu.edu/wang98aggregates.html

http://dx.doi.org/10.1109/IDEAS.2007.15

[133] Zhou, G., H. Chen y Y. Zhang: Hypothetical Queries on Multidimensional Dataset. EnWang, Shouyang, Lean Yu, Fenghua Wen, Shaoyi He, Yong Fang y K. K. Lai (editores):BIFE, páginas 539–543. IEEE Computer Society, 2009, ISBN 978-0-7695-3705-4.

110

Parte II

Publicaciones

113

Capítulo 5

Publicaciones asociadas alsegundo capítulo

[A.1] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez.Implementing a Fixpoint Semantics for a Constraint DeductiveDatabase based on Hereditary Harrop Formulas.En Procedings of the 11th International ACM SIGPLAN Symposium of Principles andPractice of Declarative Programing (PPDP’09), páginas 117–128. ACM Press, 2009.! Página 116

[A.2] G. Aranda, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Incorporating Integrity Constraints to a Deductive Database System.En XI Jornadas sobre Programación y Lenguajes, PROLE2011 (SISTEDES)editores: Purificación Arenas, Victor M. Gulías y Pablo Nogueira, páginas 141–152,Septiembre, 2011.! Página 128

[A.3] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.An Extended Constraint Deductive Database: Theory and imple-mentation.The Journal of Logic and Algebraic Programming, volumen 21, páginas 20–52, 2013.! Página 140

115

Implementing a Fixpoint Semantics for a ConstraintDeductive Database based on Hereditary Harrop Formulas

Gabriel Aranda-LopezDpto. de Sistemas Informaticos y

Computacion,Univ. Complutense de Madrid

[email protected]

Susana NievaDpto. de Sistemas Informaticos y

Computacion,Univ. Complutense de Madrid

[email protected]

Fernando Saenz-PerezDpto. de Ingenierıa del Software e

Inteligencia Artificial,Univ. Complutense de Madrid

[email protected]

Jaime Sanchez-HernandezDpto. de Sistemas Informaticos y Computacion,

Univ. Complutense de [email protected]

AbstractThis work is aimed to show a concrete implementation of a de-ductive database system based on the scheme HH¬(C) (HereditaryHarrop Formulas with Negation and Constraints) following a fix-point semantics proposed in a previous work. We have developed aProlog implementation for this scheme that is constraint system in-dependent, therefore allowing to use it as a base for any instanceof the formal scheme. We have developed several specific con-straint systems: Real numbers, integers, Boolean and user-definedenumerated types. We have added types to the database so that rela-tions become typed (as tables in relational databases) and each con-straint is mapped to its corresponding constraint system. The pred-icates that compute the fixpoint giving the meaning to a databaseare described. In particular, we show the implementation of a forc-ing relation (for derivation steps) and highlight how the inherentdifficulties have been overcome in a system allowing hypotheticalqueries, which make the database dynamically grow.

Categories and Subject Descriptors CR-number [subcategory]:third-level

General Terms Algorithms, Languages, Performance, Theory.

Keywords Hereditary Harrop Formulas, Deductive Databases,Stratification, Constraints, Fixpoint Semantics, Prolog.

1. IntroductionDeductive databases (DDBs) and their query languages have re-ceived a great deal of attention recently in many areas, includingontologies [Fikes et al. 2004], the semantic web [Calı et al. 2009],social networks [Ronen and Shmueli 2009], and policy languages[Becker et al. 2007]. The high level expressivity of logic-based

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.PPDP’09, September 7–9, 2009, Coimbra, Portugal.Copyright c© 2009 ACM 978-1-60558-568-0/09/09. . . $5.00

query languages has been therefore acknowledged as a useful fea-ture for handling knowledge-based information systems. In partic-ular, Datalog (along its extensions), from which many current ref-erences can be found, is playing the role of a renowned language inthose settings.

Current deductive database systems (such as, e.g., XSB [Sag-onas et al. 1994] –with inputs from the company XSB, Inc.– bd-dbddb [Lam et al. 2005], LDL++ [Arni et al. 2003], DES [Saenz-Perez 2009], ConceptBase [Jarke et al. 2008], .QL [Ramalingamand Visser 2007] –developed by Semmle, Ltd.– and DLV [Leoneet al. 2006]) lack several features we provide in the schemeHH¬(C) (Hereditary Harrop formulas with Negation and Con-straints) [Nieva et al. 2008]. These features are helpful for knowl-edge systems in which more expressive ways of posing queriesare needed. The scheme HH(C) was presented in [Leach et al.2001] as an extension of traditional LP (Logic Programming). Theone hand, hereditary Harrop formulas extend Horn logic allowingdisjunctions, intuitionistic implications and universal quantifiers,improving the expressivity; the other hand, it incorporates the ad-vantages of constraints. Then, HH¬(C) was obtained by addingnegation to the previous scheme in order to conform to the foun-dations for a DDB, that extends Datalog in the two orthogonaldirections, just mentioned.

In our system, a database is a logic program: a set of facts(ground atoms) defining the extensional database, and a set ofclauses, defining the intensional database. Clauses can be seen asthe definition of views in relational databases. The evaluation ofa query with respect to a deductive database can be seen as thecomputation of a goal from a program (database), and the answer isa constraint. Since the constraint domain is parametric, it is possibleto consider different instances (such as arithmetical constraints overreal numbers and finite domain constraints).

Let us show the expressivity of our language with the followingexample written in an instance that allows both real and finitedomain constraints.

EXAMPLE 1. Consider the following extensional database for abank. We follow a syntax similar to Prolog. In addition we writenot for negation, => for implication, ex(X,G) representing ∃X G,and fa(X,G) representing ∀X G. Some other details of the syntaxare deferred to next sections.

116

% client(Name, Balance, Salary)client(smith,2000,1200).client(brown,1000,1500).client(mcandrew,5300,3000).

% pastDue(Name, Amount)pastDue(smith,3000).pastDue(mcandrew,100).

% mortgageQuote(Name, Quote)mortgageQuote(brown,400).mortgageQuote(mcandrew,100).

where we assume that each client has, at most, a mortgage quote.Moreover, we can define the following views.

% hasMortgage(Name)hasMortgage(N):- ex(Q,mortgageQuote(N,Q)).

A debtor is a client who has a past due with an amount greater thanhis balance.

% debtor(Name)debtor(N):-client(N,B,S),pastDue(N,A),A>B.

The interest rate that is applicable to a client is specified by thenext relation:

% interestRate(Name, Rate)interestRate(N,2):-client(N,B,S),B<1200.

interestRate(N,5):-client(N,B,S),B>=1200.

The next relation specifies that a non-debtor client can be givena new mortgage in two situations. First, if he has no mortgage, amortgage quote smaller than the 40% of his salary can be given.And, second, if he has a mortgage quote already, then the sum ofthis quote and the new one has to be smaller than that percentage.

% newMortgage(Name, Quote)newMortgage(N,Q) :-client(N,B,S),not(debtor(N)),not(hasMortgage(N,Q1)),Q<=0.4*S.

newMortgage(N,Q) :-client(N,B,S),not(debtor(N)),mortgageQuote(N,Q2),Q+Q2<=0.4*S.

% getMortgage(Name)getMortgage(N):- ex(Q,newMortgage(N,Q)).

If the client satisfies the requirements to be given a new mort-gage, then it is possible to apply for a personal credit, whoseamount is smaller than 6000. Otherwise, if the client does notsatisfy that requirements, the amount must be between 6000 and20000.

% personalCredit(Name, Amount)personalCredit(N,A) :-

(getMortgage(N),A<6000)

;(not(getMortgage(N)),A>=6000,A<20000).

For this database, we can query whether every client is a debtor:

fa(N,debtor(N)).

The answer is false.Moreover it is possible to ask, for example, the quote and the

salary of clients whose mortgage quote is greater than 100 with thenext query:

ex(B,client(N,B,S),mortgageQuote(N,Q),(Q>=100)).

The answer constraint, that provides such information is the fol-lowing disjunction:

(Q=400, S=1500, N=brown);(Q=100, S=3000, N=mcandrew).

For knowing whether there are debtors with a past due amountgreater than 1000, the following query can be formulated:

ex(N,ex(A,(debtor(N),pastDue(N,A),(A>1000)))).

and the answer is true. Note that we are using quantifiers for Nand A, meaning that there are no explicit conditions over them.Otherwise, the answer will be a constraint over them.

The next query corresponds to the question: if for a clientwe assume that has a balance greater than 2000, what would theinterest rate be?

fa(N,ex(S,ex(B,(client(N,B,S) =>B>2000 => interestRate(N,R))))).

the answer is the constraint R=5. We are using nested implication toformulate hypothetical queries, in which we can assume both factsand constraints.

The next query involves negation and represents which clientscan get a mortgage quote of 400 but not a personal credit.

newMortgage(N,400), not(personalCredit(N,A)).

And the answer is the constraint:

(N=mcandrew, A>=6000, A<20000)

This constraint means that it is possible to give a new mortgage toclient McAndrew but it is not possible to give him a personal creditof an amount between 6000 and 20000. �

In this paper, we present an implementation of the fixpoint se-mantics presented in [Nieva et al. 2008], which is independent ofthe concrete constraint system. Also, we use a type system for iden-tifying the constraint system to which each constraint in a databasebelongs. We propose several constraint systems as instances ofHH¬(C) and their solvers. And we explain how they are imple-mented.

The semantics of a database is computed as a set of pairs(A,C), where A is an atom and C a constraint, that can be deducedfrom both the extensional and intensional parts of the database. Acan be understood as a n-ary relation instance, where their argu-ments are constrained by C. These pairs are computed by strata,classifying predicates by strata with a new form of stratificationdriven by both negations and implications occurring in rules. Eachstratum should become saturated before trying to saturate any otherhigher stratum. However, as an implication may occur in a goal, thecomputation must take into account that the database is augmentedwith the hypothesis posed in the implication antecedent. Therefore,

117

a fixpoint computation has to be started from scratch since newpairs may be added at lower strata. So-nested subcomputations adda new complexity level with respect to usual bottom-up computa-tions in deductive databases without implications.

Another complexity source comes again from implications,since the variables in D ⇒ G can occur both in D and G. When adatabase ∆ is augmented with the local clause D, those variablesmust be distinguished from other instances of the same variablesin ∆. To this end, we recourse to Prolog attributed variables toidentify them.

Finally, in order to find a stratification for ensuring finitenessof computations, a new dependency graph is described using amutually recursive definition between the dependencies introducedby goals and clauses.

The rest of the paper is organized as follows. Section 2 recallssyntactical notions, the stratification needed for classifying predi-cates into strata due to both negation and implication, as well asstratified interpretations and the forcing relation. Section 3 intro-duces a user-oriented description of the system and the compu-tation stages of the implementation. Section 4 describes the typesystem, constraint systems, their solvers and how they are imple-mented. Section 5 explains how the fixpoint semantics has been im-plemented by successive applications of an operator, which in turnimplements the forcing relation of HH¬(C). Section 6 describes anew form of the dependency graph needed to implement the forc-ing of the implication. Section 7 shows an actual, running exampleof the system in its current form. Section 8 summarizes some con-clusions and sketches some future work.

2. PreliminariesHere, we recall the foundations, presented in [Nieva et al. 2008], inwhich the implementation is based on.

2.1 SyntaxWe consider a set of defined predicate symbols, representing thenames of database relations, to build atoms, denoted by A, and non-defined (built-in) predicate symbols, including at least the equalitypredicate symbol ≈, to build constraints, denoted by C. We willalso assume the existence of a set of constant and operator symbols,and a set of variables to build terms, denoted by t.

The constraints we consider belong to a generic system C =〈LC ,`C〉, where LC is the constraint language and `C is a binaryentailment relation. Γ `C C denotes that the constraint C is in-ferred in the constraint system C from the set of constraints Γ. Someminimal conditions are imposed on C to be a constraint system (see[Leach et al. 2001] for details). In particular, C is required to con-tain> (true) and⊥ (false), and to deal with∧,¬, and the existentialquantifier ∃; the constraint system has the responsibility of check-ing the satisfiability of answers in the constraint domain.

We say that a constraint C is C-satisfiable if ∅ `C ∃C, where ∃Cstands for the existential closure of C. C and C′ are C-equivalentif C `C C′ and C′ `C C.

The well-formed formulas in HH¬(C) can be classified intoclauses D (defining database relations) and goals (or queries) G.They are recursively defined by the following rules:

D ::= A | G⇒ A |D1 ∧D2 | ∀xDG ::= A |¬A | C | G1 ∧G2 | G1 ∨G2 |D ⇒ G | C ⇒ G

| ∃xG | ∀xGThe programs, denoted by ∆, are sets of clauses and rep-

resent databases. Any ∆ can always be given as an equivalentset, elab(∆), of implicative clauses with atomic heads in theway we precise now. The elaboration of a program ∆ is the setelab(∆) =

SD∈∆ elab(D), where elab(D) is defined by:

elab(A) = {> ⇒ A},

elab(D1 ∧D2) = elab(D1) ∪ elab(D2),elab(G⇒ A) = {G⇒ A},elab(∀xD) = {∀xD′ |D′ ∈ elab(D)}.

2.2 StratificationThe notion of stratification is used as a syntactical criterion todetermine if a query to a database can be potentially be computedin a finite number of steps. The idea is that when ¬A is going tobe proved, the stratum of A has been previously saturated (all theanswers for A are available) and ¬A can be correctly computed.A stratification for a database is based on the construction of adependency graph for a set of formulas [Zaniolo et al. 1997].

The nodes of the graph are the defined predicate symbols ofthe set. An implication of the form F1 ⇒ F2 produces edgesand/or paths in the graph from the defined predicate symbols insideF1 to each defined predicate symbol inside F2. An edge will benegatively labeled when the corresponding atom occurs negated onthe left side of the implication. Notice that in HH¬(C) implicationsmay occur not only between the head and the body of a clause, butalso inside the goals, and therefore in the body of any clause. Sinceconstraints do not include defined predicate symbols, they do notproduce dependencies.

Those two kinds of edges are sufficient to guarantee the con-sistency of the following theory. However, in the implementation,an additional case of producing a negatively labeled edge will beconsidered. This new case will be explained in Section 6, after mo-tivating it in Section 5.4.

DEFINITION 1. Given a set of formulas Φ, its corresponding de-pendency graph DGΦ, and two predicates p and q, we say that

• q depends on p if there is a path from p to q in DGΦ.• q negatively depends on p if there is a path from p to q in DGΦ

with at least one negatively labeled edge.

Let P = {p1, . . . , pn} be the set of defined predicate symbolsof Φ. A stratification of Φ is a mapping s : P → {1, . . . , n}, suchthat s(p) ≤ s(q) if q depends on p, and s(p) < s(q) if q negativelydepends on p. Φ is stratifiable if there is a stratification for it.

The stratum of a formula F , denoted by str(F ), is the maximums(p), where p is in the set of predicate symbols occurring in F .

Figure 1 shows the dependency graph for the bank database ofthe introduction. Negative edges are labelled with ¬.

2.3 Stratified Interpretations and Forcing RelationLet W be the set of stratifiable databases ∆, with respect to thesame fixed stratification s, At be the set of open atoms, and SLCbe the set of C-satisfiable constraints modulo C-equivalence. In-terpretations are classified on strata and each interpretation givesinformation up to its corresponding stratum.

DEFINITION 2. Let i ≥ 1, an interpretation I over the stratum i isa function I :W → P(At×SLC), such that for any ∆ ∈ W , andany j > i, [I(∆)]j = ∅, where

[I(∆)]i = {(A,C) ∈ I(∆) | str(A) = i}.We denote by Ii the set of interpretations over i.

For every i ≥ 1, an order on Ii can be defined.

DEFINITION 3. Let i ≥ 1 and I1, I2 ∈ Ii, I1 is less or equalthan I2 at stratum i, denoted by I1 vi I2, if for each ∆ ∈ W thefollowing conditions are satisfied:

• [I1(∆)]j = [I2(∆)]j , for every 1 ≤ j < i.• [I1(∆)]i ⊆ [I2(∆)]i.

118

newMortgage

client

debtor

interestRatemortgageQuote

hasMortgage

personalCredit pastDue

getMortgage

Figure 1. Dependency Graph for Example 1

For every i ≥ 1, every chain of interpretations of (Ii,vi),{In}n≥0, such that I0 vi I1 vi I2 vi . . . , has a least upperbound

Fn≥0 In, which can be defined as:

(Gn≥0

In)(∆) =[n≥0

{In(∆)},

for any ∆ ∈ W .The following definition formalizes what means that a query G

is true for an interpretation I in the context of a database ∆, if theconstraint C is satisfied.

DEFINITION 4. Let i ≥ 1. The forcing relation �� between pairsI,∆ and pairs (G,C) (where I ∈ Ii, str(G) ≤ i, and C is C-satisfiable) is recursively defined by the rules below.

• I,∆ �� (C′, C) ⇐⇒ C `C C′.• I,∆ �� (A,C) ⇐⇒ (A,C) ∈ I(∆).• I,∆ �� (¬A,C) ⇐⇒ for every (A,C′) ∈ I(∆), C `C ¬C′

holds. If there is no pair of the form (A,C′) in I(∆), thenC ≡ >.

• I,∆ �� (G1∧G2, C) ⇐⇒ for each i ∈ {1, 2}, I,∆ �� (Gi, C).• I,∆ �� (G1∨G2, C) ⇐⇒ for some i ∈ {1, 2} I,∆ �� (Gi, C).• I,∆ �� (D ⇒ G,C) ⇐⇒ I,∆ ∪ {D} �� (G,C).• I,∆ �� (C′ ⇒ G,C) ⇐⇒ I,∆ �� (G,C ∧ C′).• I,∆ �� (∃xG,C) ⇐⇒

there is C′ such that I,∆ �� (G[y/x], C′), where y does notoccur free in ∆, ∃xG, C, and C `C ∃yC′.

• I,∆ �� (∀xG,C) ⇐⇒ I,∆ �� (G[y/x], C), y does notoccur free in ∆, ∀xG, C.

When I,∆ �� (G,C), it is said that (G,C) is forced by I,∆.

2.4 Fixpoint SemanticsThe notion of truth at each stratum is given by means of thefixpoint of a continuous operator (for every stratum) transforminginterpretations.

DEFINITION 5. Let i ≥ 1 represent a stratum. The operator Ti :Ii −→ Ii transforms interpretations over i as follows. Let I ∈ Ii,∆ ∈ W , and (A,C) ∈ At×SLC , then (A,C) ∈ Ti(I)(∆) when:

• (A,C) ∈ [I(∆)]j for some j < i or• s(A) = i and there is a variant ∀x(G ⇒ A′) of a clause inelab(∆), such that the variables x do not occur free in A, andI,∆ �� (∃x(A ≈ A′ ∧G), C).

The operator T1 has a least fixpoint fix1 =F

n≥0 Tn1 (I⊥),

where the interpretation I⊥ represents the constant function ∅.

Proceeding successively on the same way, a chain:

fixi−1 vi Ti(fixi−1) vi Ti(Ti(fixi−1)) vi . . .

. . . vi Tni (fixi−1) vi . . .

can be defined for any stratum i > 1, and a fixpoint of it,

fixi =Gn≥0

Tni (fixi−1),

can be found. In particular, if k is the maximum stratum in ∆,we simplify fixk writing fix. Then, fix(∆) represents the pairs(A,C) such that A can be deduced from ∆ if C is satisfied.

3. System DescriptionIn this section, we briefly introduce a user-oriented description ofthe system and the computation stages of the implementation.

The system incorporates the predefined data types bool (withtrue and false as elements) and real, an infinite data type,whose real numeric range is system-dependent. As well, the user isable to define new enumerated data types. A data type declarationis written as:

domain(data type, [constant 1, ..., constant n]).

Intervals for integers are allowed in data type declarations, as in:

domain(months, 1..12).

An n-arity predicate type declaration is written as:

type(predicate(type 1, ..., type n)).

For instance, type(client(client dt, real)) is a type decla-ration, where client dt can be defined as:

domain(client dt, [smith, brown, mcandrew]).

The syntax for clauses is essentially as introduced in examplesof Section 1, except for constraints, for which we use the syntaxconstr(Dom,C), denoting a constraint C ranging over the domainDom.

When, in the context of a database ∆, a user query Q is posedat the system prompt, it is translated into a clause D ≡ A :- Q,where A is an atom whose predicate symbol is query and whosearguments are the free variables in Q (they are implicitly existen-tially quantified in Q and universally quantified in D). In addition,the types for query are inferred and provided as the type declara-tion type(query(Types)).

Solving this query entails to add D to the current database ∆, i.e.,to consider ∆′ = ∆ ∪ {D} for the following computation stages:1) Check and infer predicate types; 2) Build the dependency graphof ∆′; 3) Compute a stratification for ∆′ if there is any. If it is notstratifiable the system throws an error message an stops; 4) If the

119

previous step success, compute fix(∆′). The answer constraint tothe query Q is the constraint C such that (A,C)∈ fix(∆′).

Next, we describe the different components of the implementa-tion in detail.

4. Implementing Constraint SolvingThis section focuses on the implementation of constraint solvingfor the following particular constraint systems: Real numbers, inte-gers, Boolean and user-defined enumerated types. Firstly, we com-ment on the type system needed to identify the types of variableswhich are used to send a constraint to its corresponding solver.Then, the constraint systems are described, including their prede-fined data values, functions and operators. Finally, we show theimplementation of the constraint solvers, which makes use of SWI-Prolog [Wielemaker 2009] underlying constraint solvers.

4.1 TypesWe have implemented a type checking and inferrer system forHH¬(C) programs which is able to detect type inconsistencies andlack of type declarations, and to infer types for user queries. Typesare locally annotated for each predicate symbol. A type annotationconsists of storing the type of a variable in an attribute of thisvariable (cf. attributed variables [Holzbaur 1990]). A type is knownin the context of a set of clauses: either a) an atom provides its type(i.e., because of its corresponding predicate type), or b) a constraintconstr(Dom,C) provides its type. A type-conflict exception israised when different types are tried to be assigned to the samevariable. A lack-of-type-declaration exception is raised when notype is assigned to a variable.

4.2 Constraint SystemsAs introduced, a constraint system provides a constraint languagefor expressing constraints and an entailment relation for ensuringsatisfiability of constraints (this relation will be covered in the nextsubsection). Our constraint systems include the concrete syntaxfor the required values, symbols, connectives, and quantifiers asfollows: “true”, “false”, “=”, “,”, “not” and “ex(X,C)” whichrepresent, respectively, >, ⊥, ≈, ∧, ¬ and ∃X C. In addition, wealso include “;” for ∨ and “/=” for the negation of ≈.

We have proposed three constraint systems for the schemeHH¬(C): Boolean, Reals, and Finite Domains. The first one con-sists of just the required components plus the universal quantifier.The constraint system Reals includes the type real (infinite set ofreal numeric values) and real constraint operators (+, -, *, . . .) andfunctions (abs, sin, exp, min, . . .).

Finite Domains represent a family of specific constraint systemsranging over denumerable sets. Enumerated types are includedas well as (finite) integer numeric types. Whereas the constraintsystems Boolean and Reals have attached predefined types, FiniteDomains do not. This system also includes comparison operators(>, >=, . . .), universally quantified constraints (fa(X,C), as above),and the domain constraint X in Range, where Range is a subsetof data values built with V1..V2, which denotes the set of values inthe closed interval between V1 and V2, and R1\/R2, which denotesthe union of ranges. A finite domain may also include constraintoperators (as +, -, . . .) and constraint functions (as abs, min, . . .).Note that relevant primitive functions for each system should beclear from their intended semantics (+ might not be relevant forBooleans, although it can be used). We allow to use the samesymbols to build constraints in different systems; for instance, bothconstr(real, X>Y) and constr(month, X>Y) make sense intheir respective constraint systems.

4.3 Constraint SolversWe have considered the entailment relation of the classical logicfor every constraint system. This entailment satisfies the minimalcondition imposed to constraint systems. For implementing thisrelation, we provide a constraint solver with a generic interfacesolve(C,SC) for C `C SC, intended to solve a constraint C, checkits satisfiability and produce a solved form SC. A solved form SCcorresponding to a constraint C is a simplified, more readable formof the constraint wrt. C. A solved form can be a disjunction of sim-ple constraints, where a simple constraint does neither include dis-junctions nor quantifications, nor negations. This generic interfaceis implemented as follows:

solve(C,SC) :-partition_ctr(C,DCs),solve_ctr_list(DCs,SDCs),ctr_list_to_ctr(SDCs,CC),simplify_ctr(CC,SC).

Its first call partitions the input constraint into a list whosecomponents belong to different constraint domains. The next callposts each component to its corresponding solve as a call to thepredicate solveFD (described later). After, the solved constraintrepresented as a list is transformed back into a constraint datastructure. Finally, this constraint is simplified by logical axioms asDe Morgan’s laws.

In addition to the generic interface, the particular interfacesolve(Dom,C,SC) is also provided, which is useful when thedomain Dom is already known and can be directly posted to itscorresponding solver.

Next, we describe our implementation of the constraint solversfor the constraint systems we propose as practical instances ofHH¬(C).

We rely on the underlying constraint solvers already availablein SWI-Prolog [Wielemaker 2009] for implementing the constraintsystems Finite Domains, Boolean and Reals. For certain con-straints, we are able to map them to constraints in the underlyingSWI-Prolog finite domain solver because we map data values tointegers. Before posting to this solver, a constraint is rewritten withthe mapped integer values and, after solving, the solved constraintis rewritten back with the corresponding enumerated values. On theother hand, there are constraints that the underlying solvers cannotdirectly handle (quantifiers and disjunctions) which we explicitlyhandle as will be shown later. Since SWI- Prolog does not provide aBoolean solver, we resort to the finite domain constraint solver forsolving Boolean constraints, and provide the predefined constraintsystem bool which is handled as any other enumerated constraintsystem.

For the solvers of the constraint systems Finite Domains andBoolean, the following predicates are available:

• solveFD(+Domain,+Constraint,-SolvedConstraint)It solves the input Constraint over Domain and returns itssolved form SolvedConstraint associated to Domain , if it issatisfiable.

• constr conjFD(+Domain,-C1,+C2,+C)It is read as “C1,C2 = C”, and computes the component C1 ofthe conjunction C under the given domain.

Since we consider classical logic for these particular constraintsystems, the following implementation for the second predicate issound:

constr_conjFD(Domain,C1,C2,C) :-solveFD(Domain,(not(C2);C),C1),solveFD(Domain,(C1,C2),SC).

120

Whilst the second line is intended to compute C1 under theassumption of success, the following lines check that the computedconstraint is satisfiable.

The code excerpt of Figure 2 implements the required be-haviour:

Note that line (05) is intended to replace quantified variablesby fresh ones in order to avoid a name clash. Line (07) mapsdomain data values with integers, whereas line (16) replaces backthe (integer) computed data values by the corresponding, mappeddata values. The core of constraint solving lays between lines(09)-(11), where, first, the constraint is tried to be solved (seenext paragraph describing the predicate solveFD ctr). Second, itis checked for satisfiability, that is, trying to find a single, concretesolution via labeling. And, third, the underlying constraint store isprojected with respect to the relevant variables (i.e., those occurringin the input constraint plus the possible new ones computed by theunderlying solver). Lines (13)-(15) are simply intended for datastructure formatting.

Next, we describe the predicate:

solveFD ctr(+Constraint,-Satisfiable),

which receives a constraint and returns whether it is satisfiable ornot. The first case of this predicate corresponds to a constraintsupported by the constraint solver of SWI-Prolog (where #> isthe finite domain constraint comparison operator provided by thissolver):

solveFD_ctr(X>Y,true) :-!,X#>Y.

Negation is, as shown below, explicitly handled because it canapply to unsupported constraints. The predicate

complement(+Constraint,-ComplementedConstraint)

computes the complemented constraint before solving it.

solveFD_ctr(not(C),B) :-!,complement(C,NotC),solveFD_ctr(NotC,B).

An example of handling unsupported constraints is disjunction,which is computed by collecting all answers (cf. line (08)). Solv-ing this constraint is as follows:

solveFD_ctr((C1;_C2),true) :-solveFD_ctr(C1,true).

solveFD_ctr((_C1;C2),true) :-solveFD_ctr(C2,true).

Finally, we describe quantifiers. Firstly, the existential quan-tifier is implemented as follows, where in the last but one linesatisfiable(FC,true) tries to find a concrete value satisfyingFC:

solveFD_ctr(ex(X,C),B) :-!,% Replace X by a fresh variable _FX in C:swap(X,_FX,C,FC),get_current_domain(DN),constrain_domains(FC,DN),% Solving:(solveFD_ctr(FC,true),% Checking satisfiability:satisfiable(FC,true),B=true ; B=false).

The universal quantifier is solved by imposing a conjunctiveconstraint C for all the values of X in the solving domain (cf. thecall to solve forall):

solveFD_ctr(fa(X,C),B) :-!,get_current_domain(Domain),domain_bounds(Domain,L,U),(solve_forall(X,C,L,U) ->B=true;B=false).

The constraint solver for Reals follows a similar but simplerroute for its implementation since there are neither universal quan-tifiers, nor domain data values to map.

5. Implementing the Fixpoint SemanticsIn this section, we present the implementation of the core sys-tem, which is independent from the concrete constrain systems ex-plained in the previous section.

5.1 Fixpoint by StrataFor the fixpoint computation we assume a stratified database ∆,i.e., a partition st1, . . . , stk over the predicate symbols defined init (the stratification algorithm will be explained in Section 6). Aclause of the form A :- G is interpreted as ∀X1, . . . , Xn(G⇒ A),being X1, . . . , Xn the free variables of (A, G), and is encoded asthe Prolog term

rule(St,Vars,A,G)

where St = str(A) and Vars= [X1, . . . , Xn].The fixpoint is computed stratum by stratum (although a stratum

may require the computation of the fixpoint for a previous stratumwhen the program is enlarged due to implications as we will see inSection 5.4). The predicate

fixPointStrat(+Delta, +St, -Fix)

computes Fix = fixSt(Delta). Then, if Delta represents adatabase such that St = str(Delta) = k, this predicate givesfixk(Delta), computing previous fixpoints from St = 0 toSt = k.

fixPointStrat(_Delta,0,[]) :- !.

fixPointStrat(Delta,St,FixSt) :- St1 is St-1,fixPointStrat(Delta,St1,FixSt1),iterT(Delta,St,FixSt1,FixSt).

Each fixpoint is evaluated by iterating the fixpoint operator asfollows:

iterT(Delta,St,I,FixSt) :-opT(Delta,Delta,St,I,TI),(I==TI,!, FixSt=I

;iterT(Delta,St,TI,FixSt) ).

I represents the current computed interpretation and FixSt willbe the fixpoint for the stratum under consideration. The operator isiterated until no more information can be added to the interpretation(I==TI), i.e., we have reached the fixpoint for the stratum St. Thepredicate opT is detailed below.

121

(00) solveFD(DN,C,SC) :-(01) set_current_domain(DN), % A flag storing the current domain(02) copy_term(C,FC), % Input variables keep untouched(03) get_vars(C,Vars), % Input variables are held to be(04) get_vars(FC,FVars), % mapped to the solved new vars(05) swap_qvars_by_fvars(FC,QFC), % Replace quantified vars by fresh ones(06) constrain_domains(QFC,DN), % Constrain variables to the current domain(07) domain_to_int(QFC,DN,IC), % Domain mapping from enumerated to integer(08) bagof((FVars,Cs,Sat), % (Fresh vars,Constraints,Satisfiable)(09) (solveFD_ctr(IC,true), % Solving(10) satisfiable(IC,Sat), % Check satisfiability(11) project_ctrs(FVars,Vars,Cs) % Project constraints wrt. input vars(12) ), LFVarsCsS), ! % List of (Fresh vars,Constraints,Satisfiable)(13) filter_ctr_list(LFVarsCsS,LICs), % Pick solved constraints(14) simplify_disj_list(LICs,SLICs), % Simplify the disjunctive list(15) disj_list_to_ctr(SLICs,ISC), % Convert list to constraint(16) int_to_domain(ISC,DN,SC). % Map domain from integer to enumerated

Figure 2. The Predicate solveFD for solving Finite Domain Constraints

5.2 Fixpoint OperatorThe predicate opT corresponds to the application of the operator Ti

(for some stratum i) to a given interpretation. Following Definition5, the predicate

opT(+Rules,+Delta,+St,+I,-TI)

receives in I the set of pairs of Tni (fixi−1)(Delta) for some n ≥

0, the stratum i = St and computes TI = Tn+1i (fixi−1)(Delta).

The call to opT from iterT has the form

opT(Delta,Delta,St,I,TI)

taking Delta twice because it uses each clause of Delta separately,but the forcing relation will need the full database Delta. Thisoperator only uses the clauses of the current stratum St (secondclause) and skips the rest (last clause).

opT([],_Delta,_St,I,I).

opT([rule(St,Vars,A,G)|Rs],Delta,St,I,TI) :-!,rename(Vars,(A,G),Vars1,(A1,G1)),flatHead(A1,A2,Cs),buildExists(Vars1,(Cs,G1),G2),(force(Delta,I,G2,C), !,addItemLst([(A2,C)],I,I1)

;I1=I ),

opT(Rs,Delta,St,I1,TI).

opT([_|Rs],Delta,St,I,I1) :-opT(Rs,Delta,St,I,I1).

The second clause performs some initial transformations on therule rule(St,Vars,A,G): the predicates rename, flatHead andbuildExists build the goal to be forced

G2 = ∃ Vars1 (G1 ∧ A1 ≈ A2),

being ∀ Vars1 (G1 ⇒ A1) a variant of rule(St,Vars,A,G).Then it tries to force the obtained goal G2 using Delta and thecurrent interpretation I. If it succeeds, we obtain the associatedconstraint C and we add the pair (A2,C) to such an interpretation.Finally, opT performs the same operation on the rest of rules Rs.

5.3 Forcing RelationWe implement the forcing relation �� of Definition 4 by means ofthe predicate

force(+Delta,+I,+G,-C).

Given I = Tni (fixi−1)(Delta) for some n ≥ 0 and a fixed stra-

tum i > 0, force is successful if Tni (fixi−1), Delta �� (G, C).

An important point to understand the implementation is to keepin mind the deterministic nature of this predicate. The definitionof �� establishes conditions on a constraint C in order to satisfyI, Delta �� (G,C), but the predicate force must build a concreteconstraint C. In addition, each possible answer constraint for a goalmust be captured in a single answer constraint (possibly) using dis-junctions. There is a clause of force for each goal structure. Weexplain them shortly, except for the case of implication, that will bestudied in the next subsection:

force(_Delta,_I,constr(Dom,C),C1) :-!, solve(Dom,C,C1).

force(Delta,I,(G1,G2),C) :-!, force(Delta,I,G1,C1),force(Delta,I,G2,C2),solve((C1,C2),C).

force(Delta,I,(G1;G2),C) :- !,( force(Delta,I,G1,C1), !,

( force(Delta,I,G2,C2), !,solve((C1;C2),C)

;solve(C1,C) )

;force(Delta,I,G2,C2),solve(C2,C) ).

force(Delta,I,(constr(Dom,C)=>G),C2) :-!, force(Delta,I,G,C1),constr_conj(Dom,C2,C,C1).

force(Delta,I,ex(X,G),C) :-!, replace(X,X1,G,G1),force(Delta,I,G1,C1),solve(ex(X1,C1),C).

122

force(Delta,I,fa(X,G),C) :-!, replace(X,X1,G,G1),force(Delta,I,G1,C1),solve(fa(X1,C1),C).

force(_Delta,I,not(At),C) :-!, lookUpAll(At,I,Ls),( Ls==[], !, C=true;buildNegConj(Ls,NLs),solve(NLs,C) ).

force(_Delta,I,At,C) :-!, lookUpAll(At,I,Cs),buildDisj(Cs,C1),solve(C1,C).

The first clause stands for the forcing of a constraint C within adomain Dom, that is processed by calling the constraint solver. Thesecond stands for a conjunction G1,G2; it forces both goals, andthen solves the conjunction of the resulting answer constraints.For a disjunction G1;G2 (third clause) there are four possible (andexclusive) situations: both goals can be forced, only G1, only G2,or neither of two; the answer constraint is obtained by solving thecorresponding constraints or failing in the last case. The fourthclause of force corresponds to an implication with a constraintas antecedent; in this case the predicate constr conj obtains aconstraint C2 such that if I forces (G,C1) then the conjunctionC2,C is equivalent to C1.

For the universal quantifier, according to the Definition 4, tofind C such that I, Delta �� (∀X G, C), we obtain G1 as the re-sult of replacing X by a new variable X1 in G; then we proveI, Delta �� (G1, C1) and finally C is obtained by solving ∀X1 C1.For the existential quantifier, according to the Definition 4, we findC such that there is C’ satisfying I, Delta �� (G[X1/X], C′) andC `C ∃X1 C′. Then we can use C as the solved form of ∃X1 C′in the implementation.

For negated atoms not(At), thanks to the stratification we canensure that every possible atom At deducible from the databaseis already present in the current interpretation I. Then, by meansof lookUpAll(At,I,Ls) we find the list Ls=[C1,...,Cn] suchthat (At,Ci)∈I. The variable NLs is used to build the constraint¬C1∧...∧¬Cn (or true if Ls=[]), that we must solve to obtainthe constraint C we are looking for.

The last (default) case is the forcing of an atom At. As before,we search for all the pairs (At,C1),...,(At,Cn)∈I and then webuild the disjunction C1∨...∨Cn and solve it with solve.

5.4 The Case of D => G in the Forcing RelationImplementing force(Delta,I,(D=>G),C) requires some specialtreatment. In this case, according with the definition of the rela-tion �� (see Definition 4), Delta is augmented with the clause D.Remains that the current set I has been computed in accordancewith the database Delta, in such a way that if i and n are, respec-tively, the stratum and iteration under construction, (A,C) ∈ I⇔(A,C) ∈ Tn

i (I ′)(Delta), where I ′ is the fixpoint for the stra-tum i− 1, built from Delta. According to the theory, the next stepwill be to prove Tn

i (I ′), Delta∪{D} �� (G, C). But the question ishow to compute Tn

i (I ′)(Delta ∪ {D}). Notice that I is not usefulhere. First, because I(∆) ⊆ I(∆ ∪ {D}) does not hold for ev-ery I,∆, D. Second, because I has been built considering alwaysDelta, in particular the fixpoint I ′ has been computed for Delta,then it represents fixi−1(Delta). So nothing is known about theneeded set Tn

i (I ′)(Delta ∪ {D}).

What it is happening is that the definition of the fixpoint opera-tor Ti is not constructive for the case of implication due to the in-crease of the set of clauses. To solve this obstacle, we have adopteda conservative position: to compute locally the fixpoint of the stra-tum j for Delta ∪ {D}, where j is the stratum of G, that meansfixj(Delta∪{D}), and then prove if fixj , Delta∪{D} �� (G, C).

Of course, the complexity of the algorithm is considerably aug-mented on this case. But the code keeps simple. The correspondingclause for the predicate force is as follows:

force(Delta,I,(D=>G),C) :-!,elab(D,De),localRules(De,Ls),getStrat(G,StG),addLocalRules(Ls,Delta,Delta1),fixPointStrat(Delta1,StG,Fix),force(Delta1,Fix,G,C).

Calling to elab(D,De), localRules(De,Ls), getStrat(G,StG)and addLocalRules(Ls,Delta,Delta1), the elaboration of theset of clauses Delta ∪ {D}, is produced giving the correspondingset Delta1 in the used format. The execution of

fixPointStrat(Delta1, StG, Fix)

finds Fix = fixj(Delta1), where j = StG is the stratum of G,the consequent of the initial goal D => G. Once Fix is computed,it is needed to force G with it and the augmented set Delta1. Thiscorresponds to prove

force(Delta1, Fix, G, C),

that implies Tni (I ′), Delta∪{D} �� (G, C), as we wanted to prove.

This solution causes the following problem. Consider a clausein Delta of the form A :- D => G, such that i = str(A) andj = str(G); from Definition 1, j ≤ i can be deduced. During thecomputation of fixi(Delta), the predicate opT takes this clauseinto account, in order to look for a pair (A,C) to be added to thecurrent I. Then

force(Delta, I, (D => G), C)

is executed which calls to

fixPointStrat(Delta1, j, Fix),

where Delta1 = Delta ∪ {D} (except elaboration and variablerenaming). If j = i, that means to build fixi(Delta1), so theclause A :- D => G will be tried again, because the stratum ofA is i. This gives rise to a non-terminating loop, since Delta1is augmented with the elaboration of D once more, and so on.However, if j < i, Fix = fixj(Delta1) can be correctly built.This is the reason why, in the construction of dependency graphs,a new kind of negatively labeled edges has been incorporated, thatensures str(G) < str(A) in these cases. The details are explainedin the following section.

6. Implementing the Dependency GraphIn [Nieva et al. 2006], we defined an algorithm to compute the de-pendency graph of any set of HH¬(C) formulas. The main ideasand definitions are introduced in Section 2.2. Due to the problem in-troduced by nested implications, that we have exposed previously,a stronger definition of stratifiable database has been adopted in thecurrent implementation. Now, these implications will introduce ad-ditional negative dependencies in the dependency graph. More pre-cisely, if G⇒ A is a clause, such that G contains a subgoal of theform D ⇒ G′, this nested implication produces negatively labelededges from the definite predicate symbols of G′ to the predicatesymbol of A.

123

• dpClause(A) =<∅, {pA}, ∅>• dpClause(D1 ∧D2) =<E1 ∪ E2, N1 ∪N2, I1 ∪ I2>

if dpClause(D1)=<E1, N1, I1> and dpClause(D2)=<E2, N2, I2>

• dpClause(∀x D) = dpClause(D)

• dpClause(G⇒ A) =

<EG ∪S

n∈(NG\IG){n→ pA} ∪S¬n∈NG

{n ¬→ pA} ∪S

n∈IG{n¬→ pA}, {pA}, IG>

if dpGoal(G) =<EG, NG, IG>

• dpGoal(A) =<∅, {pA}, ∅>• dpGoal(¬A) =<∅, {¬pA}, ∅>• dpGoal(C) =<∅, ∅, ∅>• dpGoal(C ⇒ G) = dpGoal(G)

• dpGoal(G1 ∧G2) = dpGoal(G1 ∨G2) =<E1 ∪ E2, N1 ∪N2, I1 ∪ I2>if dpGoal(G1)=<E1, N1, I1> and dpGoal(G2)=<E1, N1, I1>

• dpGoal(∀x G) = dpGoal(∃x G) = dpGoal(G)

• dpGoal(D⇒G) =<ED∪EG∪S

m∈NG(S

n∈ND{n→m} ∪S

¬n∈ND{n ¬→m}), ND∪NG, NG>

if dpClause(D) =<ED, ND, ID> and dpGoal(G) =<EG, NG, IG>

Notation: pA stands for the predicate symbol of the atom A

Figure 3. Dependency Graph for Clauses and Goals

The algorithm for calculating the dependency graph is ex-pressed by means of the mutually recursive functions dpClauseand dpGoal defined in Figure 3, depending on the structure of theformula. Both they return a triple <E,N, I>, where E is a set ofedges of the form p → q or p ¬→ q, N and I are auxiliary setsof link-nodes. N is used to store information about the positive-negative predicates, and I stores the predicates involved in nestedimplications. Using the function dpClause it is straightforward tocalculate the dependency graph of a set of clauses as the union ofthe edges obtained for each element of the set. The dependencygraph is used to define the stratification in HH¬(C), that is a syn-tactic condition for ensuring finiteness in the computations withnegated atoms.

EXAMPLE 2. Consider the clause:D ≡ ∀x(G⇒ p(x)), whereG ≡ ∃y(q(x, y)⇒ (r(x) ∧ s(y))) ∧ ¬t(x). ThendpGoal(G) =

<{q → r, q → s}, {q, r, s,¬t}, {r, s}>,dpClause(D) =

<{q → r, q → s, q → p, r¬→ p, s

¬→ p, t¬→ p}, {p}, {r, s}>.

The first component of the tuple dpClause(D) is the dependencygraph associated to D. A database with just this clause is stratifi-able, but if the clause:

D′ ≡ ∀x∀y(p(x)⇒ q(x, y))

is also present, the database becomes non stratifiable. �

The concrete algorithm for finding a stratification for ∆ (or forchecking that it is not stratifiable) associates to each predicate sym-bol p an integer variable Xp ∈ [1..N ], where N is the number ofpredicate symbols of ∆, and generates an inequation system: eachdependency p → q produces Xp ≤ Xq and p

¬→ q producesXp < Xq . Then, solving this system (if possible) provides thestratum of each p in Xp. The stratification algorithm ends with aconcrete stratification if there exists one or stops with an error mes-

sage (in a polynomial time with respect to the number of predicatesymbols in the database).

A stratification for the clause D of Example 2 will collect all thepredicates in the stratum 1 except p, which will be in the stratum 2.In particular Xq < Xp. Intuitively, this means that for evaluatingp, the rest of predicates should be evaluated before, in particularq, that takes part of a nested implication. If the previous clause D′

is considered, we would also have Xq ≥ Xp and the inequationsystem does not have any solution.

The new negative dependencies introduced in the graph due tonested implications restrict the class of stratifiable programs, i.e.,the syntax of our programs. Nevertheless, in practice this restrictiondoes not means a loss of expressivity in the language, that is muchmore powerful than relational algebra or Datalog.

In the next section, we show (in Figure 4) the whole dependencygraph associated to the bank database plus the queries of Example1. This set is stratifiable. Notice that the edge interestRate

¬→query4 is due to the first nested implication inside the clausedefining query4:

query4(R) :- fa(N,ex(S,ex(B,(client(N,B,S) =>constr(real,B>2000) => interestRate(N,R))))).

This implication produces also client → interestRate andclient→ query4. So, by transitivity, query4 negatively dependson interestRate, but it also negatively depends on client, be-cause interestRate depends on client.

7. A System SessionNext, we show the result of executing our system for the databaseand queries ∆ that we have shown in Example 1. In this example,the following enumerated domain and types are declared:

domain(client_dt,[smith,brown,mcandrew]).

124

newMortgage

client

debtor

interestRatemortgageQuote

hasMortgage

pastDue

getMortgage

personalCredit

query3

query1

query5

query2

query4

Figure 4. Dependency Graph for Example 1 with some queries.

type(client(client_dt,real,real)).type(pastDue(client_dt,real)).type(mortgageQuote(client_dt,real)).type(hasMortgageQuote(client_dt)).type(debtor(client_dt)).type(interestRate(client_dt,real)).type(newMortgage(client_dt,real)).type(getMortgage(client_dt)).type(personalCredit(client_dt,real)).

The following clauses corresponding to a number of queries areadded to the bank database. They are shown along with their types,which are inferred in the context of the above declarations.

type(query1).query1 :- fa(N,debtor(N)).

type(query2(client_dt,real,real)).query2(N,S,Q) :-ex(B,client(N,B,S),mortgageQuote(N,Q),constr(real,Q>=100)).

type(query3).query3 :-ex(N,ex(A,(debtor(N),pastDue(N,A),constr(real,A>1000)))).

type(query4(real)).query4(R) :-fa(N,ex(S,ex(B,(client(N,B,S) =>

constr(real,B>2000) =>interestRate(N,R))))).

type(query5(client_dt,real)).query5(N,A) :-newMortgage(N,400), not(personalCredit(N,A)).

The dependency graph calculated for the current set of clausesis shown in Figure 4 (we use dashed lines for dependencies intro-duced by the queries).

From this graph, the stratification algorithm associates:

• Stratum 1 to client, pastDue, mortgageQuote, debtor,interestRate, hasMortgage, query1, query2 and query3.

• Stratum 2 to newMortgage,getMortgage, and query4.• Stratum 3 to personalCredit.• Stratum 4 to query5.

Since ∆ is stratifiable, the computation of

fixPointStrat(∆,4, Fix)

begins calculating fixi(∆), stratum by stratum from i = 1 to 4, inorder to obtain Fix = fix4(∆).

1. Computation of fix1(∆).The first iteration of T1 over the empty set, that correspondsto the execution of opT(∆,∆,1, [], TI), obtains in TI the pairsassociated to the extensional database:

(client(smith,2000,1200), true),(client(brown,1000,1500), true),(client(mcandrew,5300,3000), true)(pastDue(smith,3000), true),(pastDue(mcandrew,100, true),(mortgageQuote(brown,400), true),(mortgageQuote(mcandrew,100), true)

The fixpoint computation of this first stratum requires one moreiteration of T1. After this, the following pairs are added:

(debtor(X), X=smith),(interestRate(smith, 2), true),(interestRate(X,Y),

((X=brown, Y=5);(X=mcandrew, Y=5))),

(query2(X,Y,Z),((Y=400, Z=1500, X=brown);(Y=100, Z=3000, X=mcandrew))),

(query3, true),(hasMortgage(X), (X=brown;X=mcandrew))

Note that no pair due to query1 is added at this stage sincethe universally quantified constraint in this clause amounts to aconjunctive constraint over the domain of debtor, i.e., impos-ing that all the clients in client dt are debtors, which is notthe case.

2. Computation of fix2(∆).Determining whether a pair (query4(X),C) can be added tothe current set of pairs gives to locally recalculate fix1, but thistime for ∆ ∪ {client(N, B, S)}.To obtain fix2(∆), in the first iteration and after the appro-priate computations to calculate fix1(∆∪{client(N, B, S)}),the following pairs are added to fix1(∆):

125

(query4(X), X=5),(newMortgage(X,Y),

((Y=<200, X=brown);(Y=<1100, X=mcandrew)))

And, in the second iteration, the next pair is added:

(getMortgage(X), (X=brown;X=mcandrew))

3. Computation of fix3(∆).Here, a pair for the predicate personalCredit is added to theprevious fixpoint:

(personalCredit(X,Y),((Y>=6000,Y<20000, X=smith);(Y<6000, X=brown);(Y<6000, X=mcandrew)))

4. Computation of fix4(∆).The final fixpoint requires one iteration of T4 over the fixpointof the third stratum

iterT(∆,4,fix3(∆), FixSt),

obtaining the following new pair:

(query5(X,Y),(X=mcandrew, Y>=6000, Y=<20000))

This completes the result, and fix4(∆) = FixSt captures thesemantics of our database and queries.

In the example, the stratification and fixpoint have been calcu-lated for the database together with all the queries we had formu-lated. Hence they can be seen as predefined views. It is not the casethat the fixpoint should be recomputed each time a query is posed.A more reusable behaviour is also possible in many cases. For adatabase Delta, a stratification s and a fixpoint Fix= fix(Delta)can be computed and stored. If the stratification s is valid for theposed query Q, then the expected answer constraint C can be ob-tained by executing: force(Delta,Fix,Q,C).

8. Conclusions and Future WorkIn [Nieva et al. 2008] we presented a formalization of the con-straint logic programing scheme HH¬(C) as an expressive de-ductive database system that returns constraints as answer of thequeries. A semantics was developed, following stratification andfixpoint techniques, usual in the framework of deductive databasesemantics. But the underlying logic of our system embraces bothconstraints and new connectives on the goals or queries (implica-tions, negation and quantifiers). This fact enlarges expressivity andefficiency, but introduces some penalties in the implementation.

We have developed a prototype of a deductive database systemthat shows the feasibility of the fixpoint semantics as a base for anactual implementation. The core of this implementation is indepen-dent of the concrete constraint system. Several constraint systemsare implemented as instances of this scheme. In particular, we haveconsidered real numbers, integers, Booleans and user defined enu-merated types (all of these, but reals, belong to the finite domainconstraint family). They have been implemented by taking advan-tage of the underlying constraint solvers in SWI-Prolog. We haveadded types to programs so that relations become typed (as tablesin relational databases) and each constraint is mapped to its solver.

The big difficulties in the implementation of our stratified fix-point semantics consist of the adaptation of the usual techniques fornot only working with constraints but also taking into account thata database can dynamically be augmented with local clauses, whenan hypothetical query is formulated. The definition of the fixpointoperator is not constructive for the case of nested implications, thena stronger definition of dependency graph has been formulated toensure a constructive and terminating fixpoint computation.

Future work The prototype presented in this work can beenhanced to set it as a practical system. The current implementationis very close to the theory developed in our previous works and is avaluable tool for understanding such a theory, but as a consequenceit has an expected penalty in efficiency. On the one hand, we haveimplemented a naıve stratification algorithm for this first prototypethat can be easily improved. On the other hand, a more serioussource of inefficiency comes from the forcing of implication. Inthis line, well-known methods as magic set transformations [Beeriand Ramakrishnan 1991] and tabling [Tamaki and Sato 1986] couldbe worth to be adapted to the current implementation. This is alsorelated to widen the set of computable queries and programs, byadapting the ideas found in the well-founded model [Van Gelderet al. 1991], that could relax our stratification restrictions. Thiscan also be coupled with efficient solving methods [Shen et al.2002]. In addition, to use existing efficient relational technology tosolve concrete queries which do not need the more powerful (less-efficient) database engine we currently provide.

Moreover, in the field of databases, the useful constraint sys-tems are often combinations of different domains. The constraintsystems we have implemented work together, but do not cooperate.Due to the nature of the logic involved in our system, finding meth-ods for proving satisfiability of constraints in a mixed domain isa complex task, because the syntax of such constraints will allow,among other aspects, combining existential and universal quantifi-cations for variables of the considered domains. In order to de-velop a mixed solver, we will consider the existing works that com-bine concrete domains in the context of HH¬(C) [Garcıa-Dıaz andNieva 2003] and the combination of decision methods with tech-niques applied to constraint solvers. This line comes from a fruitfulresearch line in combining constraint systems to cope with prob-lems that, either cannot be handled by a domain constraint solveralone, or its solving can be significatively improved by coopera-tion of constraint solvers [Hofstedt and Pepper 2007, Castro andMonfroy 2004, Granvilliers et al. 2001].

AcknowledgmentsThis work has been partially supported by the projects STAMP(TIN2008-06622-C03-0)1, PROMESAS-CAM (S-0505/TIC/0407)and UCM-BSCH-GR58/08-910502. We are also grateful to JanWielemaker, author of SWI-Prolog, and Markus Triska, author ofthe finite domain constraint library for this system, who was verykind to support us in developing new features used in our finitedomain constraint solver.

ReferencesF. Arni, K. Ong, S. Tsur, Haixun Wang, and C. Zaniolo. The Deductive

Database System LDL++. Theory and Practice of Logic Programming,3(1):61–94, 2003.

M. Becker, C. Fournet, and A. Gordon. Design and Semantics of a Decen-tralized Authorization Language. In CSF ’07: Proceedings of the 20thIEEE Computer Security Foundations Symposium, pages 3–15, Wash-ingtonFrancesco, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2819-8. doi: http://dx.doi.org/10.1109/CSF.2007.18.

126

C. Beeri and R. Ramakrishnan. On the power of magic. Journal ofLogic Programming, 10(3-4):255–299, 1991. ISSN 0743-1066. doi:http://dx.doi.org/10.1016/0743-1066(91)90038-Q.

A. Calı, G Gottlob, and T. Lukasiewicz. Datalog±: a unified approachto ontologies and integrity constraints. In ICDT ’09: Proceedings ofthe 12th International Conference on Database Theory, pages 14–30,New York, NY, USA, 2009. ACM. ISBN 978-1-60558-423-2. doi:http://doi.acm.org/10.1145/1514894.1514897.

C. Castro and E. Monfroy. Designing hybrid cooperations with a componentlanguage for solving optimisation problems. In International Conferenceon Artificial Intelligence: Methodology, Systems and Applications 2004,volume 3192 of LNCS, pages 447–458. Springer, 2004.

R. Fikes, P. J. Hayes, and I. Horrocks. OWL-QL - a lan-guage for deductive query answering on the SemanticWeb. Journal of Web Semantics, 2(1):19–29, 2004. URLhttp://www.informatik.uni-trier.de/~ley/db/journals/ws/ws2.html#FikesHH04.

M. Garcıa-Dıaz and S. Nieva. Solving Constraints for an Instance of anExtended CLP Language over a Domain based on Real Numbers andHerbrand Terms. Journal of Functional and Logic Programming, 2003(2), September 2003.

L. Granvilliers, E. Monfroy, and F. Benhamou. Cooperative solvers inconstraint programming: a short introduction. ALP Newsletter, 14(2),2001.

P. Hofstedt and P. Pepper. Integration of declarative and con-straint programming. Theory and Practice of Logic Pro-gramming, 7(1-2):93–121, 2007. ISSN 1471-0684. doi:http://dx.doi.org/10.1017/S1471068406002833.

C. Holzbaur. Realization of forward checking in logic program-ming through extended unification. Report TR-90-11, OesterreichischesForschungsinstitut fuer. Artificial Intelligence, 1990.

M. Jarke, M. A. Jeusfeld, and C. Quix. ConceptBase V7.1 User Manual.Technical report, RWTH Aachen, April 2008.

M. S. Lam, S. Whaley, B. V. Livshits, M. C. Martin, D. Avots, M. Carbin,and C. Unkel. Context-sensitive program analysis as database queries. InChen Li, editor, Symposium on Principles of Database Systems (PODS),pages 1–12. ACM, 2005. ISBN 1-59593-062-0.

J. Leach, S. Nieva, and M. Rodrıguez-Artalejo. Constraint Logic Program-ming with Hereditary Harrop Formulas. Theory and Practice of LogicProgramming, 1(4):409–445, 2001.

N. Leone, G. Pfeifer, W. Faber, T. Eiter, G Gottlob, S. Perri, and F. Scarcello.The DLV system for knowledge representation and reasoning. ACMTransactions on Computational Logic, 7(3):499–562, 2006.

S. Nieva, F. Saenz-Perez, and J. Sanchez. Towards a constraint deductivedatabase language based on hereditary harrop formulas. In P. Lucioand F. Orejas, editors, Sextas Jornadas de Programacion y Lenguajes,PROLE, pages 171–182, 2006.

S. Nieva, F. Saenz-Perez, and J. Sanchez. Formalizing a Constraint De-ductive Database Language based on Hereditary Harrop Formulas withNegation. In FLOPS’08, Proceedings, volume 4989 of Lecture Notes inComputer Science, pages 289–304, Ise, Japan, 2008. Springer-Verlag.

G. Ramalingam and Eelco Visser, editors. Proceedings of the 2007 ACMSIGPLAN Workshop on Partial Evaluation and Semantics-based Pro-gram Manipulation, 2007, Nice, France, January 15-16, 2007, 2007.ACM. ISBN 978-1-59593-620-2.

R. Ronen and O. Shmueli. Evaluating very large datalog queries onsocial networks. In EDBT ’09: Proceedings of the 12th Interna-tional Conference on Extending Database Technology, pages 577–587,New York, NY, USA, 2009. ACM. ISBN 978-1-60558-422-5. doi:http://doi.acm.org/10.1145/1516360.1516427.

F. Saenz-Perez. Datalog Educational System. User’s Manual version 1.6.2.Technical report, Faculty of Computer Science, UCM, march 2009.Available from http://des.sourceforge.net/.

K. Sagonas, T. Swift, and D. S. Warren. XSB as an efficient deductivedatabase engine. In SIGMOD ’94: Proceedings of the 1994 ACMSIGMOD international conference on Management of data, pages 442–

453, New York, NY, USA, 1994. ACM. ISBN 0-89791-639-5. doi:http://doi.acm.org/10.1145/191839.191927.

Y. Shen, L. Yuan, and J. You. Slt-resolution for the well-founded semantics.Journal of Automated Reasoning, 28:53–97, 2002.

H. Tamaki and T. Sato. Old resolution with tabulation. In Proceedingson Third international conference on logic programming, pages 84–98,New York, NY, USA, 1986. Springer-Verlag New York, Inc. ISBN 0-387-16492-8.

A. Van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semanticsfor general logic programs. J. ACM, 38(3):619–649, 1991. ISSN 0004-5411. doi: http://doi.acm.org/10.1145/116825.116838.

J. Wielemaker. SWI-Prolog. User’s Manual version 5.6.64, 2009. Availablefrom http://www.swi-prolog.org/.

C. Zaniolo, S. Ceri, C. Faloutsos, R.T. Snodgrass, V. S. Subrahmanian,and R. Zicari. Advanced Database Systems. Pages 180–183, MorganKaufmann Publishers Inc., San Francisco, CA, USA, 1997. ISBN 1-55860-443-X.

127

PROLE 2011

Incorporating Integrity Constraints

to a Deductive Database System

Gabriel Aranda-Lopez†, Susana Nieva†, Fernando Saenz-Perez‡

and Jaime Sanchez-Hernandez†

†Dept. Sistemas Informaticos y Computacion and ‡Dept. Ingenierıa del Software e InteligenciaArtificial,

Universidad Complutense de Madrid, Spain{nieva,fernan,jaime}@sip.ucm.es, [email protected]

Abstract

Hereditary Harrop Formulas with Constraint and Negation (HH¬(C)) have been proposed as a veryexpressive constraint deductive database scheme. The theoretical foundations lay on a fixpointsemantics that is also the basis for a Prolog implementation presented in a previous work. We havedeveloped several solvers for specific constraint domains. In this paper we introduce, for the firsttime, (strong) integrity constraints in the HH¬(C) system by taking advantage of the expressivenessof our approach. Integrity constraints are used to ensure consistency of data in a database language.A (strong) integrity constraint expresses a relationship among data that every database instance isrequired to satisfy. We show how our language and the fixpoint semantics implementation lead toan easy specification of the most usual integrity constraints provided in other relational databaselanguages. In addition to the usual specifications of primary key and foreign key the system alsosupports functional dependencies.

Keywords: Deductive Databases, Constraints, Hereditary Harrop Formulas, Fixpoint Semantics,Integrity Constraints

1 Introduction

In [7] we presented an extension of Hereditary Harrop formulas with con-straints by adding negation to obtain HH¬(C), a Constraint DeductiveDatabase (CDDB) system as Datalog (with Constraitnts) [9], based on afixpoint semantics as Coral [8]. We stress, as an important benefit of ourapproach, the ability to formulate hypothetical and universally quantifiedqueries. The resulting language enjoys the expressive power of Datalog, butadds constraints and two new logical connectives: implication (to formulatehypothetical queries), and universal quantification (to encapsulate data). We

This paper is electronically published inElectronic Notes in Theoretical Computer Science

URL: www.elsevier.nl/locate/entcs

128

Aranda-Lopez, Nieva, Saenz-Perez and Sanchez-Hernandez

have implemented a prototype as proof of concept for the theoretical frame-work. Two main components can be distinguished in the implementationof this prototype, that we already presented in [1]. One corresponds to thebottom-up implementation of the fixpoint semantics, which is very close to thetheory. The fixpoint is computed using a stratification of the predicates of thedatabase which is obtained from a suitable notion of dependency graph. Theother component corresponds to the implementation of the constraint solvers.The first component is independent of the particular constraint system, i.e.,it is parametric on the second component. Then, the known safety results forDatalog (with constraints) [9] are valid in our case, because they rely on theconstraint systems.

In [3] we shown a first approximation of how incorporate aggregate func-tions to our system. Those functions are useful in computing single valuesfrom a set of numerical or other-type values. We have taken advantage ofcertain aspects of the stratified semantics of our database system in order todeal with the implementation of aggregate functions.

HH¬(C) prototype incorporates a type checking and inferrer system, thenour clauses defining databases are typed. Types are needed to know the con-straint system the constraints belong to. We have proposed three constraintsystems as possible instances of the scheme HH¬(C): Boolean, Reals, andFinite Domains.

We have relied on the underlying constraint solvers available in SWI-Prolog[12] for implementing the constraint systems. In addition, due to some conec-tives of the language are not suppoted in SWI-Prolog (for instance universalquantifier), it has been necessary to make an explicit management for them.

Consistency constraints over data are known as strong integrity constraintsin the deductive database area. Examples of such integrity constraints in therelational model are primary keys and foreign keys, to name a few. Suchintegrity constraints must not be missed with those belonging to constraintsystem C used to parameterize the scheme HH¬(C) (or analogously CLP (C),where constraints are first-class citizen constructions of the language, as theycan occur in well-formed formulas. As well, constraints in deductive systemshave also been studied [2,5], as DLV [6] or XSB [10] implementing stablemodel [4] and well-founded model semantics [11], respectively, are otherwiseunderstood as model filters. Since a database can have several models, onlythose complaining constraints are included in the answer, therefore discardingunfeasible models from the answer.

In this paper, instead, we focus on integrity constraints as understood inrelational database management systems (RDBMS’s) in order to provide ameans to detect inconsistent data with respect to user requirements. Oursystem already included one such kind of a constraint in the form of domainconstraints (as known is RDBDM’s) as it is strongly typed. Here, we describe

142

129


for the first time the introduction of other integrity constraints: the moreusual primary key and foreign key, and also the functional dependency, whichin particular is useful for ensuring consistency over denormalized relations.We present how integrity constraints can be specified at the language level us-ing HH¬(C) expresiveness. We also explain the issues which arise in enablingintegrity constraints support in a general database framework. We propose aconcrete implementation and we sketch some improvements to enhance per-formance.

2 System Description and Fixpoint by Strata

In this section we introduce the main aspects of our system and how to use it.

2.1 Databases and Queries

As usual in CDDBs, a database is a set of clauses and a query is a goal, whoseanswer is a constraint. When the system is started, the computation of thefixpoint semantics of a database ∆ follows the next computation stages:

(i) Check and infer the predicate types.

(ii) Build the dependency graph of ∆.

(iii) Compute a stratification s for ∆, if there is any. Otherwise, the systemthrows an error message and stops.

(iv) If the previous step succeeds, compute the fixpoint of ∆, fix(∆), fromthe first to the last stratum.

The fixpoint fix(∆) is compossed by a set of pairs (A,C) such that theatom A can be derived from ∆ if C is satisfied, i.e., it captures the semanticsof the database ∆. Those pairs are obtained by means of a stratified fixpointoperator, whose computation is specially difficult when dealing with nestedimplications. This is because when an implication D ⇒ G appears in thebody of a clause, the database for which the fixpoint is being computed isaugmented with D. Then a local fixpoint for the extended database must becalculated. See [1] for details.

After the stage (iv), the system keeps in memory, while not processinganother database, the following information: the just computed fix(∆), thestratification s, and the dependency graph of ∆

When a query G is posed at the prompt, the system computes, if it exists,a new stratification s′ for the set ∆ ∪ {G}. The query can not be computed ifthere is not such s′, and the systems stops. In other case:

• If s = s′, the kept fixpoint, computed for ∆, is valid to evaluate G.

• If s 6= s′, the symbol predicates involved in the computation of G can be in a

143

130


different stratum than when fix(∆) was computed. So, the stored fixpointis not valid now to evaluate G and a new fixpoint, fix(∆∪{query(X): −G}),must be computed, where X are the free variables of G. The new predicatequery added to the database captures the desired answer, that will be theconstraint C stored in the pair (query(X),C) of the computed fixpoint.More details of the implementation are shown in [1]. In the next section weintroduce the use of HH¬(C) system by means of examples.

2.2 A Database Example

Here we define a simple database for managing information about studentsin some courses related to Logic Programming. We adhere to a syntax quitesimilar to Prolog. In addition, we write not for negation, => for implication,ex(X,G) representing ∃X G, and fa(X,G) representing ∀X G. First, we definetwo domains, one for the student names and the other for subject names:

domain(student_dt,[angela,david,joseluis,nicolas]).

domain(subject_dt,[programming_introduction,logic_programming,

declarative_programming]).

Although several solvers can be used together within the same database,they can not be combined for the moment, i.e., constraints of different typescannot be freely mixed to get an heterogeneous compound constraint. Pred-icates with arguments of different types are restricted to those extensionallydefined.

This limitation can be surpassed in some practical situations by definingdomain mappings. In this example, we can easily define a mapping from thedomain of student names to real values, i.e., associate a real number (intendedas the identity card number) to each student:

type(student_id(student_dt,real)).

student_id(angela,350001).

student_id(david,500002).

student_id(joseluis,750003).

student_id(nicolas,900004).

We also associate each subject name to its code:

type(subject_id(subject_dt,real)).

subject_id(programming_introduction,406).

subject_id(logic_programming,428).

subject_id(declarative_programming,455).

The information about students taking a course can be stored as:

type(course(real,real,real)).

144

131


course(350001, 406, 5.0).

course(750003, 406, 7.5).

course(500002, 406, 2.0).

course(350001, 428, 3.5).

For instance, the first clause means that the student with identity card number350001 is taking the course of Programming Introduction (identifier 406) andit has a mark of 5.0.

We focus now on the intensional database. In a first view, we collect thestudents St who have passed a concrete subject Sb:

type(passed(real,real)).

passed(St,Sb) :- course(St,Sb,M), constr(real ,M>=5.0).

The next view represents that a student is able to register in Declara-tive Programming (identifier 455) if it has passed Programming Introduction(identifier 406) and it is already registered in Logic Programming (identifier428):

type(register(real,real)).

register(St,455) :- passed(St,406), course(St,428,M).

Fixpoint of the Database

When the user process (load) a database, the system performs the com-putation stages mentioned in Section 2.1, and obtains a fixpoint for such adatabase. That fixpoint is a set of pairs (Atom,Constraint), meaning thatthe atom is true if the constraint is satisfied. The fixpoint for our workingexample is:

passed(A_real, B_real), (B_real=406.0, A_real=350001.0;

B_real=406.0, A_real=750003.0)

register(350001.0, 455.0), true

student_id(angela, 350001.0), true

student_id(david, 500002.0), true

student_id(joseluis, 750003.0), true

student_id(nicolas, 900004.0), true

subject_id(programming_introduction, 406.0), true

subject_id(logic_programming, 428.0), true

subject_id(declarative_programming, 455.0), true

course(350001.0, 406.0, 5.0), true

course(750003.0, 406.0, 7.5), true

course(500002.0, 406.0, 2.0), true

course(350001.0, 428.0, 3.5), true

145

132


Querying

Once the database is loaded the user can submit queries to the system.For example, the user can ask how many students are studing ProgrammingIntroduction (identifier 406):

HHn(C)> constr(real,NumSt=count(course(St,406.0,M))).

Answer: NumSt=3.0

Who is not able to register in Declarative Programming (identifier 455):

HHn(C)> not(register(St,455.0)),student_id(N,St).

Answer: ( N=nicolas, St=900004.0;

N=joseluis, St=500003.0;

N=david, St=750002.0;

N=angela, St=350001.0),

St/=350001.0

Assuming that a student has passed Programming Introduction (identifier 406)with a mark of 9.0, what is the average mark of the class in this subject:

HHn(C)> course(750003.0,406.0,9.0)=>

constr(real,Avg=avg(course(St,406.0,C),C)).

Answer: Avg=5.875

The current version of the system is available at https://gpd.sip.ucm.

es/trac/gpd/wiki/GpdSystems including a bundle of examples.

3 Integrity Constraints in HH¬(C)In this section we introduce integrity constraints in the system. We describethe different approaches that we have considered dealing with this feature.Firstly, we show how the language by itself is able to capture the most commonintegrity constraints, at least for databases that do not involve some forms ofnested implications (in particular, the specifications will be sound for theextensional database, which is the most common use in relational databases).Then, we explain the problems coming from nested implications and the formin which they can be overcome within the concrete implementation of thesystem.

3.1 Using Expressiveness of the Language to Define Integrity Constraints

The intended aim when designing the HH¬(C) language is to get a highlyexpressive query language. As a proof of concept, in this section we showhow the user can define (strong) integrity constraint within the laguage itself,by adding predicates to the input database that capture information aboutthe violation of these constraints. For the examples below, we suppose an

146

133


alphabetical domain data composed of all letters of the alphabet. Let us startwith a primary key constraint for a defined predicate p:

p(a,b,c).

p(b,c,d).

p(e,f,g).

p(a,i,j).

A primary key constraint specifies that there are no two tuples in a predicatewith the same values for a given set of columns. For example, for specifingthat the first parameter is the primary key for the predicate p, we add to thedatabase:

pk_p_fails:- p(A,B,C), p(A,B2,C2),constr(data,(B/=B2;C/=C2)).

The predicate pk p fails express a failure in the intended primaray key con-straint, i.e., it is true iff the constraint is violated. In this case pk p fails istrue as the value a occurs twice as the first argument of p. Moreover, we cancapture information about the tuples that violate the constraint by addingparameters in the head:

pk_p_fails(A,B,C):- p(A,B,C), p(A,B2,C2),

constr(data,(B/=B2;C/=C2)).

In this case, those tuples are (a,b,c) and (a,i,j).

A foreign key constraint specifies that the values in a given set of columnsof a predicate must exist already in the columns declared in the primary keyconstraint of another predicate. Assume for example the following predicatess and q:

s(a,b). q(a,b).

s(e,a). q(e,f).

s(h,e). q(h,j).

s(k,k). q(k,l).

s(a,b). q(c,d).

s(a,c). q(t,g).

For defining a foreign key between the first argument of s and the first argu-ment of q we are interested in those tuples that could be defined as:

fk_not_pairs(X):-q(X,A),not(s(X,B)).

With this idea it is straightforward to define the fk qs fails predicate:

fk_qs_fails(V):- q(V,A), fa(B,not_s(V,B)).

not_s(V,B):- not(s(V,B)).

In this example, this predicate will capture the concrete values (V=c; V=t).

A functional dependency constraint X → Y over a predicate p specifiesthat the set of arguments X of p functionally determine the set Y , i.e., each

147

134


tuple of values of X in p is associated with exactly one tuple of values Y inthe same tuple of p. For instance assume a predicate u:

u(a,b,c,d,e,f).

u(a,b,d,c,e,f).

u(a,a,a,a,a,a).

u(a,b,c,h,a,a).

For the predicate u(A1,A2,B1,B2,C,D), let us assume the functional depen-dency (A1, A2) → (B1, B2). Its violation can be expressed by the followingpredicate:

fd_u_fails(A1,A2,B1,B2):- u(A1,A2,B1,B2,_,_),

u(A1,A2,B3,B4,_,_),

constr(data,(B1/=B3 ; B2/=B4)).

This predicate fd u fails(A,B,C,D) will capture the tuples:

(A=a, C=c, D=d, B=b;

A=a, C=c, D=h, B=b;

A=a, C=d, D=c, B=b).

In the previous examples the user realizes that a violation of an integrityconstraint occurs because of the presence of pairs for concrete predicates inthe database fixpoint. This is a direct first approach, that shows the expres-siveness of the language and also serves as a clear specification of the integrityconstraints we are interested in. Nevertheless, as we have pointed out beforethis specification is not complete for the general case. Nested implicationsrequire a sophisticated computation mechanism involving temporary fixpointcalculations (see [1] for a detailed description of this mechanism). The infor-mation about integrity constraint violation, as previously defined, may appearin some temporary fixpoint but not in the final fixpoint. To illustrate this sit-uation, assume the following database:

q(a,b).

p:- q(a,c) => q(a,c).

And a definition of primary key for the first argument of q, following theprevious ideas:

pk_q_fails:- q(A,B), q(A,C), constr(data,B\=C).

During the computation for the predicate p it is required to calculate a localfixpoint for an extended database including the tuple q(a,c). At this point,both tuples q(a,b) and q(a,c) are in the fixpoint, which means a violation ofthe specified primary key. In fact, the pair (pk q fails,true) will be addedto this local fixpoint, that is not kept anymore. So the pair (pk q fails,true)

does not appear in the final fixpoint, which is:

148

135


p, true

q(a, b), true

And this is not consistent according to the definition of primary key for q.It is interesting that at some point of the computation the inconsistency ispresent and could be checked by the system with some easy modifications ofit. In the next section we focus on these issues in order to get a practicalimplementation.

3.2 Adding Integrity Constraint Support to a Database System

Three issues must be taken into account for adding support to integrity con-straints to a given database system and language.

First, how integrity constraints are declared. On the one hand, RDBMS’sallow to declare them with both DDL (Data Definition Language) and GUIinterfaces (in turn, the latter uses the former and hidden from the user). Onthe other hand, Prolog systems include either assertions or directives for suchdeclarations. Currently, although somewhat interactive, our system reads adatabase which is not subject to change up to next loading, i.e., it is more tar-geted to be used as a batch database instead of a full fledged online database.So, including a DDL in addition to the current DML (Data ManipulationLanguage) does not seem a need. Thus, as HH¬(C) is a logic programming(LP) language, we rather follow the more usual way in these LP systems andprovide support to assertions. Next section presents the concrete syntax wepropose.

Second, when a constraint must be detected as violated. Our system im-plements an iterative fixpoint procedure with a common operation for addingnew tuples to the current interpretation. So, a safe check to detect constraintviolation is at this point. This can be seen as a sufficient condition, but weakerconditions can be found, indeed. Given that each constraint is implementedwith a predicate, each time a given atom and constraint is to be added, itspredicate name can be checked for matching with those constraint predicates.Other alternatives do exist, from this just introduced end to the other, oppositeend, when constraints are checked after all fixpoint computation is finished.But notice that, as implications are allowed, it might be the case that someconstraints had been violated in a given subcomputation that does not longerbe included in the outcome fixpoint, as explained in the previous section.

And, third, what to do when an inconsistency is found. Here, batch andonline operations are considered as a reference to this question. Online updatesto databases usually do constraint checking for each tuple to be either added,modified or deleted, discarding the update if an inconsistency is found andraising an exception. Batch updates, on the other hand, can follow anotherroute: Collect the inconsistencies and report offending tuples to the user. This

149

136


is quite adequate for massive updates for which it is convenient to have allthe offending tuples at hand in order to reason about and repair error sources.Several RDBMS’s do allow these batch updates with different tools (Oracle,DB2, MySQL, . . . ).

3.3 Concrete Implementation: The Case of Primary Key

In this section we explore a first approach for a concrete implementation ofintegrity contraints, that is currently working in the system. The aim is toautomate the generation of the specifications introduced in Section 3.1, takingreasonable decisions about the three questions presented in the previous Sec-tion. The problem explained in Section 3.1 about nested implications and theassociated subcomputations can be surpased now at the system implementa-tion level.

We focus on primary key constraints, as the others (foreign keys, andfunctional dependencies) can be treated in a similar way. For the first questionhow, we introduce a new syntactical contruction in the system in order todefine a primary key contraint. Given a predicate p of arity n, and two tuplesof variables X and Y (of arities n and m ≤ n resp.) such that Y ⊆ X, thenthe directive pk(p(X), Y ) establishes the arguments corresponding to Y as aprimary key for p. For example, the primary key constraint for the predicatep of Section 3.1 would be expressed as:

:- pk(p(A,B,C),(A))

The system automates the translation of this directive, producing exactly thecode for pk p fails(A,B,C) presented in Section 3.1.

The system must also check the possible violations of integrity contraints.As explained in Section 3.1, we can not wait for the complete fixpoint of thedatabase. But the system implements an iterative fixpoint procedure with aspecific operation for adding new pairs to the current interpretation. Then,for the second question, when, a safe answer is whenever a new pair is addedto the current interpretation, as any fixpoint (temporary or not) comes froma growing interpretation. Of course, this solves the problem pointed out inSection 3.1 related to temporary fixpoints. Note that this is analogous to theapproach taken on by current RDBMS implementations. The idea is to avoidspeculative computations by detecting inconsistencies as soon as possible. Inour case, there is another natural point to check integrity that could also beimplemented, that is to check it at the end of any fixpoint computation.

For the last question, what, we have decided to show the error messagewith the corresponding predicate and the values of the tuples that cause theconflict.

For instance, for the primary key constraint violation of predicate p, ofSection 3.1, this message is:

150

137


-- Integrity Constraint Violation --

Duplicate primary key in predicate p, duplicate value: a

After this error message, the system continues and finishes the computation.

4 The next steps

Following the line of the first approach of Section 3.1 we plan to use HH¬(C)expresiveness to declare user-defined integrity constraints as views. At a firsteasy example, referred to the student database in Section 2, we can define anuser integrity constraint aimed to limit the number of students of a course:

lp_quote_exceeded :- constr(real,count(course(X,406,M))>25).

So, an integrity constraint specifies unfeasible values rather than feasible. Thisis opposed to the usual way of specifying integrity constraints in SQL withCHECK clauses, where the positive case is specified instead of the negative one aswe do. Notice also that in this example we take advantage of our constraintsystem implementation, which supports aggregates. Following the previousphilosophy, it is straightforward to define a system directive for user-definedconstraints. For this example:

:- uc(lp_quote_exceeded).

This directive would specify that the predicate lp quote exceeded shouldnot occur in any fixpoint. Otherwise, an informative error message would beraised.

In addition, we plan to explore alternative implementations of the integrityconstraints. An alternative is to keep all the information about integrityconstraints separated from the fixpoint, i.e., integrity constraints does notaffect to the semantics of the database. They are interpreted as externalobservers of that fixpoint that would raise information about the semanticinconsistencies defined by integrity constraints.

Acknowledgements: This work has been partially supported bythe Spanish projects STAMP (TIN2008-06622-C03-01), Prometidos-CM(S2009TIC-1465) and GPD (UCM-BSCH-GR35/10-A-910502).

References

[1] G. Aranda, S. Nieva, F. Saenz-Perez, and J. Sanchez. Implementing a Fixpoint Semanticsfor a Constraint Deductive Database based on Hereditary Harrop Formulas. In Procedings ofthe 11th International ACM SIGPLAN Symposium of Principles and Practice of DeclarativePrograming (PPDP’09), pages 117–128. ACM Press, 2009.

[2] Andrea Calı, Georg Gottlob, and Thomas Lukasiewicz. Datalog±: a unified approach toontologies and integrity constraints. In ICDT ’09: Proceedings of the 12th InternationalConference on Database Theory, pages 14–30, New York, NY, USA, 2009. ACM.

151

138


[3] G.Aranda, S.Nieva, F.Saenz-Perez, , and J. Sanchez. A prototype constraint deductivedatabase system based on hh¬(c). In X Jornadas sobre Programacion y Lenguajes(PROLE’10), pages 189–196, 2010.

[4] Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming.In ICLP/SLP, pages 1070–1080. MIT Press, 1988.

[5] Robert Kowalski, Fariba Sadri, and Paul Soper. Integrity checking in deductive databases.In In Proceedings of the VLDB International Conference, pages 61–69. Morgan KaufmannPublishers, 1987.

[6] Nicola Leone, Gerald Pfeifer, Wolfgang Faber, Thomas Eiter, Georg Gottlob, Simona Perri,and Francesco Scarcello. The DLV system for knowledge representation and reasoning. ACMTrans. Comput. Log., 7(3):499–562, 2006.

[7] S. Nieva, F. Saenz-Perez, and J. Sanchez. Formalizing a Constraint Deductive DatabaseLanguage based on Hereditary Harrop Formulas with Negation. In FLOPS’08, Proceedings,volume 4989 of LNCS, pages 289–304, Ise, Japan, 2008. Springer-Verlag.

[8] Raghu Ramakrishnan, Divesh Srivastava, S. Sudarshan, and Praveen Seshadri. The CORALDeductive System. The VLDB Journal, 3:161–210, 1994.

[9] P. Z. Revesz. Introduction to Constraint Databases. Springer, 2002.

[10] Konstantinos Sagonas, Terrance Swift, and David S. Warren. XSB as an efficient deductivedatabase engine. In SIGMOD ’94: Proceedings of the 1994 ACM SIGMOD internationalconference on Management of data, pages 442–453, New York, NY, USA, 1994. ACM.

[11] A. Van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general logicprograms. J. ACM, 38(3):619–649, 1991.

[12] Jan Wielemaker. An overview of the SWI-Prolog programming environment. In Fred Mesnardand Alexander Serebenik, editors, Proceedings of the 13th International Workshop on LogicProgramming Environments, pages 1–16, 2003.

152

139

The Journal of Logic and Algebraic Programming 83 (2014) 20–52

Contents lists available at ScienceDirect

The Journal of Logic and Algebraic Programming

www.elsevier.com/locate/jlap

An extended constraint deductive database: Theory andimplementation

Gabriel Aranda-López ∗, Susana Nieva, Fernando Sáenz-Pérez,Jaime Sánchez-Hernández

Facultad de Informática, Complutense University of Madrid, Spain

a r t i c l e i n f o a b s t r a c t

Article history:Received 28 July 2011Received in revised form 4 April 2013Accepted 3 July 2013Available online 9 July 2013

Keywords:Deductive databasesConstraintsHereditary Harrop formulasFixpoint semantics

The scheme of Hereditary Harrop formulas with constraints, HH(C), has been proposed asa basis for constraint logic programming languages. In the same way that Datalog emergesfrom logic programming as a deductive database language, such formulas can supporta very expressive framework for constraint deductive databases, allowing hypotheticalqueries and universal quantifications. As negation is needed in the database field, HH(C)

is extended with negation to get HH¬(C). This work presents the theoretical foundationsof HH¬(C) and an implementation that shows the viability and expressive power ofthe proposal. Moreover, the language is designed in a flexible way in order to supportdifferent constraint domains. The implementation includes several domain instances, andit also supports aggregates as usual in database languages. The formal semantics of thelanguage is defined by a proof-theoretic calculus, and for the operational mechanism weuse a stratified fixpoint semantics, which is proved to be sound and complete w.r.t. theformer. Hypothetical queries and aggregates require a more involved stratification thanthe common one used in Datalog. The resulting fixpoint semantics constitutes a suitablefoundation for the system implementation.

© 2013 Elsevier Inc. All rights reserved.

1. Introduction

The extension of LP (Logic Programming) with constraints gave rise to the CLP (Constraint Logic Programming) scheme[26,25]. In a similar way, the HH(C) scheme (Hereditary Harrop formulas with Constraints) [31,19] extends HH by addingconstraints. In both cases, a parametric domain of constraints is assumed for which it is possible to consider differentinstances (such as arithmetical constraints over real numbers or finite domain constraints). The extension is completelyintegrated into the language: constraints are allowed to occur in goals, bodies of clauses, and answers.

As a programming language, HH(C) can still be viewed as an extension of CLP in two main aspects. On the one hand,the logic HH introduces new connectives which are not available in Horn Clause logic, such as disjunction, implication anduniversal quantifiers [36]. On the other hand, and following Saraswat [45], in the scheme HH(C), the notion of constraintsystem is established in such a way that any C satisfying certain minimal conditions can be considered as a possibleinstance for the scheme. In [45], as minimal conditions the language of constraints incorporates ∧ and ∃. However, particularconstraint systems may include more logical symbols as ∀ and ⇒, together with the corresponding assumptions related totheir behavior. Therefore, the language of constraints itself extends the common ones used in CLP, consequently facilitatingthe representation of more complex constraints.

* Corresponding author.E-mail address: [email protected] (G. Aranda-López).

1567-8326/$ – see front matter © 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jlap.2013.07.002

140

G. Aranda-López et al. / The Journal of Logic and Algebraic Programming 83 (2014) 20–52 21

This paper extends other works [38,2] in which we investigated the use of HH(C) not as a (general purpose) programminglanguage, but as the basis for constraint deductive database (CDDB) systems [30,42]. The motivation is that, in the sameway that Datalog [51,56] and Datalog with constraints [27] arise for modeling database systems inspired by LP and CLPrespectively, the language HH(C) can offer a suitable starting point for the same purpose. We show that the expressivepower of HH(C) improves existing languages by enriching the mechanisms for database definition and querying, with newelements that are useful and natural in practice. In particular, implications can be used to write hypothetical queries, anduniversal quantification allows encapsulation. The existence of constraints is exploited to represent answers and to finitelymodel infinite databases and answers. This is also the case of constraint databases, but the syntax of our constraints is alsomore expressive than the one commonly used in them, as it is the case of Datalog with constraints.

However, HH(C), as it was originally introduced, lacks negation which, as we will see, is needed for our proposal to becomplete with respect to relational algebra (RA). We have extended HH(C) with negation, to obtain HH¬(C) to be used asa database language. We have defined a proof-theoretic semantics to provide the meaning of goals (queries) and programs(databases). This meaning is represented by answer constraints, which can be obtained using the goal-oriented rules ofa sequent calculus which combines intuitionistic inference rules with deductibility in a generic constraint system. Also,a stratified fixpoint semantics has been defined and proved to be sound and complete with respect to the previous one. Themotivation for introducing this new semantics take into account several aspects:

• The fixpoint semantics provides a model for the whole database, while the proof-theoretic one (as well as top-downsemantics in general) focuses only on the meaning of a query in the context of a database. In fact, the fixpoint of adatabase will correspond to the instance of the database. Thereby, the fixpoint semantics supplies a framework in whichproperties such as equivalence of databases can be easily analyzed, which gives formal support to the study of queryoptimization.

• In order to deal with recursion and negation, we have followed the stratified negation approach used in [51] whichgives semantics to Datalog. The use of a fixpoint semantics as an operational mechanism has been adopted as a goodchoice in several deductive database systems as it is able to avoid the non-monotonicity inherent to negation. Thenit guarantees termination, as long as the constraint answer sets are finite. Further, it facilitates to work with differentconstraint systems, relegating the problem of termination to the problem of finding intensional representations of databy the constraint system. This issue is dealt in works as [40], where safety conditions are imposed to the constraintsystem, and some particular systems are identified as satisfying such conditions. Hence, termination for any query isensured for these systems. But, identifying such particular systems is out of the scope of this work.

• Introducing negation in goals makes a given database to may have several meanings [51]. Stratified negation is oneof the approaches that, by imposing syntactical restrictions, guarantees a unique model for the database: the minimalfixpoint interpretation. Moreover, stratification has been previously used as a useful resource when dealing with hypo-thetical queries in Datalog [9], in order to get a unique minimal model for the database, as it is the case of our proposal.

• Stratification is a common technique to deal with aggregates [42], because it ensures monotonicity. Our stratified designof the fixpoint semantics has become a good framework to implement aggregates.

In order to define a stratified fixpoint semantics for HH¬(C), we have adapted the usual notion of dependency graphsto include the dependencies derived from implications inside goals, as well as those derived from aggregate functions. Thefixpoint of a database is computed as a set of pairs (A, C), where A is an atom and C a constraint. The atom A can beunderstood as an n-ary relation instance, where its arguments are constrained by C . According to the dependency graph,predicates are classified by strata and these pairs are computed by strata. Each stratum should become saturated beforetrying to saturate any other higher stratum. However, as an implication may occur in a goal, the computation must takeinto account that the database is augmented with the hypothesis posed in the implication antecedent. From the theoreticalpoint of view, this issue does not make any obstacle, but it can be a drawback for a concrete implementation. In ourapproach, the fixpoint of the augmented database must be locally computed to solve the implication. But our proposal takesadvantage of the stratification to avoid cycles during the computation. In addition, the dependency graph can be useful toreduce this computation to the part of the database that is involved in the implication.

The fixpoint semantics provides support for a concrete database system. We have implemented a Prolog prototype veryclose to the underlying theory as a proof-of-concept. Essentially, it incorporates all the features introduced in the paper,and moreover it supports aggregate functions. Two main components can be distinguished in the implementation of thisdatabase language. One corresponds to the implementation of the fixpoint semantics which is independent of the concreteconstraint system. The other component corresponds to the implementation of the constraint system. We have consideredand implemented solvers for the following constraint systems: Boolean, Reals, and Finite Domains, as instances of C in thescheme HH¬(C). The implementation is designed in such a way that more than one constraint system can be used withinthe same database. We have designed a type system for identifying the constraint system each constraint in a databasebelongs to.

1.1. Related work

It is well known that negation in logic programming is a difficult issue and there has been a great amount of work aboutit [1], and also in constraint logic programming [46] and deductive databases [8] since long time ago. A first bag of issues

141

22 G. Aranda-López et al. / The Journal of Logic and Algebraic Programming 83 (2014) 20–52

comes from deriving incorrect answers or failing to derive others in presence of classical negation. Several problems arise inthis setting, as unsoundness of SLDNF, which is workarounded by restricting a negative goal to be selected until it becomesground [34]. However, this introduces another problem: floundering [5], which is avoided in the field of deductive databaseswith safety conditions [51]. Safety conditions have been also applied to constraint deductive databases [7,40,41] for partic-ular constraint systems, enabling to develop closed-form constraints as answers. The closed-form evaluation requirementguarantees that it is possible for the query solver to calculate intensional forms for the answer. In constraint databases,where (non-ground) intensional data are managed, constructive negation [29,14,16] can be used instead of such safety con-ditions. Its rationale lies on allowing non-ground negative goals, which can construct answers by involving constraints ongoal variables, therefore avoiding floundering. This is the very fundamental of constraint databases [42], where an answerto a negated goal is a set of constraints. This idea was used in CLP [46] as an approach to constructive negation to avoidfloundering. Our work can also be seen from this perspective because the answer to a negated goal is also constructive.Nevertheless, the proposal of [46] is based on classical logic, while our approach is based on intuitionistic logic includingimplication and universal quantification.

A second bag of issues comes from assigning a model to a program valid to the user under an intended semantics. Asmentioned before, stratified negation guarantees that only one minimal model can be assigned to a program [51]. Otherapproaches are based on non-monotonic logic, as inflationary semantics [13], which is also based on a two-valued logic andassigning one model to a program. Its drawback is that, in general, that model is not a minimal one and inflationary modelsemantics does not always meet the intended semantics. Next approaches are based on three-valued logic and produce ingeneral several outcome models:

Gelfond and Lifschitz proposed Stable Models [21], a declarative semantics for logic programs with negation, based onautoepistemic logic. Another related approach is the Well-Founded semantics defined by Van Gelder et al. [52], where themain idea is the notion of unfounded set, which provides the basis for obtaining negative information in the semantics. An-swer Set Programming (ASP) is a form of declarative programming oriented towards difficult combinatorial search problems[32,20]. It is based on the work about Stable Models and fast propositional solvers are used as computational mechanismfor inference. Its key idea is to use ground instances of programs. Our scheme HH¬(C) is designed to work with anygeneric constraint language LC and the use of ground instances would impose a serious limitation on the constraint sys-tems allowed as instances of the scheme. This technique would be adequate for a Herbrand constraint system, but they areunfeasible for more sophisticated constraints. Notice that HH¬(C) constraint systems work with intensional representationsthat are able to be more general than a simple equality. For example, constraints over real numbers should be excluded as apossible instance, as no grounding is possible (at least in a straightforward way) and it would be a serious drawback for ourproposal. However, although ASP includes constraints, they are viewed as integrity constraints which discards models in theanswer rather than syntax objects which can be dealt as first citizen constructions á la CLP. In addition, neither implicationsnor quantifiers are supported. It might be expected that introducing the implication, which dynamically produces an in-creasing of the database, in a system based on the approaches just mentioned, additional computations would be necessary,as it happens in our system.

Although stratification imposes certain syntactic restrictions to the language, these conditions are usually adopted indeductive database systems, as Datalog, for practical reasons. Other works add different syntactic conditions as [4], wherethe notion of guarded negation is adapted to database queries both for SQL and Datalog languages in order to improveperformance. In the case of Datalog, guarded negation introduces an additional syntactical condition to stratification. Al-though it is computationally well-behaved and subsumes several well-known query languages as unions of conjunctivequeries, monadic Datalog and frontier-guarded tuple-generating dependencies, it is actually subsumed by stratified Datalog.As HH¬(C), in turn, subsumes stratified Datalog, it also subsumes Datalog with guarded negation.

In addition, the syntactic limitations associated to the stratification approach can be overcome in practical situations ofpotentially non-stratifiable programs, which can be modeled by equivalent stratifiable databases. We illustrate this point byan example showed in [21] and also related in [52] as a classical example of non-stratifiable program, that can be tackledboth with the Well-Founded semantics and Stable Models.

Example 1. In this example it is shown a general scheme for a two-people game with a finite space of states. This schemeallows to determine the winning states of the game (those that guarantee the victory for the player in turn) by means ofone single clause reflecting a simple idea: one wins if the opponent cannot win because it cannot move. The clause is:

∀x∀y winning(x) ⇐ move(x, y) ∧ ¬winning(y),

where move is defined for the concrete game. This program is not stratifiable due the negative cycle in winning. Nevertheless,it is easy to see that the winning strategy is based on the parity of the number of movements. Using that fact, it isstraightforward to encode this strategy as follows:

∀x∀y canMove(x) ⇐ move(x, y),

∀x∀y possibleWinning(x) ⇐ oddMove(x, y) ∧ ¬canMove(y),

∀x∀y winning(x) ⇐ move(x, y) ∧ ¬possibleWinning(y).

142


Here oddMove(x, y) represents a change from state x to state y in an odd number of movements and it is defined as:

∀x∀y oddMove(x, y) ⇐ move(x, y),

∀x∀y∀z1∀z2 oddMove(x, y) ⇐ move(x, z1) ∧ move(z1, z2) ∧ oddMove(z2, y).

This program represents a stratifiable database which is suitable for HH¬(C), and even for Datalog. The definition of thepredicate oddMove can be simplified in HH¬(C), making use of constraints. In addition, our language combines implication,negation and recursion, so more complex queries can be formulated as, for instance:

move(a,b) ⇒ winning(x),

that asks for the winner positions, assuming the existence of the new movement move(a,b). �One of the main advantages of our language is the use of hypothetical queries, an unusual feature in a database

language. However, there exist some works as Hypothetical Datalog [9,10], that is an extension of Horn-clause logic al-lowing hypothetical queries in a similar way to our proposal. Queries are allowed to include the local (hypothetical)addition and deletion of tuples from the database. In both, additions (A ← B [add : C]) and deletions (A ← B [del : C]),the atomic term C is temporarily added or deleted from the extensional database in order to solve the query B, whichcan be understood as a dynamic modification of the database, similarly to the HH¬(C) behavior. Nevertheless, this isa very restrictive form of hypothetical queries, which implies a great simplification of the approach, as HH¬(C) allowscomplete clauses as the antecedent in hypothetical queries, i.e., it allows to dynamically change the intensional database(which corresponds to the formalization within the intuitionistic logic). For instance, in the previous example, the querymove(a,b) ⇒ winning(x) can be formulated also in Hypothetical Datalog. But, it is not possible to assume a more gen-eral rule for the predicate move, that defines the possible movements of the game. In contrast, for instance, the formula∀x∀y(2 ∗ x ≈ y ⇒ move(x, y)) ⇒ winning(z) is a valid query in HH¬(C). Moreover, when comparing Hypothetical Datalogand its theoretical foundations to HH¬(C), we must emphasize that our logic combines connectives and constraints. In [15]another approach for hypothetical queries, also including positive and negative hypotheses, is proposed, but neither negationnor implications are allowed in clauses.

Previously, we pointed out the difficulty of dealing with negation in logic programming. The semantics of negation iseven more complex in the presence of implication in goals, as it is the case of Hypothetical Datalog. Additional obstaclesarise when considering the logic HH, due to the inclusion of universal quantifiers. There exist several proposals aimed tointroduce negation in HH (see for instance [24,37,17]). [17] is the first work introducing negation as failure into N-Prolog.In [37] a natural calculus is proposed for HH with negation. [24] is closer to our approach in the sense that a sequentcalculus as well as a fixpoint semantics is defined for this logic. Instead of stratification, in order to preserve monotonicity,two forcing relations (positive and negative) are introduced, and in addition completely and incompletely defined predicatesmust be distinguished. In practice, these issues yield to a hard computation of the fixpoint in a concrete implementation.

With respect to constraint database systems, during the last couple of decades, constraint databases have received muchattention because they are specially suited for applications subject to geometric interpretation, notably in geographic infor-mation systems (GIS), spatial databases and spatio-temporal databases [22,44,42,30]. Current trends are oriented to developefficient operators and indexing of geometric data. Specific academic systems include MLPQ/GIS [28], DISCO [11] and PReSTO[43,12]. Output from this research was transferred to commercial systems and nowadays, several database vendors offerconstraint-based databases as IBM DB2, MS SQL Server, Oracle, PostgreSQL, mainly for GIS (Geographic Information Sys-tem) applications. However, as in relational algebra, negation only occurs in difference operators, so that negated predicatescannot be queried, as we do allow. Negation in the specific field of CDDB systems has been also studied [23,42]. In [23], dif-ferent uses of negation are identified and the more general one requires delaying for constraints to become ground, whichis a severe restriction. The treatment of negation of [42] corresponds to stratified Datalog for safe constraint queries. In ourlanguage, negation is even more complex due to the presence of implication and universal quantification in goals.

1.2. Contributions and relationship to prior work

In this paper we give a complete picture of the HH¬(C) language as an expressive constraint deductive database language,including its theoretical foundation, as well as the description of a prototype system based on it. The paper provides anintegrating view of previous works related to this language, [38,2], and extends them in several aspects:

• In [38] the programming scheme HH¬(C) was formalized defining a proof-semantics and a stratified fixpoint semantics,that is sound and complete w.r.t. the former. Here, we show in detail these two formalizations which support thescheme, emphasizing the purpose of the fixpoint semantics as an operational semantics that guides the implementation.In addition, we include full proofs for the equivalence results.

• In [2] we presented a first Prolog prototype implementing that scheme, based on the fixpoint semantics, that is inde-pendent on the particular constraint system. This prototype has been enhanced with different improvements. In thispaper we show a detailed description of the improved HH¬(C) system, the way in which technical problems have been

143


solved, and some selected examples evincing the usefulness of the scheme. The most relevant features added to thesystem in the present work are:

– We have investigated how to incorporate aggregate functions to the language. The most usual aggregate operationsare integrated in the language as part of the constraint system, in such a way that the computation of aggregatefunctions is delegated to the constraint solver. This is a non-trivial issue in our context that has been solved by takingadvantage of our stratification and dependency graph notions. With this aim, the dependency graph specification hasbeen enriched by adding dependencies due to aggregate operations.

– We have also improved the constraint systems, which are now able to deal with a limited form of constraint combi-nation. Specifically, the system can deal with more complex constraints because the generic interface of the constraintsolvers can identify constraint belonging to different domains, split them, and send each part of the initial constraintto its corresponding domain solver.

– We have improved the computation of queries, avoiding the recomputation of the whole database from scratch, incases where stratification changes.

– Finally, we have design a better and more complete user interface, also enhancing its performance.

1.3. Organization of the paper

The rest of the paper is organized as follows. In Section 2, HH¬(C) is informally presented as a query language by meansof examples. In Section 3, the syntax of the language is formally introduced and concrete examples, which illustrate itspotential and expressiveness, are shown. In Section 4, the proof-theoretic semantics for HH¬(C) is defined as a foundationfor the scheme. In Section 5, a fixpoint semantics, based on the notion of stratification, is presented and proved to be soundand complete with respect to the proof-theoretic semantics. Section 6 introduces a user-oriented description of the currentsystem and the aggregate functions, and schematizes the computation stages of the system. Section 7 is devoted to thedescription of constraint domains and the implementation of the corresponding solvers. Aggregate functions are introducedas a part of the constraint language. Section 8 is an overview of the fixpoint semantics implementation, making emphasison the implementation of the forcing relation which supports the semantics of HH¬(C). In particular, the difficulties thathave been overcome to implement the forcing of the implication are explained. Section 9 summarizes some conclusions andsketches future works. In Appendix A we include the full proofs for the results belonging to Section 4. Appendix B describesa form of the dependency graph needed to implement the forcing of the implication and constraints including aggregates.Finally, in Appendix C, the implementation of the constraint systems is provided, and Appendix D includes implementationdetails for the forcing relation.

2. HH(C) as a database language: A first glance

This section is devoted to informally present HH(C) as a suitable language for constraint databases. In our system,a database is a logic program: a set of clauses. Facts (ground atoms) define the extensional database, and clauses withbody define the intensional database. The last ones can be seen as the definition of views in relational databases. The eval-uation of a query with respect to a deductive database can be seen as the computation of a goal (query) from a program(database), and the answer is a constraint.

2.1. Relations and predicates

The relational model deals with (finite) relations which can be defined both extensionally, as sets of tuples, and inten-sionally, by means of views. A relation has a name, an arity, and its meaning may be understood as a set of tuples. As well,a predicate has a name, an arity, and its meaning can be understood as a set of constraints over its arguments. Predicatescan also be defined both extensionally, by means of facts, and intensionally, by means of clauses.

Example 2. Fig. 1 defines some relations extensionally (client and mortgageQuote), and intensionally (accounting) both inRA and in HH(C). Extensional relations are defined as tables in the relational model in (a) and as extensional predicatesin HH(C) (facts of (c)). The relation accounting is defined as a view in the relational model in (b), and as an intensionalpredicate in HH(C) (clauses with a non-empty right-hand side of (c)).

In RA the result of computing the view accounting is the relation:

name salary quote

brown 1500 400mcandrew 3000 100

accounting

In HH(C) this query corresponds to the computation of the goal accounting(n, s,q) whose answer is (n ≈ brown ∧ s ≈1500 ∧ q ≈ 400) ∨ (n ≈ mcandrew ∧ s ≈ 3000 ∧ q ≈ 100). This introductory example shows some relations that could also be

144


(a) Relations extensionally defined as relational tables:

name balance salary

smith 2000 1200brown 1000 1500

mcandrew 5300 3000

client

name quote

brown 400mcandrew 100

mortgageQuote

(b) A relation defined as a relational view:

accounting ← πname,salary,quote(σquote�100(client � mortgageQuote))

(c) The above relations defined as an HH(C) program:

client(smith,2000,1200). mortgageQuote(brown,400).

client(brown,1000,1500). mortgageQuote(mcandrew,100).

client(mcandrew,5300,3000).

∀name∀salary∀quote∀balance(accounting(name, salary,quote) ⇐client(name,balance, salary) ∧mortgageQuote(name,quote) ∧quote � 100).

Fig. 1. Relations vs. HH(C) predicates.

computed in Datalog with constraints [27], but it will be extended later in Example 6 with some new relations that exceedthe capabilities of such a system. �2.2. Infinite data as finite representations

One of the advantages of using constraints in the (general) context of LP is that they provide a natural way for dealingwith infinite data collections using finite (intensional) representations. Constraint databases [30] have inherited this feature.We illustrate this point with the following example.

Example 3. Assume the instance HH(R), i.e., the domain of arithmetic constraints over real numbers. We are interested indescribing regions in the plane. A region is a set of points identified by its characteristic function (a Boolean function whichevaluates to true over the points of such a region, and to false over the rest of points of the plane). For example, a rectanglecan be determined by its left-bottom corner (x1, y1) and its right-top corner (x2, y2) and its characteristic function can beexpressed by the next clause:

∀rectangle(x1, y1, x2, y2, x, y) ⇐ x � x1 ∧ x � x2 ∧ y � y1 ∧ y � y2,

where ∀ represents the universal closure of a formula. Analogously, ∃ will represent the existential closure.Notice that a rectangle contains (in general) an infinite set of points and they are finitely represented in an easy way by

means of real constraints. From a database perspective, this is a very interesting feature: databases were conceived to workwith finite pieces of information, but introducing constraints makes it possible to manage (potentially) infinite sets of data.

The goal rectangle(0,0,4,4, x, y) ∧ rectangle(1,1,5,5, x, y) computes the intersection of two rectangles and an answercan be represented by the constraint:

(x � 1) ∧ (x � 4) ∧ (y � 1) ∧ (y � 4).

A circle can be defined by its center and radius, using non-linear constraints now:

∀circle(xc, yc, r, x, y) ⇐ (x − xc)2 + (y − yc)2 � r2.

We can ask, for instance, whether any pair (x, y) such that x2 + y2 = 1 (the circumference centered in the origin andradius 1) is inside the circle with center (0,0) and radius 2 by means of the goal:

∀(x2 + y2 ≈ 1 ⇒ circle(0,0,2, x, y)

).

This goal is not expressible in standard deductive database languages because, in addition to constraints, it involvesuniversal quantifiers and implication. Even Hypothetical Datalog cannot deal with this goal due to the universal quantifiers.These components constitute a big step in expressivity. �

Since HH(C) does not support negation, it is not still complete w.r.t. RA as we show next.

145


• Projection. E = πi1 , . . . ,πik (E1)

∀e(xi1 , . . . , xik ) ⇐ e1(x1, . . . , xn).

• Selection. E = σt1θt2 (E1)

∀e(x1, . . . , xn) ⇐ e1(x1, . . . , xn) ∧ Cθ .

• Cartesian product. E = E1 × E2

∀e(x1, . . . , xn, xn+1, . . . , xm) ⇐ e1(x1, . . . , xn) ∧ e2(xn+1, . . . , xm).

• Set union. E = E1 ∪ E2

∀e(x1, . . . , xn) ⇐ e1(x1, . . . , xn) ∨ e2(x1, . . . , xn).

• Set difference. E = E1 − E2

∀e(x1, . . . , xn) ⇐ e1(x1, . . . , xn) ∧ ¬e2(x1, . . . , xn).

E and Ei (resp.) are relational expressions represented as e and ei (resp.) predicates. Cθ isthe constraint corresponding to the condition t1θt2.

Fig. 2. Relational operators as HH¬(C) programs.

Fig. 3. Regions in the plane.

2.3. Need for negation

What a database user might want is to have the basic relational operations available in this language. In fact, a databaselanguage is complete w.r.t. RA if these operations can be expressed within the language. As it is shown in Fig. 2, HH(C)can express projection, Cartesian product, union, and selection. For the last one, it is required that the constraint systemC incorporates (or can express) the operators θ , in order to build the corresponding constraint Cθ . For instance, σ$i�$ jcorresponds to xi � x j . As we will see in Section 3.1, any constraint system in our scheme satisfies this requirement. Butexpressing set difference needs some kind of negation. So we have added the connective ¬ to HH(C), obtaining a completedatabase language which will be formalized in the next section. There are some other situations, besides relational databaserequirements, in which negation is needed.

Example 4. Returning to Example 3, we define the dashed frame depicted in Fig. 3 by the inner region of a large rectangleand the outer region of a small rectangle with the goal

rectangle(0,0,4,4, x, y) ∧ ¬rectangle(1,1,3,3, x, y),

and an answer can be represented by the constraint:

(y > 3 ∧ y � 4 ∧ x � 0 ∧ x � 4) ∨ (y � 0 ∧ y < 1 ∧ x � 0 ∧ x � 4) ∨(y � 0 ∧ y � 4 ∧ x > 3 ∧ x � 4) ∨ (y � 0 ∧ y � 4 ∧ x � 0 ∧ x < 1).

In this example, we assume that negation can be effectively handled by the constraint solver, an issue addressed later inthis paper. �3. The language HH¬(C)

The formalisms which HH(C) is founded on [31,19] do not support any kind of negation, so the language is not expressiveenough in the field of database systems. We have extended HH(C) including negation to obtain a CDDB language which iscomplete w.r.t. RA. In this section, we make precise the syntax of the formulas of HH(C) extended with negation, denotedby HH¬(C); next we introduce more database examples.

3.1. Syntax

As usual, to build the syntactic objects of the logic, we consider a set of variables and a signature containing:

146


• defined predicate symbols, representing the names of database relations, to build atoms,• non-defined (built-in) predicate symbols, including at least the comparison � and the equality predicate symbol ≈, to

build atomic constraints, and• constant and operator symbols, which depend on the particular constraint system, to build terms.

Since our interest is to represent databases, we only use finite signatures.Well-formed formulas in HH¬(C) can be classified into clauses D (defining database relations) and goals (or queries) G .

They are recursively defined by the following rules:

D ::= A | G ⇒ A | D1 ∧ D2 | ∀xD,

G ::= A | ¬A | C | G1 ∧ G2 | G1 ∨ G2 | D ⇒ G | C ⇒ G | ∃xG | ∀xG.

A represents an atom, i.e., a formula of the form p(t1, . . . , tn), where p is a defined predicate symbol of arity n, and tiare terms; C represents a constraint. The incorporation of negated atoms in goals is the addition to HH(C). Negation is notallowed in the head of a clause, but inside its body.

3.1.1. The constraint system CThe constraints we consider belong to a generic system C = 〈LC, C〉, where LC is the constraint language and C is

a binary entailment relation. Γ C C denotes that the constraint C is inferred in the constraint system C from the set ofconstraints Γ . Some minimal conditions are imposed to C to be a constraint system:

• LC contains at least every first-order formula built up using:– � (true), ⊥ (false),– built-in predicate symbols,– the connectives ∧,¬, and the existential quantifier ∃.

• Regarding to C :– It includes inference logic rules for the considered connectives and quantifiers.– It is compact, i.e., Γ C C implies that there exists a finite set Γ ′ ⊆ Γ , such that Γ ′ C C .– It is closed under substitution, i.e., Γ C C implies Γ σ C Cσ for every substitution σ .

Let us remark that C is required to deal with negation, because the incorporation of the connective ¬ to the languageHH yields to the need for incorporating the negation in the constraint system, which has the responsibility of checking thesatisfiability of answers in the constraint domain.

We say that a constraint C is C-satisfiable if ∅ C ∃C , where ∃C stands for the existential closure of C . C and C ′ areC-equivalent if C C C ′ and C ′ C C .

The constraint systems of the previous examples verify the required minimal conditions aforementioned. Moreover, theyalso include the connective ∨ (as usual), constants to represent numbers and names, arithmetical operators, and morebuilt-in predicates (>,�, . . .).

For instance, for the constraint system R of Real-closed Fields, LR is a first-order language with all classical logicalconnectives including negation, and Γ R C holds when AxR ∪ Γ ≈ C , where AxR is Tarski’s axiomatization of the realnumbers, and ≈ is the entailment relation of classical logic with equality. An example of a concrete constraint is ¬(x ≈ 0.2),also written as x �≈ 0.2 for the sake of simplicity.

It is easy to see that every formula allowed in the selection operation of RA can be expressed with an equivalentconstraint, since for any C , the language LC contains the built-in predicates � and ≈, and the connectives ∧ and ¬.

3.1.2. HH¬(C) programsPrograms, denoted by �, are sets of clauses and represent databases. As usual in Logic Programming, they can still be

viewed as sets of implicative formulas with atomic head, in the way we precise now. The elaboration of a program � is theset elab(�) = ⋃

D∈� elab(D), where elab(D) is defined by:

• elab(A) = {� ⇒ A},• elab(G ⇒ A) = {G ⇒ A},• elab(∀xD) = {∀xD ′ | D ′ ∈ elab(D)},• elab(D1 ∧ D2) = elab(D1) ∪ elab(D2).

So, elaborated clauses are formulas of the form ∀x1 . . .∀xn(G ⇒ A) (or simply ∀x(G ⇒ A)), but notice that clauses insideG are not required to be elaborated. The use of the elaborated form, instead of general HH¬(C) clauses, has some practicalbenefits:

• It permits to specify a database view that defines a predicate (database relation) p by means of a set of elaboratedclauses whose heads are atoms beginning with the predicate symbol p, as it is done in logic programs with Hornclauses.

147


• It permits to define a calculus governing HH¬(C) without rules introducing connectives in the left, providing the uni-formity property of the calculus, which guarantees completeness of goal-oriented search for proofs. Notice that in thecalculus UC¬ (introduced in Section 4) there is only the rule (Clause) to deal with atomic goals, that corresponds to theSLD-Resolution rule of logic programming.

• It facilitates the formalization of the fixpoint operator used to define the fixpoint semantics introduced in Section 5.2.

For convenience, we will also use the common notation ∀x1 . . .∀xn(A ⇐ G) as in previous examples.

3.2. Examples of HH¬(C)

Once we have formalized the syntax of our language, we introduce more examples showing the advantages of ourproposal w.r.t. other common database languages. As an important benefit of our approach, we stress the ability to formulatehypothetical and universally quantified queries. In addition, variables can be explicitly and existentially quantified in queriesavoiding the computation of an explicit answer for these variables.

In these examples, the instance HH¬(FR) is used, where FR is a hybrid constraint system which combines constraintsover finite and real numbers domains, ensuring domain independence. Instantiating the scheme with mixed constraintsystems will be very useful in the context of databases. In [18], a hybrid constraint system subsuming FR is presented.

Example 5. Consider the following travel database. The predicate flight(Origin,Destination,Time) represents an extensionaldatabase relation of direct flights from Origin to Destination and duration Time:

flight(mad,par,1.5),

flight(par,ny,10),

flight(london,ny,9).

In turn, travel(Origin,Destination,Time) represents an intensional database relation, expressing that it is possible to travelfrom Origin to Destination in a time greater or equal than Time, possibly concatenating some flights:

∀travel(x, y, t) ⇐ flight(x, y, w) ∧ t � w,

∀travel(x, y, t) ⇐ flight(x, z, t1) ∧ travel(z, y, t2) ∧ t � t1 + t2.

The next goal asks for the duration of a flight from Madrid to London in order to be able to travel from Madrid to NewYork in 11 hours at most:

flight(mad, london, t) ⇒ travel(mad,ny,11).

The answer constraint of this query will be 11 � t + 9 which is FR-equivalent to the final answer t � 2.Another hypothetical query to the previous database is the question that if it is possible to travel from Madrid to some

place in any time greater than 1.5. The goal ∀t(t > 1.5 ⇒ ∃y travel(mad, y, t)) includes also universal quantification, and thecorresponding answer is �.

We can also compare HH¬(C) to relational calculus (whose underlying logic is richer than the used on implementedCDDB languages). For instance, the query ¬(∃t flight(x, y, t)) ∧ x �≈ y, or its equivalent (∀t¬flight(x, y, t)) ∧ x �≈ y, whichrepresents the cities in the database that have no direct flights between them, is not safe in the domain relational calculus,because it contains a negated formula whose free variables are not limited. This problem is avoided in our system becauseformulas are interpreted in the context of the constraint domain of the particular instance and no test for this kind of safetyis needed. In fact, (∀t¬flight(x, y, t)) ∧ x �≈ y represents a valid HH¬(F R) query, which has as answer constraint:

(x �≈ mad ∨ y �≈ par) ∧ (x �≈ par ∨ y �≈ ny) ∧ (x �≈ lon ∨ y �≈ ny)

in the domain of the cities registered in the current database. However, the query is not allowed in Datalog with constraintsdue, in this case, to the quantifier occurrence.

Assume now a more realistic situation in which flight delays may happen, which is represented by the following defini-tion:

∀deltravel(x, y, t) ⇐ flight(x, y, t1) ∧ delay(x, y, t2) ∧ t � t1 + t2,

∀deltravel(x, y, t) ⇐ flight(x, z, t1) ∧ delay(x, z, t2) ∧ deltravel(z, t, t3) ∧ t � t1 + t2 + t3.

Tuples of delay may be in the extensional database or may be assumed when the query is formulated. For instance, thequery

148


∀x(delay(par, x,1) ∧ delay(mad,par,0.5)

) ⇒ deltravel(mad,ny, t)

represents the query: What is the time needed to travel from Madrid to New York assuming that for any destination there isa delay of one hour from Paris, and the flight from Madrid to Paris is half an hour delayed? According to its proof-theoreticinterpretation, the clause ∀x(delay(par, x,1) ∧ delay(mad,par,0.5)) will be added locally to the database to solve the goaldeltravel(mad,ny, t), and it will be discarded after the computation as they are hypothetical assumptions. Since flights mayor may not be delayed, a more general view can be defined in order to know the expected time of a trip:

∀trip(x, y, t) ⇐ nondeltravel(x, y, t) ∨ deltravel(x, y.t),

∀nondeltravel(x, y, t) ⇐ travel(x, y, t) ∧ ¬delayed(x, y),

∀x∀y delayed(x, y) ⇐ ∃t deltravel(x, y, t).

Notice that the last formula is equivalent to ∀delayed(x, y) ⇐ deltravel(x, y, t). Since explicit existential quantifiers are al-lowed in HH¬(C), they can also be used to improve readability and facilitate the specification of some predicates. �Example 6. In this example we extend the database for a bank introduced in Example 2. The extensional databaseis given by facts for the relations client(Name,Balance, Salary), mortgageQuote(Name,Quote), pastDue(Name,Amount) andbranch(Office,Name) as follows:

client(smith,2000,1200). mortgageQuote(brown,400).

client(brown,1000,1500). mortgageQuote(mcandrew,100).

client(mcandrew,5300,3000).

pastDue(smith,3000). branch(lon, smith).

pastDue(mcandrew,100). branch(mad,brown).

branch(par,mcandrew).

As an additional restriction we assume that each client has, at most, one mortgage quote. Next, we introduce someviews defining the intensional part of the database. The first one captures the clients that have a mortgage: a client has amortgage if there exists a mortgage quote associated to him:

∀hasMortgage(x) ⇐ mortgageQuote(x, y).

A debtor is a client who has a past due with an amount greater than his balance:

∀debtor(x) ⇐ client(x, y, z) ∧ pastDue(x, w) ∧ w > y.

The applicable interest rate to a client is specified by the next relation:

∀interestRate(x,2) ⇐ client(x, y, z) ∧ y < 1200,

∀interestRate(x,5) ⇐ client(x, y, z) ∧ y � 1200.

The next relation newMortgage(Name,Quote) specifies that a non-debtor client Name can be given a new mortgage withQuote in two situations. First, if he has no mortgage, a mortgage quote smaller than 40% of his salary can be given. And,second, if he has a mortgage quote already, then the sum of this quote and the new one has to be smaller than thatpercentage:

∀newMortgage(x, w) ⇐ client(x, y, z) ∧ ¬debtor(x) ∧ ¬hasMortgage(x) ∧ w � 0.4 ∗ z,

∀newMortgage(x, w) ⇐ client(x, y, z) ∧ ¬debtor(x) ∧ mortgageQuote(x, w ′) ∧ w + w ′ � 0.4 ∗ z,

∀gotMortgage(x) ⇐ newMortgage(x, w).

If the client satisfies the requirements to be given a new mortgage, then it is possible to apply for a personal credit,whose amount is smaller than 6000. Otherwise, if such a client does not satisfy that requirements, the amount must bebetween 6000 and 20,000. The relation personalCredit(Name,Amount) formalizes these conditions:

∀personalCredit(x, y) ⇐ (gotMortgage(x) ∧ y < 6000

) ∨(¬gotMortgage(x) ∧ y � 6000 ∧ y < 20,000

).

149


Moreover, it is possible to define a view with the quote and the salary of clients whose mortgage quote is greater than100 with the following relation accounting(Name, Salary,Quote) which corresponds to the predicate accounting of Example 2:

∀accounting(x, z, w) ⇐ client(x, y, z) ∧ mortgageQuote(x, w) ∧ w � 100.

The previous predicates define the database that we are going to use for illustrating some queries. As a first example,we can query whether every client is a debtor:

∀xdebtor(x),

for which the answer is ⊥.For knowing whether there are debtors with a past due amount greater than 1000, the following query can be formu-

lated:

∃x∃y debtor(x) ∧ pastDue(x, y) ∧ y > 1000,

and the answer is �. Note that we are using quantifiers for variables x and y, meaning that there are no explicit conditionsover them. Otherwise, the answer will be a constraint over such variables.

The next query corresponds to the question: If for a non-specific client we assume that has a balance greater than 2000,what would the interest rate be?

∀x∃y∃z(client(x, y, z) ⇒ (

y > 2000 ⇒ interestRate(x, w)))

.

Here we are using nested implication to formulate a hypothetical query. The answer is the constraint w ≈ 5.The next query involves negation and can be read as: which clients can get a mortgage quote of 400 but not a personal

credit?

newMortgage(x,400) ∧ ¬personalCredit(x, y),

and the answer is the constraint x ≈ mcandrew ∧ y � 6000 ∧ y < 20,000, which means that it is possible to give a newmortgage to the client McAndrew, but it is not allowed to give him a personal credit of an amount between 6000 and20,000. �4. Proof-theoretic semantics

Several kinds of semantics have been defined for HH(C) without negation, including proof-theoretic, operational [31] andfixpoint semantics [19], as well as for its higher-order version [33]. The proof-theoretic and fixpoint approaches have beenadapted in order to formalize the extension HH¬(C). In addition we have proven that they are equivalent.

The simplest way for explaining the meaning of programs and goals in the present framework is by using a proof-theoretic semantics. Queries formulated to a database are interpreted by means of the inference system which governs theunderlying logic. This proof-system, called UC¬ (Uniform calculus handling Constraints and negation) is a sequent calculusthat combines traditional inference rules with the entailment relation C of the generic constraint system C .

Sequents have the form �;Γ G , where programs � and sets of constraints Γ are on the left, and goals on the right.The notation �;Γ UC¬ G means that the sequent �;Γ G has a proof using the rules of UC¬ . A proof of a sequent is afinite tree whose root is the sequent to be proved and the nodes are sequents. The rules regulate relationship between childnodes and parent nodes and the leaves are nodes of the form Γ C C . If �; C UC¬ G , then C is called an answer constraintto the query G in the database �, that can be understood as the meaning of the query G formulated to the database �.The idea is that G is true for � if the constraint C is satisfied. UC¬ carries out only uniform proofs in the sense defined byMiller et. al. [35], i.e., goal-oriented proofs. The rules are applied backwards and, at any step, only one rule of the calculuscan be applied, that is the corresponding to the structure of the goal. Notice that we are assuming that any constraint willbe treated as a whole, and the only applicable rule in this case is (C). (Clause) is used for atoms beginning with definedpredicate symbols, the rest of the rules correspond to the outermost connective/quantifier of the goal (non-constraint) to beproved.

This proof system is an extension of the calculus UC , introduced in [31] which provided proof-theoretic semantics forHH(C). The incorporation of negation to the language makes it necessary to extend the notion of derivability, because thereis no rule for this connective in UC . Therefore, UC¬ incorporates a new rule (¬) to formalize derivability of negated atoms.The rules defining the extended calculus appear in Fig. 4.

Next, we explain the rules (∃), (Clause) and (¬), the others correspond to widespread intuitionistic rules introducingconnectives on the right of the sequent (see, e.g., [35]), except (C) which deals with goals that are pure constraints.

(∃) captures the fact that the witness in the proof of an existentially quantified formula can be represented by a con-straint which can be more general than an equality x ≈ t simulating a substitution (e.g., (x ∗ x ≈ 2) represents the witness√

2, which cannot be written as a term).(Clause) represents backchaining and allows one to prove an atomic goal A ≡ p(t1, . . . , tn), where p is a defined predicate

symbol, using a program clause whose head A′ ≡ p(t′1, . . . , t′

n) is not required to unify with A, but rather solving a new

150


Γ C C

�;Γ C(C)

�;Γ ∃x1 . . .∃xn((A′ ≈ A) ∧ G)

�;Γ A(Clause)(∗),where

∀x1 . . .∀xn(G ⇒ A′) is a variant of a formula of elab(�)

�;Γ Gi

�;Γ G1 ∨ G2(∨) (i = 1,2)

�;Γ G1 �;Γ G2

�;Γ G1 ∧ G2(∧)

�, D;Γ G

�;Γ D ⇒ G(⇒)

�;Γ, C G

�;Γ C ⇒ G(⇒C )

�;Γ, C G[y/x] Γ C ∃yC

�;Γ ∃xG(∃)(∗∗)

�;Γ G[y/x]�;Γ ∀xG

(∀)(∗∗)

Γ C ¬C for every �; C A

�;Γ ¬A(¬)

(∗) x1, . . . , xn fresh for A(∗∗) y fresh for the formulas in the conclusion of the rule

Fig. 4. Rules of the sequent calculus UC¬ .

existentially quantified goal that, by applying the (∃) rule, will result in a search for a constraint that implies the equalityA′ ≈ A (this stands for t′

1 ≈ t1 ∧ · · · ∧ t′n ≈ tn).

The idea of interpreting the query ¬A from a database �, by means of an answer constraint C is that, whenever C ′ is apossible answer to the query A from �, then C C ¬C ′ . This is formalized with (¬). We say that (¬) is a metarule since itspremise considers any derivation �; C A of the atom A. In practice, there is a derivation of ¬A when the set of answerconstraints of A from � is finite.

Next we show two examples of proof derivation trees.

Example 7. Consider a fragment of the travel database of Example 5.

Let � = {flight(mad,par,1.5), flight(par,ny,10), flight(lon,ny,9),∀((

flight(x, y, w) ∧ t � w) ⇒ travel(x, y, t)

),

∀((flight(x, z, t1) ∧ travel(z, y, t2) ∧ t � t1 + t2

) ⇒ travel(x, y, t))}

and G ≡ ∀t(t > 1.5 ⇒ ∃y travel(mad, y, t)).The following is a derivation of the sequent �; {�} G . We use the abbreviations Γ = {�, t > 1.5, y ≈ par} and

Γ ′ = Γ ∪ {x′ ≈ mad, y′ ≈ par, t′ ≈ t, w ′ ≈ 1.5}. (∃4) denotes 4 successive applications of the rule (∃), the four correspondingside conditions referring to FR are put together, and abbreviated as Γ FR ∃x′(x′ ≈ mad) . . . Γ ′ FR ∃w ′(w ′ ≈ 1.5):

�′ FR x′ ≈ mad ∧ · · · ∧ t′ � w ′

�;�′ x′ ≈ mad ∧ · · · ∧ t′ � w ′ (C)

�′ FR x′ ≈ mad ∧ y′ ≈ par ∧ w ′ ≈ 1.5

�;�′ x′ ≈ mad ∧ y′ ≈ par ∧ w ′ ≈ 1.5(C)

�;�′ flight(x′, y′, w ′)(Clause)

�;�′ (x′ ≈ mad ∧ · · · ∧ t′ � w ′) ∧ flight(x′, y′, w ′)

� FR ∃x′(x′ ≈ mad) . . .�′ FR ∃w ′(w ′ ≈ 1.5)

(∧)

�;� ∃x′∃y′∃t′∃w ′(x′ ≈ mad ∧ y′ ≈ y∧t ≈ t′ ∧ t′ � w ′ ∧ flight(x′, y′, w ′))

(∃4)

�; {t > 1.5, y ≈ par} travel(mad, y, t)(Clause) {t > 1.5} FR ∃y(y ≈ par)

�; {t > 1.5} ∃y travel(mad, y, t)(∃)

�; {�} t > 1.5 ⇒ ∃y travel(mad, y, t)(⇒C )

�; {�} ∀t (t > 1.5 ⇒ ∃y travel(mad, y, t))(∀) �

Example 8. Recall Examples 3 and 4. Let � be the set:{∀(

x � x1 ∧ x � x2 ∧ y � y1 ∧ y � y2 ⇒ rectangle(x1, y1, x2, y2, x, y))}

,

and G ≡ rectangle(0,0,4,4, x, y),¬rectangle(1,1,3,3, x, y). The answer constraint:

C ≡ ((y > 3) ∧ (y � 4) ∧ (x � 0) ∧ (x � 4)

) ∨((y � 0) ∧ (y < 1) ∧ (x � 0) ∧ (x � 4)

) ∨((y � 0) ∧ (y � 4) ∧ (x > 3) ∧ (x � 4)

) ∨((y � 0) ∧ (y � 4) ∧ (x � 0) ∧ (x < 1)

)

151


can be obtained by the following deduction:

C R ∃a1∃a2∃b1∃b2∃x1∃y1(a1 ≈ 0 ∧ x1 ≈ x ∧ · · ·)�; C ∃a1∃a2∃b1∃b2∃x1∃y1(a1 ≈ 0 ∧ x1 ≈ x ∧ x1 � a1∧

a2 ≈ 0 ∧ y1 ≈ y ∧ x1 � b1 ∧ b1 ≈ 4 ∧ y1 � a2 ∧ b2 ≈ 4 ∧ y1 � b2)

(C)

�; C rectangle(0,0,4,4, x, y)(Clause)

D�; C rectangle(0,0,4,4, x, y) ∧ ¬rectangle(1,1,3,3, x, y)

(∧)

where D is a deduction for �; C ¬rectangle(1,1,3,3, x, y) whose last steps have the form:

C R ¬( x � 1 ∧ y � 1∧x � 3 ∧ y � 3)

〈rest of derivation〉�; x � 1 ∧ y � 1∧

x � 3 ∧ y � 3 rectangle(1,1,3,3, x, y)

�; C ¬rectangle(1,1,3,3, x, y)(¬) �

In order to define a sound and complete goal solving procedure, some finiteness conditions must be imposed to makeviable the metarule (¬). That is, it has to be guaranteed that the set of answer constraints for an atom (that occurs negatedin a goal) is finite, and that this set can be computed in a finite number of steps. As usual in the constraint database field,finiteness of the set of computed answers can be ensured by imposing different safety conditions to the constraint systems[42]. A technique that guarantees termination, provided finiteness of the constraint answers sets, is stratification.

We have adopted it because it is easy to combine with our notion of constraint system, giving a clear operationalsemantics to the scheme HH¬(C) by providing meaning to the whole database (in the presence of safety conditions).

The stratified negation that we propose is widely explained in the next section, where a stratified fixpoint semantics ispresented as the basis of an implementation of HH¬(C).

5. Fixpoint semantics

We have extended and adapted the semantics presented in [19] in order to interpret full HH¬(C) using a stratificationtechnique. The semantics defined was based on a forcing relation among programs, sets of constraints and goals that stateswhether an interpretation makes true a goal G in the context 〈�,Γ 〉 of a program � and a set of constraints Γ . Inter-pretations were defined as functions able to give meaning to every pair 〈�,Γ 〉 as sets of atoms. The interpretation had todepend on this context because, when computing implicative goals, � or Γ may be augmented. Here, interpretations aredefined as functions able to give meaning to a database as a set of pairs (Atom,Constraint), and are classified on strata.Following [51], the stratification of a database is based on the definition of a dependency graph. Next we introduce thesenotions for our language.

5.1. Stratification and dependency graph

Given a set of clauses and goals Φ , the corresponding dependency graph DGΦ is a directed graph whose nodes are thedefined predicate symbols in Φ , and the edges are determined by the implication symbols of the formulas.

Here, we adapt those notions as a useful starting point of a fixpoint semantics for our language. But now, the constructionof dependency graphs must consider the fact that implications may occur not only between the head and the body of aclause, but also inside the goals, and therefore in any clause body. This feature will be taken into account in the followingway: An implication of the form F1 ⇒ F2 produces edges in the graph from the defined predicate symbols inside F1 to everydefined predicate symbol inside F2. An edge will be negatively labeled when the corresponding atom occurs negated on theleft of the implication. Since constraints do not include defined predicate symbols, they cannot produce dependencies.

Example 9. Let � be the bank database of Example 6. Fig. 5 shows the dependency graph for � (the predicate branchcorresponds to an isolated node which is not represented in the figure). Negative edges are labeled with ¬. �Definition 1. Given a set of formulas Φ , its corresponding dependency graph DGΦ , and two predicates p and q, we say:

• q depends on p if there is a path from p to q in DGΦ .• q negatively depends on p if there is a path from p to q in DGΦ with at least one negatively labeled edge.

Definition 2. Let Φ be a set of formulas and P = {p1, . . . , pn} the set of defined predicate symbols of Φ . A stratification ofΦ is any mapping s : P → {1, . . . ,n} such that s(p) � s(q) if q depends on p, and s(p) < s(q) if q negatively depends on p.Φ is stratifiable if there is a stratification for it.

152


Fig. 5. Dependency graph for Example 6.

Example 10. A stratification for the database � of Example 5 will collect all the predicates in the stratum 1 exceptnondeltravel and trip, which will be in stratum 2. Intuitively, this means that for evaluating nondeltravel, the rest of pred-icates (except trip) should be evaluated before (in particular, delayed). Formulating the query: G ≡ ∃t deltravel(x, y, t) ⇒delayed(x, y), the augmented set � ∪ {G} remains stratifiable, but if G ′ ≡ trip(mad, lon, T ) ⇒ delay(mad,ny, t) is formulated,the extended set � ∪ {G ′} results non-stratifiable. It is because G ′ adds the dependency trip → delay, and then, any stratifi-cation s must satisfy s(trip) � s(delay) � s(delayed) < s(nondeltravel) � s(trip), which is impossible. �

From now on, we assume the existence of a fixed stratification s for the considered sets � ∪ {G}. It is useful to have anotion of the stratum of an atom (i.e., the stratum of its predicate symbol), but also to extend this notion to any formula orset of formulas.

Definition 3. Let F be a goal or a clause. The stratum of a formula F , denoted by str(F ), is recursively defined as:

str(C) = 1

str(¬A) = 1 + str(A) str(

p(t1, . . . , tn)) = s(p)

str(F1�F2) = max(str(F1), str(F2)

), where � ∈ {∧,∨,⇒}

str(Q xF ) = str(F ), where Q ∈ {∃,∀}The stratum of a set of formulas Φ is str(Φ) = max{str(F ) | F ∈ Φ}.

5.2. Stratified interpretations and forcing relation

Let W be the set of stratifiable databases with respect to the same fixed stratification s, which can be built from aparticular signature. Interpretations and the fixpoint operator will be applied to the databases of W and they operate overstrata.

Let At be the set of open atoms, i.e., defined predicate symbols of the signature applied to variables (up to variablerenaming); and let SLC be the set of C-satisfiable constraints modulo C-equivalence. The set At × SLC is finite becausewe consider finite signatures and compact constraint systems. As we define below, an interpretation over a stratum i of adatabase will be a set of pairs (A, [C]) ∈ At × SLC , where str(A) � i and [C] represents the set of constraints C-equivalentto C .

Definition 4. Let i � 1. An interpretation I over the stratum i is a function I :W → P(At × SLC), such that for every � ∈ W ,if (A, [C]) ∈ I(�) then str(A) � j. We denote by Ii the set of interpretations over i.

In order to simplify the notation, we write:

153


• (A, C) ∈ At × SLC , instead of (A, [C]), assuming that C is any representative of its equivalence class [C].• [I(�)]i to represent the set {(A, C) ∈ I(�) | str(A) = i}.

Notice that if str(�) = k, then {[I(�)]i | 1 � i � k} is a partition of I(�). For every i � 1, an order on Ii can be defined asfollows.

Definition 5. Let i � 1 and I1, I2 ∈ Ii . I1 is less than or equal to I2 at stratum i, denoted by I1 �i I2, if for each � ∈ W thefollowing conditions are satisfied:

• [I1(�)] j = [I2(�)] j , for every 1 � j < i.• [I1(�)]i ⊆ [I2(�)]i .

It is straightforward to check that for any i � 1, (Ii,�i) is a poset. The idea behind this definition is that when an inter-pretation over a stratum i increases, the information of the smaller strata remains invariable. In such a way, if str(¬A) = i,since str(A) = i − 1, the truth value of ¬A at the stratum i will remain invariable and monotonicity of the truth relationcan be guaranteed even for negative atoms, as we will show. In addition, the following result holds.

Lemma 1. For any i � 1, any chain of interpretations of (Ii,�i), {In}n�0 , such that I0 �i I1 �i I2 �i · · · , has a least upper bound⊔n�0 In, which can be defined as: (

⊔n�0 In)(�) = ⋃{In(�) | n � 0}, for any � ∈ W .

Proof. For any � ∈ W we define the function I as I(�) = ⋃{In(�) | n � 0}, or simply⋃

n�0 In(�). It must be checked that

I is the least upper bound of the chain {In}n�0.

– I is an upper bound of the chain. Let k � 0. For any �, we have that:[Ik(�)] j = ⋃

n�0[In(�)] j = [ I(�)] j , for 1 � j < i, and

[Ik(�)]i ⊆ ⋃n�0[In(�)]i = [ I(�)]i .

– Now, we prove that it is the least upper bound of the chain.Let us assume that I ′ is an upper bound of {In}n�0. For each k � 0, Ik �i I ′ implies that for any �, [Ik(�)] j = [I ′(�)] j ,for 1 � j < i, and [Ik(�)]i ⊆ [I ′(�)]i . Therefore,

⋃n�0[In(�)] j = [I ′(�)] j , for 1 � j < i, and

⋃n�0[In(�)]i ⊆ [I ′(�)]i , for

any � ∈ W . Thus, I �i I ′ . �The following definition formalizes the notion of a query G being “true” for an interpretation I in the context of a

database �, if the constraint C is satisfied.As already said, we assume that s is not only a stratification for �, but also for � ∪ {G}.

Definition 6. Let i � 1. The forcing relation �� between pairs I,� and pairs (G, C) (where I ∈ Ii , str(G) � i, and C isC-satisfiable) is recursively defined by the rules below. When I,� �� (G, C), it is said that (G, C) is forced by I , �.

• I,� �� (C ′, C) ⇐⇒ C C C ′ .• I,� �� (A, C) ⇐⇒ (A, C) ∈ I(�).• I,� �� (¬A, C) ⇐⇒ for every (A, C ′) ∈ I(�), C C ¬C ′ holds, and if there is no pair of the form (A, C ′) in I(�), then

C ≡ �.• I,� �� (G1 ∧ G2, C) ⇐⇒ for each i ∈ {1,2}, I,� �� (Gi, C).• I,� �� (G1 ∨ G2, C) ⇐⇒ for some i ∈ {1,2} I,� �� (Gi, C).• I,� �� (D ⇒ G, C) ⇐⇒ I,� ∪ {D} �� (G, C).• I,� �� (C ′ ⇒ G, C) ⇐⇒ I,� �� (G, C ∧ C ′).• I,� �� (∃xG, C) ⇐⇒ there is C ′ such that I,� �� (G[y/x], C ′), where y does not occur free in �, ∃xG , C , and C C

∃yC ′ .• I,� �� (∀xG, C) ⇐⇒ I,� �� (G[y/x], C) where y does not occur free in �, ∀xG , C .

Those rules are well-defined because if s is a stratification for � ∪ {G}, with str(G) � i, and G ′ is a subformula of G ,then s is also a stratification for � ∪ {G ′}, and str(G ′) � i. Notice that, for the particular case G ≡ D ⇒ G ′ , s will be also astratification for � ∪ {D, G ′}.

From now on, when we write I,� �� (G, C) we will assume that if I ∈ Ii , then str(G) � i and C is C-satisfiable. Therelation �� is not defined otherwise. Formally, �� should be denoted ��i , because there is a forcing relation for each Ii .We avoid the subindex in order to simplify the notation.

The following lemma establishes the monotonicity of the forcing relation.

154


Lemma 2. Let i � 1 and I1, I2 ∈ Ii such that I1 �i I2 . Then, for any � ∈ W , and (G, C) ∈ G × SLC , if I1,� �� (G, C), thenI2,� �� (G, C).

Proof. The proof is inductive on the structure of G . The full proof is in Appendix A. Here only a few significative cases arepresented.

(¬A) If I1,� �� (¬A, C), then C C ¬C ′ for every C ′ such that (A, C ′) ∈ I1(�), or there is no such C ′ and C ≡ �. Sincestr(¬A) � i, obviously str(A) = j, for some j < i. Then [I2(�)] j = [I1(�)] j , because I1 �i I2, and I1,� �� (¬A, C)

is equivalent to I2,� �� (¬A, C).(∀xG ′) I1,� �� (∀xG ′, C) ⇐⇒ � �� (G ′[y/x], C), where y does not occur free in �, ∀xG ′ , C . By induction hypothesis

I2,� �� (G ′[y/x], C), therefore I2,� �� (∀xG ′, C). �Lemma 3. Let i � 1 and let {In}n�0 be a denumerable family of interpretations over the stratum i, such that I0 �i I1 �i I2 �i · · · .Then, for any �, G and C ,

⊔n�0

In,� �� (G, C) ⇐⇒ there exists k � 0 such that Ik,� �� (G, C).

Proof. In order to simplify the notation we write I for⊔

n�0 In . The implication from right to left is a consequence of

Lemma 2, since Ik �i I holds for any k. The converse is proved using the result of Lemma 1 ( I(�) = ⋃n�0 In(�)), by

induction on the structure of G . As before, we present only some cases, and the others appear in Appendix A.

(¬A) I,� �� (¬A, C) ⇐⇒ for every C ′ such that I,� �� (A, C ′), C C ¬C ′ , or there is not such C ′ . We are assum-ing that str(¬A) � i so str(A) < i. I0 �i I1 �i I2 �i · · · implies that [I0(�)] j = [I1(�)] j = · · · = [⋃n�0 In(�)] j =[⋃n�0 In(�)] j . So for any k � 1, Ik,� �� (¬A, C).

(D ⇒ G ′) I,� �� (D ⇒ G ′, C) ⇐⇒ I,� ∪ {D} �� (G ′, C) ⇒ there is k � 0 such that Ik,� ∪ {D} �� (G ′, C), by inductionhypothesis ⇒ there is k � 0 such that Ik,� �� (D ⇒ G ′, C). �

Next, a continuous operator for every stratum transforming interpretations is defined. Its least fixpoint supplies theexpected version of truth at each stratum.

Definition 7. Let i � 1 represent a stratum. The operator Ti :Ii → Ii transforms interpretations over i as follows. For anyI ∈ Ii , � ∈ W , and (A, C) ∈ At × SLC , (A, C) ∈ (Ti(I))(�) when:

• (A, C) ∈ [I(�)] j for some j < i or• str(A) = i and there is a variant ∀x(G ⇒ A′) of a clause in elab(�), such that the variables x do not occur free in A, and

I,� �� (∃x(A ≈ A′ ∧ G), C).

The crucial aspect of Ti is: For a database �, Ti incorporates information obtained exclusively from the clauses of �,whose heads are atoms of the stratum i, and the information of smaller strata remains invariable. Notice that if str(A) = i,then str(∃x(A ≈ A′ ∧ G)) � i and Ti is well-defined.

In order to establish the existence of a fixpoint of Ti , it will be proved to be monotonous and continuous.

Lemma 4 (Monotonicity of Ti ). Let i � 1 and I1, I2 ∈ Ii such that I1 �i I2 . Then, Ti(I1) �i T i(I2).

Proof. Let us consider any � and (A, C) ∈ (Ti(I1))(�). This implies that str(A) � i. If str(A) = j < i, then (A, C) ∈ [I1(�)] j =[I2(�)] j , because I1 �i I2 and j < i. Hence (A, C) ∈ (Ti(I2))(�), by definition of Ti . If str(A) = i, then there is a variant∀x(G ⇒ A′) of a clause of �, such that the variables x do not occur free in A, and I1,� �� (∃x(A ≈ A′ ∧ G), C). UsingLemma 2 and the fact that I1 �i I2, we obtain I2,� �� (∃x(A ≈ A′ ∧ G), C), which implies (A, C) ∈ Ti(I2)(�), by definitionof Ti . �Lemma 5 (Continuity of Ti ). Let i � 1 and {In}n�0 be a denumerable family of interpretations over i, such that I0 �i I1 �i I2 �i · · · .Then Ti(

⊔n�0 In) = ⊔

n�0 Ti(In).

Proof. The inclusion ⊇ is a consequence of the monotonicity of Ti . Let us prove the inclusion ⊆. Consider any �

and (A, C) ∈ (Ti(⊔

n�0 In))(�). Then str(A) � i. If str(A) = j < i, (A, C) ∈ [(Ti(⊔

n�0 In))(�)] j = [I0(�)] j , then (A, C) ∈(Ti(I0))(�) ⊆ ⋃

n�0(Ti(In))(�) = (⊔

n�0 Ti(In))(�). If str(A) = i, there is a variant ∀x(G ⇒ A′) of a clause of �, suchthat the variables x do not occur free in A, and

⊔n�0 In,� �� (∃x (A ≈ A′ ∧ G), C). Thanks to Lemma 3, there exists

155


Stratum Iteration Considered clause Deduced pair

p(a) (p(x), x ≈ a)⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭

fix1

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

fix2

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

fix3

T1(∅) p(b) (p(x), x ≈ b)

1 ∀x(p(x) ⇒ t(x)) None (∗)

p(a) None (∗∗)

T1(T1(∅)) p(b) None (∗∗)

∀x(p(x) ⇒ t(x)) (t(x), x ≈ a ∨ x ≈ b)

2 T2(fix1)∀x(¬p(x) ⇒ q(x)) (q(x), x �≈ a ∧ x �≈ b)

∀x((p(x) ⇒ q(x)) ⇒ r(x)) None (∗)

3 T3(fix2) ∀x(¬q(x) ⇒ u(x)) (u(x), x ≈ a ∨ x ≈ b)

Fig. 6. Stratified fixpoint of the elaborated database of Example 11.

k � 0, such that Ik,� �� (∃x(A ≈ A′ ∧ G), C), and therefore (A, C) ∈ (Ti(Ik))(�). As a consequence, (Ti(⊔

n�0 In))(�) ⊆⋃n�0(Ti(In))(�) = (

⊔n�0 Ti(In))(�). �

Proposition 1. The operator T1 has a least fixpoint, which is⊔

n�0 T n1(I⊥), where the interpretation I⊥ represents the constant

function ∅.

Proof. By the Knaster–Tarski fixpoint theorem [49], using Lemma 5. �Let fix1 denote

⊔n�0 T n

1(I⊥), i.e., the least fixpoint at stratum 1.Consider now the following sequence {T n

2(fix1)}n�0 of interpretations in (I2,�2). Using the properties of Ti , it is easy toprove by induction on n � 0 that this sequence is a chain:

fix1 �2 T2(fix1) �2 T2(T2(fix1)

) �2 · · · �2 T n2(fix1) �2 · · · .

As before, in accordance to Lemmas 1 and 5, {T n2(fix1)}n�0 has a least upper bound,

⊔n�0 T n

2(fix1), in (I2,�2) that is afixpoint of T2, denoted by fix2. Proceeding successively in the same way, a chain:

fixi−1 �i T i(fixi−1) �i T i(Ti(fixi−1)

) �i · · · �i T ni (fixi−1) �i · · ·

can be defined for any stratum i > 1, and a fixpoint of it

fixi =⊔n�0

T ni (fixi−1)

can be found. In particular, if str(�) = k, we simplify fixk writing fix. Then, fix(�) represents the pairs (A, C) such that Acan be deduced from � if C is satisfied. Notice that fix(�) is computed by saturating strata sequentially from fix1(�) up tofixk(�), using for every i only the clauses of the stratum i.

Example 11. Given the finite domain [a,b, c], let us consider the elaborated database:

� = {p(a), p(b),∀x

(p(x) ⇒ t(x)

),∀x

(¬p(x) ⇒ q(x)),∀x

((p(x) ⇒ q(x)

) ⇒ r(x)),∀x

(¬q(x) ⇒ u(x))}

.

In this database, p and t belong to stratum 1, q and r belong to stratum 2 and u belongs to stratum 3. In Fig. 6we summarize the pairs that compose the successive iterations of the corresponding fixpoint for each stratum. Note thatthere are two situations in which the considered clause does not lead to new pairs in the current interpretation, if theyare already in it (∗∗), and if it is not possible to find any constraint allowing to force the body of the clause (∗). Theforcing relation is non-deterministic, therefore the constraint we show for each pair corresponds to a representative of itsequivalence class. �5.3. Soundness and completeness

The fixpoint semantics defined in [19] for HH(C) was proven to be sound and complete with respect to the calculusUC . Now, the coexistence of constraints, negation and implication, as well as the use of HH¬(C) as a database systemhave made necessary to extend UC to UC¬ and to define the concept of stratified interpretation composed of pairs (Atom,Constraint), instead of simply atoms as it was done in [19]. The new fixpoint semantics combines stratification techniqueswith the forcing relation. In this section we prove soundness and completeness of the new fixpoint semantics for HH¬(C)with respect to the extended calculus UC¬ . This means that the forcing relation, considering the least fixpoint at the laststratum of a database and a query, coincides with derivability in UC¬ . More precisely, if str(G) = i, then (G, C) is forced byfixi in the context of � if and only if C is an answer constraint of G from �. Although without negation, any database �

156


and query G have a stratification with only one stratum, and then soundness and completeness are similar to those resultsfor HH(C), the general case is not trivial. For this reason, we present the proof of the soundness and completeness in ratherdetail.

The definitions below introduce a measure of complexity which will be used to perform induction in the proof ofsoundness. Let i � 1 and Si = {〈�, G, C〉 ∈ W × G × SLC | fixi,� �� (G, C)}. Given any 〈�, G, C〉 ∈ Si , Lemma 3 guaranteesthat the set S = {k � 0 | T k

i (fixi−1),� �� (G, C)}1 is non-empty. Therefore, it is possible to define ord(〈�, G, C〉) = minS .Let us consider the partially ordered set (Si,<i), where <i is defined as follows. Given any 〈�, G, C〉, 〈�′, G ′, C ′〉 ∈ Si ,〈�, G, C〉 <i 〈�′, G ′, C ′〉 if:

• ord(〈�, G, C〉) < ord(〈�′, G ′, C ′〉), or• ord(〈�, G, C〉) = ord(〈�′, G ′, C ′〉) and G is a renaming of a strict subformula of G ′ .

Such partial order is well-founded.

Proposition 2. For every � ∈ W , and every pair (G, C) ∈ G × SLC , such that str(G) = 1 then:

fix1,� �� (G, C) ⇐⇒ �; C UC¬ G.

Proof. The proof is an adaptation of those presented in [19] to the definition of the forcing relation defined now for HH¬(C).Notice that, since we are assuming that str(G) = 1, then the case G ≡ ¬A has not to be considered. �

Now, we deal with the general case.

Proposition 3. For every i � 1, � ∈ W , and every pair (G, C) ∈ G × SLC , such that G does not contain negation, if str(G) � i, then:

fixi,� �� (G, C) ⇐⇒ �; C UC¬ G.

Proof. The more difficult case to prove is for G atomic. We show here the proof of this case. The full proof is detailed inAppendix A.

⇒) This implication can be proved by induction on the structural order (Si,<i). Let us take 〈�, G, C〉 ∈ Si and assumethat, for any other 〈�′, G ′, C ′〉 ∈ Si , 〈�′, G ′, C ′〉 <i 〈�, G, C〉 implies that �′; C ′ UC¬ G ′ . Then, let us conclude �; C UC¬ Gby case analysis on the structure of G . For the case G ≡ A:

〈�, A, C〉 ∈ Si implies that fixi,� �� (A, C). Let k = ord(〈�, A, C〉), then T ki (fixi−1),� �� (A, C), which is equivalent to

(A, C) ∈ (T ki (fixi−1))(�). Hence, there is a variant ∀x(G ⇒ A′) of a clause of � such that the variables x do not occur free

in A, and T k−1i (fixi−1),� �� (∃x(A ≈ A′ ∧ G), C). In this case, 〈�,∃x(A ≈ A′ ∧ G), C〉 <i 〈�, A, C〉, so the induction hypothesis

can be applied, obtaining that �; C UC¬ ∃x(A ≈ A′ ∧ G). Using the rule (Clause) with the elaborated clause ∀x(G ⇒ A′), itfollows that �; C UC¬ A.

⇐) It is proved by induction on the height h of the tree proof for �; C UC¬ G . The case G ≡ A is an inductive case.We suppose that �; C A has a proof of height h. Let us prove that fixi,� �� (A, C). Obviously, the rule employed in thebottom of such proof is (Clause). So there must exist a variant ∀x(G ⇒ A′) of a clause of � such that x do not occur freein A, and that �; C ∃x(A ≈ A′ ∧ G) has a proof of height h − 1. By induction hypothesis, fixi,� �� (∃x(A ≈ A′ ∧ G), C).Using the definition of the operator Ti , the latter implies (A, C) ∈ (Ti(fixi))(�) = fixi(�), then fixi,� �� (A, C). �

Termination is guaranteed by the previous lemmas if � is stratifiable.

Theorem 1 (Soundness and completeness). For every i � 1, � ∈ W , and every pair (G, C) ∈ G × SLC , if str(G) � i then:

fixi,� �� (G, C) ⇐⇒ �; C UC¬ G.

Proof. By induction on i. Proposition 2 is the proof of the case i = 1.For i > 1, assume the induction hypothesis: for every �, G , C , with str(G) � i − 1: fixi−1,� �� (G, C) ⇐⇒ �; C UC¬ G .

Proposition 3 corresponds to the proof of every case of G except for ¬A. Let us analyze this case: fixi,� �� (¬A, C) ⇐⇒ forevery C ′ such that (A, C ′) ∈ fixi(�), it holds C C ¬C ′ , or there is no such C ′ and C ≡ �. Obviously, str(A) � i − 1, then theprevious sentence is equivalent to say that for every C ′ such that fixi−i,� �� (A, C ′), it holds C C ¬C ′ , or there is no suchC ′ and C ≡ �. Applying the induction hypothesis, it is equivalent to say that either for every C ′ such that �; C ′ UC¬ A,C C ¬C ′ holds, or there is not such C ′ and C ≡ �. This is equivalent to �; C UC¬ ¬A. �

1 I⊥ instead of fixi−1 for i = 1.

157


As a consequence of this theorem: (A, C) ∈ fix(�) ⇐⇒ �; C UC¬ A. This means that the atoms in the fixpoint of adatabase are those that can be derived by the calculus.

One of the advantages of the fixpoint semantics over the proof-theoretic one is that the former provides a model for thewhole database, while the latter focuses only on the meaning of a query in the context of a database. In [31] a goal solvingprocedure for HH(C) based on the uniform calculus (without negation) was presented, but the presence of the negation inthe language makes unavailing that procedure. The fixpoint semantics can be considered as the formal basis of particularimplementations of database systems based on HH¬(C). In fact, the fixpoint of a database definition will correspond to theinstance of the database. Once that instance is computed, the forcing relation, in which this semantics is based on, providesa formal basis for a goal-oriented computation of answers.

Notice that the previous formalisms are defined for a generic constraint system C as a black box for which the existenceof a solver which checks C-satisfiability has been assumed. So, we have easily developed an implementation for the schemeHH¬(C), based on this semantics, which is independent from any constraint system.

The rest of the paper is devoted to present the implementation of our language as a database system.

6. Introducing a database system based on HH¬(C)

Having stated HH¬(C) as a formal language, this section deals with a general description of our proposal to the imple-mentation of a Constraint Deductive Database including HH¬(C) as a query language. Also, here we show some examples ofuse of the system.

Two main components can be distinguished in the implementation of this query language. One corresponds to theimplementation of the fixpoint semantics which is very close to the theory and is independent of the concrete constraintsystem (cf. Section 8). The other component corresponds to the implementation of the constraint system (cf. Section 7).

The system is available at https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems including a user-oriented manual and a bun-dle of examples. In the following, the concrete syntax for clauses and queries is quite similar to the usual Prolog syntax,where predicate symbols and constants start with lowercase, whereas variables start with uppercase. In addition, we writenot for negation, => for implication, ex(X,G) representing ∃xG , fa(X,G) representing ∀xG , and constr(Dom,C)denoting a constraint C together with its domain. The HH¬(C) system also requires explicit type declaration, type(pred-icate(dom_1, ..., dom_n)), for predicates.

Although several solvers can be used together within the same database, they cannot be combined for the moment,i.e., constraints of different domains cannot be freely mixed to get a heterogeneous compound constraint. Predicates witharguments of different domains are restricted to those extensionally defined and are only intended for informative purposes.For instance, in the bank database of Example 6, in order to avoid the combination of domains in more complex relations(that would imply the combination of constraint systems during computation), we associate to each client a real valuewhich represents the client identifier:

client_id(smith,1.0). client_id(brown,2.0). client_id(mcandrew,3.0).

The predicates introduced in Example 6 are defined in the same way, but changing the client names by real values.

6.1. Computation stages

Next, we briefly summarize the stages of computation performed by the system to calculate the fixpoint semantics of adatabase Delta:

1. Check and infer predicate types (cf. Section 7).2. Build the dependency graph of Delta (cf. Appendix B).3. Compute a stratification s for Delta, if there is any. Otherwise, the system throws an error message and stops

(cf. Appendix B).4. If the previous step succeeds, compute fix(Delta) (cf. Section 8).

The system keeps in memory the computed fixpoint fix(Delta), the stratification s consisting of a list of pairs〈defined_predicate_symbol,stratum〉, and the dependency graph of Delta. Regarding the implementation of thedependency graph, defined in Section 5, new negatively labeled edges have been considered to deal with aggregate functions(cf. Section 7.1) and nested implication (cf. Section 8.1).

For instance, for the database of Example 6, s = [〈client,1〉, 〈pastDue,1〉, 〈mortgageQuote,1〉, 〈debtor, 1〉,〈interestRate, 1〉, 〈hasMortgage,1〉, 〈accounting, 1〉, 〈client_id, 1〉, 〈branch, 1〉, 〈newMortgage, 2〉,〈gotMortgage, 2〉, 〈personalCredit, 3〉]. And fix(Delta) contains the pairs corresponding to the extensional databaseas (client(3.0,5300.0,3000.0), true), as well as the pairs obtained from the intensional database:

158


• In stratum 1:

(debtor(1.0), true),(interestRate(2.0,2.0), true),(interestRate(X,Y), ((X=1.0, Y=5.0); (X=3.0, Y=5.0))),(accounting(X,Y,Z), ((Y=400.0, Z=1500.0, X=2.0); (Y=100.0, Z=3000.0, X=3.0))),(hasMortgage(X), (X=2.0;X=3.0))

• In stratum 2:

(newMortgage(X,Y), ((Y=<200.0, X=2.0); (Y=<1100.0, X=3.0))),(gotMortgage(X), (X=2.0; X=3.0))

• In stratum 3:

(personalCredit(X,Y), ((Y>=6000.0, Y<20000.0, X/=2.0, X/=3.0);(Y<6000.0, X=2.0); (Y<6000.0, X=3.0)))

6.2. Querying

When a query G is submitted at the prompt HHn(C)>, the system computes, if it exists, a new stratification s′for the set Delta ∪ {G}. If there is not such s′ , the query cannot be computed and the system stops; otherwise,the kept fixpoint, computed for Delta is valid to evaluate the query G. According to the theory, the answer con-straint C must satisfy fix(Delta);Delta �� (G,C). This forcing relation is implemented by means of the predicateforce(Delta,Stratification,I,G,C), as it will be explained in Section 8. This predicate makes use of the cur-rent stratification. Let us distinguish two cases, depending on whether the previous stratification s is equal to the new s′ ornot:

• If s = s′ , then as fix(Delta) was calculated with s, the answer constraint C can be obtained by executingforce(Delta, s,fix(Delta),G,C).

• If s �= s′ , it is because G contains some subgoal of the form D => G’. In this case, the dependency graph of Delta∪{G},from which s′ has been obtained, contains the edges of Delta plus the edges corresponding to the new implications in-troduced by G. Hence the new stratification s′ is also a valid stratification for Delta, and the same fixpoint fix(Delta)

would have been obtained by working with s′ . Therefore, as before, the answer constraint C for the submitted queryG can be computed by executing force(Delta, s′,fix(Delta),G,C). The information of the stratification is needed,because solving G requires to solve the implication subgoals inside it. When some D => G’ is going to be solved, inaccordance with the definition of the forcing relation, Delta is locally augmented with D in order to find the answerof G’. Since s′ has been defined taking into account those implications, it is also a stratification of Delta ∪ {D}, thatwill be used in this local computation.

In summary, in both cases the kept fixpoint fix(Delta) is the needed interpretation to find the answer, that is obtainedjust executing:

force(Delta, s′,fix(Delta),G,C).

Following with the example of the bank, the user can ask whether every client belongs to Madrid office:

HHn(C)> fa(A,branch(mad,A)).

This query does not imply any change in the dependency graph, it can be solved using the kept fixpoint. A univer-sal quantification over a finite domain is naturally translated into a conjunctive constraint obtained by instantiating thequantified variable with each element in the domain. Then the result is:

Answer: false

Dealing with negation, the user can ask for the clients that have no mortgage:

HHn(C)> not(hasMortgage(N)).

This query does not change the stratification and the system uses again the kept fixpoint. Every constraint C associatedto a pair (hasMortgage(N),C) in the fixpoint is collected and a negative conjunction is built, which is sent to the solver.Finally, the system returns:

159


Answer: N/=3.0, N/=2.0

Let us show a contrived query only to illustrate a situation in which the stratification changes:

HHn(C)> newMortgage(N,R) => interestRate(N,R).

This query introduces a new dependency between newMortgage and interestRate, then the second predicate mustbe now in stratum 2. However, the already computed fixpoint is still valid for calculating the answer, which is:

Answer: (R=2.0, N=2.0); (R=5.0, N=1.0); (R=5.0, N=3.0)

7. Implementing constraint solving

This section focuses on the implementation of constraint solving for several particular constraint systems. We also intro-duce aggregates in our system, for the first time, as constraint functions belonging to concrete constraint systems.

7.1. Aggregates

Aggregate functions are useful for computing single values from a set of numerical or other-type values. For instance,common predefined functions of relational query languages are the average and the maximum of a numeric attribute. As ourdeductive database with constraints scheme gives a natural analogy of the relational calculus, a crucial question concernswith how to extend standard aggregation constructs from the relational model to our scheme in the context of a constraintdomain. Certain proposals to solve this question have been formulated both in geometric constraint databases (see, e.g.,Chapter 6 of [30]) and deductive database settings [47,55,54,39].

In attempting to add aggregates to constraint query languages, several obstacles come up. Aggregate functions take a setof values as input and return a single value, but the output of queries in those languages are constraints which representintensional answers, not an explicit set of values. In addition, since a constraint answer can represent an infinite set, someaggregate functions, as count, make no sense.

We have taken advantage of certain aspects of the operational semantics of our database system in order to deal withthese problems. On the one hand, the stratified fixpoint computation, designed to support negation, is a good frameworkto incorporate aggregates. The rationale behind this is ensuring monotonicity as, along building the fixpoint for a givenstratum, the result of an aggregate function taking values in this stratum may change. For instance, the count of tuplescorresponding to a relation is only known after such a relation has been completely computed. This is guaranteed if thesum is computed in a stratum above the relation stratum. This can be achieved by introducing a negative dependency aswe will see later in this section. On the other hand, aggregates can be represented as functions of a constraint system, andthen its computation can be relegated to the corresponding constraint solver.

As a general requirement for computing an aggregate function over a predicate p, each pair (p(x1, . . . , xn), C) in thecurrent interpretation must hold that C restricts each variable x1, . . . , xn to a single value. Otherwise, the system is notable to compute it. Our supported aggregates include count, ranging over an atom, and sum (summatory), avg (average),min (minimum), and max (maximum), ranging over an atom and a program variable. Implementation details are given inAppendix C.2.1.

Example 12. For instance, the view:

liquid(Amount) :- constr(real,Amount=sum(client(N,B,S),B))

defined for Example 6, allows to compute the liquid assets as the cumulative sum of balances of all the clients, by includingthe aggregate sum in a constraint expression. The average salary can be specified by:

avg_salary(Average) :- constr(real,Average=sum(client(N1,B1,S1),B1)/count(client(N2,B2,S2)))

or directly with the aggregate avg:

avg_salary(Average) :- constr(real,Average=avg(client(N,B,S),S)).

The richness of the database language HH¬(C) allows to calculate an aggregate function under an assumption which changesthe values used in the aggregation. For instance, consider the following view:

view(X) :- pastDue(brown,200.0) => constr(real,X=sum(pastDue(N,A),A)).

It assumes that client Brown has a past due and then compute the sum of all the past dues in the database. Adding thisclause to the current database, the query

160


HHn(C)> view(X)

has an answer X=3300. �Since constraints can now contain aggregate functions, and aggregates include defined predicates, additional consider-

ations must be taken into account in order to compute them. Computing aggregate functions requires that the involvedpredicates are completely known, i.e., the part of the fixpoint corresponding to those predicates has been fully computed.Similarly to what happens with a clause containing a negated atom in its body, if a clause, defining a predicate p, containsan aggregate over a predicate q, the computation of q must be finished when the computation of p begins. This conditionis easily achieved by introducing a negative dependency from q to p, which guarantees s(q) < s(p) in the stratification. Forthe clauses of Example 12, we get s(client) < s(liquid), s(client) < s(avg_salary), and s(pastDue) < s(view).Full details on the dependency graph construction and stratification can be found in Appendix B.

7.2. Constraint systems and solvers

We have proposed three constraint systems as possible instances C of the scheme HH¬(C): Boolean, Reals, and FiniteDomains, representing a family of specific constraint systems ranging over denumerable sets. Enumerated types are includedas well as (finite) integer numeric types. Our constraint systems include the concrete syntax for the required values, symbols,connectives, and quantifiers as follows: “true”, “false”, “=”, “,”, “not” and “ex(X,C)”; in addition, we also include “;”,“/=” , “>”, “>=”, all of them with the usual meaning. Numeric constraints include arithmetic operators (as “+”, “-”, . . .)and constraint functions (as “abs”, . . .). Moreover, Boolean and Finite Domain constraints admit the universal quantifier“fa(X,C)”. The Finite Domain incorporates the particular domain constraint “X in Range”, where Range is a subsetof data values built with both V1..V2, which denotes the set of values in the closed interval between V1 and V2, andR1\/R2, which denotes the union of ranges.

We have incorporated a simple type checking and inferrer mechanism in the system. It expects a type declaration foreach predicate symbol in the database and warns about missing declarations or misleading use of predicates. Types forqueries are inferred from the type information of the database. In addition to the well-known benefits of using a typedlanguage, we use type information to know the constraint system a constraint belongs to.

We have considered the entailment relation of the classical logic with equality for every constraint system. This entail-ment satisfies the minimal condition imposed to constraint systems. For implementing this relation, we provide a constraintsolver with a generic interface solve(I,C,SC) for C C SC, intended to solve a constraint C, i.e., to produce a solved formSC, if it is satisfiable, or ⊥ otherwise. A solved form SC corresponding to a constraint C is a simplified, more readable formof the constraint w.r.t. C. A solved form is either a simple constraint or a disjunction of simple constraints, where a simpleconstraint is a constraint that does neither include disjunctions nor quantifications, nor negations. The generic interfacesolve for solving constraints is as follows:

solve(+Interpretation,+Constraint,-SolvedConstraint)

which solves Constraint as SolvedConstraint under Interpretation, where this interpretation is included toallow to compute aggregates.

For implementing the constraint systems, instead of starting completely from scratch, we rely on the underlying con-straint solvers already available in SWI-Prolog [53,50]. In this way, we develop a solver layer which is built over theseunderlying (simpler) solvers which is capable of solving all the constraints in such (more complex) C-instances. As an exam-ple to illustrate such an implementation, let’s consider the C-instance Finite Domains. Let’s denote supported constraints inthe underlying solver as primitive constraints. On the one hand, certain constraints of the constraint system Finite Domainscan be mapped to primitive constraints. This mapping involves relating enumerated data values (non-integer in general) insuch constraint system with integers in the underlying solver. Before posting to this solver, a constraint is rewritten withthe mapped integer values and, after solving, solved constraints in the constraint store are rewritten back with the corre-sponding enumerated values. On the other hand, there are constraints in the C-instance that the underlying solver cannotdirectly solve (quantifiers and disjunctions). For them we develop specific constraint solving as shown in Appendix C.3.

Constraint solving is expected to terminate since solver operations are always monotonic (constraints are added, possiblypruning the search space, but not removed) and enumeration (universal quantification) is only provided on finite domains.Completeness for finite domains is related to completeness of the underlying finite domain solver.

8. Implementing the fixpoint semantics

In this section we summarize the main ideas that guide the implementation of the core system: the fixpoint computation.This implementation is very close to the theoretical framework developed in previous sections, but there are some criticalpoints, regarding nested implication, where we must take pragmatic decisions. We will provide a general view of such animplementation and provide relevant details for nested implications. The constraint solvers are used as black boxes with theappropriate interface predicate solve.

161


In this section we assume a stratifiable database �, with a stratification that has been previously computed and stored asan association list Stratification. The fixpoint is then computed stratum by stratum (although a stratum may requireto compute the fixpoint for a previous stratum for the database locally enlarged due to nested implications, as we willexplain in Section 8.1). The predicate

fixPointStrat(+Delta, +Stratification, +St, -Fix)

computes Fix= fixSt(Delta), using Stratification.Then, if Delta represents a database such that str(Delta) = k,this predicate gives fixk(Delta) by computing previous fixpoints from stratum St= 0 to St= k.

Each fixpoint is evaluated by iterating the fixpoint operator following Definition 7, that relies on the forcing relation,implemented by means of the predicate


Given I= T ni (fixi−1)(Delta), for some n � 0 and a fixed stratum i > 0, force is successful if

T ni (fixi−1), Delta �� (G,C).

This predicate is implemented in a deterministic way, by making a case distinction on the syntax of the goal G to beforced (see Fig. 9 in Appendix D). But the theory still contains another source of non-determinism: the definition of ��establishes conditions on a constraint C in order to satisfy I,Delta �� (G,C), and the predicate force must build aconcrete constraint C in a deterministic way. In addition, each possible answer constraint for a goal must be captured in asingle answer constraint (possibly) using disjunctions. The concrete Prolog code for the predicate force is presented andexplained in Appendix D. Next we will explain the case of implication which is the farthest from theory.

8.1. The case of D => G in the forcing relation

Implementing force(Delta,I,(D => G),C) requires some special treatment. In this case, according to the definitionof the relation �� (see Definition 6), Delta is augmented with the clause D. At this point, the current set I has beencomputed w.r.t. to the database Delta. Then, if i and n are, respectively, the stratum and iteration under construction,(A, C) ∈ I implies (A, C) ∈ T n

i (I ′)(Delta), where I ′ is the fixpoint for the stratum i − 1, built from Delta. As stated in thetheory, the next step will be to prove T n

i (I ′),Delta∪{D} �� (G,C). But the question is how to compute T ni (I ′)(Delta∪{D}).

Notice that I is not useful here. First, because I(�) ⊆ I(� ∪ {D}) does not hold for every I , �, D . Second, because I hasbeen built considering always Delta. In particular, the fixpoint I ′ has been computed for Delta, and then it representsfixi−1(Delta). So nothing is known about the needed set T n

i (I ′)(Delta∪ {D}).The actual fact is that the definition of the fixpoint operator Ti is not constructive for the case of implication due to the

increase of the set of clauses. To solve this obstacle, we have adopted a conservative position. Let StG= max{St | 〈p, St〉 ∈Stratification, p a predicate symbol in G}. Then, the fixpoint of the stratum StG for Delta∪{D} is locally computed,and finally it is proved if fixStG , Delta∪ {D} �� (G,C).

This solution causes the following problem. Consider a clause in Delta of the form A :- D => G, such that i = str(A).From Definition 3, StG� i can be deduced. During the computation of fixi(Delta), the fixpoint operator takes this clauseinto account in order to look for a pair (A,C) to be added to the current I. Following Definition 7, I,Delta �� (∃x(A≈A′ ∧ D => G), C) must be proved, then after the existential quantifiers are eliminated, the predicates

force(Delta,Stratification,I,A≈ A′,C), and

force(Delta,Stratification,I,(D => G),C)

will be executed (except variable renaming). The second force will call to

fixPointStrat(Delta1,Stratification,StG,Fix),

where Delta1= Delta ∪ {D} (modulo elaboration and variable renaming). If StG= i, this means to build fixi(Delta1),so the clause A :- D => G will be tried again, because the stratum of A is i. This gives rise to a non-terminating loop,since force(Delta1,Stratification,I,(D => G),C) will be executed and Delta1 will be augmented with theelaboration of D once more, obtaining Delta2, which is again augmented with the same clause, and so on. However, ifStG< i, then Fix= fixStG(Delta1) can be correctly built, because A :- D => G is not considered in strata less that i.

The target StG< str(A) would be ensured if the predicate with maximum stratum in G negatively depends on the predi-cate symbol of A. This is achieved by negatively labeling the edges from the defined predicate symbols of G to the predicatesymbol of A. Although these new dependencies impose additional syntactical conditions on databases to be stratifiable, theimplementation remains sound w.r.t. the stratified fixpoint semantics defined in Section 5.

The following example sketches the computation of the fixpoint for a predicate defined using an embedded implica-tion.

162


Example 13. Let us consider the following database Delta:

q(a). r(c).q(b). p(X) :- q(X) => r(X).

According to the previous explanations, if we consider a Stratification where all the predicates p, q and r are instratum 1, then we would have the following infinite sequence of calls:

fixPointStrat(Delta,Stratification,1,Fix)

fixPointStrat(Delta∪{q(X)},Stratification,1,Fix)

fixPointStrat(Delta∪{q(X)}∪{q(X)},Stratification,1,Fix)

...

But, with the current definition of dependency graph, the Stratification raises p to stratum 2, while q and r arein stratum 1. For the first stratum fixPointStrat(Delta,Stratification,1,Fix1) gets Fix1={(q(X),X=a),

(q(X),X=b),(r(X),X=c)}, because p(X) :- q(X) => r(X) is not considered here. For the second (and last) stra-tum we have the call

fixPointStrat(Delta,Stratification,2,Fix2)

Fix2 will be built from Fix1 introducing pairs related to p. In the first iteration of the fixpoint operator, and as we haveremarked above, the clause defining p will require to execute

force(Delta,Stratification,Fix1,q(X) => r(X),C),

in order to calculate C and then introduce a pair (p(X), C) into Fix1. Two main steps can be distinguished in suchexecution (see Appendix D for details):

1. Extending the database Delta with q(X), obtaining Delta1 and locally evaluating the fixpoint of the stratum 1 (thestratum of r) for the extended database Delta1. This is achieved by calling

fixPointStrat(Delta1,Stratification,1,Fix1’).

Since p is not considered now in the stratum 1, Fix1’ can be correctly calculated as

Fix1’= {(q(X),true), (q(X),X=a), (q(X),X=b)(r(X),X=c)}.

2. Forcing the goal r(X) with this new fixpoint, by calling

force(Delta1,Stratification,Fix1’,r(X),C)

which computes the constraint C as X=c.

Then (p(X), X=c) is added to the previous interpretation. Since no more pairs can be added,

Fix2 = {(p(X),X=c), (q(X),X=a), (q(X),X=b) (r(X),X=c)}is obtained as the final fixpoint for Delta we were looking for. �

We finish this section with an example, illustrating how to obtain a dependency graph, in which some negative labelededges are generated, as explained before, due to an embedded implication.

Example 14. Consider the clause:

D ≡ ∀x(G ⇒ p(x)

), where G ≡ ∃y

(q(x, y) ⇒ (

r(x) ∧ u(y))) ∧ ¬t(x).

From G the edges q → r and q → u are added to the dependency graph. Now G must be connected with the outer contextof the clause D , i.e., the predicate symbols q, r, u, t have to be connected with p. This introduces a first edge q → p andthe rest of edges will be negatively labeled: t occurs explicitly negated and r, u occur in a nested implication. So the edgesr ¬−→ p, u ¬−→ p, t ¬−→ p are introduced. Then, collecting all the edges, the dependency graph for D is:

{q → r,q → u,q → p, r ¬−→ p, u ¬−→ p, t ¬−→ p}.According to Definition 2, a stratification for D is any mapping s from {p,q, r, u, t} to natural numbers satisfying s(q) � s(r),s(q) � s(u), s(q) � s(p), s(r) < s(p), s(u) < s(p), s(t) < s(p). For instance, s(p) = 2 and s(q) = s(r) = s(u) = s(t) = 1. Thisway, a database with just this clause is stratifiable. Intuitively, this means that for evaluating p, the rest of predicates shouldbe evaluated before; in particular q, which takes part of a nested implication.

But if we add the clause D ′ ≡ ∀x∀y(p(x) ⇒ q(x, y)), the edge p → q is generated, and we would also have the inequalitys(p) � s(q). Then, the system of inequalities does not have any solution, i.e., the augmented database is not stratifiable. �

163


The complete specification of the dependency graph used in the implementation (including the additional negativelylabeled edges introduced in this section and those introduced by aggregates explained in Section 7.1) and the stratificationalgorithm can be found in Appendix B.

9. Conclusions and future work

In this work we have presented the formalization and implementation of the constraint logic programming schemeHH¬(C) as an expressive deductive database language. The main additions of this language w.r.t. the existing ones come fromthe fact that the intuitionistic logic of hereditary Harrop formulas which it is based on supports implication and universalquantification in goals. This allows to manage hypothetical queries and encapsulation of variables. In addition, HH¬(C)incorporates the benefits of using constraints. The proposal includes within the same language these features together withstratified negation and a flexible architecture for different constraint systems. In fact, the framework is independent ofthe concrete constraint system. In particular, we have provided Boolean, Real, and Finite Domains constraint systems. Also,a limited combination of constraints from different constraint systems is allowed. As a result, we obtain a very powerfuldatabase language with a well-formalized theoretical framework and a prototype system that implements it.

HH¬(C) is a mature proposal from the theoretical point of view, and the prototype shows its viability. But the prototypepresented in this work must be enhanced to set it as a practical system. The current implementation is very close to thetheory and is a valuable tool for understanding and develop such a theory, but as a consequence it has an expected penaltyin efficiency.

A first source of inefficiency comes from the forcing of implication, which dynamically changes the database. This in-volves the local computation of the fixpoint for augmented databases, which yields to start computations from lower strata.Two important improvements oriented to reduce the number of extra computations can be done. First, when solving D ⇒ G ,where str(D) = i, D is included into the current database � in order to solve G . However, the fixpoint of the augmenteddatabase has not to be computed from scratch, but from the known fixi−1(�), that in fact is equal to fixi−1(�∪{D}). Second,for computing the answer constraint associated to G , only the semantics of the predicates which G depends on is neces-sary. These predicates can be identified from the dependency graph, so only the fixpoint for a fragment of the database,corresponding to the clauses defining such predicates, must be computed.

The bottom-up computation by strata we have chosen as computation mechanism offers a number of benefits as wehave explained, but it is also a second source of inefficiency. In this line, well-known methods as magic set transformations[6] and tabling [48] could be worth adapting to the current implementation. Therefore, a more efficient top-down-drivenbottom-up computation can be achieved, as it is guided by goals. This is also related to widen the set of computable queriesand databases that could relax our stratification restrictions. In particular, by using tabling we can avoid the incorporationof negative dependencies to deal with implications in the body of a clause. The idea for adapting tabling to our scheme isnot only memorizing answers but also assumed clauses.

In addition, efficiency can be upgraded if some fixpoint components are mapped to relational tables, in this way someoperations can be solved with SQL queries by taking advantage of the existing relational technology performance and mem-ory scalability.

There are several interesting extensions that can be incorporated to the system, as strong integrity constraints. This isuseful in a number of circumstances, as in Example 6, where it is assumed that each client has, at most, one mortgagequote. This condition could be naturally expressed as an integrity constraint, better than an assumption. A first approach tothis issue has already been addressed in [3], where we sketched how to support primary keys, foreign keys and functionaldependencies in HH¬(C). Another possible extension refers to aggregate functions. We have imposed the requirement thatevery atom over which the aggregate works on must be ground in the interpretation. While this restriction is quite nat-ural and responds to the most common use of aggregates, the possibility of dropping such a restriction and its practicalapplications can be studied.

Acknowledgements

This work has been partially supported by the Spanish projects STAMP (TIN2008-06622-C03-01), Prometidos-CM(S2009TIC-1465) and GPD (UCM-BSCH-GR35/10-A-910502). Also thanks to Jan Wielemaker for providing SWI-Prolog [53]and Markus Triska [50] for both providing its FD library and adding new features we needed. Finally, we would like toexpress our thanks to the anonymous referees who really helped us in improving the paper.

Appendix A. Proofs of Section 5

Lemma 2. Let i � 1 and I1, I2 ∈ Ii such that I1 �i I2 . Then, for any � ∈ W , and (G, C) ∈ G × SLC , if I1,� �� (G, C), thenI2,� �� (G, C).

Proof. The proof is inductive on the structure of G .

(C ′) This case is trivial by the definition of ��.

164


(A) I1,� �� (A, C) ⇐⇒ (A, C) ∈ I1(�). In fact (A, C) ∈ [I1(�)] j , 1 � j � i. Then, since I1 �i I2 implies that [I1(�)] j ⊆[I2(�)] j ⊆ I2(�), (A, C) ∈ I2(�) and therefore I2,� �� (A, C).

(¬A) If I1,� �� (¬A, C), then C C ¬C ′ for every C ′ such that (A, C ′) ∈ I1(�), or there is no such C ′ and C ≡ �. Sincestr(¬A) � i, obviously str(A) = j, for some j < i. Then [I2(�)] j = [I1(�)] j , because I1 �i I2, and I1,� �� (¬A, C)

is equivalent to I2,� �� (¬A, C).(G1 ∧ G2) If I1,� �� (G1 ∧ G2, C), then I1,� �� (G1, C) and I1,� �� (G2, C). In both cases the induction hypothesis can be

used, notice that the strata of G1 and G2 are less than or equal to i, so I2,� �� (G1, C) and I2,� �� (G2, C), whichimplies that I2,� �� (G1 ∧ G2, C).

(G1 ∨ G2) I1,� �� (G1 ∨ G2, C) ⇐⇒ there is k ∈ {1,2} such that I1,� �� (Gk, C). By induction hypothesis, I2,� ��(Gk, C), hence I2,� �� (G1 ∨ G2, C).

(D ⇒ G ′) I1,� �� (D ⇒ G ′, C) ⇐⇒ I1,� ∪ {D} �� (G ′, C). Then, by induction hypothesis, I2,� ∪ {D} �� (G ′, C) holds, soI2,� �� (D ⇒ G ′, C).

(C ′ ⇒ G ′) I1,� �� (C ′ ⇒ G ′, C) ⇐⇒ I1,� �� (G ′, C ∧ C ′). str(G ′) = str(C ′ ⇒ G ′), so str(G ′) � i. Hence, by induction hypoth-esis, I2,� �� (G ′, C ∧ C ′) holds, which implies that I2,� �� (C ′ ⇒ G ′, C).

(∃xG ′) I1,� �� (∃xG ′, C) ⇐⇒ I1,� �� (G ′[y/x], C ′), where y does not occur free in �, ∃xG ′ , C , and C C ∃yC ′ . By in-duction hypothesis I2,� �� (G ′[y/x], C ′), hence I2,� �� (∃xG ′, C).

(∀xG ′) I1,� �� (∀xG ′, C) ⇐⇒ � �� (G ′[y/x], C), where y does not occur free in �, ∀xG ′ , C . By induction hypothesisI2,� �� (G ′[y/x], C), therefore I2,� �� (∀xG ′, C). �

Lemma 3. Let i � 1 and let {In}n�0 be a denumerable family of interpretations over the stratum i, such that I0 �i I1 �i I2 �i · · · .Then, for any �, G and C ,

⊔n�0

In,� �� (G, C) ⇐⇒ there exists k � 0 such that Ik,� �� (G, C).

Proof. In order to simplify the notation we write I as⊔

n�0 In . The implication from right to left is a consequence of

Lemma 2, since Ik �i I holds for any k. The converse is proved using the result of Lemma 1 ( I(�) = ⋃n�0 In(�)), by

induction on the structure of G .

(C ) I,� �� (C, C ′) ⇐⇒ Ik,� �� (C, C ′) is true independently of k � 0.(A) I,� �� (A, C) ⇐⇒ (A, C) ∈ I(�) = ⋃

n�0 In(�). Therefore, there exists k � 0 such that (A, C) ∈ Ik(�), hence, forthat k, Ik,� �� (A, C).

(¬A) I,� �� (¬A, C) ⇐⇒ for every C ′ such that I,� �� (A, C ′), C C ¬C ′ , or there is not such C ′ . We are assum-ing that str(¬A) � i so str(A) < i. I0 �i I1 �i I2 �i · · · implies that [I0(�)] j = [I1(�)] j = · · · = [⋃n�0 In(�)] j =[⋃n�0 In(�)] j . So for any k � 1, Ik,� �� (¬A, C).

(G1 ∧ G2) I,� �� (G1 ∧ G2, C) ⇐⇒ I,� �� (G j, C), for each j ∈ {1,2}. In both cases the induction hypothesis can be used,so there exist k1,k2 � 0 such that Ik j ,� �� (G j, C) for each j ∈ {1,2}. Let k = max(k1,k2). Then Ik,� �� (G j, C)

for each j ∈ {1,2} in virtue of Lemma 2, and hence Ik,� �� (G1 ∧ G2, C).(G1 ∨ G2) I,� �� (G1 ∨ G2, C) ⇐⇒ there is j ∈ {1,2} such that I,� �� (G j, C). The induction hypothesis can be used, so

there exists k � 0 such that Ik,� �� (G j, C), and therefore Ik,� �� (G1 ∨ G2, C).(D ⇒ G ′) I,� �� (D ⇒ G ′, C) ⇐⇒ I,� ∪ {D} �� (G ′, C). Then there is k � 0 such that Ik,� ∪ {D} �� (G ′, C), by induction

hypothesis. Therefore there is k � 0 such that Ik,� �� (D ⇒ G ′, C), by definition of the relation �� .(C ′ ⇒ G ′) I,� �� (C ′ ⇒ G ′, C) ⇐⇒ I,� �� (G ′, C ∧ C ′). Then there is k � 0 such that Ik,� �� (G ′, C ∧ C ′), by induction

hypothesis, which means that there is k � 0 such that Ik,� �� (C ′ ⇒ G ′, C).(∀xG ′) I,� �� (∀xG ′, C) ⇐⇒ there is a variable y such that y does not occur free in �, C , ∀xG ′ , such that

I,� �� (G ′[y/x], C). By induction hypothesis, it happens that there exists k � 0 such that Ik,� �� (G ′[y/x], C).Hence Ik,� �� (∀xG ′, C) for some k � 0.

(∃xG ′) I,� �� (∃xG ′, C) ⇐⇒ there is a variable y such that y does not occur free in �, C , ∃xG ′ , and a constraint C ′ ,such that I,� �� (G ′[y/x], C ′), and C C ∃yC ′ . By induction hypothesis, it happens that there exists k � 0 suchthat Ik,� �� (G ′[y/x], C ′). Hence, by definition of �� , Ik,� �� (∃xG ′, C) for some k � 0. �

Proposition 3. For every i � 1, � ∈ W , and every pair (G, C) ∈ G × SLC , such that G does not contain negation, if str(G) � i, then:

fixi,� �� (G, C) ⇐⇒ �; C UC¬ G.

Proof. ⇒) This implication can be proved by induction on the structural order (Si,<i). Let us take 〈�, G, C〉 ∈ Si andassume that, for any other 〈�′, G ′, C ′〉 ∈ Si , 〈�′, G ′, C ′〉 <i 〈�, G, C〉 implies that �′; C ′ UC¬ G ′ . Then, let us conclude�; C UC¬ G by case analysis on the structure of G .

165


C ∈ SLC If 〈�, C ′, C〉 ∈ Si then C C C ′ , therefore �; C UC¬ C ′ by (C).A ∈ At 〈�, A, C〉 ∈ Si implies that fixi,� �� (A, C). Let k = ord(〈�, A, C〉), then T k

i (fixi−1),� �� (A, C), which is equivalentto (A, C) ∈ (T k

i (fixi−1))(�). Hence, there is a variant ∀x(G ⇒ A′) of a clause of � such that the variables x do not occur

free in A, and T k−1i (fixi−1),� �� (∃x(A ≈ A′ ∧ G), C). In this case, 〈�,∃x(A ≈ A′ ∧ G), C〉 <i 〈�, A, C〉, so the induction

hypothesis can be applied, obtaining that �; C UC¬ ∃x(A ≈ A′ ∧ G). Using the rule (Clause) with the elaborated clause∀x(G ⇒ A′), it follows that �; C UC¬ A.

G1 ∧ G2 Then 〈�, G1 ∧ G2, C〉 ∈ Si implies that fixi,� �� (Gk, C) for each k ∈ {1,2}. G1, G2 are strict subformulas of G1 ∧ G2,hence 〈�, Gk, C〉 <i 〈�, G1 ∧ G2, C〉, for each k ∈ {1,2}. Then, by the induction hypothesis, �; C UC¬ Gk , for eachk ∈ {1,2}. So �; C UC¬ G1 ∧ G2. applying (∧) rule.

G1 ∨ G2 Similar to the previous case.D ⇒ G Now 〈�, D ⇒ G, C〉 ∈ Si implies that fixi,� ∪ {D} �� (G, C). Clearly, ord(〈�, D ⇒ G, C〉) = ord(〈� ∪ {D}, G, C〉)

and G is a strict subformula of D ⇒ G , so 〈� ∪ {D}, G, C〉 <i 〈�, D ⇒ G, C〉. Therefore, by the induction hypothesis,�, D; C UC¬ G . Thanks to the rule (⇒), it follows that �; C UC¬ D ⇒ G .

C ′ ⇒ G Then 〈�, C ′ ⇒ G, C〉 ∈ Si implies that fixi,� �� (G, C ∧ C ′). Clearly, 〈�, G, C ∧ C ′〉 <i 〈�, C ′ ⇒ G, C〉. Then, by theinduction hypothesis, �; C ∧ C ′ UC¬ G . Hence it is easy to show �; C UC¬ C ′ ⇒ G , using (⇒C ) rule.

∃xG Then 〈�,∃xG, C〉 ∈ Si implies that there is C ′ , such that C C ∃yC ′ and fixi,� �� (G[y/x], C ′), where y does not occurfree in �, ∃xG and C . Then G[y/x] is a renaming of a strict subformula of ∃xG , and 〈�, G[y/x], C ′〉 <i 〈�,∃xG, C〉.Therefore �; C ′ UC¬ G[y/x] by the induction hypothesis, so �; C, C ′ UC¬ G[y/x], trivially. Hence �; C UC¬ ∃xG , byusing the rule (∃), because C C ∃yC ′ .

∀xG ′ Then 〈�,∀xG, C〉 ∈ Si implies that fixi, w �� (G[y/x], C), where the variable y does not occur free in �, ∀xG , C .Clearly, ord(〈�,∀xG, C〉) = ord(〈�, G[y/x], C〉) and G[y/x] is a renaming of a strict subformula of ∀xG , so〈�, G[y/x], C〉 <i 〈�,∀xG, C〉. Therefore, by the induction hypothesis, we obtain �; C UC¬ G[y/x]. Applying now (∀),it follows that �; C UC¬ ∀xG .

⇐) It is proved by induction on the height h of the tree proof for �; C UC¬ G .Base case: h = 1. The only possibility is that G ≡ C ′ ∈ SLC . Obviously fixi,� �� (C ′, C), because we are assuming

�; C UC¬ C ′ , so C C C ′ .Inductive case: We suppose that �; C G has a proof of height h. Let us prove that fixi,� �� (G, C), by case analysis on

the rule employed in the bottom of such proof.

(Clause) There must exist a variant ∀x(G ⇒ A′) of a clause of � such that x do not occur free in A, and that �; C ∃x(A ≈A′ ∧ G) has a proof of height h − 1. By induction hypothesis, fixi,� �� (∃x(A ≈ A′ ∧ G), C). Using the definition of theoperator Ti , the latter implies (A, C) ∈ (Ti(fixi))(�) = fixi(�), then fixi,� �� (A, C).

(∧) There must exist goals G1, G2 such that G ≡ G1 ∧ G2 and the sequent �; C Gk has a proof of height less than hfor each k ∈ {1,2}. By induction hypothesis, fixi,� �� (G1, C), and fixi,� �� (G2, C). As a consequence, fixi,� �� (G1 ∧G2, C).

(∨) Similar to the previous case.(⇒) Then G ≡ D ⇒ G ′ and the sequent �, D; C G ′ has a proof of height h − 1. By induction hypothesis, fixi,� ∪

{D} �� (G ′, C). Therefore fixi,� �� (D ⇒ G ′, C).(⇒C ) Now G ≡ C ′ ⇒ G ′ and the sequent �; C, C ′ G ′ has a proof of height h−1. By the properties of UC¬ , also �; C ∧C ′

G ′ has a proof of height h − 1. Applying now the induction hypothesis, fixi,� �� (G ′, C ∧ C ′). Therefore fixi,� �� (C ′ ⇒G ′, C).

(∃) Then G ≡ ∃xG ′ , and there must exist a constraint C ′ and a variable y not occurring free in �, C , ∃xG ′ , such that�; C ∧ C ′ G ′[y/x] has a proof of height h − 1 and C C ∃yC ′ . Then fixi,� �� (G ′[y/x], C ∧ C ′), by induction hypothesis,and therefore fixi,� �� (∃xG ′, C), because C C ∃y(C ∧ C ′).

(∀) G must be of the form ∀xG ′ , and there must exist a variable y not occurring free in �, C , ∀xG ′ such that�; C G ′[y/x], has a proof of height h − 1. By induction hypothesis, fixi,� �� (G ′[y/x], C), and, as a consequence,fixi,� �� (∀xG ′, C). �

Appendix B. Dependency graph and stratification

The algorithm for calculating the dependency graph is expressed by means of the mutually recursive functions dpClauseand dpGoal defined in Fig. 7, depending on the structure of the formula. These functions return a pair 〈E, N〉, where E isa set of edges of the form p → q or p ¬−→ q, and N is an auxiliary set of link-nodes which stores the predicate symbolsoccurring in the formula. Moreover, each of these predicate symbols has a negative annotation in three cases: if theyoccur in a negated atom, in a nested implication or in an aggregate function (notice that the link-nodes, ¬preds(C), in thefourth case in the definition of dpGoal correspond to the aggregate functions occurring in C ). This annotation will be thenpropagated to the edges coming out of those nodes.

By using the function dpClause and dpGoal, it is straightforward to calculate the dependency graph of a set of formulas Φ

(and in particular, for a database) as the union of the edges obtained for each element of the set:

166


• dpClause(A) = 〈∅, {p A}〉• dpClause(D1 ∧ D2) = 〈E1 ∪ E2, N1 ∪ N2〉

if dpClause(D1) = 〈E1, N1〉 and dpClause(D2) = 〈E2, N2〉• dpClause(∀x D) = dpClause(D)

• dpClause(G ⇒ A) = 〈EG ∪ ⋃n∈NG

{n → p A} ∪ ⋃¬n∈NG

{n ¬−→ p A}, {p A}〉if dpGoal(G) = 〈EG , NG 〉

• dpGoal(A) = 〈∅, {p A}〉• dpGoal(¬A) = 〈∅, {¬p A}〉• dpGoal(C) = 〈∅,¬preds(C)〉• dpGoal(C ⇒ G) = 〈E ∪ ⋃

n∈preds(C), m∈preds(G){n ¬−→ m}, N ∪ ¬preds(C)〉if dpGoal(G) = 〈E, N〉

• dpGoal(G1 ∧ G2) = dpGoal(G1 ∨ G2) = 〈E1 ∪ E2, N1 ∪ N2〉if dpGoal(G1) = 〈E1, N1〉 and dpGoal(G2) = 〈E2, N2〉

• dpGoal(∀x G) = dpGoal(∃x G) = dpGoal(G)

• dpGoal(D ⇒ G) = 〈E D ∪ EG ∪ ⋃m∈preds(G)(

⋃n∈ND

{n → m} ∪ ⋃¬n∈ND

{n ¬−→ m}), ND ∪ ¬preds(G)〉if dpClause(D) = 〈E D , ND 〉 and dpGoal(G) = 〈EG , NG 〉

Notation:

• p A : predicate symbol of the atom A• preds(F ) = {p | p is a definite predicate symbol occurring in F }• ¬S = {¬p | p ∈ S}

Fig. 7. Dependency graph for clauses and goals.

DGΦ =⋃

D∈Φ

{E D

∣∣ dpClause(D) = 〈E D , N〉} ∪⋃

G∈Φ

{EG

∣∣ dpGoal(G) = 〈EG , N〉}.

Once we have the dependency graph, the particular algorithm for finding a stratification for � (or for checking that it isnot stratifiable) associates to each predicate symbol p an integer variable X p ∈ [1 . . . N], where N is the number of predicatesymbols of �, and generates a system of inequalities: each dependency p → q produces X p � Xq and p ¬−→ q producesXp < Xq . Then, solving this system (if possible) provides the stratum of each p in X p . The stratification algorithm ends witha concrete stratification if there exists one or stops with an error message (in a polynomial time with respect to the numberof predicate symbols in the database).

Appendix C. Implementation of constraint solving

This appendix includes implementation details of the constraint solvers, focusing on the finite domain and aggregates.

C.1. Implementing solve

The generic interface to the constraint solvers is implemented as follows:

solve(I,C,SC) :-simplify_ground_ctr(C,SGC),partition_ctr(I,SGC,DCs),solve_ctr_list(I,DCs,SDCs),ctr_list_to_ctr(SDCs,CC),simplify_ctr(CC,SC).

This code first calls simplify_ground_ctr, which simplifies trivial ground primitive constraints (as, e.g., =,>, . . .).Next, the call to predicate partition_ctr partitions the input constraint into a list whose components belong to differentconstraint domains. This partition is always possible when the constraint ranges over a single domain. When it combinesdifferent domains, the current implementation is able to achieve the partition in some cases. If it is unable to do it, then anexception is raised.

The call to solve_ctr_list posts each component to its corresponding solver as a call to the predicate solveFD(described later). After, the solved constraint, which is represented as a list, is transformed back into a conjunctive constraintvia ctr_list_to_ctr. Finally, this constraint is simplified by logical axioms as De Morgan’s laws. In addition to thisgeneric interface, the particular interface

solve(+Domain,+Interpretation,+Constraint,-SolvedConstraint).

is also provided, which is useful when the domain Dom is already known and can be directly posted to its correspondingsolver.

167


(01) solveFD(Dom,I,C,SC) :-(02) copy_term(C,FC), % Input variables keep untouched(03) get_vars(C,Vars), % Input variables are held to be(04) get_vars(FC,FVars), % mapped to the solved new vars(05) swap_qvars_by_fvars(FC,QFC), % Replace quantified vars by fresh ones(06) constrain_domains(QFC,Dom), % Constrain variables to the current domain(07) domain_to_int(QFC,Dom,IC), % Domain mapping from enumerated to integer(08) bagof((FVars,Cs,Sat), % List of (Fresh vars,Constraints,Satisfiable)(09) (solveFD_ctr(IC,Dom,I,true), % Solving(10) satisfiable(IC,Sat), % Check satisfiability(11) project_ctrs(FVars,Vars,Cs) % Project constraints wrt. input vars(12) ), LFVarsCsS), ! % List of Fresh vars,Constraints,Satisfiable(13) filter_ctr_list(LFVarsCsS,LICs), % Pick solved constraints(14) simplify_disj_list(LICs,SLICs), % Simplify the disjunctive list(15) disj_list_to_ctr(SLICs,ISC), % Convert list to constraint(16) get_vars(ISC,FVSC), % If the output constraint contains(17) (fresh_vars(Vars,FVSC) -> % fresh variables(18) SC = C % Then discard the solved constraint(19) ; % and return the input constraint(20) int_to_domain(ISC,Dom,SC)). % Else, map domain from integer to enumerated(21) solveFD(_Dom,_I,_C,false). % Return false upon unsatisfiability

Fig. 8. The Predicate solveFD for solving finite domain constraints.

C.2. Implementing solveFD

For the solvers of the constraint systems Finite Domains and Boolean, the following predicates are available:

• solveFD(+Domain,+Interpretation,+Constraint,-SolvedConstraint)It solves the input Constraint over Domain using Interpretation and returns its solved form SolvedConstraint, if it is satisfiable; otherwise, returns false.

• constr_conjFD(+Domain,+Interpretation,-C1,+C2,+C)It computes (using Interpretation) the component C1 of the conjunction C1,C2 such that C1,C2 C C, where Cis the constraint system corresponding to Domain.

Since we consider classical logic for these particular constraint systems, the following implementation for the secondpredicate is sound:

constr_conjFD(Dom,I,C1,C2,C) :-solveFD(Dom,I,(not(C2);C),C1), !,C1\==false.

Predicate solveFD includes calls to the solving of constraints, which are computed with the predicate:

solveFD_ctr(+Constraint,+DomainName,+Interpretation,-Satisfiable),

which receives a constraint, a domain name and an interpretation, returning whether it is satisfiable (true) or not (false).The code excerpt of Fig. 8 implements the required behavior for solveFD. Line (05) is intended to replace quantified

variables by fresh ones in order to avoid a name clash. Line (07) maps domain data values with integers, whereas line(21) replaces back the (integer) computed data values by the corresponding, mapped data values. The core of constraintsolving lays between lines (09)-(11), where, first, the input constraint is tried to be solved (see next paragraph describingthe predicate solveFD_ctr). Second, it is checked for satisfiability, that is, try to find a single, concrete solution vialabeling. And, third, the underlying constraint store is projected with respect to the relevant variables (i.e., those occurringin the input constraint plus the possible new ones computed by the underlying solver). Lines (13)-(15) are simplyintended for data structure formatting.

C.2.1. AggregatesDuring evaluating expressions, the predicate compute_aggr is responsible of computing the outcome of each aggregate

function, as illustrated by one of its clauses below:

compute_aggr(sum(At,Var),I,Res) :-!,get_values(I,At,Var,S),compute_sum(S,0,Res).

168


The predicate get_values calls the predicate lookUpAll to get concrete values from ground equalities of the formVar = value from an interpretation I, with respect to an atom At and a variable Var. Then, these values are stored ina list S. Finally, S is used to compute the final result Res w.r.t. the particular function.

The implementation of the function count is slightly different from the others because the call to this function has nota variable as an argument. In this case, we also call the predicate lookUpAll, but then, the solver computes the numberof occurrences of the atom to be counted from the interpretation.

C.3. Solving primitive constraints

Whereas some constraints can be posted to the underlying solver, others cannot, as negation which is, as shown below,explicitly handled because it can apply to constraints that are not supported by the underlying Prolog solver:

solveFD_ctr(not(C),Dom,I,B) :-!,complement(C,NotC),solveFD_ctr(NotC,Dom,I,B).

Here, the predicate complement computes the complemented constraint (e.g., X#=<Y is the complemented constraintof X#>Y).

An example of unsupported constraint is disjunction, which is computed by collecting all answers (cf. line (08) inFig. 8). Solving this constraint is as follows:

solveFD_ctr((C1;_C2),Dom,I,true) :-solveFD_ctr(C1,Dom,I,true).

solveFD_ctr((_C1;C2),Dom,I,true) :-!,solveFD_ctr(C2,Dom,I,true).

Finally, we describe quantifiers. The existential quantifier is implemented as follows, where in the last but one linesatisfiable(FC,Domain,true) tries to find a concrete value satisfying FC:

solveFD_ctr(ex(X,C),Dom,I,B) :-!,% Replace X by a fresh variable _FX in C:swap(X,_FX,C,FC),constrain_domains(FC,Dom),(solveFD_ctr(FC,Dom,I,true),% Checking satisfiability:satisfiable(FC,Dom,true),B=true;B=false

).

For the universal quantifier, a constraint fa(X,C) is replaced by the conjunction C[X/v1], ..., C[X/vn],where vi (1 � i � n) are the values in the domain of X. Note that cuts at the beginning in the body of each clause oc-cur because we add a default case corresponding to an illegal constraint, which involves the raising of an exception.

The constraint solver for Reals follows a similar but simpler route for its implementation since universal quantifiers arenot supported, and there are no domain data values to map. Both solveR and constr_conjR are provided, analogouslyto solveFD and constr_conjFD respectively.

Appendix D. Implementation of the forcing relation

The forcing relation �� of Definition 6 is implemented by means of the predicate


whose meaning is: given I = T ni (fixi−1)(Delta), for some n � 0 and a fixed stratum i > 0, force is successful if

T ni (fixi−1),Delta �� (G,C). An important point to understand the implementation is to keep in mind the deterministic na-

ture of this predicate. The definition of �� establishes conditions on a constraint C in order to satisfy I,Delta �� (G,C),but the predicate force must build a concrete constraint C. In addition, each possible answer constraint for a goal mustbe captured in a single answer constraint (possibly) using disjunctions.

169


(1) force(_Delta,_Stratification,I,constr(Dom,C),C1):- !,solve(Dom,I,C,C1).

(2) force(Delta,Stratification,I,(G1,G2),C):- !,force(Delta,Stratification,I,G1,C1),force(Delta,Stratification,I,G2,C2),solve(I,(C1,C2),C).

(3) force(Delta,Stratification,I,(G1;G2),C):- !,( force(Delta,Stratification,I,G1,C1), !,( force(Delta,Stratification,I,G2,C2), !, solve(I,(C1;C2),C); solve(I,C1,C) )

; force(Delta,Stratification,I,G2,C2), solve(I,C2,C) ).

(4) force(Delta,Stratification,I,(constr(D,C2) => G),C1):- !,( solve(D,I,C2,_), !, force(Delta,Stratification,I,G,C),constr_conj(D,I,C1,C2,C)

; C1=true ).

(5) force(Delta,Stratification,I,(D => G),C) :- !,elab(D,De),localClauses(De,Ls), addLocalClauses(Ls,Delta,Delta1),getMaxStrat(G,Stratification,StG),fixPointStrat(Delta1,Stratification,StG,Fix),force(Delta1,Stratification,Fix,G,C).

(6) force(Delta,Stratification,I,ex(X,G),C):- !, replace(X,X1,G,G1),force(Delta,Stratification,I,G1,C1), solve(I,ex(X1,C1),C).

(7) force(Delta,Stratification,I,fa(X,G),C):- !, replace(X,X1,G,G1),force(Delta,Stratification,I,G1,C1), solve(I,fa(X1,C1),C).

(8) force(_Delta,_Stratification,I,not(At),C):- !, lookUpAll(At,I,Ls),( Ls==[], !, C=true ; buildNegConj(Ls,NLs), solve(I,NLs,C) ).

(9) force(_Delta,_Stratification,I,At,C):-lookUpAll(At,I,Cs), buildDisj(Cs,C1), solve(I,C1,C).

Fig. 9. Forcing relation.

Fig. 9 shows the implementation of force. There is a clause of force for each goal structure. We explain them shortly.Clause (1) stands for the forcing of a constraint C over a domain Dom, which is processed by calling the constraint

solver. Clause (2) stands for a conjunction G1,G2; it forces both goals, and then solves the conjunction of the resultinganswer constraints. For a disjunction G1;G2 (clause (3)) there are four possible (and mutually exclusive) situations: bothgoals can be forced, only G1, only G2, or neither of two; the answer constraint is obtained by solving the correspondingconstraints or failing in the last case.

Clause (4) corresponds to an implication with a constraint as antecedent. In this case, we first try to solve the an-tecedent C2 in order to check its satisfiability. If it is not satisfiable (second part of the disjunction) then the implicationtrivially holds and the answer constraint C1 is true. If it is satisfiable, then the consequent G must be forced, obtaining ananswer constraint C. This answer constraint is obtained using the predicate constr_conj in order to find C1 such thatC1,C2 C C as stated in the theory (see Definition 6).

Clause (5) corresponds to the case of an implication D => G which introduces additional difficulties as explained inSection 8.1, as it involves a computation of a new fixpoint for the extended database. The predicate elab provides therules corresponding to the elaboration of the clause D as explained in Section 3.1, then localClauses transforms theminto the used representation hhcnClause(Vars,Head,Body). Calling to addLocalClauses the extended databaseDelta1= Delta∪ {D} is obtained. The execution of

fixPointStrat(Delta1,Stratification,StG,Fix)

finds Fix= fixStG(Delta1).Once Fix is computed, it is used to force G with the augmented set Delta1. This corresponds to prove

force(Delta1,Stratification,Fix,G,C), which implies T ni (I ′),Delta ∪ {D} �� (G,C), which is what we wanted

to prove.For the existential (clause (6)), according to the definition of �� , to find C such that

I,Delta ��(ex(X,G),C

)

(analogously for fa(X,G), clause (7)), we obtain G1 as the result of replacing X by a new variable X1 in G; then we proveI,Delta �� (G1,C1), and finally C is obtained by solving ex(X1,C1) (fa(X1,C1), respectively).

170


For a negated atom not(At) (clause (8)), thanks to the stratification, we can ensure that every possible atom Atdeducible from the database is already present in the current interpretation I. Then, by means of lookUpAll(At,I,Ls),we find the list Ls=[C1,...,Cn] such that (At,Ci) ∈ I. The variable NLs is used to build the constraintnot(C1),...,not(Cn) (or true if Ls=[]), that we must solve to obtain the constraint C we are looking for.

Clause (9) (default case) is the forcing of an atom At. As before, we search for all the pairs (At,C1),...,

(At,Cn) ∈ I and then we build the disjunction C1=C1;...;Cn and solve it with solve.

References

[1] K. Apt, R. Bol, Logic programming and negation: A survey, J. Log. Program. 19 (1994) 9–71.[2] G. Aranda, S. Nieva, F. Sáenz-Pérez, J. Sánchez, Implementing a fixpoint semantics for a constraint deductive database based on hereditary Harrop

formulas, in: Proceedings of the 11th International ACM SIGPLAN Symposium of Principles and Practice of Declarative Programing, PPDP’09, ACMPress, 2009, pp. 117–128.

[3] G. Aranda, S. Nieva, F. Sáenz-Pérez, J. Sánchez, Incorporating integrity constraints to a deductive database system, in: XI Jornadas sobre Programacióny Lenguajes, PROLE’11, 2011, pp. 141–152.

[4] V. Bárány, B. ten Cate, M. Otto, Queries with guarded negation, in: Proceedings of the VLDB Endowment, vol. 5, 2012, pp. 1328–1339.[5] R. Barbuti, M. Martelli, A tool to check the non-floundering logic programs and goals, in: Proceedings of the Programming Language Implementation

and Logic Programming 1st International Workshop, PLILP’88, in: LNCS, vol. 348, Springer, 1988, pp. 58–67.[6] C. Beeri, R. Ramakrishnan, On the power of magic, J. Log. Program. 10 (1991) 255–299.[7] M. Benedikt, L. Libkin, Safe constraint queries, in: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database

Systems, PODS’98, ACM Press, 1998, pp. 99–108.[8] N. Bidoit, Negation in rule-based database languages: A survey, Theoret. Comput. Sci. 78 (1) (1991) 3–83.[9] A. Bonner, Hypothetical datalog: Complexity and expressibility, Theoret. Comput. Sci. 76 (1) (1990) 3–51.

[10] A. Bonner, A logical semantics for hypothetical rulebases with deletion, J. Log. Program. 32 (2) (1997) 119–170.[11] J.H. Byon, P.Z. Revesz, DISCO: A constraint database system with sets, in: Proceedings of the Workshop on Constraint Databases and Applications, in:

LNCS, vol. 1034, Springer-Verlag, 1995, pp. 68–83.[12] M. Cai, D. Keshwani, P.Z. Revesz, Parametric rectangles: A model for querying and animation of spatiotemporal databases, in: Proceedings of the 7th

International Conference on Extending Database Technology: Advances in Database Technology, EDBT’00, in: LNCS, vol. 1777, Springer-Verlag, 2000,pp. 430–444.

[13] S. Ceri, G. Gottlob, L. Tanca, Logic Programming and Databases, Springer-Verlag, 1990.[14] D. Chan, An extension of constructive negation and its application in coroutining, in: Proceedings of the North American Conference on Logic Program-

ming, NACLP’89, MIT Press, 1989, pp. 477–493.[15] H. Christiansen, T. Andreasen, A practical approach to hypothetical database queries, in: Transactions and Change in Logic Databases, in: LNCS, vol. 1472,

Springer, 1998, pp. 340–355.[16] W. Drabent, What is failure? An approach to constructive negation, Acta Inform. 32 (1995) 27–29.[17] D.M. Gabbay, N-prolog: An extension of prolog with hypothetical implication II – Logical foundations, and negation as failure, J. Log. Algebr. Program.

2 (1985) 251–283.[18] M. García-Díaz, S. Nieva, Solving constraints for an instance of an extended CLP language over a domain based on real numbers and Herbrand terms,

J. Funct. Log. Program. 2 (2003).[19] M. García-Díaz, S. Nieva, Providing declarative semantics for HH extended constraint logic programs, in: Proceedings of the 6th ACM SIGPLAN Interna-

tional Conference of Principles and Practice of Declarative Programing, PPDP’04, ACM Press, 2004, pp. 55–66.[20] M. Gebser, R. Kaminski, B. Kaufmann, T. Schaub, Answer Set Solving in Practice, Synthesis Lectures on Artificial Intelligence and Machine Learning,

Morgan and Claypool Publishers, 2012.[21] M. Gelfond, V. Lifschitz, The stable model semantics for logic programming, in: Proceedings of the International Conference on Logic Programming/Sym-

posium on Logic Programming, MIT Press, 1988, pp. 1070–1080.[22] M.F. Goodchild, Twenty years of progress: Giscience in 2010, J. Spatial Inform. Sci. 1 (2010) 3–20.[23] R. Gross, Implementation of constraint database systems using a compile-time rewrite approach, PhD thesis, ETH, Zurich, 1996.[24] J. Harland, Success and failure for hereditary Harrop formulae, J. Log. Program. 17 (1993) 1–29.[25] J. Jaffar, J.L. Lassez, Constraint logic programming, in: Proceedings of the 14th ACM Symposium on Principles of Programming Languages, POPL’87, ACM

Press, 1987, pp. 111–119.[26] J. Jaffar, M.J. Maher, Constraint logic programming: A survey, J. Log. Program. 19/20 (1994) 503–581.[27] P. Kanellakis, G. Kuper, P. Revesz, Constraint query languages, J. Comput. System Sci. 51 (1995) 26–52.[28] P. Kanjamala, P.Z. Revesz, Y. Wang, MLPQ/GIS: A GIS using linear constraint databases, in: Proceedings of the Ninth International Conference on

Management of Data, McGraw–Hill, 1998, pp. 389–393.[29] K. Kunen, Negation in logic programming, J. Log. Program. 4 (1987) 289–308.[30] G. Kuper, L. Libkin, J. Paredaens, Constraint Databases, Springer, 2000.[31] J. Leach, S. Nieva, M. Rodríguez-Artalejo, Constraint logic programming with hereditary Harrop formulas, Theory Pract. Log. Program. 1 (2001) 409–445.[32] V. Lifschitz, Introduction to answer set programming, in: Introductory course at the 16th European Summer School in Logic, Language and Information,

Unpublished draft, available at: www.cs.utexas.edu/users/vl/mypapers/esslli.ps, 2004.[33] J. Lipton, S. Nieva, Higher-order logic programming languages with constraints: A semantics, in: Proceedings of the 8th International Conference on

Typed Lambda Calculi and Applications, in: LNCS, vol. 4583, Springer-Verlag, 2007, pp. 272–289.[34] J.W. Lloyd, Foundations of Logic Programming, Springer, 1987.[35] D. Miller, G. Nadathur, F. Pfenning, A. Scedrov, Uniform proofs as a foundation for logic programming, Ann. Pure Appl. Logic 51 (1991) 125–157.[36] D. Miller, G. Nadathur, A. Scedrov, Hereditary Harrop formulas and uniform proof systems, in: Proceedings of the Second Annual IEEE Symposium on

Logic in Computer Science, LICS’87, IEEE Computer Society, 1987, pp. 98–105.[37] A. Momigliano, Minimal negation and hereditary Harrop formulae, in: Proceedings of the Logical Foundations of Computer Science (LFCS), in: LNCS,

vol. 620, Springer, 1992, pp. 326–335.[38] S. Nieva, F. Sáenz-Pérez, J. Sánchez, Formalizing a constraint deductive database language based on hereditary Harrop formulas with negation, in:

Proceedings of the International Symposium on Functional and Logic Programming, FLOPS’08, in: LNCS, vol. 4989, Springer-Verlag, 2008, pp. 289–304.[39] K. Ramamohanarao, J. Harland, An introduction to deductive database languages and systems, VLDB J. 3 (1994) 107–122.[40] P.Z. Revesz, Safe datalog queries with linear constraints, in: Proceedings of the 4th International Conference on Principles and Practice of Constraint

Programming, CP98, in: LNCS, vol. 1520, Springer, 1998, pp. 355–369.[41] P.Z. Revesz, Safe query languages for constraint databases, ACM Trans. Database Syst. 23 (1998) 58–99.

171


[42] P.Z. Revesz, Introduction to Constraint Databases, Springer, 2002.[43] P. Revesz, MLPQ/Presto Users’ Manual, Department of Computer Science and Engineering, University of Nebraska–Lincoln, 2004.[44] P. Rigaux, M. Scholl, A. Voisard, Spatial Database: With Application to GIS, Morgan Kaufmann Ser. Data Manage. Syst., Morgan Kaufmann, 2002.[45] V.A. Saraswat, The category of constraint systems is cartesian-closed, in: Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer

Science, LICS’92, IEEE Computer Society, 1992, pp. 341–345.[46] P. Stuckey, Negation and constraint logic programming, Inform. and Comput. 118 (1995) 12–33.[47] T. Dell’Armi, W. Faber, G. Ielpa, N. Leone, G. Pfeifer, Aggregate functions in DLV, in: Answer Set Programming, Advances in Theory and Implementation,

Proceedings of the 2nd International ASP’03 Workshop, 2003, CEUR-WS.org.[48] H. Tamaki, T. Sato, Old resolution with tabulation, in: Proceedings of the 3rd International Conference on Logic Programming, ICLP’86, in: LNCS, vol. 255,

1986, pp. 84–98.[49] A. Tarski, A lattice-theoretical fixpoint theorem and its applications, Pacific J. Math. 5 (1955) 285–309.[50] M. Triska, Generalising constraint solving over finite domains, in: Proceedings of the 24th International Conference on Logic Programming, ICLP’08, in:

LNCS, vol. 5366, Springer-Verlag, 2008, pp. 820–821.[51] J. Ullman, Database and Knowledge-Base Systems, vols. I (Classical Database Systems) and II (The New Technologies), Computer Science Press, 1995.[52] A. Van Gelder, K.A. Ross, J.S. Schlipf, The well-founded semantics for general logic programs, J. ACM 38 (1991) 619–649.[53] J. Wielemaker, An overview of the SWI-prolog programming environment, in: Proceedings of the 13th International Workshop on Logic Programming

Environments, Katholieke Universiteit Leuven, Department of Computer Science, 2003, pp. 1–16.[54] C. Zaniolo, Key constraints and monotonic aggregates in deductive databases, in: Computational Logic: Logic Programming and Beyond, Essays in

Honour of Robert A. Kowalski, Part II, Springer-Verlag, 2002, pp. 109–134.[55] C. Zaniolo, N. Arni, K. Ong, Negation and aggregates in recursive rules: The LDL++ approach, in: Proceedings of the International Conference on

Deductive and Object-Oriented Databases, DOOD, Springer, 1993, pp. 204–221.[56] C. Zaniolo, S. Ceri, C. Faloutsos, R.T. Snodgrass, V.S. Subrahmanian, R. Zicari, Advanced Database Systems, Morgan Kaufmann Publishers, 1997.

172

Capítulo 6

Publicaciones asociadas al tercercapítulo

[B.1] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Formalizing a Broader Recursion Coverage in SQL.En Symposium on Practical Aspects of Declarative Languages (PADL’13), volumen7752 de LNCS, páginas 93 – 108, 2013.! Página 176

[B.2] G. Aranda-López, S. Nieva, F. Sáenz-Pérez, and J. Sánchez-Hernández.Incorporating Hypothetical Views and Extended Recursion into SQLDatabase Systems.En Ken Mcmillan, Aart Middeldorp, Geoff Sutcliffe, y Andrei Voronkov, editores, LPAR-19, volumen 26 de EPiC Series, páginas 9–22. EasyChair, 2014.! Página 192


175

Formalizing a Broader RecursionCoverage in SQL

Gabriel Aranda1, Susana Nieva1, Fernando Saenz-Perez2, andJaime Sanchez-Hernandez1 ?

1 Dept. Sistemas Informaticos y Computacion, UCM, Spain2 Dept. Ingenierıa del Software e Inteligencia Artificial, UCM, Spain

[email protected], {nieva,fernan,jaime}@sip.ucm.es

Abstract. SQL is the de facto standard language for relational data-bases and has evolved by introducing new resources and expressive capa-bilities, such as recursive definitions in queries and views. Recursion wasincluded in the SQL-99 standard, but this approach is limited as onlylinear recursion is allowed, mutual recursion is not supported, and nega-tion cannot be combined with recursion. In this work, we propose a newapproach, called R-SQL, aimed to overcome these limitations and oth-ers, allowing in particular cycles in recursive definitions of graphs andmutually recursive relation definitions. In order to combine recursionand negation, we import ideas from the deductive database field, suchas stratified negation, based on the definition of a dependency graph be-tween relations involved in the database. We develop a formal frameworkusing a stratified fixpoint semantics and introduce a proof-of-concept im-plementation.

Keywords: Databases, SQL, Recursion, Fixpoint Semantics

1 Introduction

Codd’s famous paper on relational model [2] sowed the seeds for current rela-tional database management systems (RDBMS’s), such as DB2, Oracle, MySQL,SQL Server and others. Formal query languages were proposed for the relationalmodel: Relational algebra (RA) and relational calculus, which are syntacticallydifferent but semantically equivalent w.r.t. safe formulas [16]. Such RDBMS’srather rely on the SQL query language (current standard SQL:2008 [7]) thatdeparts from the relational model and goes beyond. Its acknowledged successbuilds upon an elegant and yet simple formulation of a data model with rela-tions which can be queried with a language including some basic RA-operators,which are all about relations. Original operators became a limitation for practi-cal applications of the model, and others emerged to fill some gaps, including, forinstance, aggregate operators for, e.g., computing running sums and averages.

? This work has been partially supported by the Spanish projects TIN2008-06622-C03-01 (FAST-STAMP), S2009/TIC-1465 (PROMETIDOS), and GPD-UCM-A-910502.

176

Other additions include representing absent or unknown information, which de-livered the introduction of null values and outer join operators ranging over suchvalues. Also, duplicates were introduced to account for bags (multisets) insteadof sets. Finally, we mention the inclusion of recursion (Starburst [10] was thefirst non-commercial RDBMS to implement this whereas IBM DB2 was the firstcommercial one), a powerful feature to cope with queries that must be other-wise solved by the intermixing with a host language. However, as pointed outby many (see, e.g., [9],[13]), the relational model has several limitations. Thus,such current RDBMS’s include that extended “relational” model, which is farfrom the original one and it is even heavily criticized [3] because of nulls andduplicates.

In this work, we focus on the inclusion of recursion in SQL as currentRDBMS’s lack both a formal support and suffer a narrow coverage of recur-sion. Regarding formalization, an extension of the RA is presented in [1], with alooping construct and assignment in order to deal with the integration of recur-sion and negation. [5] is the source of the original SQL-99 proposal for recursion,which is based on the research in the areas of logic programming and deductivedatabases [16], as explained in [4]. Another example of an approach built on anextension of RA with a fixpoint construct is in [6]. However, as far as we know,these formalizations do not lead to concrete implementations, while our proposalprovides an operational mechanism allowing a straightforward implementation.

Regarding recursion coverage, there are several main drawbacks in currentimplementations of recursion: Linearity is required, so that relation definitionswith calls to more than one recursive relation are not allowed. Some other fea-tures are not supported: Mutual recursion, and query solving involving an ex-cept clause. In general, termination is manually controlled by limiting the num-ber of iterations instead of detecting that there are no further opportunities todevelop new tuples.

Here, we propose R-SQL, a subset of the SQL standard to cope with recursivedefinitions which are not restricted as current RDBMS’s do, and also allowingneater formulations by allowing concise relation definitions (much following theassignment RA-operator) and avoiding extensive writings (cf. Section 2). Forthis language, first we develop a novel formalization based on stratified inter-pretations and a fixpoint operator to support theoretical results (cf. Section 3).And, second, we propose a proof-of-concept implementation which takes a set ofdatabase relation (in general, recursive) definitions and computes their meanings(cf. Section 4). This implementation uses the underlying host SQL system andPython to compute the outcome, and can be easily adapted to be integrated asa part of any state-of-the-art RDBMS. Section 5 concludes and presents somefurther work.

2 Introducing R-SQL

In this section, we present the language R-SQL by using a minimal syntax thatallows to capture the core expressiveness of standard SQL. Namely, we consider

2

177

basic SQL constructs to cover relational algebra. Nevertheless, this language isconceived to be able to be extended in order to incorporate other usual features.R-SQL is focused on the incorporation of recursive relation definitions. The ideais simple and effective: A relation is defined with an assignment operation asa named query (view) that can contain a self reference, i.e., a relation R canbe defined as R sch := select . . . from . . . R . . ., where sch is the relationschema. Next, we introduce the formal grammar of this language, then we showby means of examples the benefits of R-SQL w.r.t. current RDBMS systems.

2.1 Syntax of R-SQL

The formal syntax of R-SQL is defined by the grammar in Figure 1. In thisgrammar, productions start with lowercase letters whereas terminals start withuppercase (SQL terminal symbols use small caps). Optional statements are de-limited by square brackets and alternative sentences are separated by pipes. Thegrammar defines the following syntactic categories:

– A database sql db is a (non-empty) sequence of relation definitions sepa-rated by semicolons (“;”). A relation definition assigns a select statement tothe relation, that is identified by its name R and its schema.

– A schema sch is a tuple of attribute names with their corresponding types.– A select statement sel stm is defined in the usual way. The clauses from and

where are optional. We also allow union and except, but notice that thesyntax for except allows only a relation name instead of a select statement

sql db ::= R sch := sel stm;

...R sch := sel stm;

sch ::= (A T,...,A T)

sel stm ::= select exp,...,exp [ from R,...,R [ where wcond ] ]

| sel stm union sel stm

| sel stm except R

exp ::= C | R.A | exp opm exp | - exp

wcond ::= true | false | exp opc exp | not (wcond)

| wcond [ and | or ] wcond

opm ::= + | - | / | *

opc ::= = | <> | < | > | >= | <=

R stands for relation names, A for attribute names, T for standard SQL types (asinteger, float, varchar(n)), and C for constants belonging to a valid SQL type.

Fig. 1. A Grammar for the R-SQL Language

3

178

as usual in SQL. This is done in order to keep simple the syntax and doesnot imply expressivity losses, because a relation name can be identified withthe select statement that defines it.

– An expression exp can be either a constant value C, an attribute of a relation(denoted by R.A), or an arithmetic expression.

– A Boolean condition wcond in the where clause of a select statement isbuilt up in the usual way, using also the standard comparison operators.

Below, we show a syntactic transformation [ ]RA that maps every select state-ment to an equivalent RA-expression in the usual way3.

– [select exp1, . . . , expk from R1, . . . , Rm where wcond]RA =πexp1,...,expk(σwcond(R1 × . . .× Rm))

– [sel stm1 union sel stm2]RA = [sel stm1]RA⋃

[sel stm2]RA– [sel stm except R]RA = [sel stm]RA − R

The formal meaning of every sel stm w.r.t. an interpretation I, stated inDefinition 5 (Section 3), evinces the idea that the expected interpretation of aselect statement [[sel stm]]I should be the set of tuples associated to the corre-sponding equivalent RA-expression [sel stm]RA.

2.2 Expressiveness of R-SQL

Next, we illustrate that R-SQL overcomes some limitations present in currentRDBMS’s following SQL-99. These languages use not exits and except clausesto deal with negation, and with recursive to engage recursion. As it is pointedout in [5], SQL-99 does not allow an arbitrary collection of mutually recursiverelations to be written in the with recursive clause. Although any mutualrecursion can be converted to direct recursion by inlining [8], our proposal allowsto explicitly define mutual recursive relations, which is an advantage in terms ofprogram readability and maintenance. For instance, using R-SQL, it is easy towrite the classical example for computing even and odd numbers up to a bound(100 in the example) as follows:

even(x float) := SELECT 0 UNION

SELECT odd.x+1 FROM odd WHERE odd.x<100;

odd(x float) := SELECT even.x+1 FROM even WHERE even.x<100;

Further, linear recursion in standard SQL restricts the number of allowedrecursive calls to be only one, i.e., Fibonacci numbers cannot be specified asfollows4:

3 Notice that arithmetic expressions are allowed as arguments in projection (π) andselect (σ) operations.

4 The relations fib1 and fib2 simply represent two aliases for fib, which are necessarybecause, for simplicity, we have not added support for renamings in R-SQL fromclauses.

4

179

fib1(n float, f float) := SELECT fib.n, fib.f FROM fib;


fib(n float, f float) := SELECT 0,1 UNION SELECT 1,1 UNION

SELECT fib1.n+1,fib1.f+fib2.f FROM fib1,fib2

WHERE fib1.n=fib2.n+1 AND fib1.n<10;

This means that several graph algorithms specified using non-linear recursioncannot be directly expressed in current recursive SQL systems [17].

Non-termination is another problem that arises associated to recursion. Forinstance, the basic transitive closure over a graph that includes a cycle makescurrent SQL systems (such as PostgreSQL and MySQL) either to reject thequery or to go into an infinite loop (some systems allow to impose a maximumnumber of iterations as a simple termination condition). Nevertheless, the fix-point computation used by R-SQL guarantees termination when dealing withfinite relations. The following example written in R-SQL defines the relationsarc (a graph with a cycle) and path (its transitive closure). The computation isterminating since both relations are finite.

arc(ori varchar(1), des varchar(1)) :=

SELECT a,b UNION SELECT b,c UNION SELECT c,a;

path(ori varchar(1), des varchar(1)) :=

SELECT arc.ori, arc.des FROM arc UNION

SELECT arc.ori, path.des FROM arc,path WHERE arc.des=path.ori

The following running example contains a concrete relation defined using theclassical transitive closure technique mentioned above.

Example 1. A database for flights. As usual, the information about direct flightscan be composed of the city of origin, the city of destination, and the length ofthe flight. Cities (Lisboa, Madrid, Paris, London, New York) will be representedwith constants (lis, mad, par, lon, ny, resp.)

flight(frm varchar(10), to varchar(10), time float) :=

SELECT ’lis’,’mad’,1.0 UNION SELECT ’mad’,’par’,1.5 UNION

SELECT ’par’,’lon’,2.0 UNION SELECT ’lon’,’ny’,7.0 UNION

SELECT ’par’,’ny’,8.0;

The relation reachable consists of all the possible trips between the cities ofthe database, maybe concatenating more than one flight.

reachable(frm varchar(10), to varchar(10)) :=

SELECT flight.frm, flight.to FROM flight UNION

SELECT reachable.frm, flight.to FROM reachable,flight

WHERE reachable.to = flight.frm;

The relation travel also gives time information about alternative trips.

5

180

travel(frm varchar(10), to varchar(10), time float) :=

SELECT flight.frm, flight.to, flight.time

FROM flight UNION

SELECT flight.frm, travel.to, flight.time+travel.time

FROM flight, travel WHERE flight.to = travel.frm;

Both reachable and travel represent transitive closures of the relationflight. Notice that if flight has a cycle, then the relation travel that in-cludes times for each trip is infinite, while reachable is not. As pointed before,reachable can be finitely computed in our system. But, as travel would pro-duce an infinite set of different tuples, some computation limitation would haveto be imposed (as the maximum time for a travel, for example). However, thisis not a drawback of our approach, but an issue due to using infinite relations(built with arithmetic expressions).

The relation madAirport contains travels departing or arriving in Madrid,while avoidMad contains possible travels that neither begin, nor end in Madrid.

madAirport(frm varchar(10), to varchar(10)) :=

SELECT reachable.frm, reachable.to FROM reachable

WHERE (reachable.frm = ’mad’ OR reachable.to = ’mad’);

avoidMad(frm varchar(10), to varchar(10)) :=

SELECT reachable.frm, reachable.to FROM reachable

EXCEPT madAirport;

This definition includes negation together with recursive relations. This com-bination can not be expressed in SQL-99 as it is shown in [4].

3 A Stratified Fixpoint Semantics for R-SQL

It is well-known that the combination of negation and recursion in databaselanguages is a difficult task [1]. This problem has been tackled with stratifiedfixpoint semantics in several works [12, 11, 14], and we have found that thesetechniques can be also applied to our proposal to obtain an operational seman-tics for R-SQL. In this section we present a novel formalization of recursiveSQL relations by means of a stratified fixpoint interpretation that formalizesthe meaning of R-SQL-databases, and we show how to compute such fixpoint.

Next, we introduce the notions of dependency graph and stratification thatprovide the basis for the stratified negation formalization we are looking for.Then, we define the concept of stratified interpretations, and prove the exis-tence of the fixpoint of a continuous operator as the required interpretation of adatabase. The obtained semantics will be the basis of the implementation of aconcrete R-SQL database system.

3.1 Dependency Graph and Stratification

Stratification is based on the definition of a dependency graph for a database. Inthe following, we consider a database sql db defined as R1sch1:= sel stm1 ; . . . ;

6

181

Rnschn:= sel stmn. We denote by RN the set {R1,...,Rn} of relation namesof sql db. We assume that relations are well defined, in the sense that therelation names used inside sel stm1 . . . sel stmn are in RN. The dependencygraph associated to sql db, denoted by DGsql db, is a directed graph whosenodes are the elements of RN, and the edges, that can be negatively labelled, aredetermined by the dependencies between the database relations, that are definedas follows. A relation definition of the form R sch := sel stm produces edges inthe graph from every relation name inside sel stm to R. Those edges produced bythe relation name that is just to the right of an except are negatively labelled.

Definition 1. For every two relations R1, R2 ∈ RN, we say:

– R2 depends on R1 if there is a path from R1 to R2 in DGsql db.– R2 negatively depends on R1 if there is a path from R1 to R2 in DGsql db with

at least one negatively labelled edge.

Example 2. Consider the database of Example 1. Its corresponding set of re-lation names is RN = {flight, reachable, travel, madAirport, avoidMad}. Itsdependency graph is depicted in Figure 2, where negatively labelled edges areannotated with ¬.

Definition 2. A stratification of sql db is a mapping str : RN → {1, . . . , n},such that:

– str(Ri) ≤ str(Rj), if Rj depends on Ri,– str(Ri) < str(Rj) if Rj negatively depends on Ri.

sql db is stratifiable if there exists a stratification for it. In this case, for everyR ∈ RN, we say that str(R) is the stratum of R. We denote by numstr the maxi-mum stratum of the elements of RN. And str(sel stm) represents the maximumstratum of the relations included in sel stm.

Intuitively, a relation name preceded by an except plays the role of a negatedpredicate (relation) in the deductive database field. A stratification-based solvingprocedure ensures that when a relation that contains an except in its definitionis going to be calculated, the meaning of the inner negated relation has beencompletely evaluated, avoiding nonmonotonicity, as it is widely studied in Data-log [16]. The novelty lies on introducing these ideas into the field of the relationalmodel.

Fig. 2. DGsql db of Example 1

7

182

3.2 Stratified Interpretations and Fixpoint Operator

From now on, we consider a stratifiable sql db, and that str is a stratificationfor it. In the previous section, we established that in a relation definition for R

sch, the schema sch is a sequence of type declarations for the attributes of R. Inorder to give meaning to this relation, we assume that every type T included insch denotes a domain D. In previous examples we have used two types: varchar,denoting the domain of strings, and float, denoting a numeric domain. We willconsider a universal domain D which is the union of the family of the considereddomains. Relations of arity k will denote a set of k-tuples included in Dk. Ingeneral, every relation denotes a subset of T =

⋃n≥1Dn.

Interpretations are functions that associate an element of P(T ) to each el-ement of RN. So, considering the usual relational model terminology of schemaand instance of a relation, the interpretation of a relation in our model can beseen as the relationship between the schema and the instance of the relation.Interpretations are classified by strata. An interpretation of a stratum i givesmeaning to the relations of strata less or qual to i. Next, we formalize the conceptof interpretation over a stratum.

Definition 3. An interpretation I for sql db, over the stratum i, 1 ≤ i ≤numstr is a function from RN to P(T ), such that, for each R ∈ RN:

– If R has schema (A1T1, . . . , ArTr), and D1, . . . , Dr are, respectively, the do-mains denoted by T1, . . . , Tr, then I(R) ⊆ D1 × . . .×Dr.

– I(R) = ∅, if str(R) > i.

The set of interpretations for sql db over the stratum i, 1 ≤ i ≤ numstr isdenoted by Isql db

i . The following definition provides an order on Isql dbi .

Definition 4. Let i ≥ 1, and I1, I2 ∈ Isql dbi . I1 is less or equal than I2 at

stratum i, denoted by I1 vi I2, if the following conditions are satisfied for everyR ∈ RN:

– I1(R) = I2(R), if str(R) < i.– I1(R) ⊆ I2(R), if str(R) = i.

It is straightforward to check that for any i, 1 ≤ i ≤ numstr, (Isql dbi ,vi)

is a poset. The main question is that when an interpretation over a stratum iincreases, the set of tuples associated to the relations whose stratum is i canincrease, but the sets associated to relations of smaller strata remain invariable.In addition, this poset is a cpo, as it is proved in the following lemma.

Lemma 1. For any i ≥ 1, the pair (Isql dbi ,vi) is a complete partially ordered

set. Moreover, if {In}n≥0 is a chain of interpretations in (Isql dbi ,vi), then I,

defined as I(R) =⋃

n≥0 In(R), is the least upper bound of {In}n≥0.

Proof. It is easy to prove that I ∈ Isql dbi , and that it is an upper bound. In

addition, if I is another upper bound, that implies: If str(R) < i, I(R) = In(R)for every n ≥ 0, and hence I(R) = I(R). If str(R) = i, In(R) ⊆ I(R) for everyn ≥ 0, then

⋃n≥0 In(R) ⊆ I(R). Therefore I vi I, by the definition of vi. �

8

183

The following definition formalizes the meaning of a select statement sel stm

in the context of a concrete interpretation I, both associated to a concrete sql db

database. As we pointed out before, the interpretation of a sel stm will be theset of tuples associated to its corresponding RA-expression, [sel stm]RA, whenthe value of the involved relation names is given by I.

Definition 5. Let i ≥ 1, and I ∈ Isql dbi . Let sel stm be a select statement

including only relation names of RN, such that str(sel stm) ≤ i. We recursivelydefine the interpretation of sel stm w.r.t. I, denoted by [[sel stm]]I , as:

– [[sel stm1 union sel stm2]]I = [[sel stm1]]I⋃

[[sel stm2]]I , where⋃

standsfor the set union.

– [[sel stm except R]]I = [[sel stm]]I \ I(R), where \ represents set difference.– [[select exp1, . . . , expk]]I = {(exp1, . . . , expk)}, where expi is the mathe-

matical evaluation of expi.– [[select exp1, . . . , expk from R1, . . . , Rm where wcond]]I={(exp1[a/A], . . . , expk[a/A])|a∈I(R1)×. . .×I(Rm) and wcond[a/A] is satisfied}.

A is a sequence of attributes labelled with their corresponding relation names.More precisely, if Aj1, . . . , A

jrj are the attributes of Rj , 1 ≤ j ≤ m, then A represents

the complete sequence R1.A11, . . . , R1.A

1r1 , . . . , Rm.A

m1 . . . Rm.A

mrm . expj [a/A], 1 ≤

j ≤ k, is the mathematical evaluation of expj , after replacing the tuple a by

A. And wcond[a/A] is the evaluation of the Boolean expression wcond, with theprevious substitution.

Example 3. Consider the definitions of the relations odd and even of Section2.2. Let us assume a concrete interpretation I such that I(even) = {(0), (2)}and I(odd) = ∅. Hence, the interpretation of the select statement that definesthe relation odd w.r.t. I is:[[SELECT even.x+1 FROM even WHERE even.x<100]]I = {(even.x+1)[a/even.x]|(a) ∈ I(even) and (even.x<100) [a/even.x]is satisfied} = {(1), (3)}.

The case of the relation even is analogous:[[SELECT 0 UNION SELECT odd.x+1 FROM odd WHERE odd.x<100]]I =[[SELECT 0]]I

⋃[[SELECT odd.x+1 FROM odd WHERE odd.x<100 ]]I = {(0)} ⋃

{(odd.x+1)[a/odd.x] |(a) ∈ I(odd), (odd.x<100)[a/odd.x] is satisfied} = {(0)}.Notice that the interpretation I defined by I(even) = {(0), (2), . . . , (100)}

and I(odd) = {(1), (3), . . . , (99)} satisfies:

I(even) = [[SELECT 0 UNION SELECT odd.x+1 FROM odd WHERE odd.x<100]]I .

I(odd) = [[SELECT even.x+1 FROM even WHERE even.x<100]]I .

The semantics of sql db will be formalized by means of an interpretation Iover numstr, such that for every R ∈ RN, if R sch := sel stm is the definitionof R in sql db, then I maps the set [[sel stm]]I to R, as the interpretation Iof Example 3 does. For every stratum i, the appropriate interpretation thatgives the complete meaning to each relation of stratum i is the least fixpointof a continuous operator over the set of interpretations of stratum i. Thesefixpoint interpretations are constructed sequentially from stratum 1 to numstr.

9

184

The fixpoint of the last stratum numstr provides the semantics for the wholedatabase. Some technical lemmas are shown in order to ensure the existence ofsuch fixpoint interpretations.

The following lemma states that the sets of tuples denoted by a select state-ment of a stratum i, w.r.t. two ordered interpretations, satisfy an inclusion re-lation that is in accordance with the order vi between the two interpretations.

Lemma 2. Let i ≥ 1, R ∈ RN, with str(R) ≤ i, and I1, I2 ∈ Isql dbi , such that

I1 vi I2. Then, every sel stm included in the select statement that defines R

holds:

– If str(sel stm) < i, then [[sel stm]]I1 = [[sel stm]]I2 .– If str(sel stm) = i, then [[sel stm]]I1 ⊆ [[sel stm]]I2 .

Proof. The proof is inductive on the structure of sel stm. Here, we only showthe most critical case. The others are similar.

[[sel stm except R′]]I1 = [[sel stm]]I1 \ I1(R′). According to the definitionof stratification, str(R′) < i, because we are assuming that sel stm exceptR′ occurs in the definition of R and str(R) ≤ i. Hence I1(R′) = I2(R′). Now,if str(sel stm except R′) ≤ i, then [[sel stm]]I1 ⊆ [[sel stm]]I2 , applying theinduction hypothesis. Therefore [[sel stm except R′]]I1 ⊆ [[sel stm exceptR′]]I2 , with equality for the case str(sel stm except R′) < i. �

The following lemma underlies the proof of the continuity of the operatorwhose fixpoint provides the semantics of a database (it can be proved by induc-tion on the structure of sel stm).

Lemma 3. Let i ≥ 1, R ∈ RN, with str(R) ≤ i, and {In}n≥0 be a chain in Isql dbi .

Then, for every sel stm included in the definitions of R, if I =⊔

n≥0 In, there

exists n ≥ 0, such that [[sel stm]]I = [[sel stm]]In .

Next, for every i, a continuous operator Ti over the set Isql dbi of interpre-

tations of stratum i is defined. Analogously to the theoretical foundations thatsupport Datalog [16], we choose the least fixpoint of Ti, as the interpretation overi that will give meaning to the relations of stratum i. In accordance with theKnaster-Tarski theorem, this fixpoint can be obtained as the least upper boundof the chain of interpretations resulting by successively applying this operatorto a minimal interpretation.

Definition 6. Let 1 ≤ i ≤ numstr. The operator Ti : Isql dbi −→ Isql db

i trans-

forms interpretations over i as follows. For every I ∈ Isql dbi , R ∈ RN:

– Ti(I)(R) = I(R), if str(R) < i.– Ti(I)(R) = [[sel stm]]I , if str(R) = i and R sch := sel stm is the definition

of R in sql db.– Ti(I)(R) = ∅, if str(R) > i.

This operator is proved to be monotone (it is a consequence of Lemma 2)and continuous for every i.

10

185

Lemma 4. [Monotonicity of Ti] Let i ≥ 1 and I1, I2 ∈ Isql dbi , such that I1 vi

I2. Then, Ti(I1) vi Ti(I2).

Proposition 1. [Continuity of Ti] Let i ≥ 1 and {In}n≥0 be a chain of inter-

pretations in Isql dbi (I0vi I1 vi I2 vi . . .). Then, Ti(

⊔n≥0 In) =i

⊔n≥0 Ti(In).

Proof. The proof of⊔

n≥0 Ti(In) vi Ti(⊔

n≥0 In) is a direct consequence of themonotonicity of Ti (Lemma 4). Let us prove Ti(

⊔n≥0 In) vi

⊔n≥0 Ti(In):

– If str(R) < i, then Ti(⊔

n≥0 In)(R) =⊔

n≥0 In(R), by the definition of Ti.Now, for every n ≥ 0, In(R) = Ti(In)(R), also by definition of Ti. Therefore,(Ti(

⊔n≥0 In))(R) = (

⊔n≥0 Ti(In))(R).

– If str(R) = i, then Ti(⊔

n≥0 In)(R) = [[sel stm]]⊔

n≥0 In , by definition of Ti.

And, in accordance with Lemma 3, for some n ≥ 0: [[sel stm]]⊔

n≥0 In ⊆[[sel stm]]In . Now [[sel stm]]In = Ti(In)(R), by definition of Ti, and obvi-ously Ti(In)(R) ⊆ ⋃

n≥0 Ti(In)(R), but⋃

n≥0 Ti(In)(R) = (⊔

n≥0 Ti(In))(R),by Lemma 1. Hence, we conclude Ti(

⊔n≥0 In)(R) ⊆ (

⊔n≥0 Ti(In))(R). �

Next, the expected result corresponding to the existence of least fixpointstratum by stratum is shown.

Lemma 5. The operator T1 has a least fixpoint, which is⊔

n≥0 Tn1 (∅), where

∅ : RN→ P(T ) is the interpretation such that ∅(R) = ∅ for every R ∈ RN.

Proof. By the Knaster-Tarski fixpoint theorem [15], using Proposition 1. �

We will denote⊔

n≥0 Tn1 (∅) by fix1, i.e., fix1 represents the least fixpoint

at stratum 1. Using Example 1, Figure 3 shows the tuples corresponding to thesuccessive applications of the operator T1 until fix1(travel) is obtained.

Consider now the sequence {Tn2 (fix1)}n≥0 of interpretations in (Isql db

2 ,v2)greater than fix1. Using the definition of Ti and the fact that fix1(R) = ∅ forevery R such that str(R) ≥ 2, it is easy to prove, by induction on n ≥ 0, thatthis sequence is a chain:

fix1 v2 T2(fix1) v2 T2(T2(fix1)) v2 . . . ,v2 Tn2 (fix1), . . .

Tn1 (∅)(travel) Set of tuples

T 11 (∅)(travel)

{(lon,ny,7.0), (par,lon,2.0), (par,ny,8.0),(mad,par,1.5), (lis,mad,1.0)}

T 21 (∅)(travel) {(lis,par,2.5), (par,ny,9.0), (mad,ny,9.5), (mad,lon,3.5)}T 31 (∅)(travel) {(lis,ny,10.5), (lis,lon,4.5), (mad,ny,10.5)}T 41 (∅)(travel) {(lis,lon,4.5), (mad,ny,10.5), (lis,ny,11.5)}

Fig. 3. Obtaining fix1(travel)

11

186

As before, in accordance with Proposition 1, {Tn2 (fix1)}n≥0 has a least upper

bound,⊔

n≥0 Tn2 (fix1), in (Isql db

2 ,v2) that is the least fixpoint of T2 containingfix1. We denote this interpretation by fix2.

By proceeding successively, for every i, 1 < i ≤ numstr, a chain:

fixi−1 vi Ti(fixi−1) vi Ti(Ti(fixi−1)) vi . . . vi Tni (fixi−1) . . .

can be defined, and a fixpoint of Ti, fixi =⊔

n≥0 Tni (fixi−1), can be found.

Theorem 1. There is a fixpoint interpretation fix : RN −→ P(T ), such thatfor every R ∈ RN, if sel stm is the definition of R, then fix(R) = [[sel stm]]fix.

Proof. The interpretation fix we are looking for is fixnumstr , the least fixpointof the operator Tnumstr , applied to fixnumstr−1. As it has been pointed out, thisfixpoint exists and verifies fix1 vnumstr fix2 vnumstr . . . vnumstr fixnumstr .Moreover, if str(R) = i, 1 ≤ i ≤ numstr, and it is defined by the statementsel stm, then fix(R) = fixi(R) = Ti(fixi)(R), because fixi is the fixpointof Ti. Now, Ti(fixi)(R) = [[sel stm]]fixi , by definition of Ti. We can concludefix(R) = [[sel stm]]fix, trivially if i = numstr, or using Lemma 2, if i < numstr,because fixi vnumstr fix. �

Therefore, the interpretation fix defines the fixpoint semantics of sql db.This semantics is the support of the database system prototype we have imple-mented, which is described next.

4 Implementing R-SQL

In this section we introduce a working proof-of-concept implementation forthe R-SQL language that takes a set of relation definitions and outputs theirmeanings if a stratification can be found. More specifically, taking a stratifiabledatabase definition in the R-SQL syntax as input, we get a SQL database (fora concrete SQL database system), that corresponds to the fixpoint semanticsof the input database. If the database is not stratifiable, the system throws anerror message and stops.

4.1 An Algorithm to Compute the Database Fixpoint

Let sql db be the definition of a R-SQL database. In order to create the corre-sponding SQL database we have to generate the appropriate SQL sentences forbuilding the expected relations, that will be eventually processed by a RDBMS.The algorithm takes sql db as input, i.e., a sequence of relation definitions,R1sch1 := sel stm1; . . . ; Rnschn := sel stmn. The computation builds thedependency graph for sql db, as shown in Section 3.1, then calculates a strati-fication for it obtaining the sets R1, . . . ,Rnumstr , where Ri is the set of relationsof stratum i, and finally the fixpoint is computed with the following algorithm:

12

187

(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)

str:=1while str ≤ numstr do

for each Ri ∈ Rstr do create table Ri schichange := truewhile change do

size := rel size sum(Rstr)for each Ri ∈ Rstr do

insert into Ri select * from sel stmiexcept select * from Ri;

change = (size 6= rel size sum(Rstr))end whilestr:=str+1

end while

This algorithm generates for each Ri of sql db a SQL table with the elementsof fix(Ri). Each iteration of the external while at line 2 corresponds to a stratumstr, and builds the tables of the relations of this stratum, by calculating fixstr.To do that, first of all an empty table with the corresponding attributes iscreated for every relation in the stratum str (line 3). Then, the iteration nof the innermost while at line 5 computes Tn

str(fixstr−1), as we will explain.For every relation Ri of str, it submits the insert statement at line 8. Thesentence select * from sel stmi selects all tuples as defined by the relationRi (notice that sel stmi is a valid SQL statement). Assuming that the currentdatabase instance coincides with the value of the interpretation Tn−1

str (fixstr−1),then in accordance with Definition 5, the set of tuples that satisfy that SQL

statement coincides with [[sel stmi]]Tn−1str (fixstr−1). And this is Tn

str(fixstr−1)(Ri),by Definition 6. The tuples already present in the table are excluded to avoidrepetitions (with the except clause at line 9). In this way, Tn

str(fixstr−1)(Ri)is obtained for every Ri of stratum str. The expression rel size sum(Rstr) atline 10 is equal to

∑R∈Rstr

|R|, where |R| is the current number of tuples of thetable corresponding to R. Therefore, the variable change controls changes onthe table sizes in order to stop the process, since change = false means thatTnstr(fixstr−1) = Tn−1

str (fixstr−1), so that fixstr has been reached. Then, thelast iteration of the external while calculates fixnumstr , the fixpoint of sql db.

4.2 A Concrete Implementation

The concrete implementation of this algorithm can be done in a number of ways.We have developed a Prolog program that processes the R-SQL input file, buildsthe dependency graph and the stratification (if exists), and finally produces aPython module with the code of the previous section. In fact, the external whileat line 2 is expanded according to the number of strata, writing explicitly thecorresponding code for each stratum. The for loop at line 7 is also expanded aswe will see in Example 4. We have chosen Python as the host language mainly be-cause is multiplatform and it provides easy connections with different database

13

188

systems such as PostgreSQL, MySQL, or even via ODBC, which allows con-nectivity to almost any RDBMS. The additional features required for the hostlanguage are basic: Loops, assignment and basic arithmetic.

Example 4. Below, we show the result of executing our proposed algorithm forthe sql db of Example 1. The system assigns stratum 1 to flight, reachable,travel, madAirport, and stratum 2 to avoidMad. Next, we detail some partsof the code generated stratum by stratum. Firstly, for stratum 1, we have:

while change dosize := rel size sum(Rstr)INSERT INTO flight SELECT ’lis’,’mad’,1 UNION SELECT ’mad’,’par’,1

UNION SELECT ’par’,’lon’,2 UNION SELECT ’lon’,’ny’,7

UNION SELECT ’par’,’ny’,8 EXCEPT SELECT * FROM flight;

INSERT INTO reachable SELECT flight.frm, flight.to

FROM flight UNION SELECT reachable.frm, flight.to

WHERE reachable.to = flight.frm

EXCEPT SELECT * FROM reachable;

INSERT INTO travel SELECT * FROM flight UNION

SELECT flight.frm, travel.to, flight.time+travel.time

FROM flight, travel WHERE flight.to = travel.frm

EXCEPT SELECT * FROM travel;

INSERT INTO madAirport SELECT travel.frm,travel.to

FROM travel EXCEPT SELECT * FROM madAirport;

change = (size 6= rel size sum(Rstr))end while

In the first iteration of this loop, we obtain all the tuples for flight andmadAirport relations. But the recursive definitions for reachable and travelneed more iterations. As mentioned before, those iterations correspond to thesuccessive applications of T1. The tuples added for travel at each iteration areshown in Figure 3 (Section 3.2). After five iterations, the loop stops and the firststratum is completed. In the second stratum we consider the avoidMad relation:

INSERT INTO avoidMad SELECT travel.frm,travel.to FROM travel

EXCEPT SELECT * FROM madAirport EXCEPT SELECT * FROM avoidMad;

This second loop ends after two iterations. This completes fix2 for our sql db,i.e., it obtains the semantics of the working example database.

4.3 Integrating R-SQL into a RDBMS

Our proposal establishes the core for introducing a novel approach for recursionin SQL. The current implementation of R-SQL has been conceived as a proof-of-concept of the theoretical foundations of the language. As we have stated, thisleads to compute the semantics of the whole database from scratch. Nevertheless,the main goal of the proposal is not to introduce a new database language, but

14

189

to allow less-restricted recursive relation definitions in existing SQL engines.In that sense, our proposal can be understood as the foundation of an existingSQL RDBMS that supports extended forms of recursion, allowing users to definerecursive relations as regular views using the R-SQL techniques, developed inthis work. Once an R-SQL database definition has been processed, the tablesobtained can be stored as a database instance in a concrete RDBMS. On theone hand, the user can formulate queries that will be solved using those tables(without performing any further fixpoint computation). On the other hand, aswe pointed out before, the user can define new recursive relations using views.Those views can be readily used in conjunction with other regular views, andthey can be either computed on demand or can be materialized.

In order to compute the answer of new recursive relations, the current (re-lation) instance can be considered as a stratified R-SQL database. It is correctto assign higher strata to the new relations, because none of the existing re-lations depend on the new ones, and a relation definition does not introducedependencies between the relations that appear in its select statement. Then,their tuples can be obtained by executing the algorithm in Section 4.1 to com-pute the fixpoint of their corresponding strata, therefore saving recalculating theprevious ones. Moreover, it is straightforward to modify the algorithm to get alazy evaluation of such relations, performing iterations only when new values aredemanded. To seamlessly integrate this into a RDBMS, we can profit from thefourth-generation languages (e.g., SQL PL in IBM DB2 and PL/SQL in Oracle).

5 Conclusions

In this paper, we have introduced the R-SQL language as a new approach for in-corporating recursion in SQL. This is not a trivial task, and it was not addressedin the initial proposals of SQL. It was firstly introduced in the 1999 standard,allowing only a limited form of recursion, namely linear recursion, which doesnot allow neither multiple recursive calls nor mutually recursive definitions. Thedifficulties increase when recursion is combined with negation.

We have developed a theoretical framework and a suitable implementation forR-SQL, inspired on the stratification techniques and fixpoint computations usedfor instance in Datalog. The stratification mechanism implies to impose somesyntactic conditions on the database definitions, that guarantee that the fixpointfor such a database can be computed in a finite number of steps. This condition isless restrictive than the linearity conditions required by the standard SQL. Thismeans that our approach is more expressive than the one adopted in SQL; inaddition our language is supported by a solid computational semantics. We havepresented a proof-of-concept implementation of the R-SQL database definitionlanguage based on this semantics. This implementation produces as output aset of standard SQL statements embedded in a Python program that builds therelational tables corresponding to the fixpoint of the input database definition.This implementation has been tested with PostgreSQL, but the architecture can

15

190

be easily ported to any RDBMS. The system is available at https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems/RSQL.

As already suggested, our approach can be integrated into a state-of-the-artRDBMS. This can be dealt by resorting to database function definitions, whichallow cursor-returning functions. In addition for this integration to be practi-cal, performance improvements play a key role as, e.g., indexing of temporaryrelations during fixpoint computations and identifying tuple seeds in relationdefinitions that do not need to be recomputed.

References

1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley,1995.

2. E. Codd. A Relational Model for Large Shared Databanks. Communications ofthe ACM, 13(6):377–390, June 1970.

3. C. J. Date. SQL and relational theory: how to write accurate SQL code. O’Reilly,Sebastopol, CA, 2009.

4. S. J. Finkelstein, N. Mattos, I. S. Mumick, and H. Pirahesh. Expressing recursivequeries in SQL. Technical report, ISO, 1996.

5. H. Garcia-Molina, J. D. Ullman, and J. Widom. Database systems - the completebook (2. ed.). Pearson Education, 2009.

6. M. A. W. Houtsma and P. M. G. Apers. Algebraic optimization of recursive queries.Data Knowl. Eng., 7:299–325, 1991.

7. ISO/IEC. SQL:2008 ISO/IEC 9075(1-4,9-11,13,14):2008 Standard, 2008.8. O. Kaser, C. R. Ramakrishnan, and S. Pawagi. On the conversion of indirect to

direct recursion. ACM Lett. Program. Lang. Syst., 2(1-4):151–164, Mar. 1993.9. R. A. Kowalski. Logic for data description. In Logic and Data Bases, pages 77–103,

1977.10. I. S. Mumick and H. Pirahesh. Implementation of magic-sets in a relational

database system. SIGMOD Rec., 23:103–114, May 1994.11. S. Nieva, F. Saenz-Perez, and J. Sanchez. Formalizing a Constraint Deductive

Database Language based on Hereditary Harrop Formulas with Negation. InFLOPS’08, volume 4989 of LNCS, pages 289–304. Springer-Verlag, 2008.

12. K. Ramamohanarao and J. Harland. An introduction to deductive database lan-guages and systems. The VLDB Journal, 3(2):107–122, 1994.

13. R. Reiter. Towards a logical reconstruction of relational database theory. In OnConceptual Modelling (Intervale), pages 191–233, 1982.

14. J. Shepherdson. Negation in logic programming. In J. Minker, editor, Founda-tions of Deductive Databases and Logic Programming, pages 19–88. Kaufmann,Los Altos, CA, 1988.

15. A. Tarski. A lattice-theoretical fixpoint theorem and its applications. PacificJournal of Mathematics, 5:285–309, 1955.

16. J. Ullman. Database and Knowledge-Base Systems Vols. I (Classical DatabaseSystems) and II (The New Technologies). Computer Science Press, 1995.

17. C. Zaniolo, S. Ceri, C. Faloutsos, R. T. Snodgrass, V. S. Subrahmanian, and R. Zi-cari. Advanced Database Systems. Morgan Kaufmann Publishers Inc., 1997.

16

191

Incorporating Hypothetical Views and Extended

Recursion into SQL Database Systems ∗

Gabriel Aranda-Lopez1, Susana Nieva1,Fernando Saenz-Perez2 and Jaime Sanchez-Hernandez1


[email protected], {nieva,fernan,jaime}@sip.ucm.es

Abstract

Current database systems supporting recursive SQL impose restrictions on queries suchas linearity, and do not implement mutual recursion. In a previous work we presentedthe language and prototype R-SQL to overcome those drawbacks. Now we introduce aformalization and an implementation of the database system HR-SQL that, in addition toextended recursion, incorporates hypothetical reasoning in a novel way which cannot befound in any other SQL system, allowing both positive and negative assumptions. Theformalization extends the fixpoint semantics of R-SQL. The implementation improves theefficiency of the previous prototype and is integrated in a commercial DBMS.

1 Introduction

Current relational database systems provide limited support for the ANSI/ISO standard lan-guage SQL w.r.t. recursion. In [2] we proposed a new approach, called R-SQL, aimed toovercome some of such limits. We developed a formal framework, borrowing techniques fromthe deductive database field, such as stratified negation [15], and following the original relationaldata model [7], so avoiding both duplicates and nulls (as encouraged by [8]). But in additionto recursion, several applications require predictive and historical analysis over large amountsof data [10], typically making some sort of assumptions to deduce conclusions. Hypotheticalqueries, also known as ”what-if” queries, can help managers to take decisions on scenarios thatare somewhat changed with respect to a current state. Such queries are used, for instance,for deciding what resources must be added, changed or removed to optimize some criterion.Current applications include OLAP environments business intelligence, and e-commerce.

So, driven by these needs, with the work proposed in this paper we face the inclusion ofhypothetical queries and views in the recursive SQL setting based on [2]. To this end, weextend a subset of standard SQL to embody both recursive definitions and hypothetical viewsin the language HR-SQL. We summarize the syntax and semantics of the definition language inSection 2, and introduce a novel syntax and semantics of queries and view definitions in sections3 and 4, respectively. An assumption (hypothetical reasoning) can be either overloading therelation (with the new clause assume query in relation) or restricting it (assume query notin relation). For supporting our approach, we propose a stratified fixpoint semantics whichis an extension of the semantics presented in [2] to give meaning to hypothetical queries andview definitions.

Since our targets are current state-of-the-art relational database systems (DBMS’s), weadhere to stratification in order to return a single answer set [15], which is a natural expectation

∗This work has been partially supported by the Spanish projects S2009/TIC-1465 (PROMETIDOS) andUCM-BSCH-GR58/08-910502 (GPD-UCM).

1

192

Incorporating Hypothetical Views and Extended Recursion into SQL Database Systems Susana Nieva et al.

from current database users. In Section 5, we propose an implementation of the HR-SQLlanguage for the concrete system IBM DB2 (although it is easily adaptable to any other system),which improves the prototype introduced in [2] by factoring out those fragments of SQL queriesthat can be computed out of the fixpoint operator loop. In addition, we propose an efficientquery solving procedure by generating SQL PL scripts and temporary tables to avoid locksand logging, therefore providing memory scalability and performance. Moreover, we providea shell in which users can submit regular, hypothetical, and extended recursive SQL queries.Whereas regular queries are directly sent to the host DBMS, hypothetical and recursive queriesare processed by such SQL PL scripts.

Related Work. To the best of our knowledge, there have been neither a formalization nor asystem for SQL combining recursion and hypothetical queries as we do. However, we list somerelated works in both the relational and logic programming fields. With respect to hypotheticalrelational databases, the very first work was presented in [13], where hypotheses were statedby replacing actual data with a replace operator, and assumed data persist until the queryis finished. In that early work, recursion was not considered. Works as [9] present extensionsof RA to support hypothetical queries by means of updates and with no recursion. Also, theeducational system DES [12] includes hypothetical SQL queries, but neither hypothetical viewsnor negative assumptions are supported. On the logic programming arena, Hypothetical Data-log [3, 5] fits into intuitionistic logic programming, an extension of logic programming includingboth embedded implications and negation, and integrates atomic assumptions as hypotheti-cal queries in the inference system. It has been a proposal thoroughly studied from semanticand complexity point-of-views, allowing to assume atoms in order to prove goals. Transactionlogic [4] allows a database to be updated by transactions with elementary updates, and thetransaction base is immutable. If bulk updates are needed, the transaction base must accountfor them. In [1] we developed a more expressive setting for constraint deductive databasesbased on Hereditary Harrop formulas. In particular, it provides support for assuming rules ashypothetical queries. Our current work can be understood as porting this feature to relationaldatabases by adding the assume clause involving assumptions over relations which intensionallyadd new tuples (i.e., with select statements) to such relations. As a surplus, in this work, weallow to intensionally remove tuples from such relations by negative assumptions.

2 The Definition Database Language of HR-SQL

This language is oriented to provide definition for databases using a SQL-like language, whichallows to define recursive definitions of relations. The formal syntax for a database definitionis described by the following grammar:

db ::= R sch := sel stm; ... R sch := sel stm;

sch ::= (A T,...,A T)

sel stm ::= select exp,...,exp [from R,...,R [where cond]]

| sel stm union sel stm | sel stm except sel stm

exp ::= C | R.A | exp m op exp | -exp

cond ::= true | false | exp b op exp | not cond | cond [and|or] cond

m op ::= + | - | / | *

b op ::= = | <> | < | > | >= | <=

2

193


Uppercase identifiers denote terminal symbols and lowercase ones denote grammar produc-tions, R stands for relation names, A for attribute names, T for standard SQL types (as inte-ger, float, varchar(n)), cond for Boolean conditions, m op and b op for mathematical andBoolean operators respectively, and C for constants of a valid SQL type.

A database db is a (non-empty) sequence of relation definitions. A relation definition assignsa select statement to the relation, that is identified by its name R and its schema sch, that is atuple of attribute names with their corresponding types. As syntactic sugar, we admit * in theprojection list of SQL statements. The HR-SQL definition language coincides with the R-SQLintroduced in [2], but the syntax of the except operator allows now any select statement inthe right part, instead of a simple relation name (as it was the case in [2]).

Example 1. The travel database definition below (inspired on an example of [6]) representsthe information sketched in Figure 1. This database includes the relations flight, bus andboat with schema (ori varchar(10), des varchar(10), time float) to store informationabout origin (ori), destination (des) and time (time), for traveling around the Canary Islands.

Figure 1: Travel Database for the Canary Islands.

The relation link collects all the possible transports. The relation travel is the transitiveclosure of link, i.e., it provides all the possible travels of the database, maybe concatenatingany of the available transports. Their respective definitions written in HR-SQL syntax are:

link(ori varchar(10), des varchar(10), time float):=select ∗ from flight union select ∗ from boat unionselect ∗ from bus;

travel(ori varchar(10), des varchar(10), time float):=select ∗ from link unionselect link.ori, travel.des, link.time + travel.time

from link, travel where link.des=travel.ori;

From now on, RNdb stands for the set of relations names {R1, . . . , Rn} defined in a databasedb. We write RNsel stm for the set of relation names occurring in a select statement sel stm. Forthe case of a select statement of the form sel stm = sel stm1 except sel stm2 we also defineRN¬sel stm as the set of relation names occurring in sel stm2 (notice that RN¬sel stm ⊆ RNsel stm).We assume that for every R sch:= sel stm defined in db it holds that RNsel stm ⊆ RNdb.

2.1 Fixpoint Semantics

The meaning of every relation defined in a database db corresponds to the set of tuples that”satisfies” the relation definition. In [2] a stratified fixpoint semantics for the language R-SQLwas introduced. Here, we recapitulate the main concepts in order to facilitate the understandingof the following sections. In addition, we introduce the semantics of the extended except selectstatement.

3

194


The stratified fixpoint theory holds on the notion of dependency graph for a database.The dependency graph associated to db, denoted by DGdb, is a directed graph whose nodesare the elements of RNdb, and the edges (which can be negatively labeled) are determined asfollows. For any relation definition R sch := sel stm there is an edge from every relationname R′ ∈ RNsel stm to R. Those edges produced by the relation names belonging to RN¬sel stm

are negatively labeled. Then, for every pair of relations R1, R2 ∈ RNdb, we say that R2 dependson R1 if there is a path from R1 to R2 in DGdb. And R2 negatively depends on R1 if there is apath from R1 to R2 in DGdb with at least one negatively labeled edge. The previous conceptsare needed to characterize the stratifiable databases.

Definition 1. A stratification of a database db defining n relations is a mapping str : RNdb →{1, . . . , n}, such that: str(Ri) ≤ str(Rj), if Rj depends on Ri, and str(Ri) < str(Rj), if Rjnegatively depends on Ri.

The database db is stratifiable if there exists a stratification for it. In this case, for every R

∈ RNdb, we say that str(R) is the stratum of R. And for a select statement sel stm, we definestr(sel stm) = max{str(Ri) | Ri ∈ RNsel stm}.

From now on, we consider a fixed stratifiable database db and a stratification str for it. Inorder to give meaning to a relation R (A1 T1, ..., Ar Tr), we assume that every type Ti, i = 1..r,denotes a domain Di. We also assume a universal domain D, which is the union of the familyof the considered domains. Since different relations can have different arities, we use the setT =

⋃n≥1Dn. Interpretations are defined as functions that associate an element of P(T ) to

each element of RNdb, and they are classified by strata, as we formalize next.

Definition 2. Let i ≥ 1, an interpretation I for db, over the stratum i, is a function I : RNdb →P(T ), such that for every R ∈ RNdb with schema sch:

• If sch ≡ (A1 T1, . . . , Ar Tr), and D1, . . . , Dr are, respectively, the domains denoted byT1, . . . , Tr, then I(R) ⊆ D1 × . . .×Dr,

• I(R) = ∅, if str(R) > i.

The set of interpretations for db over the stratum i is denoted by Idbi . Let I1, I2 ∈ Idbi . I1 is lessthan or equal to I2 at stratum i, denoted by I1 vi I2, if the following conditions are satisfiedfor every R ∈ RNdb: I1(R) = I2(R), if str(R) < i, and I1(R) ⊆ I2(R), if str(R) = i.

It is straightforward to check that for any i, (Idbi ,vi) is a poset. The main question is thatwhen an interpretation over a stratum i increases, the set of tuples associated to the relationswhose stratum is i can increase, but the sets associated to relations of smaller strata remaininvariable. In addition, (Idbi ,vi) is a complete partially ordered set: If {In}n≥0 is a chain in

(Idbi ,vi), then I, defined as I(R) =⋃

n≥0 In(R), R ∈ RNdb, is the least upper bound of {In}n≥0.The following definition formalizes the meaning of a select statement sel stm in the context

of a concrete interpretation I.

Definition 3. Let i ≥ 1, I ∈ Idbi . Let sel stm be a select statement, such that str(sel stm) ≤ i.We recursively define the interpretation of sel stm w.r.t. I for db, denoted by [[sel stm]]I , asfollows:

• [[sel stm1 union sel stm2]]I = [[sel stm1]]I ∪ [[sel stm2]]I .

• [[sel stm1 except sel stm2]]I = [[sel stm1]]I \ [[sel stm2]]I .

4

195


• [[select exp1, . . . , expk]]I = {(exp1, . . . , expk)}, where expi denotes the mathematicalevaluation of expi.

• [[select exp1, . . . , expk from R1, . . . , Rm where cond]]I ={(exp1[a/A], . . . , expk[a/A]) | a ∈I(R1)× . . .× I(Rm), cond[a/A] is satisfied},

where A represents a sequence of attributes prefixed with their corresponding relationnames, i.e., if A

j1, . . . , A

jrj are the attributes of Rj, 1 ≤ j ≤ m, then A is the complete

sequence R1.A11, . . . , R1.A

1r1 , . . . , Rm.A

m1 , . . . , Rm.A

mrm ; the notation expj [a/A], 1 ≤ j ≤ k,

stands for the mathematical evaluation of expj, after replacing the tuple a by A; and

cond[a/A] denotes the evaluation of the Boolean expression cond, with the previous sub-stitution.

Next, for every i, an operator T dbi over the set Idbi of interpretations of stratum i for db is

defined. T dbi is continuous, as stated in [2]. The least fixpoint of T db

i is the interpretation givingmeaning to the relations of db in the stratum i.

Definition 4. The operator T dbi : Idbi −→ Idbi transforms interpretations over i as follows. For

every I ∈ Idbi and for every R ∈ RNdb:

• T dbi (I)(R) = I(R), if str(R) < i.

• T dbi (I)(R)=[[sel stm]]I, if str(R)= i and sel stm is the definition of R in db.

• T dbi (I)(R) = ∅, if str(R) > i.

Proposition 1 (Continuity of T dbi ). Let i ≥ 1 and {In}n≥0 be a chain of interpretations in Idbi

(I0vi I1 vi I2 vi . . . ). Then, T dbi (

⊔n≥0 In) =

⊔n≥0 T

dbi (In).

Therefore, the existence of a least fixpoint stratum by stratum is a direct consequence ofthe Knaster-Tarski fixpoint theorem [14].

Theorem 1. There is a fixpoint interpretation fixdb : RNdb → P(T ), such that for every R

∈ RNdb, if sel stm is the definition of R in db, then fixdb(R) = [[sel stm]]fixdb

.

The interpretation fixdb defines the semantics of db. The construction of this fixpoint isstratum by stratum as follows:

The operator T db1 has a least fixpoint, called fixdb1 , which is

⊔n≥0 (T db

1 )n(∅), the least upper

bound of the sequence {(T db1 )

n(∅)}n≥0, where (T db

1 )n(∅) is the result of n successive applications

of T db1 to the empty interpretation.

Consider now the sequence {(T db2 )

n(fixdb1 )}n≥0 of interpretations in (Idb2 ,v2) greater than

fixdb1 . Using the definition of T dbi and the fact that fixdb1 (R) = ∅ for every R such that

str(R) ≥ 2, it is easy to prove (as for the stratum 1) that such sequence is a chain, fixdb1 v2

T db2 (fixdb1 ) v2 T

db2 (T db

2 (fixdb1 )) v2 . . . ,v2 (T db2 )

n(fixdb1 ), . . . with least upper bound in (Idb2 ,v2

),⊔

n≥0 (T db2 )

n(fix1), that is the least fixpoint of T db

2 containing fixdb1 , called fixdb2 .

Now, if k = max{str(R) | R ∈ RNdb}, by proceeding successively, for every i, 1 < i ≤ k, achain, {(T db

i )n(fixdbi−1)}n≥0 can be defined, and a fixpoint of T db

i , fixdbi =⊔

n≥0 (T dbi )

n(fixdbi−1),

can be found. In addition, fixdb1 vk . . . vk fixdbk . We call fixdb to fixdbk , since it contains the

information of the whole database.

5

196


3 The Query Language of HR-SQL

As usual in SQL, users of an HR-SQL database can formulate queries by means of select state-ments. The novelty of the HR-SQL language w.r.t. R-SQL is the incorporation of hypotheticalqueries. The syntax of queries is defined as:

query ::= sel stm | sel hyp

sel hyp ::= assume hypo,...,hypo sel stm

hypo ::= sel stm [not] in R

Example 2. Consider the database of Example 1, and the query: how long does it take toarrive in Valverde from Madrid, if boat links that take more than one hour are not considered?It can be expressed in HR-SQL as:

assume select * from boat where boat.time > 1 not in link

select travel.time from travel

where travel.ori = ’MAD’ and travel.des = ’VDE’

From the logical point of view, a hypothetical query can be interpreted as an intuitionis-tic implication: it represents the value of the consequent assuming the antecedent. Next weformalize this idea.

3.1 The Semantics of a Query

As usual, the answer of a query is identified with the set of tuples that satisfy such a query.So, for a stratifiable database definition db, this answer corresponds to the interpretation ofthe query w.r.t. the fixpoint of db. The following definition formalizes this concept for thedifferent cases of queries. In the case of a hypothetical query, to reflect the changes intro-duced in the current database assuming the hypothesis, we will use the notation db[R sch :=

sel stm′/R sch := sel stm] to denote the database definition that results from the databasedb by replacing the relation definition R sch := sel stm by R sch := sel stm′. In addition,sel(query) denotes the select statement of query. More precisely sel(sel stm) = sel stm andsel(assume hypo1, . . . , hypok sel stm) = sel stm.

For readability, we give the definition only for the case of one assumption; for a sequenceof assumptions it is obtained as a simple sequential extension, considering a sequence of suchreplacements, as shown in Example 3 later.

Definition 5. Let query be a query for db. Its answer w.r.t. db, denoted by [[query]]db, isdefined by cases:

Simple query: [[sel stm]]db = [[sel stm]]fixdb

.Hypothetical query: If R sch := sel stmR is the definition of R in db, then:

• [[assume sel stm′ in R sel stm]]db= [[sel stm]]fixdb′, where

db′ = db[R sch := sel stmR union sel stm′/ R sch := sel stmR].

• [[assume sel stm′ not in R sel stm]]db=[[sel stm]]fixdb′, where

db′ = db[R sch := sel stmR except sel stm′/ R sch := sel stmR].

Example 3. Let db be the following database definition (for simplicity, we omit the schema A

int for all the relations):

R1:= select 1 union select 2 union select 3;

R2:= sel stmR2

6

197


where sel stmR2 ≡ select 1 union select 3 union select 5

except select R1.A from R1 where R1.A=1 or R1.A=2;

R3:= select R2.A from R2 union select R3.A*2 from R3 where R3.A<5;

Consider the following hypothetical query:

query ≡ assume select R1.A from R1 where R1.A < 3 in R2,

select 3 not in R2

select R3.A from R3

Then [[query]]db = [[select R3.A from R3]]fixdb′

, where db′ = (db)θσ being:θ = [R2 := sel stm′R2/ R2 := sel stmR2],σ = [R2 := sel stm′R2 except select 3/R2 := sel stm′R2],sel stm′R2 ≡ sel stmR2 union select R1.A from R1 where R1.A < 3.

Therefore db′ is the following database:

R1:= select 1 union select 2 union select 3;

R2:= ((select 1 union select 3 union select 5

except select R1.A from R1 where R1.A=1 or R1.A=2)union select R1.A from R1 where R1.A<3) except select 3;

R3:= select R2.A from R2 union select R3.A*2 from R3 where R3.A<5;

The computation of a simple query for an existing database is easy, because the valueof [[sel stm]]db is [[sel stm]]fix

db

, and fixdb is known and coincides with the instance of thedatabase. The case of a hypothetical query sel hyp requires additional explanation, its meaningis the interpretation of a select statement w.r.t. a new database db′, where some relations havechanged because the assumptions are incorporated to the corresponding relations. db′ must bea stratifiable database in order to define the interpretation fixdb

′. By taking advantage of the

stratified semantics, the computation of fixdb′

can be simplified:First, the dependency graph DGdb′ is an extension of DGdb, because RNdb′ = RNdb, and

every relation definition of db′ is in db, but the new relation definition R sch := sel stmRunion|except sel stm′. The edges from the relations inside sel stmR to R are already inDGdb. So DGdb′ can be built from DGdb as follows: For every R′ ∈ RNsel stm′ , an edge from R′

to R is added; it is negatively labeled in the except case or if R′ ∈ RN¬sel stm′ . A stratification fordb′, str′ : RNdb′ → {1, . . . , n}, if it exists, satisfies str′(R) ≥ str(R), since (as we have remarkedalready) the select statement that defines R in db′, contains the select statement sel stmR,which defines R in db.

Second, in order to obtain [[sel(sel hyp)]]fixdb′

, it is only necessary to compute fixdb′(R′) for

the relations R′ such that the relations in RNsel(sel hyp) depend on R′. In addition, fixdb′

has notto be computed from stratum 1, as we will see. Let i = str′(R) (i = min{str′(Rj)|1 ≤ j ≤ k} in

the general case, if assumptions for the relations R1, . . . , Rk are considered), then fixdb′(R′) =

fixdb(R′), for every R′ with str′(R′) < i. And let S = {R′′ |R′ ∈ RNsel(sel hyp) and R′ depends on

R′′}, then fixdb′

can be obtained from fixdb in the following way:

1. Compute fixdb′

i (R′) from fixdbi−1 for every relations R′ ∈ S and str′(R′) = i.

2. Compute fixdb′

j (R′) from fixdb′

j−1 for the relations R′ ∈ S and str′(R′) = j, j = i +1 .. str′(sel(sel hyp)).

Example 4. Consider the stratifiable database db and the query of Example 3. Let str be astratification for db, such that str(R1) = 1, str(R2) = 2, str(R3) = 3. In this case, str is also astratification for the modified database db′, detailed in Example 3, needed to answer to query.

7

198


It is easy to check that:fixdb(R1) = {(1), (2), (3)}, fixdb(R2) = {(3), (5)}, fixdb(R3) = {(3), (5), (6)}.

In order to obtain fixdb′, notice that RNsel(query) = {R3}. So S = {R′′ |R′ ∈ {R3} and R′ depends

on R′′} = {R1, R2, R3}, but the computation can start at stratum 2 = str(R2), with fixdb′

1

= fixdb1 . R2 is the only relation in S in stratum 2.fixdb

′2 (R2) = {(1), (2), (5)}.

Similarly, for stratum 3, only fixdb′

3 (R3) must be computed to get the answer, even in the casethat db had other relations in this stratum.fixdb

′3 (R3)={(1), (2), (4), (5), (8)} = [[select R3.A from R3]]fix

db′=[[query]]db.

4 The View Definition Language of HR-SQL

In this section we extend the definition language by allowing the definition of views, whichessentially consists of assigning names to queries in order to use them as relation names insideother queries, or inside itself to express recursive queries. The syntax is as follows:

vd ::= view ... view

view ::= V sch := sel stm; | HV sch := sel hyp;

We use V for names of views that are defined by a non hypothetical query, and HV for hypotheticalviews. From now on, those symbols can be considered as elements of the set RNdb as relationnames.

We say that vd is a definition of views for db if the involved names in it are relation namesof db or view names defined in vd. Mutual recursive definitions are allowed among non hypo-thetical views. Then their names can occur inside the definition of any view (hypothetical ornot). Every hypothetical view can be recursive but its name cannot appear inside the definitionof other views, which means that in a definition of views of the form:

V1 sch1 := sel stm1; ... Vm schm := sel stmm;

HV1 sch1 := sel hyp1; ... HVr schr := sel hypr;

for every j = 1..m, Vj can occur everywhere; for every j = 1..r, HVj can occur inside sel(sel hypj),but not in sel stm1, . . . , sel stmm, sel hypk, if k 6= j, nor in the assumption part of sel hypj .

Example 5. Referring to Example 1, assume there is a volcanic eruption in El Hierro and theairspace must be closed in the archipelago, as well as the bus in this island. But a boat fromEl Hierro to La Palma is added. The hypothetical view below can be defined in HR-SQL torepresent the reachable cities in the Canary islands in this situation.

reachable(ori varchar(10),des varchar(10)) := assume(select * from bus where bus.ori = ’VDE’

union select * from flight) not in link,

select ’RES’,’SPC’,1.5 in boat


where link.des = reachable.ori;

4.1 The Semantics of a Definition of Views

A view name identifies a query, so the meaning of a definition of views vd sets the correspondencebetween every view name in vd and the interpretation of the corresponding query. But this

8

199


interpretation must consider the original database definition extended with the views definedin vd as new relations. As we will show, stratification must be extended to assign a stratum toevery view name. Next, these ideas are formalized. First we consider the definition of a simpleview, then we will generalize it to the definition of a sequence of views.

Definition 6. Let V sch := sel stm be the definition of a non hypothetical view for db. Themeaning of V w.r.t. db, denoted by [[V]]db, is equal to [[sel stm]]db′ , where db′ is the result ofextending db with V sch := sel stm as a new relation.

Let HV sch := sel hyp be the definition of a hypothetical view for db. The meaning of HVw.r.t. db, denoted by [[HV]]db, is equal to [[sel hyp]]db′ , where db′ is the result of extending db

with HV sch := sel(sel hyp) as a new relation.

In V sch := sel stm, the value [[V]]db = [[sel stm]]db′ = [[sel stm]]fixdb′

depends on thefixpoint of a new database definition which should be stratifiable. db′ is equal to db extendedwith V sch := sel stm, it will be non stratifiable if V appears in an except clause insidesel stm, but in the other case, the fixpoint of the new database will be equal to the one ofdb, except for the new relation V. Notice that RNdb′ = RNdb ∪ {V}, and no relation definedin db may depend on V. So, if k is the maximum stratum of db (with n relations), then astratification str′ for db′ can be defined as str′ : RNdb′ → {1, . . . , n + 1}, with str′(R) = str(R)for every R ∈ RNdb, and str′(V) = k + 1. Hence, for i = 1..k, fixdb

′i = fixdbi . Therefore:

fixdb′

= fixdb′

k+1 =⊔

m≥0(T db′k+1)m(fixdb), so fixdb

′is an extension of the known fixdb, and

only the last stratum k+ 1 for the relation V (the only one in this stratum) must be calculated

to find [[sel stm]]fixdb′

= fixdb′

k+1(V).The semantics for the case HV sch := assume sel stm′ [not] in R sel stm requires to

modify the original database in two ways:

1. [[HV]]db = [[sel hyp]]db′ , according to Definition 6, where db′ results from extending db withHV sch := sel stm.

2. [[sel hyp]]db′ = [[sel stm]]fixdb′′

, in accordance with Definition 5, where db′′ = db′[R sch

:= sel stmR union | except sel stm′/R sch := sel stmR].

In 1, the original database is extended with a new relation, HV. Notice that, considering HV as arelation identifier, the added definition, HV sch := sel stm, is syntactically correct (howeverHV sch := sel hyp is not allowed as a relation definition). In 2, the assumption is incorporatedto the corresponding relation, as explained in Section 3.1. So, the new relation definitions indb′′ are:

HV sch := sel(sel hyp); R sch := sel stmR union | except sel stm′;

Then, the dependency graph DGdb′′ can be built from DGdb adding new edges to the relation R,as explained before. But there is also a new node HV an new edges: For every R′ ∈ RNsel(sel hyp),an edge from R′ to HV, that is negatively labelled if R′ ∈ RN¬sel(sel hyp).

As for the non hypothetical case, a stratification of db′′, str′ : RNdb′′ → {1, . . . , n + 1},if it exists, may assign the stratum k + 1 to HV. fixdb

′′k can be computed as explained in

Section 3.1 for hypothetical queries, and the computation of fixdb′′

k+1 will consider only HV, and

fixdb′′

k+1(HV) = [[sel stm]]fixdb′′

.

Example 6. Consider the database db of Example 3 and the hypothetical view:

HV A int := assumeselect R1.A from R1 where R1.A < 3 in R2, select 3 not in R2

select R3.A from R3 union select HV.A*3 from HV where HV.A < 3;

9

200


Following Definition 6, [[HV]]db = [[select R3.A from R3 union select HV.A*3 from HV

where HV.A<3]]fixdb′′

, where db′′ is an extension of the database db′ of Example 3 with:

HV A int := select R3.A from R3 unionselect HV.A*3 from HV where HV.A < 3;

A function str′ extending str in such a way that str′(HV) = 4 is a stratification of the new data-base. For 1 ≤ i ≤ 3, fixdb

′′i = fixdb

′i , which appear in Example 4. Then, since [[HV]]db coincides

with fixdb′′(HV), it is only necessary to calculate fixdb

′′4 (HV) = (

⊔m≥0 (T db′′

4 )m

(fixdb′′

3 ))(HV) ={(1), (2), (3), (4), (5), (6), (8)}.

Next we deal with the case of simultaneous view definitions for a database db. The idea isthat the semantics of vd associates to every view name in vd, the interpretation of the querythat defines the view. But, if there is more than one non hypothetical view definition in vd, itis not valid to identify [[V]]db with [[sel stm]]db′ , being db′ the result of extending db with V sch

:= sel stm. This is because other names defined in vd distinct of V can occur inside sel stm,while they are not defined in db′. Then the semantics of vd is defined as follows:

Definition 7. Let db be a database and letvd ::= V1 sch1 := sel stm1; . . . ; Vm schm := sel stmm;

HV1 sch1 := sel hyp1; . . . ; HVr schr := sel hypr;

be a definition of views for db. The semantics of vd is defined as the mapping that associatesVj to [[Vj ]]db′ , for j = 1..m, and HVj to [[HVj ]]db′ , for j = 1..r, where db′ is the result of extendingdb with:V1 sch1 := sel stm1; . . . ; Vm schm := sel stmm;

Notice that, according to Definition 6, [[Vj ]]db′ = [[sel stmj ]]db′′ for every j = 1..m, wheredb′′ is the result of extending db′ with Vjschj := sel stmj , but this definition is already in db′,so db′′ = db′. Since, HV1, . . . , HVr do not appear in sel stmj , their definitions are not requiredin db′. But for every 1 ≤ j ≤ r, [[HVj ]]db′ = [[sel hypj ]]db′′ , where db′′ is the result of extendingdb′ with HVj schj := sel(sel hypj), allowing HVj to be recursive.

In order to compute the answer of every view included in a simultaneous definition, hypothet-ical views can be relegated to process the others. As in the simple case, db′ must be stratifiable.In the practice, if db′ is stratifiable, a stratification str′ for db′, such that str′(Vj) > n for every

1 ≤ j ≤ m can be found. Then the interpretation fixdb′

can be obtained stratum by stratum,starting from fixdb, as in the simple case. Now, every hypothetical view can be treated sepa-rately, starting each time with fixdb

′as initial interpretation, and processing each view as in

the simple case.

5 The HR-SQL System

We present a SWI-Prolog implementation for the HR-SQL language adapted for IBM DB2.The system, with a bundle of examples, is available at https://gpd.sip.ucm.es/trac/gpd/

wiki/GpdSystems/HR-SQL. The structure of the system is depicted in Figure 5. The interfaceconsists of a prompt ’hr-db2 =>’ which works as an extension of the command interpreter ofDB2. The user can submit any DB2 input to manage an existing database (label A in Figure5), and also the following ones provided by HR-SQL (label B in Figure 5):

• load db <db file> loads an HR-SQL database definition from a file and computes the cor-responding fixpoint. The resulting tuples for the relations are stored as DB2 tables.

• load vd <vd file> loads an HR-SQL definition of views from a file, computes the values for

10

201


Figure 2: The HR-SQL System.

each view, and materializes them as DB2 tables.

• A hypothetical query written in HR-SQL syntax (sel hyp), which is submitted to the sys-tem and recognized as such because it starts with assume.

These new statements are preprocessed by the SWI-Prolog module as shown in Figure 5.After parsing, the dependency graph is built, a stratification is generated (if it exists; an erroris thrown otherwise). The current algorithm to compute the stratification tries to minimizethe number of relations in each strata. This allows to improve the efficiency of the fixpointcomputation w.r.t. [2], because now each stratum i contains only those mutually recursiverelations, avoiding to process the rest of them in each iteration of the fixpoint operator atstratum i. After the stratification, an SQL PL script is produced as will be explained inSection 5.1. This output is executed by DB2 (label C in Figure 5). The code generation forhypothetical views needs an additional process which is shown in Section 5.2.

5.1 Computing the Fixpoint

Figure 3 shows the algorithm for generating the DB2 database corresponding to the fixpoint ofan HR-SQL database definition. It produces the SQL statements (create and insert) neededto build such a database. This version enhances the one in [2] with the functions in and outwhich will be explained later.

1 for all R ∈ RNdb do create table R sch;

2 i := 13 while i ≤ numStr do4 for all R ∈ RNi do insert into R out(sel stmR);5 repeat6 size := rel size(RNi)7 for all R ∈ RNi do8 insert into R in(sel stmR) except select * from R;

9 until size = rel size(RNi)10 i := i+ 1

Figure 3: Algorithm to Compute the Fixpoint

The algorithm considers a concrete stratification for the database where numStr denotes thenumber of strata and NRi the set of relations of stratum i. First of all, a table is created foreach relation R sch := sel stmR of the database (line 1). Then, the external while (lines 3-10)computes successively the fixpoints fixdb1 , fix

db2 , . . . , fix

dbnumStr . According to the theory, each

11

202


fixdbi is calculated for every relation of NRi, by iterating the operators T dbi of Definition 3, i.e.,

the repeat (lines 5-9) at iteration n computes (T dbi )

n(fixi−1). The loop is iterated while some

tuple is added to the tables of the current stratum; the variable size is used to check if sometuple is added to some relation of the current stratum.

This algorithm improves the efficiency of the introduced in [2] by reducing the work in theiterations of the repeat with the functions in and out. The idea is that the iteration of theoperator T db

i is only needed for recursive relations; in fact, only for the recursive fragment ofthe select statements defining those relations. The functions in and out split each sel stm

into the (recursive) fragment that must be used in the insert statements inside the loop (line8), and the fragment that can be processed before the loop, as the base case of the recursivedefinition (line 4). The in and out fragments of a sel stm can be easily determined using thestratum of its components because, as mentioned before, the stratification is such that if arelation R in stratum i depends on another relation R′, then the stratum of R′ is lower than i,so it must be previously computed, or it is exactly i (if they are mutually recursive) and bothrelations must be computed simultaneously. Therefore, if for instance R := sel stm1 unionsel stm2, str(R) = i, and str(sel stm1) < i, then sel stm1 will be part of the out fragment,and the corresponding tuples can be inserted before the loop, because the involved relations arealready computed in the computation of a previous stratum. Functions in and out are definedby recursion on the structure of sel stm. For example, if sel stm ≡ ss1 except ss2, andstr(sel stm) = i, then str(ss2) < i, so:

in(sel stm) = in(ss1) except ss2; out(sel stm) = out(ss1) except ss2.

5.2 Computing Hypothetical Views

The SQL PL script generated to process views follows the ideas of Section 4.1. We use the viewreachable of Example 5 to illustrate the system steps to solve a hypothetical view definition.It is interesting as it is a recursive definition containing positive and negative assumptions.

First of all, the system extends the original dependency graph with the new edges due tohypothetical assumptions: two negatively labeled edges to link, one from bus, and anotherfrom flight. Due to the expanded form of stratification we have defined, the stratificationfor the original database is also a stratification for the new one. Following the explanations ofSection 3.1, the system looks for those relations that must be recomputed to obtain the tuplesof the view reachable, in this case only boat and link. The algorithm that generates the SQLstatements, for computing these relations and the new view, is quite similar to that presentedin Figure 3 to compute the fixpoint of a database. Next we explain the differences followingthe example.

The relations needed to compute the view are locally created and recomputed using tempo-rary tables, and the computation will start at stratum i = min{str(boat), str(link)}:

DECLARE GLOBAL TEMPORARY TABLE link AS link;

DECLARE GLOBAL TEMPORARY TABLE boat LIKE boat;

INSERT INTO SESSION.boat

((SELECT ’TFS’,’GMZ’,1) UNION (SELECT ’GMZ’,’VDE’,1.5) UNION

(SELECT ’SPC’,’TFN’,2 l) UNION (SELECT ’RES’,’SPC’,1.5));

INSERT INTO SESSION.link

(SELECT * FROM flight UNION SELECT * FROM SESSION.boat UNION

SELECT * FROM bus EXCEPT (SELECT * FROM bus WHERE bus.ori = ’VDE’)

UNION SELECT * FROM flight);

Temporary tables are prefixed with SESSION. For processing a hypothetical view HV := sel hypHV,the script to compute the tuples of HV will consider the definition HV := sel stm, where sel stm

12

203


results from replacing R by SESSION.R in sel(sel hypHV). The tuples for reachable are mate-rialized and stored, then the temporary tables are discarded. Temporary tables are adequateas they are in-memory data structures.

The computation of hypothetical queries follows the same steps, but instead of creatingtables, a cursor is used to obtain the answer without materializing it.

6 Conclusions and Future Work

We have designed a practical, formally-supported SQL system, porting some techniques fromthe deductive database field to the relational one. Thus, we provide an original way to give se-mantics to SQL languages supporting recursion. In addition our system allows both less-limitedrecursion (w.r.t. current SQL systems) and hypothetical reasoning (as a novel addition to suchsystems), acting as a front-end to DB2. Although targeted to this system, our work can bestraightforwardly applied to any other SQL system. However, it can be improved in a numberof ways: With respect to recursion, in-memory indexing can be applied for small search keys.These keys can be identified as the candidate keys derived from explicit functional dependencies(as already allowed in DB2) and primary keys. Also, both general and particular optimizationmethods can be applied to our work. For the first, the differential semi-naıve algorithm [15]allows to save tuples in recursive joins along fixpoint iterations. For the second sort of method,already-known linear recursion optimizations [11] can also be applied by analyzing the depen-dency graph and easily detecting such cases. With respect to hypothetical queries and views,we plan to extend the definition language, allowing mutual recursion in hypothetical views. Fi-nally, we can extend this work by allowing not only materialized views, but also regular views.For this, table functions (cf. IBM DB2 concepts) can be used as a natural construction to buildHR-SQL query results on-the-fly.

References

[1] G. Aranda-Lopez, S. Nieva, F. Saenz-Perez, and J. Sanchez-Hernandez. An extended constraintdeductive database: Theory and implementation. The Journal of Logic and Algebraic Program-ming, 2013.

[2] G. Aranda-Lopez, S. Nieva, F. Saenz-Perez, and J. Sanchez-Hernandez. Formalizing a BroaderRecursion Coverage in SQL. In Symposium on Practical Aspects of Declarative Languages(PADL’13), volume 7752 of LNCS, 2013. In Press.

[3] A. J. Bonner. Hypothetical datalog: Negation and linear recursion. In The ACM Symposium onthe Principles of Database Systems (PODS), pages 286–300, 1989.

[4] A. J. Bonner and M. Kifer. Transaction logic programming. In ICLP, pages 257–279, 1993.

[5] A. J. Bonner and L. T. McCarty. Adding negation-as-failure to intuitionistic logic programming.In E. L. Lusk and R. A. Overbeek, editors, Logic Programming, Proc. of the North AmericanConference, pages 681–703. The MIT Press, 1989.

[6] H. Christiansen and T. Andreasen. A Practical Approach to Hypothetical Database Queries. InTransactions and Change in Logic Databases, volume 1472 of LNCS, pages 340–355. Springer,1998.

[7] E. Codd. A Relational Model for Large Shared Databanks. Communications of the ACM,13(6):377–390, June 1970.

[8] C. J. Date. SQL and relational theory: how to write accurate SQL code. O’Reilly, Sebastopol, CA,2009.

13

204


[9] T. Griffin and R. Hull. A framework for implementing hypothetical queries. In SIGMOD Confer-ence, pages 231–242, 1997.

[10] W. H. Inmon. Building the data warehouse. QED Information Sciences, Inc., Wellesley, MA, USA,2005.

[11] C. Ordonez. Optimization of Linear Recursive Queries in SQL. IEEE Transactions on Knowledgeand Data Engineering, 22(2):264–277, 2010.

[12] F. Saenz-Perez. Datalog Educational System, October 2011. http://des.sourceforge.net/.

[13] M. Stonebraker and K. Keller. Embedding expert knowledge and hypothetical data bases into adata base system. In The 1980 ACM SIGMOD International Conference on Management of Data,SIGMOD ’80, pages 58–66. ACM, 1980.

[14] A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathe-matics, 5:285–309, 1955.

[15] J. Ullman. Principles of Database and Knowledge-Base Systems Vols. I (Classical Database Sys-tems) and II (The New Technologies). Computer Science Press, 1989.

14

205

Electronic Communications of the EASSTVolume 64 (2013)

Proceedings of theXIII Spanish Conference on Programming

and Computer Languages(PROLE 2013)

R-SQL: An SQL Database System with Extended Recursion1

Gabriel Aranda, Susana Nieva, Fernando Saenz-Perez andJaime Sanchez-Hernandez

18 pages

Guest Editors: Clara Benac Earle, Laura Castro, Lars-Ake FredlundManaging Editors: Tiziana Margaria, Julia Padberg, Gabriele TaentzerECEASST Home Page: http://www.easst.org/eceasst/ ISSN 1863-2122

1 This work has been partially supported by the Spanish projects TIN2013-44742-C4-3-R (CAVI-ART), TIN2008-06622-C03-01 (FAST-STAMP), S2009/TIC-1465 (PROMETIDOS), and GPD-UCM-A-910502.

206

ECEASST

R-SQL: An SQL Database System with Extended Recursion†

Gabriel Aranda1, Susana Nieva1, Fernando Saenz-Perez2 andJaime Sanchez-Hernandez1


[email protected], [email protected], [email protected], [email protected]

Abstract:

The relational database language SQL:1999 standard supports recursion, but thisapproach is limited to the linear case. Moreover, mutual recursion is not supported,and negation cannot be combined with recursion. We designed the language R-SQLto overcome these limitations in [ANSS13], improving termination properties in re-cursive definitions. In addition we developed a proof of concept implementation ofan R-SQL system. In this paper we describe in detail an improved system enhanc-ing performance. It can be integrated into existing RDBMS’s, extending them withthe aforementioned benefits of R-SQL. The system processes an R-SQL databasedefinition obtaining its extension in tables of an RDBMS (such as PostgreSQL andDB2). It is implemented in SWI-Prolog and it produces a Python script that, uponexecution, computes the result of the R-SQL relations. We provide some perfor-mance results showing the efficiency gains w.r.t. the previous version. We alsoinclude a comparative analysis including some representative relational a deductivesystems.

Keywords: Databases, SQL, Recursion, Fixpoint Semantics

1 Introduction

Recursion is a powerful tool nowadays included in almost all programming systems. However,for current implementations of the declarative programming language SQL, this tool is heavilycompromised or even not supported at all (MySQL, MS Access, . . . ) Those systems includingrecursion suffer from several drawbacks. Linearity is required, so that relation definitions withcalls to more than one recursive relation are not allowed. Mutual recursion, and query solvinginvolving an EXCEPT clause are not supported. In general, termination is manually controlledby limiting the number of iterations instead of detecting that there are no further opportunitiesto develop new tuples. Duplicate discarding is not supported and, so, queries that are actuallyterminating are not detected as such.

Starburst [MP94] was the first non-commercial RDBMS to implement recursion whereas IBMDB2 was the first commercial one. ANSI/ISO Standard SQL:1999 included for the first time

† This work has been partially supported by the Spanish projects TIN2013-44742-C4-3-R (CAVI-ART), TIN2008-06622-C03-01 (FAST-STAMP), S2009/TIC-1465 (PROMETIDOS), and GPD-UCM-A-910502.

1 / 18 Volume 64 (2013)

207

R-SQL: An SQL Database System with Extended Recursion

recursion in SQL. Today, we can find recursion in several systems: IBM DB2, Oracle, MS SQLServer, HyperSQL and others with the aforementioned limitations.

In [ANSS13] we proposed a new approach, called R-SQL, aimed to overcome these limita-tions and others, allowing in particular cycles in recursive definitions of graphs and mutuallyrecursive relation definitions. In order to combine recursion and negation, we applied ideas fromthe deductive database field, such as stratified negation, based on the definition of a dependencygraph between the relations involved in the database [Ull89]. We developed a formal frameworkfollowing the original relational data model [Cod70], therefore avoiding both duplicates and nulls(as encouraged by [Dat09]). We used a stratified fixpoint semantics and we presented an R-SQLdatabase system as a prototype implementing such formal framework. The system can be down-loaded from https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems/RSQL. Inthis work, we describe in detail an improved version enhancing performance. The systemprocesses an R-SQL database definition obtaining its extension in tables of an RDBMS (suchas PostgreSQL and DB2). It is implemented in SWI-Prolog and it generates a Python scriptthat, upon execution, computes the result of the R-SQL relations. The improvements in effi-ciency relies on a new stratification and a more elaborated version of the fixpoint calculationalgorithm that allows to avoid recomputations along iterations. The new system is availableat https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems/RSQLplus. In addi-tion, we experiment with some previously proposed optimizations [Ull89] to improve the perfor-mance of the fixpoint computation.

Related academic approaches include DLV DB [TLLP08], LDL++ [AOT+03] (now abandonedand replaced by DeALS, which does not refer to SQL queries up to now), and DES [SP13]. Thefirst one, resulting of a spin-off at Calabria University, is the closer to our work as it producesSQL code to be executed in the external database with a semi-naıve strategy, but lacks formalsupport for its proposal, and it does not describe non-linear recursion. Last two ones also allowconnecting to external databases, but processing of recursive SQL queries are in-memory.

The paper is organized as follows: In Section 2 we recall the syntax and the meaning of R-SQLdatabase definitions. Section 3 describes the system, including the new form of stratification,the fixpoint algorithm, and some performance measurements, showing the efficiency gains forseveral optimizations. We also include a comparative analysis including some representativerelational and deductive systems. Conclusions and future work are summarized in Section 4.

2 Introducing R-SQL

In this section, we present an overview of the language R-SQL, which is focused on the incor-poration of recursive relation definitions. The idea is simple and effective: A relation is definedwith an assignment operation as a named query (view) that can contain a self reference, i.e., arelation R can be defined as R sch := SELECT. . .FROM . . . R . . ., where sch is the relation schema.

2.1 The Definition Language of R-SQL

The formal syntax of R-SQL is defined by the grammar in Figure 1. In this grammar, produc-tions start with lowercase letters whereas terminals start with uppercase (SQL terminal symbols

Proc. PROLE 2013 2 / 18

208

ECEASST

db ::= R sch := sel stm; . . . R sch := sel stm;

sch ::= (A T, ...,A T)

sel stm ::= SELECT exp, ...,exp [FROM R, ...,R [WHERE wcond]]| sel stm UNION sel stm

| sel stm EXCEPT R

exp ::= C | R.A | exp opm exp | −exp

wcond ::= TRUE | FALSE | exp opc exp | NOT(wcond) | wcond [AND | OR] wcondopm ::= + | − | / | ∗opc ::= = | <> | < | > | >= | <=

R stands for relation names, A for attribute names, T for standard SQL types and C forconstants belonging to a valid SQL type.

Figure 1: A Grammar for the R-SQL Language

use small caps). As usual, optional statements are delimited by square brackets and alternativesentences are separated by pipes.

The language R-SQL overcomes some limitations present in current RDBMS’s followingSQL:1999. These languages use NOT IN and EXCEPT clauses to deal with negation, and WITH

RECURSIVE to engage recursion. As it is pointed out in [GUW09], SQL:1999 does not allow anarbitrary collection of mutually recursive relations to be written in the WITH RECURSIVE clause.

A bundle of R-SQL database examples can be found with the system distribution. Next, wepresent some of them, to show the expressiveness of the definition language. Each of them isintended to illustrate a concrete aspect of the language in a simple and concise way. Section 3explores a larger and more natural example.

Mutual Recursion Although any mutual recursion can be converted to direct recursion byinlining [KRP93], our proposal allows to explicitly define mutual recursive relations, which isan advantage in terms of program readability and maintenance. For instance, the following R-SQL database defines the relations even and odd, as the classical specification of even and oddnumbers up to a bound (100 in the example):

even(x float) := SELECT 0 UNION SELECT odd.x+1 FROM odd WHERE odd.x<100;

odd(x float) := SELECT even.x+1 FROM even WHERE even.x<100;

Nonlinear Recursion The standard SQL restricts the number of allowed recursive calls to beonly one. Here we show how to specify Fibonacci numbers in R-SQL1:

1 The relations fib1 and fib2 simply represent two aliases for fib, which are necessary because, for simplicity,we have not introduced the usual syntax for renamings in the grammar of Figure 1.

3 / 18 Volume 64 (2013)

209




fib(n float, f float) := SELECT 0,1 UNION SELECT 1,1 UNIONSELECT fib1.n+1,fib1.f+fib2.f FROM fib1,fib2WHERE fib1.n=fib2.n+1 AND fib1.n<10;

Duplicates and Termination Non termination is another problem that arises associated to re-cursion when coupled with duplicates. For instance, the following standard SQL query (thatconsiders a finite relation t) makes current systems either to reject the query or to go into aninfinite loop (some systems allow to impose a maximum number of iterations as a simple termi-nation condition, as DB2):

WITH RECURSIVE v(a) AS SELECT * FROM t UNION ALL SELECT * FROM vSELECT * FROM v

Nevertheless, the fixpoint computation for the corresponding R-SQL relation:

v(a float) := SELECT * FROM t UNION SELECT * FROM v;

guarantees termination because duplicates are discarded2 and v does not grow unbounded.The very same termination problem also happens in current RDBMS’s with the basic transitiveclosure over graphs including cycles, but not in R-SQL which ensures termination for finitegraphs.

2.2 The meaning of an R-SQL database definition

In [ANSS13] we formalized an operational semantics for the language R-SQL based on stratifiednegation and fixpoint theory, here we summarize the main ideas.

Stratification is based on the definition of a dependency graph DGdb for an R-SQL databasedb that is a directed graph whose nodes are the relation names defined in db, and the edges,that can be negatively labelled, are determined as follows. A relation definition of the formR sch := sel stm in db produces edges in the graph from every relation name inside sel stm

to R. Those edges produced by the relation name that is just to the right of an EXCEPT arenegatively labelled.

If there are n relations defined in db, and we denote by RN the set of the relation names definedin db, a stratification of db is a mapping str : RN→ {1, . . . ,n}, such that for every two relationsR1, R2 ∈ RN it satisfies:

• str(R1)≤ str(R2), if there is a path from R1 to R2 in DGdb,

• str(R1) < str(R2) if there is a path from R1 to R2 in DGdb with at least one negativelylabelled edge.

2 Note that UNION does not require ALL, as current RDBMS’s do.

Proc. PROLE 2013 4 / 18

210

ECEASST

An R-SQL database db is stratifiable if there exists a stratification for it. We denote by numStrthe maximum stratum of the elements of RN.

Intuitively, a relation name preceded by an EXCEPT operator plays the role of a negated pred-icate (relation) in the deductive database field. A stratification-based solving procedure ensuresthat when a relation that contains an EXCEPT in its definition is going to be calculated, the mean-ing of the inner negated relation has been completely evaluated, avoiding nonmonotonicity, as itis widely studied in Datalog [Ull89].

We say that an interpretation I is the relationship between every relation name R and its in-stance I(R). Interpretations are classified by strata; an interpretation belonging to a stratum igives meaning to the relations of strata less or equal to i. If I1, I2 are two interpretations ofstratum i, we say I1 is less or equal than I2 at stratum i, denoted by I1 vi I2, if the followingconditions are satisfied for every R ∈ RN:

• I1(R) = I2(R), if str(R)< i.

• I1(R)⊆ I2(R), if str(R) = i.

The meaning of every sel stm w.r.t. an interpretation I can be understood as the set of tu-ples (in the current instance represented by I) associated to the corresponding equivalent RA-expression, denoted by [sel stm]I . This RA-expression is defined as follows: 3

• [SELECT exp1, . . . ,expk FROM R1, . . . ,Rm WHERE wcond]I =πexp1,...,expk

(σwcond(I(R1)× . . .× I(Rm)))

• [sel stm1 UNION sel stm2]I = [sel stm1]

I ∪ [sel stm2]I

• [sel stm EXCEPT R]I = [sel stm]I− I(R)

Example 1 Consider the definitions of the relations odd and even of Section 2. Let us assume aconcrete interpretation I such that I(even)= {(0),(2)} and I(odd)= /0. Hence, the interpretationof the select statement that defines the relation odd w.r.t. I is:

[SELECT even.x+1 FROM even WHERE even.x< 100]I ={(even.x+1)[a/even.x] | (a) ∈ I(even),(even.x< 100)[a/even.x] is satisfied }={(1),(3)}

The case of the relation even is analogous:

[SELECT 0 UNION SELECT odd.x+1 FROM odd WHERE odd.x< 100]I =

[SELECT 0]I ∪ [SELECT odd.x+1 FROM odd WHERE odd.x< 100]I ={(0)} ∪{(odd.x+1)[a/odd.x] | (a) ∈ I(odd),(odd.x< 100)[a/odd.x] is satisfied}={(0)}

Notice that the interpretation I defined by:

I(even) = {(0),(2), . . . ,(100)} and I(odd) = {(1),(3), . . . ,(99)}3 Notice that arithmetic expressions are allowed as arguments in projection (π) and select (σ ) operations.

5 / 18 Volume 64 (2013)

211


satisfies:

I(even) = [SELECT 0 UNION SELECT odd.x+1 FROM odd WHERE odd.x< 100]I

I(odd) = [SELECT even.x+1 FROM even WHERE even.x< 100]I

So, to give meaning to a database definition, we are interested in an interpretation, called f ix,such that for every R ∈ RN, if sel stm is the definition of R, then f ix(R) = [sel stm] f ix. Inthe previous example f ix will be I. Since R can occur inside its definition, for every stratum i,the appropriate interpretation f ixi that gives the complete meaning to each relation of stratumi is the least fixpoint of a continuous operator. These fixpoint interpretations are sequentiallyconstructed from stratum 1 to numStr. f ix represents the fixpoint of the last stratum and providesthe semantics for the whole database.

For every i, 1 ≤ i ≤ numStr, we define the continuous operator Ti that transforms interpreta-tions belonging to a stratum i as follows:

• Ti(I)(R) = I(R), if str(R)< i.

• Ti(I)(R) = [sel stm]I , if str(R) = i and R sch := sel stm is the definition of R in db.

• Ti(I)(R) = /0, if str(R)> i.

The operator T1 has a least fixpoint, which is⊔

n≥0 T n1 ( /0), where /0(R) = /0 for every R ∈ RN.

We will denote⊔

n≥0 T n1 ( /0) by f ix1, i.e., f ix1 represents the least fixpoint at stratum 1.

Consider now the sequence {T n2 ( f ix1)}n≥0 of interpretations of stratum 2, greater than f ix1.

Using the definition of Ti and the fact that f ix1(R) = /0 for every R such that str(R)≥ 2, it is easyto prove, by induction on n≥ 0, that this sequence is a chain:

f ix1 v2 T2( f ix1)v2 T2(T2( f ix1))v2 . . .v2 T n2 ( f ix1), . . .

{T n2 ( f ix1)}n≥0 is a chain that has as least upper bound,

⊔n≥0 T n

2 ( f ix1), which is the leastfixpoint of T2 containing f ix1. We denote this interpretation by f ix2. By proceeding successivelyin the same way it is possible to find f ixnumStr. In [ANSS13] we have proved that f ixnumStr is theinterpretation f ix we are looking for, that associates the set of tuples denoted by its definition toevery relation of the database .

3 The Improved R-SQL System

Here we present the R-SQL system, which is based on the fixpoint construction of the previoussection. We describe its structure, focusing on the improvements that increase the efficiencyof the previous prototype, presented in [ANSS13]. These enhances are essentially due to thestratification described in Section 3.1 and in the factoring-out process incorporated in the fixpointalgorithm presented in Section 3.2.

As we show in Figure 2, the system is loaded in SWI-Prolog to process an R-SQL databasedefinition. First, the system parses the input database, then it builds the dependency graph andthe stratification if it exists (it raises an error, otherwise); finally, it produces a Python scriptthat will create the SQL database in an RDBMS. After this process, the user can connect to

Proc. PROLE 2013 6 / 18

212

ECEASST

the RDBMS in order to query or modify the database. Although we are referring to Post-greSQL in the concrete implementation https://gpd.sip.ucm.es/trac/gpd/wiki/GpdSystems/RSQLplus, it can be straightforwardly applied to other systems.

Figure 2: R-SQL System Structure.

Next we present a database for flights to illustrate the process and also will be the workingexample for the rest of the section. As usual, the information about direct flights can be composedof the city of origin, the city of destination, and the length of the flight. Cities (Lisbon, Madrid,Paris, London, New York) will be represented with constants (lis, mad, par, lon, ny, resp.).The relation reachable consists of all the possible trips between the cities of the database,maybe concatenating more than one flight. The relation travel is analogous but also givestime information about alternative trips.

flight(frm varchar(10), to varchar(10), time float) :=SELECT ’lis’,’mad’,1.0 UNION SELECT ’mad’,’par’,1.5 UNIONSELECT ’par’,’lon’,2.0 UNION SELECT ’lon’,’ny’,7.0 UNIONSELECT ’par’,’ny’,8.0;

reachable(frm varchar(10), to varchar(10)) :=SELECT flight.frm, flight.to FROM flight UNIONSELECT reachable.frm, flight.toFROM reachable,flight WHERE reachable.to = flight.frm;

travel(frm varchar(10), to varchar(10), time float) :=SELECT flight.frm, flight.to, flight.time FROM flight UNIONSELECT flight.frm, travel.to, flight.time+travel.timeFROM flight, travel WHERE flight.to = travel.frm;

Both reachable and travel represent transitive closures of the relation flight. Noticethat if flight has a cycle, then the relation travel that includes times for each trip is infinite,while reachable is not. As pointed before, reachable can be finitely computed in oursystem. But, as travel would produce an infinite set of different tuples, some computationlimitation would have to be imposed (as the maximum time for a travel, for example). How-ever, this is not a drawback of our approach, but an issue due to using infinite relations (built

7 / 18 Volume 64 (2013)

213


Figure 3: DGdb of the working example.

with arithmetic expressions). The relation madAirport contains travels departing or arrivingin Madrid, while avoidMad contains the possible travels that neither begin, nor end in Madrid.

madAirport(frm varchar(10), to varchar(10)) :=SELECT reachable.frm, reachable.to FROM reachableWHERE (reachable.frm = ’mad’ OR reachable.to = ’mad’);

avoidMad(frm varchar(10), to varchar(10)) :=SELECT reachable.frm, reachable.to FROM reachable EXCEPT madAirport;

This definition includes negation together with recursive relations. This combination can notbe expressed in SQL:1999 as it is shown in [FMMP96]. The dependency graph of this databaseis depicted in Figure 3, where negatively labelled edges are annotated with ¬.

3.1 Stratification

Given a database and its dependency graph, there can be a number of different stratifications forit. For instance, for the dependency graph of Figure 4 a possible stratification can assign stratum1 to the relations {a,b,c,d,e} and stratum 2 to {f,g}.

For the graph of Figure 4, intuitively it is easy to see that only b and c must belong to the samestratum due to the mutual dependency between them. The next algorithm minimizes the numberof relations in each stratum, which allows to enhance the efficiency of the fixpoint computationas shown in Section 3.2.

• Compute the strongly connected components C from DGdb. Negative labels are not rel-evant initially, but once the components are evaluated, it must be checked if there existssome cycle with a negatively labeled edge. In such a case, db is not stratifiable and thecomputation stops. For the example of Figure 4 the components are {a}, {f}, {g}, {b,c},{d} and {e}.

¬a

cb

e

d

gf

Figure 4: Dependency Graph Example

Proc. PROLE 2013 8 / 18

214

ECEASST

• Collapse each strongly connected component obtaining a new graph with a node for eachcomponent, C, and with an edge from C to C′ if and only if C contains a relation R and C′

contains a relation R′, such that there is an edge from R to R′ in DGdb. In our example, thecomponent {b,c} can be collapsed to the node bc, and the rest to its single element. Thenew graph has the edges {a→ bc,bc→ d,bc→ e,a→ f,f→ g}.

• Obtain a topological sorting for the resulting graph. In our example we can get the sortinga< f< g< bc< e< d.

• Uncollapse the nodes of such a sorting for obtaining a topological sorting for the stronglyconnected components, and enumerate them in ascending order. In our example, we get{a}< {f}< {g}< {b,c}< {e}< {d}.Then, the expected stratification str(a) = 1; str(f) = 2; str(g) = 3; str(b) = str(c) =4; str(e) = 5; str(d) = 6 is obtained.

The concrete implementation of this algorithm in R-SQL uses the library ugraphs of SWI-Prolog and the module scc implemented by Markus Triska, accessible from http://www.logic.at/prolog/scc.pl. For the dependency graph of Figure 3, R-SQL assigns stratum1 to flight, 2 to travel, 3 to reachable, 4 to madAirport, and 5 to avoidMad.

3.2 The Computation of the Database Fixpoint

Next, we present the algorithm for generating the SQL database corresponding to the fixpointof an R-SQL database definition db. This algorithm is shown in Figure 5. It produces the SQLstatements (CREATE and INSERT) needed to build such a database.

1 for all R ∈ RNdb do2 CREATE TABLE R sch;3 end for4 i := 15 while i≤ numStr do6 for all R ∈ RNi do7 INSERT INTO R out(sel stmR);8 end for9 repeat

10 size := rel size(RNi)11 for all R ∈ RNi do12 INSERT INTO R in(sel stmR) EXCEPT SELECT * FROM R;13 end for14 until size = rel size(RNi)15 i := i+116 end while

Figure 5: Algorithm to Compute the Fixpoint

9 / 18 Volume 64 (2013)

215


The algorithm considers a concrete stratification for the database where numStr denotes thenumber of strata and NRi the set of relations of stratum i. First of all, a table is created for eachrelation R sch := sel stmR of the database (lines 1-3). Then, the external while at line 5computes successively the fixpoints f ix1, f ix2, . . . , f ixnumStr. Following the semantics, each f ixi

is calculated for every relation of NRi, by iterating the fixpoint operators Ti, i.e., the internalrepeat (lines 9-14) at iteration n computes T n

i ( f ixi−1). The loop is iterated while some tuple isadded to the tables of the current stratum; the variable size is used to check this condition.

This algorithm enhances the introduced in [ANSS13] by reducing the work in the iterations ofthe repeat, i.e., simplifying the operations done for filling the tables, so improving the efficiency.The idea is that the iteration of the operator Ti is only needed for recursive relations, and evenmore precisely, only for the recursive fragment of the select statements defining those relations.With this aim we have defined the functions in and out to split each sel stm into, respectively,the (recursive) fragment that must be used in the INSERT statements inside the loop, and the frag-ment that can be processed before the loop, as the base case of the recursive definition. Then, thefor at lines 6-8 processes the out fragments, and the INSERT’s at lines 11-13 only process the infragments. The in and out fragments of a sel stm can be easily determined using the stratumof its components because the stratification defined in the Section 3.1 is such that if a relation Rin stratum i depends on another relation R′, then the stratum of R′ is lower than i, so it must bepreviously computed, or it is exactly i (if they are mutually recursive) and both relations mustbe computed simultaneously. Therefore, if for instance R := sel stm1 UNION sel stm2,str(R) = i, and str(sel stm1) < i, then sel stm1 will be part of the out fragment, and thecorresponding tuples can be inserted before the loop, because the involved relations are alreadycomputed in the computation of a previous stratum. Functions in and out can be easily definedusing the stratification as follows:

If str(sel stm)< i then we have:

• in(sel stm) = /0 and out(sel stm) = sel stm.

If str(sel stm) = i then, the functions are defined by recursion on the structure of sel stm:

• sel stm ≡ SELECT exp ... exp FROM R ... R WHERE wcondin(sel stm) = sel stm and out(sel stm) = /0

• sel stm ≡ sel stm1 UNION sel stm2

– If str(sel stm1) = str(sel stm2) = i then:in(sel stm) = in(sel stm1) UNION in(sel stm2) andout(sel stm) = out(sel stm1) UNION out(sel stm2)

– If str(sel stm1) = i and str(sel stm2)< i then:in(sel stm)= in(sel stm1) and out(sel stm)= out(sel stm1) UNION sel stm2

– If str(sel stm1)< i and str(sel stm2) = i then:in(sel stm) = in(sel stm2) and out(sel stm) = sel stm1 UNION out(sel stm2)

• sel stm ≡ sel stm1 EXCEPT sel stm2in(sel stm) = in(sel stm1) EXCEPT sel stm2 andout(sel stm) = out(sel stm1) EXCEPT sel stm2

Proc. PROLE 2013 10 / 18

216

ECEASST

The concrete implementation of the algorithm of Figure 5 can be done in a number of ways.We have chosen Python as the host language mainly because it is multiplatform and provideseasy connections with different database systems such as PostgreSQL, DB2, MySQL, or evenvia ODBC, which allows connectivity to almost any RDBMS. The additional features requiredfor the host language are basic: Loops, assignment and simple arithmetic.

Below, we show the Python code generated for the working example of flights. It uses thePython library psycopg2 (available at http://initd.org/psycopg/) which allows toconnect to an RDBMS and then submit SQL queries as:

cursor.execute("<query>")

where <query> is any valid SQL query. The generated code expands all the loops of thealgorithm of Figure 5, except the repeat at lines 9-14. As Python does not provide a repeat (ordo-while) loop construction, we implement it as a while True sentence with the correspondingbreak for stopping it when the condition holds. We will show it in the code generated for stratum2. Moreover, we also implement a Python function relSize(<list of relations>)that returns the number of tuples of the relations specified in its argument.

The for at lines 1-3 is expanded as:cursor.execute("CREATE table flight

(frm varchar(10), to varchar(10), time float);")cursor.execute("CREATE table travel

(frm varchar(10), to varchar(10), time float);")

and so on for the rest of relations. Now, we detail some parts of the code generated stratum bystratum. For stratum 1 the in fragment is empty and we have:

# Code generated for Stratum 1cursor.execute("INSERT INTO flight

(SELECT ’lis’,’mad’,1 UNION SELECT ’mad’,’par’,1.5 UNIONSELECT ’par’,’lon’,2 UNION SELECT ’lon’,’ny’,7 UNIONSELECT ’par’,’ny’,8) EXCEPTSELECT * FROM flight;")

Stratum 2 contains the relation travel whose definition can be splitted into two parts with thefunctions in and out.# Code generated for Stratum 2# out fragmentcursor.execute("INSERT INTO travel (SELECT * FROM flight);")

# in fragmentwhile True:

cursor.execute("INSERT INTO travel(SELECT flight.frm,travel.to,flight.time+travel.timeFROM flight,travel WHERE flight.to = travel.frm)EXCEPT SELECT * FROM travel;")

newSize = relSize(["travel"])

if (newSize != size):size = newSize

else:break

11 / 18 Volume 64 (2013)

217


The tuples added for travel at each iteration of this code are shown in the next Table:

Set of added tuplesout fragment {(lon,ny,7.0),(par,lon,2.0),(par,ny,8.0),

(mad,par,1.5),(lis,mad,1.0)}in fragment: iteration 1 {(lis,par,2.5),(par,ny,9.0),(mad,ny,9.5),(mad,lon,3.5)}in fragment: iteration 2 {(lis,ny,10.5),(lis,lon,4.5),(mad,ny,10.5)}in fragment: iteration 3 {(lis,lon,4.5),(mad,ny,10.5),(lis,ny,11.5)}

Analogously, the system produces the Python code for strata 3 and 4, which correspond toreachable and madAirport, respectively. Finally, in the last stratum the avoidMad rela-tion is computed (there is no in fragment in this case):

# Code generated for Stratum 5# out fragmentcursor.execute("INSERT INTO avoidMad

(SELECT travel.frm,travel.to FROM travelEXCEPT SELECT * FROM madAirport)");

This completes the fixpoint for the working example database. The values for flight,madAirport and avoidMad tables are illustrated in the graph in Figure 6. Direct flightsare represented in blue color and labeled with their corresponding time. Paths for madAirportrelation are represented in red color and path for avoidMad relation are represented in blackcolor.

Once the R-SQL database definition has been processed, the tables obtained are available as adatabase instance in PostgreSQL. Then, the user can formulate queries that will be solved usingthose tables (without performing any further fixpoint computation).

3.3 Performance

This section analyzes the system performance. First, we focus on the improvement of factoringout SQL fragments (as already explained in Section 3.2). And, second, we develop a field analy-sis by targeting the system to different current state-of-the art relational systems, introducing the

Figure 6: Graphical representation of resulting values of the working example.

Proc. PROLE 2013 12 / 18

218

ECEASST

benefits of a semi-naıve differential optimization [Ull85] for linear recursive queries. Numbersfor tables in this section are expressed in milliseconds and represent the average of a number ofruns, where the maximum and minimum have been elided.

3.3.1 Factoring-Out Improvement

As introduced, any DBMS allowing Python access can be used to implement our proposal. Thissection develops the connection to IBM DB2 as a target system for analyzing the performance.We consider the benchmark reachable that implements the transitive closure of the relationflight, as introduced in Section 3. To build a parametric relation, we consider links in flightsas the tuples {(1,2),(2,3), . . . ,(n,n+1)}, where n+1 is the number of nodes in the graph andthe type of the fields have been changed to integer. Table 1 shows the results for instances ofthis benchmark with a number of tuples ranging from 100 to 350 (first column). Second columnlists the number of tuples generated in the result set. Third and fourth columns show the elapsedrunning time for solving the query in R-SQL with no factoring-out improvement (No FOI) andwith this improvement enabled (With FOI), respectively. Fifth column (Speed-up) shows thespeed-up due to FOI as a percentage. The last column (Difference) shows the absolute timedifference between both timings. Benchmarks have been run on an Intel Core2 Quad CPU at2.4GHz and 3GB RAM, running Windows XP 32bit SP3, and IBM DB2 Express Edition 10.1.0database server with a default configuration.

Tuples Result Tuples No FOI With FOI Speed-up Difference100 5,050 1,135 1,050 8.1% 85150 11,325 4,438 3,428 29.4% 1,010200 20,100 10,048 8,172 23.0% 1,876250 31,375 19,001 16,041 18.5% 2,960300 45,150 32,710 28,381 15.3% 4,329350 61,425 50,085 44,175 13.4% 5,910

Table 1: Factoring-Out Improvement (FOI)

From this experiment we confirm the expected results for factoring the fragment select *from flight out of the recursive clause and the repeat loop. Indeed, even for a single SQLfragment as this, speed-ups of up to almost 30% are reached. However, as long as the tuples doincrease in the instances, the speed-up decrease because the main computation effort correspondsto the repeat loop because of the EXCEPT operator.

Next section deals with other optimizations and comparison with other relational and deduc-tive systems.

3.3.2 Analysis of Systems

This section considers different current state-of-the-art relational systems which include recur-sive queries: PostgreSQL 9.3, Oracle 11g, and DB2 10.1, all of them with a default configura-tion. We compare R-SQL when solving the previous benchmark with these systems and show the

13 / 18 Volume 64 (2013)

219


importance of introducing the semi-naıve differential optimization [Ull85]. To make the compar-ison fairest with the RDBMS’s, which do not discard duplicates, we omit the operator EXCEPT

to behave similarly to the optimized R-SQL systems. Also, we include the last published ver-sion of DLV DB in this comparison as a deductive system which is able to project its solving tothese external databases when computing a transitive closure. Another related deductive systemis LDL++, but unfortunately it is not included in this comparison since it has been replaced bythe system DeALS whose binaries and/or sources are not available yet.

RDBMS System 100 200 300 400 500Native SQL 161 187 240 360 713

R-SQL 500 3,198 12,406 39,802 71,922PostgreSQL Diff-R-SQL 208 459 1,073 2,271 4,115

TDiff-R-SQL 260 578 1,323 2,745 5,693DLV DB 703 1,651 4,458 8,047 13,120

Native SQL 604 1,781 5,765 13,349 26,297R-SQL 880 3,802 12,057 27,989 56,641

Oracle Diff-R-SQL 708 1,437 3,224 6,240 11,469TDiff-R-SQL 646 995 1,708 2,453 3,422

DLV DB 6,875 12,849 18,912 30,583 42,146Native SQL 677 1,016 1,323 2,052 3,099

R-SQL 1,271 5,797 97,052 129,917 150,104DB2 Diff-R-SQL 698 932 2,672 2,859 3,213

TDiff-R-SQL 646 1,000 1,578 4,021 9,021DLV DB 6,339 12,666 53,552 57,349 100,391

Table 2: Analysis of Systems

The results for different instances of the benchmark are given in Table 2. Numbers are nowarranged with the parameter n ranging in the horizontal axis, and rows include the consideredRDBMS (first column), the system connected to this relational database (second column), andthen (in the next five columns), the wall time for solving each instance (from 100 up to 500 tuplesin the relation flight, which delivers from 5,050 up to 125,250 tuples in the result set of thequery benchmark). Below the headings, lines are arranged in major rows, each one referring toa concrete RDBMS. And, for each RDBMS (PostgreSQL, Oracle, DB2)4, five minor rows arelisted, which refer to each system. The first minor row Native SQL refers to the correspondingRDBMS, which is used to compare how the rest of the systems behave w.r.t. a native executionof the benchmark, i.e., resorting to the recursive query specification for the transitive closurethat each RDBMS provides. For instance, DB2 uses the following syntax (where rec is thetemporary recursive relation which is built to fill the relation reachable):

INSERT INTO reachableWITH rec(frm,to) AS

4 Incidentally, MySQL does not support recursive queries at all.

Proc. PROLE 2013 14 / 18

220

ECEASST

(SELECT * FROM flightUNION ALLSELECT flight.frm, rec.to FROM flight,recWHERE flight.to = rec.frm)

SELECT * FROM rec;

The next minor row R-SQL refers to the implementation we have presented in Section 3. Minorrow labeled with Diff-R-SQL presents the results for R-SQL with the semi-naıve differentialoptimization enabled as explained in [Ull85]. Roughly, for a linear query, this optimizationrefers to use in each iteration only the results that have been generated in the previous iterationto build new tuples. To implement this, we have resorted to add a new integer column (it in thebenchmark) holding the iteration in which a given tuple has been generated. For instance, thenext query is executed for each iteration $IT$ (this is substituted by the actual iteration numberalong iterations):

INSERT INTO reachableSELECT flight.frm, reachable.to, $IT$FROM flight, reachableWHERE flight.to = reachable.frm AND

reachable.it = $IT$-1;

Next, the row labeled with TDiff-R-SQL refers to an alternative implementation of the semi-naıve differential optimization, which consists on storing all the tuples generated in a giveniteration in a temporary table. Then, the join at each iteration is computed between flight andthis temporary table, therefore avoiding to scan the growing relation reachable looking forthe tuples with a given iteration number value in the extra field. In fact, two temporary tables areneeded: One for accessing the tuples generated in the previous iteration, and another one to storethe new tuples. Next, there is a sketch of the SQL statements submitted in each iteration to DB2,where reachable temp1 is intended to hold the tuples generated in the previous iteration,and reachable temp2 is for the current one (temporary tables are preceded by SESSION.):

INSERT INTO SESSION.reachable_temp2SELECT flight.ori, SESSION.reachable_temp1.desFROM flight, SESSION.reachable_temp1WHERE flight.des = SESSION.reachable_temp1.ori;

...INSERT INTO reachable SELECT * FROM SESSION.reachable_temp1;DELETE FROM SESSION.reachable_temp1;INSERT INTO SESSION.reachable_temp1SELECT * FROM SESSION.reachable_temp2;

DELETE FROM SESSION.reachable_temp2;

The first SQL sentence loads into reachable temp2 the results just computed for the cur-rent iteration. Next sentences simply load on reachable the results from the previous iteration,and transfer the results just available in reachable temp2 to reachable temp1 in orderfor them to be available for the next iteration. reachable temp2 is finally flushed to be readyfor the next iteration as well.

Using temporary tables should reveal an advantage as neither log records nor lock managementare needed. They are computed in-memory as much as possible; only when they do not fit intoRAM, memory space quota is requested for them.

15 / 18 Volume 64 (2013)

221


Finally, the row labeled with DLV DB stands for this deductive system, which uses the sameODBC bridge to access those relational systems.

Looking at the numbers, it is noticeable that the best performance is achieved by the nativeSQL execution in PostgreSQL for all the considered instances (n ∈ {100,200, . . . ,500}) of thebenchmark. Also, the worst performance corresponds to R-SQL without optimizations (andincluding the operator EXCEPT), which is also clear as the join and the difference must be pro-cessed in each iteration for all the tuples, including those that definitely will not be involved ingenerating a new one. The semi-naıve differential optimization (which also avoids the operatorEXCEPT) alleviates this enormously, with a huge factor of 150,104/3,213 = 46.7×, when com-paring R-SQL vs. Diff-R-SQL for DB2. DLV DB is the next system in the performance ranking,behaving better than R-SQL but worse than the rest. Depending on the RDBMS, the next bestsystem can be either Diff-R-SQL or TDiff-R-SQL: The first one performs better than the secondfor PostgreSQL and the other way round for Oracle and DB2. Noticeably, both perform betterthan Native SQL for Oracle, and Diff-R-SQL behaves roughly similar to DB2. These numbershighlight how similar techniques are differently managed by the different RDBMS’s. For ex-ample, whereas for Oracle the use of temporary tables is of paramount importance for loweringthe solving time, its effect is the contrary for DB2. We have also tested table functions, whichprovide a way to implement parametric views. However, they do not provide better performancethan the already illustrated optimizations (and Oracle faces the mutating table problem whenusing them to insert tuples in the same source table).

All in all, in the best case we are able to beat an RDBMS by a factor of 26,297/3,422 =7.7×, and in the worst case (but considering the best optimization) we are beaten by a factorof 4,115/713 = 5.8×. To better understand this slowdown, we must consider that the R-SQLsystem runs an interpreted script (Python) and in each iteration, one or several SQL statementsare sent to the RDBMS via the ODBC bridge. SQL statements sent in this way must be compiledby the RDBMS for each iteration, so that it becomes a significant burden on the system, togetherwith the communication cost due to the bridge. Therefore, using a compiled language supportingprepared SQL statements should be a point worth to explore for performance gains.

4 Conclusions

R-SQL has been designed to compute the meaning of a database definition and then to query thisdatabase. Notice that the modification of a relation of the database in the underlying RDBMScan cause inconsistencies since the tables are not recomputed. For instance, after processing thedatabase for flights, if the user adds or deletes a tuple for the relation flight, then the relationtravel will become inconsistent according to its R-SQL definition. But this is the very samebehavior of RDBMS’s when dealing with materialized views. A future direction in order to fullyintegrate R-SQL into an RDBMS is to have the possibility of restoring the consistence of thedatabase (using triggers for instance), as well as to define additional (possibly recursive) views.This restoring involves the recomputation of the database fixpoint. But, using the dependencygraph, it is easy to determine the subset of relations that must be calculated, instead of computingthe whole fixpoint for the database. Moreover, those relations may not need to be recomputedfrom scratch. In addition, it is straightforward to modify the algorithm introduced in Section

Proc. PROLE 2013 16 / 18

222

ECEASST

3.2 to get a lazy evaluation of such relations, performing iterations only when new values aredemanded.

As shown in Section 3.3.2, the semi-naıve differential optimization [Ull89] for linear recur-sive queries has a notable impact on performance. Nonetheless, our system can be further ex-tended for non linear recursive queries and with enhancements as in [ZCF+97, BR87], as DLV[TLLP08] does. Implementing all these optimizations are left for future work.

Although our proposal is encouraging as results reveal, efficiency can also be improved byindexing (e.g., tries [SW12] and BDD’s [WACL05]) temporary relations during fixpoint compu-tations. To seamlessly integrate this into an RDBMS, we can profit from the fourth-generationlanguages (e.g., SQL PL in IBM DB2 and PL/SQL in Oracle) and completely integrate querysolving and view maintenance into the RDBMS. This way, prepared SQL statements are avail-able in a compiled setting, which should also improve performance. We are currently extendingthe R-SQL system with the enhancements aforementioned and more features as hypotheticaldefinitions and aggregates.

Bibliography

[ANSS13] G. Aranda-Lopez, S. Nieva, F. Saenz-Perez, J. Sanchez-Hernandez. Formalizing aBroader Recursion Coverage in SQL. In Symposium on Practical Aspects of Declar-ative Languages (PADL’13). LNCS 7752, pp. 93 – 108. 2013.

[AOT+03] F. Arni, K. Ong, S. Tsur, H. Wang, C. Zaniolo. The Deductive Database SystemLDL++. TPLP 3(1):61–94, 2003.

[BR87] I. Balbin, K. Ramamohanarao. A Generalization of the Differential Approach toRecursive Query Evaluation. J. Log. Program. 4(3):259–262, 1987.

[Cod70] E. Codd. A Relational Model for Large Shared Databanks. Communications of theACM 13(6):377–390, June 1970.

[Dat09] C. J. Date. SQL and relational theory: how to write accurate SQL code. O’Reilly,Sebastopol, CA, 2009.

[FMMP96] S. J. Finkelstein, N. Mattos, I. S. Mumick, H. Pirahesh. Expressing RecursiveQueries in SQL. Technical report, ISO, 1996.

[GUW09] H. Garcia-Molina, J. D. Ullman, J. Widom. Database systems - the complete book(2. ed.). Pearson Education, 2009.

[KRP93] O. Kaser, C. R. Ramakrishnan, S. Pawagi. On the conversion of indirect to directrecursion. ACM Lett. Program. Lang. Syst. 2(1-4):151–164, Mar. 1993.

[MP94] I. S. Mumick, H. Pirahesh. Implementation of magic-sets in a relational databasesystem. SIGMOD Rec. 23:103–114, May 1994.

[SP13] F. Saenz-Perez. Towards Bridging the Expressiveness Gap Between Relationaland Deductive Databases. In XIII Jornadas sobre Programacion y Lenguajes,PROLE2013 (SISTEDES). September 2013.

17 / 18 Volume 64 (2013)

223


[SW12] T. Swift, D. S. Warren. XSB: Extending Prolog with Tabled Logic Programming.TPLP 12(1-2):157–187, 2012.

[TLLP08] G. Terracina, N. Leone, V. Lio, C. Panetta. Experimenting with recursive queries indatabase and logic programming systems. TPLP 8(2):129–165, 2008.

[Ull85] J. D. Ullman. Implementation of Logical Query Languages for Databases. ACMTrans. Database Syst. 10(3):289–321, 1985.

[Ull89] J. Ullman. Principles of Database and Knowledge-Base Systems Vols. I (ClassicalDatabase Systems) and II (The New Technologies). Computer Science Press, 1989.

[WACL05] J. Whaley, D. Avots, M. Carbin, M. S. Lam. Using Datalog with binary decisiondiagrams for program analysis. In In Proceedings of Programming Languages andSystems: Third Asian Symposium. 2005.

[ZCF+97] C. Zaniolo, S. Ceri, C. Faloutsos, R. T. Snodgrass, V. S. Subrahmanian, R. Zicari.Advanced Database Systems. Morgan Kaufmann Publishers Inc., 1997.

Proc. PROLE 2013 18 / 18

224

Extensiones de bases de datos relacionales y deductivas ...

Documents

Transcript of Extensiones de bases de datos relacionales y deductivas ...