Database queries
The implementation of a database, its interface, and its query
language is a project far too ambitious for the scope of this book and
for the Objective CAML knowledge of the reader at this point. However,
restricting the problem and using the functional programming style
at its best allows us to create an interesting tool for query
processing. For instance, we show how to use iterators as well as
partial application to formulate and execute queries. We also show the
use of a data type encapsulating functional values.
For this application, we use as an example a database on the members
of an association. It is presumed to be stored in the file
association.dat.
Data format
Most database programs use a ``proprietary'' format to store
the data they manipulate. However, it is usually possible to store the
data as some text that has the following structure:
-
the database is a list of cards separated by
carriage-returns;
- each card is a list of fields separated by some given
character,
':' in our case;
- a field is a string which contains no carriage-return nor the
character
':';
- the first card is the list of the names associated with the
fields, separated by the character
'|'.
The association data file starts with:
Num|Lastname|Firstname|Address|Tel|Email|Pref|Date|Amount
0:Chailloux:Emmanuel:Université P6:0144274427:ec@lip6.fr:email:25.12.1998:100.00
1:Manoury:Pascal:Laboratoire PPS::pm@lip6.fr:mail:03.03.1997:150.00
2:Pagano:Bruno:Cristal:0139633963::mail:25.12.1998:150.00
3:Baro:Sylvain::0144274427:baro@pps.fr:email:01.03.1999:50.00
The meaning of the fields is the following:
-
Num is the member number;
- Lastname, Firstname, Address, Tel, and Email
are obvious;
- Pref indicates the means by which the member wishes to be
contacted: by mail (mail), by email (email), or by
phone (tel);
- Date and Amount are the date and the amount of the last
membership fee received, respectively.
We need to decide what represention the program should use internally
for a database. We could use either a list of cards
or an array of cards. On the one hand, a list has the nice property of
being easily modified: adding and removing a card are simple operations.
On the other hand, an array allows constant access time to any card.
Since our goal is to work on all the cards and not on some of them,
each query accesses all the cards. Thus a list is a good choice. The
same issue arises concerning the cards themselves: should they be
lists or arrays of strings? This time an array is a good choice,
since the format of a card is fixed for the whole database. It not
possible to add a new field. Since a query might access only a
few fields, it is important for this access to be fast.
The most natural solution for a card would be to use an array indexed
by the names of the fields. Since such a type is not available in
Objective CAML, we can use an array (indexed by integers) and a
function associating a field name with the array index
corresponding to the field.
# type data_card = string array ;;
# type data_base = { card_index : string -> int ; data : data_card list } ;;
Access to the field named n of a card dc of
the database db is implemented by the function:
# let field db n (dc : data_card) = dc.(db.card_index n) ;;
val field : data_base -> string -> data_card -> string = <fun>
The type of dc has been set to data_card to
constrain the function field to only accept string arrays and
not arrays of other types.
Here is a small example:
# let base_ex =
{ data = [ [|"Chailloux"; "Emmanuel"|] ; [|"Manoury"; "Pascal"|] ] ;
card_index = function "Lastname"->0 | "Firstname"->1
| _->raise Not_found } ;;
val base_ex : data_base =
{card_index=<fun>;
data=[[|"Chailloux"; "Emmanuel"|]; [|"Manoury"; "Pascal"|]]}
# List.map (field base_ex "Lastname") base_ex.data ;;
- : string list = ["Chailloux"; "Manoury"]
The expression field base_ex "Lastname" evaluates to a function
which takes a card and returns the value of its
"Lastname" field. The library function List.map applies the
function to each card of the database base_ex, and returns
the list of the results: a list of the
"Lastname" fields of the database.
This example shows how we wish to use the functional style in our
program. Here, the partial application of field allows us to
define an access function for a given field, which we can use on any
number of cards. This also shows us that the implementation of the
field function is not very efficient, since although we are
always accessing the same field, its index is computed for each
access. The following implementation is better:
# let field base name =
let i = base.card_index name in fun (card : data_card) -> card.(i) ;;
val field : data_base -> string -> data_card -> string = <fun>
Here, after applying the function to two arguments, the index of the
field is computed and is used for any subsequent application.
Reading a database from a file
As seen from Objective CAML, a file containing a database is just a list of
lines. The first work that needs to be done is to read each line as a
string, split it into smaller parts according to the separating
character, and then extract the corresponding data as well as the field
indexing function.
Tools for processing a line
We need a function split that splits a string at every occurrence of
some separating character. This function uses the function
suffix which returns the suffix of a string s after
some position i. To do this, we use three
predefined functions:
-
String.length returns the length of a string;
- String.sub returns the substring of s
starting at position i and of length l;
- String.index_from computes the position of the first occurrence
of character c in the string s,
starting at position n.
# let suffix s i = try String.sub s i ((String.length s)-i)
with Invalid_argument("String.sub") -> "" ;;
val suffix : string -> int -> string = <fun>
# let split c s =
let rec split_from n =
try let p = String.index_from s n c
in (String.sub s n (p-n)) :: (split_from (p+1))
with Not_found -> [ suffix s n ]
in if s="" then [] else split_from 0 ;;
val split : char -> string -> string list = <fun>
The only remarkable characteristic in this implementation is the use
of exceptions, specifically the exception Not_found.
Computing the data_base structure
There is no difficulty in creating an array of strings from a list of
strings, since this is what the of_list function in the
Array module does. It might seem more complicated to compute
the index function from a list of field names, but the List
module provides all the needed tools.
Starting from a list of strings, we need to code a function that
associates each string with an index corresponding to its position in
the list.
# let mk_index list_names =
let rec make_enum a b = if a > b then [] else a::(make_enum (a+1) b) in
let list_index = (make_enum 0 ((List.length list_names) - 1)) in
let assoc_index_name = List.combine list_names list_index in
function name -> List.assoc name assoc_index_name ;;
val mk_index : 'a list -> 'a -> int = <fun>
To create the association function between field names and indexes, we
combine the list of indexes and the list of names to obtain a
list of associations of the type string * int list. To look up
the index associated with a name, we use the function assoc
from the List library. The function mk_index
returns a function that takes a name and calls assoc on this
name and the previously built association list.
It is now possible to create a function that reads a file of
the given format.
# let read_base filename =
let channel = open_in filename in
let split_line = split ':' in
let list_names = split '|' (input_line channel) in
let rec read_file () =
try
let data = Array.of_list (split_line (input_line channel )) in
data :: (read_file ())
with End_of_file -> close_in channel ; []
in
{ card_index = mk_index list_names ; data = read_file () } ;;
val read_base : string -> data_base = <fun>
The auxiliary function read_file reads records from the
file, and works recursively on the input channel. The base case of
the recursion corresponds to the end of the file, signaled by the
End_of_file exception. In this case, the empty list is
returned after closing the channel.
The association's file can now be loaded:
# let base_ex = read_base "association.dat" ;;
val base_ex : data_base =
{card_index=<fun>;
data=
[[|"0"; "Chailloux"; "Emmanuel"; "Universit\233 P6"; "0144274427";
"ec@lip6.fr"; "email"; "25.12.1998"; "100.00"|];
[|"1"; "Manoury"; "Pascal"; "Laboratoire PPS"; ...|]; ...]}
General principles for database processing
The effectiveness and difficulty of processing the data in a
database is proportional to the power and complexity of the query
language. Since we want to use Objective CAML as query language, there is no
limit a priori on the requests we can express! However, we
also want to provide some simple tools to manipulate cards and their
data. This desire for simplicity requires us to limit the power of the
Objective CAML
language, through the use of general goals and principles for database
processing.
The goal of database processing is to obtain a state of the
database. Building such a state may be decomposed into three
steps:
-
selecting, according to some given criterion, a set of
cards;
- processing each of the selected cards;
- processing all the data collected on the cards.
Figure 6.1 illustrates this decomposition.
Figure 6.1: Processing a request.
According to this decomposition, we need three functions of the
following types:
-
(data_card -> bool) -> data_card list -> data_card list
- (data_card -> 'a) -> data_card list -> 'a list
- ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b
Objective CAML provides us with three higher-order function, also known as
iterators, introduced page ??, that satisfy our
specification:
# List.find_all ;;
- : ('a -> bool) -> 'a list -> 'a list = <fun>
# List.map ;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
# List.fold_right ;;
- : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b = <fun>
We will be able to use them to implement the three steps of building a
state by choosing the functions they take as an argument.
For some special requests, we will also use:
# List.iter ;;
- : ('a -> unit) -> 'a list -> unit = <fun>
Indeed, if the required processing consists only of displaying some
data, there is nothing to compute.
In the next paragraphs, we are going to see how to define functions
expressing simple selection criteria, as well as simple queries. We
conclude this section with a short example using these functions
according to the principles stated above.
Selection criteria
Concretely, the boolean function corresponding to the selection
criterion of a card is a boolean combination of properties of some or
all of the fields of the card. Each field of a card, even though it
is a string, can contain some information of another type: a float, a
date, etc.
Selection criteria on a field
Selecting on some field is usually done using a function of the type
data_base -> 'a -> string -> data_card -> bool. The
'a type parameter corresponds to the type of the information
contained in the field. The string argument corresponds to
the name of the field.
String fields
We define two simple tests on strings: equality with another string,
and non-emptiness.
# let eq_sfield db s n dc = (s = (field db n dc)) ;;
val eq_sfield : data_base -> string -> string -> data_card -> bool = <fun>
# let nonempty_sfield db n dc = ("" <> (field db n dc)) ;;
val nonempty_sfield : data_base -> string -> data_card -> bool = <fun>
Float fields
To implement tests on data of type float, it is enough to translate
the string representation of a decimal number into its
float value. Here are some examples obtained from a generic
function
tst_ffield:
# let tst_ffield r db v n dc = r v (float_of_string (field db n dc)) ;;
val tst_ffield :
('a -> float -> 'b) -> data_base -> 'a -> string -> data_card -> 'b = <fun>
# let eq_ffield = tst_ffield (=) ;;
# let lt_ffield = tst_ffield (<) ;;
# let le_ffield = tst_ffield (<=) ;;
(* etc. *)
These three functions have type:
data_base -> float -> string -> data_card -> bool.
Dates
This kind of information is a little more complex to deal with, as it
depends on the representation format of dates, and requires that we define
date comparison.
We decide to represent dates in a card as a string with format
dd.mm.yyyy. In order to be able to define additional
comparisons, we also allow the replacement of the day, month or year
part with the underscore character ('_'). Dates are
compared according to the lexicographic order of lists of integers of
the form [year; month; day]. To express queries such as: ``is
before July 1998'', we use the date pattern:
"_.07.1998". Comparing a date with a pattern is
accomplished with the function tst_dfield which analyses the
pattern to create the ad hoc comparison function. To define this
generic test function on dates, we need a few auxiliary functions.
We first code two conversion functions from dates
(ints_of_string) and date patterns
(ints_of_dpat) to lists of ints.
The character '_' of a pattern will be replaced by the integer
0:
# let split_date = split '.' ;;
val split_date : string -> string list = <fun>
# let ints_of_string d =
try match split_date d with
[d;m;y] -> [int_of_string y; int_of_string m; int_of_string d]
| _ -> failwith "Bad date format"
with Failure("int_of_string") -> failwith "Bad date format" ;;
val ints_of_string : string -> int list = <fun>
# let ints_of_dpat d =
let int_of_stringpat = function "_" -> 0 | s -> int_of_string s
in try match split_date d with
[d;m;y] -> [ int_of_stringpat y; int_of_stringpat m;
int_of_stringpat d ]
| _ -> failwith "Bad date format"
with Failure("int_of_string") -> failwith "Bad date pattern" ;;
val ints_of_dpat : string -> int list = <fun>
Given a relation r on integers, we now code the test function.
It simply consists of implementing the lexicographic order, taking
into account the particular case of 0:
# let rec app_dtst r d1 d2 = match d1, d2 with
[] , [] -> false
| (0::d1) , (_::d2) -> app_dtst r d1 d2
| (n1::d1) , (n2::d2) -> (r n1 n2) || ((n1 = n2) && (app_dtst r d1 d2))
| _, _ -> failwith "Bad date pattern or format" ;;
val app_dtst : (int -> int -> bool) -> int list -> int list -> bool = <fun>
We finally define the generic function tst_dfield which
takes as arguments a relation r, a database db, a
pattern dp, a field name nm, and a card
dc. This function checks that the pattern and the field from
the card satisfy the relation.
# let tst_dfield r db dp nm dc =
r (ints_of_dpat dp) (ints_of_string (field db nm dc)) ;;
val tst_dfield :
(int list -> int list -> 'a) ->
data_base -> string -> string -> data_card -> 'a = <fun>
We now apply it to three relations.
# let eq_dfield = tst_dfield (=) ;;
# let le_dfield = tst_dfield (<=) ;;
# let ge_dfield = tst_dfield (>=) ;;
These three functions have type:
data_base -> string -> string -> data_card -> bool.
Composing criteria
The tests we have defined above all take as first arguments a
database, a value, and the name of a field. When we write a query, the
value of these three arguments are known. For instance, when we work
on the database base_ex, the test ``is before July 1998''
is written
# ge_dfield base_ex "_.07.1998" "Date" ;;
- : data_card -> bool = <fun>
Thus, we can consider a test as a function of type data_card
-> bool. We want to obtain boolean combinations of the results of
such functions applied to a given card. To this end, we implement the
iterator:
# let fold_funs b c fs dc =
List.fold_right (fun f -> fun r -> c (f dc) r) fs b ;;
val fold_funs : 'a -> ('b -> 'a -> 'a) -> ('c -> 'b) list -> 'c -> 'a = <fun>
Where b is the base value, the function c is the
boolean operator, fs is the list of test functions on a
field, and dc is a card.
We can obtain the conjunction and the disjunction of a list of tests with:
# let and_fold fs = fold_funs true (&) fs ;;
val and_fold : ('a -> bool) list -> 'a -> bool = <fun>
# let or_fold fs = fold_funs false (or) fs ;;
val or_fold : ('a -> bool) list -> 'a -> bool = <fun>
We easily define the negation of a test:
# let not_fun f dc = not (f dc) ;;
val not_fun : ('a -> bool) -> 'a -> bool = <fun>
For instance, we can use these combinators to define a selection
function for cards whose date field is included in a given range:
# let date_interval db d1 d2 =
and_fold [(le_dfield db d1 "Date"); (ge_dfield db d2 "Date")] ;;
val date_interval : data_base -> string -> string -> data_card -> bool =
<fun>
Processing and computation
It is difficult to guess how a card might be processed, or the data that
would result from that processing. Nevertheless, we can consider two
common cases: numerical computation and data formatting for printing.
Let's take an example for each of these two cases.
Data formatting
In order to print, we wish to create a string containing the name
of a member of the association, followed by some information.
We start with a function that reverses the splitting of a line
using a given separating character:
# let format_list c =
let s = String.make 1 c in
List.fold_left (fun x y -> if x="" then y else x^s^y) "" ;;
val format_list : char -> string list -> string = <fun>
In order to build the list of fields we are interested in, we code the
function extract that returns the fields associated
with a given list of names in a given card:
# let extract db ns dc =
List.map (fun n -> field db n dc) ns ;;
val extract : data_base -> string list -> data_card -> string list = <fun>
We can now write the line formatting function:
# let format_line db ns dc =
(String.uppercase (field db "Lastname" dc))
^" "^(field db "Firstname" dc)
^"\t"^(format_list '\t' (extract db ns dc))
^"\n" ;;
val format_line : data_base -> string list -> data_card -> string = <fun>
The argument ns is the list of requested fields. In the
resulting string, fields are separated by a tab ('\t')
and the string is terminated with a newline character.
We display the list of last and first names of all members with:
# List.iter print_string (List.map (format_line base_ex []) base_ex.data) ;;
CHAILLOUX Emmanuel
MANOURY Pascal
PAGANO Bruno
BARO Sylvain
- : unit = ()
Numerical computation
We want to compute the total amount of received fees for a given set
of cards. This is easily done by composing the extraction and
conversion of the correct field with the addition. To get nicer
code, we define an infix composition operator:
# let (++) f g x = g (f x) ;;
val ++ : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c = <fun>
We use this operator in the following definition:
# let total db dcs =
List.fold_right ((field db "Amount") ++ float_of_string ++ (+.)) dcs 0.0 ;;
val total : data_base -> data_card list -> float = <fun>
We can now apply it to the whole database:
# total base_ex base_ex.data ;;
- : float = 450
An example
To conclude, here is a small example of an application that uses the
principles described in the paragraphs above.
We expect two kinds of queries on our database:
-
a query returning two lists, the elements of the first
containing the name of a member followed by his mail address,
the elements of the other containing the name of the member
followed by his email address, according to his preferences.
- another query returning the state of received fees for a given
period of time. This state is composed of the list of last and
first names, dates and amounts of the fees as well as the total amount
of the received fees.
List of addresses
To create these lists, we first select the relevant cards according to
the field "Pref", then we use the formatting function
format_line:
# let mail_addresses db =
let dcs = List.find_all (eq_sfield db "mail" "Pref") db.data in
List.map (format_line db ["Mail"]) dcs ;;
val mail_addresses : data_base -> string list = <fun>
# let email_addresses db =
let dcs = List.find_all (eq_sfield db "email" "Pref") db.data in
List.map (format_line db ["Email"]) dcs ;;
val email_addresses : data_base -> string list = <fun>
State of received fees
Computing the state of the received fees uses the same technique:
selection then processing. In this case however the processing part is
twofold: line formatting followed by the computation of the total
amount.
# let fees_state db d1 d2 =
let dcs = List.find_all (date_interval db d1 d2) db.data in
let ls = List.map (format_line db ["Date";"Amount"]) dcs in
let t = total db dcs in
ls, t ;;
val fees_state : data_base -> string -> string -> string list * float = <fun>
The result of this query is a tuple containing a list of strings
with member information, and the total amount of received fees.
Main program
The main program is essentially an
interactive loop that displays the result of queries asked by the
user through a menu. We use here an imperative style, except for the
display of the results which uses an iterator.
# let main() =
let db = read_base "association.dat" in
let finished = ref false in
while not !finished do
print_string" 1: List of mail addresses\n";
print_string" 2: List of email addresses\n";
print_string" 3: Received fees\n";
print_string" 0: Exit\n";
print_string"Your choice: ";
match read_int() with
0 -> finished := true
| 1 -> (List.iter print_string (mail_addresses db))
| 2 -> (List.iter print_string (email_addresses db))
| 3
-> (let d1 = print_string"Start date: "; read_line() in
let d2 = print_string"End date: "; read_line() in
let ls, t = fees_state db d1 d2 in
List.iter print_string ls;
print_string"Total: "; print_float t; print_newline())
| _ -> ()
done;
print_string"bye\n" ;;
val main : unit -> unit = <fun>
This example will be extended in chapter 21
with an interface using a web browser.
Further work
A natural extension of this example would consist of adding type
information to every field of the database. This information would be
used to define generic comparison operators with type
data_base -> 'a -> string -> data_card -> bool
where the name of the field (the third argument) would trigger the
correct conversion and test functions.