# Relational algebra

The relational algebra is a set of operations that manipulate relations as they are defined in the relational model and as such describes part of the data manipulation aspect of this data model. Because of their algebraic properties these operations are often used in database query optimization as an intermediate representation of a query to which certain rewrite rules can be applied to obtain a more efficient version of the query.

The exact set of operations may differ per definition and also depends on whether the unlabeled relational model (that uses mathematical relations) or the labeled relational model (that uses the labeled generalization of mathematical relations) is used. We will assume the labeled case here as this is the most common way the relational model is defined. That means that we assume that tuples are partial functions from attribute names to values. The attribute a of a tuple t is denoted in this article as t(a).

 Contents

## The basic operations

The six basic operations of the algebra are the selection, the projection, the Cartesian product, the set union, the set difference, and the rename. These six operations are fundamental in the sense that (1) all other operations can be expressed as combinations of these operations and (2) none of these six operations can be omitted without losing expressive power.

### The selection

The selection is a unary operation that is written as σaθb(R) or σaθv(R) where a and b are attribute names, θ a binary operation in the set {<, ≤, =, >, ≥}, v is a value constant and R is a relation. The selection σaθb(R) selects all those tuples in R for which θ holds between the a and the b attribute. The selection σaθv(R) selects all those tuples in R for which θ holds between the a attribute and the value v.

For an example consider the following tables where the first table gives the relation Person, the second table gives the result of σAge≥34(Person) and the third table gives the result of σAge=Weight(Person).

Person
Name Age Weight
Harry 34 80
Sally 28 64
George 29 70
Helena 54 54
Peter 34 80
σAge≥34(Person)
Name Age Weight
Harry 34 80
Helena 54 54
Peter 34 80
σAge=Weight(Person)
Name Age Weight
Helena 54 54

More formally the semantics of the selection is defined as follows:

σaθb(R) = { t : tR, t(a) θ t(b) }
σaθv(R) = { t : tR, t(a) θ v }

The result of the selection is only defined if the attribute names that it mentions are in the header of the relation that it operates upon.

### The projection

The projection is a unary operation that is written as πa1,..,an(R) where a1,..,an is a set of attribute names. The result of such a projection is defined as the set that is obtained when all tuples in R are restricted to the set {a1,..,an}. For an example consider the following two tables which are the relation Person and its projection on the attributes Age and Weight:

Person
Name Age Weight
Harry 34 80
Sally 28 64
George 29 70
Helena 54 54
Peter 34 80
πAge,Weight(Person)
Age Weight
34 80
28 64
29 70
54 54
Note that Harry and Peter have the same age and weight, but

since the result is a relation and therefore a set this combination only appears once in the result.

More formally the semantics of the projection is defined as follows:

πa1,..,an(R) = { t[a1,..,an] : tR }

where t[a1,..,an] is the restriction of the tuple t to the set {a1,..,an}, i.e., t[a1,..,an] = { (a' , v) | (a' , v) ∈ t, a' a1,..,an} }.

The result of a projection πa1,..,an(R) is only defined if {a1,..,an} is a subset of the header of R.

### The Cartesian product

The Cartesian product (also called cross product or cross join) is a binary operation that is very similar to the Cartesian product in set theory. It is written as R × S where R and S are relations. The result of the Cartesian product is the set of all combinations of tuples in R and S. For an example consider the tables Employee and Dept and their Cartesian product:

Employee
Name EmpId
Harry 3415
Sally 2241
Dept
DeptName Manager
Finance George
Sales Harriet
Employee × Dept
Name EmpId DeptName Manager
Harry 3415 Finance George
Harry 3415 Sales Harriet
Sally 2241 Finance George
Sally 2241 Sales Harriet

More formally the semantics of the Cartesian product is defined as follows:

R × S = { ts : tR, sS }

To ensure that the result of the Cartesian product is again a relation it is required that the headers of R and S are disjoint, i.e., do not contain the same attribute.

### The set union

The set union is a binary operation that is written as RS and is defined as the set union in set theory. For an example consider the tables Engineer, Manager and their union:

Engineer
Name EmpId
Harry 3415
Sally 2241
Harriet 2202
Manager
Name EmpId
George 3401
Harriet 2202
Engineer ∪ Manager
Name EmpId
Harry 3415
Sally 2241
George 3401
Harriet 2202

Note that Harriet only appears once in the result because the result is a set.

Formally, the union is defined as follows:

RS = { t : tRtS }

The result of the set union is only defined when the two relations have the same attributes.

### The set difference

The set difference is a binary operation that is written as R - S and is defined as the usual set difference in set theory. For an example consider the relations Engineer, Manager and their difference:

Engineer
Name EmpId
Harry 3415
Sally 2241
Harriet 2202
Manager
Name EmpId
George 3401
Harriet 2202
Engineer - Manager
Name EmpId
Harry 3415
Sally 2241

Formally, the semantics of the union is defined as follows:

R - S = { t : tR, ¬ tS }

The result of the set difference is only defined when the two relations have the same headers.

### The rename

The rename operation is a unary operation that is written as ρa/b(R) where a and b are attribute names and R is a relation. The result is identical to R except that the b field in all tuples is renamed to an a field. For an example consider the following Employee relation and its renamed version:

Employee
Name EmpId
Harry 3415
Sally 2241
ρEmpName/Name(Employee)
EmpName EmpId
Harry 3415
Sally 2241

Formally the semantics of the rename operator is defined as follows:

ρa/b(R) = { t[a/b] : tR }

where t[a/b] is defined as the tuple t with the b attribute renamed to a, i.e., t[a/b] = { (c, v) | (c, v) ∈ t, cb } ∪ { (a, t(b)) }. The result of the rename is only defined when the attribute a did not appear already in the header of the operand.

### The expressive power

These operations are enough to express all queries that can be expressed in the safe parts of tuple calculus and domain calculus which is essentially the same as first-order logic.

insert a sketch of proof here

## Other operations expressible with the basic operations

Next to the six basic operations some other operations are also often included even though they can be expressed by combinations of the basic operations. This is because they either have interesting algebraic properties or because they can be implemented more efficiently than their simulations. These operations are the set intersection, the natural join and the division.

### The set intersection

The set intersection is a binary operation that is written as RS and is defined as the usual set intersection in set theory. For an example consider the tables Engineer, Manager and their intersection:

Engineer
Name EmpId
Harry 3415
Sally 2241
Harriet 2202
Manager
Name EmpId
George 3401
Harriet 2202
Engineer ∩ Manager
Name EmpId
Harriet 2202

Formally, the union is defined as follows:

RS = { t : tR, tS }

The result of the set intersection is only defined when the two relations have the same headers. This operation can be simulated in the basic operations as follows:

RS = R - (R - S)

### The natural join

The natural join is a binary operation that is written as R |X| S where R and S are relations. The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Dept
DeptName Manager
Finance George
Sales Harriet
Production Charles
Employee [X]  Dept
Name EmpId DeptName Manager
Harry 3415 Finance George
Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet

The natural join is arguably one of the most important operators since it allows the combination of relations that are associated by a foreign key. For example, in the above example there holds probably a foreign key from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. Note that this works because the foreign key holds between columns with the same name. If this is not the case such as in the foreign key from Dept.manager to Emp.emp-number then we have to rename these columns before we take the natural join. Such a join is sometimes also referred to as an equijoin (see θ-join).

More formally the semantics of the natural join is defined as follows:

R |X| S = { ts : tR, sS, fun(ts) }

where fun(r) is a predicate that is true for a binary relation r iff r is a functional binary relation. It is usually required that R and S must have at least one common attribute, but if this constraint is omitted then in that special case the natural join becomes exactly the Cartesian product as defined above.

The natural join can be simulated with the basic operations as follows. Assume that a1,...,an are the attribute names unique to R, b1,...,bm are the attribute names common to R and S and c1,...,cm are the attribute unique to S. Furthermore assume that the attribute names d1,...,dm are neither in R nor in S. In a first step we can now rename the common attribute names in S: : S' := ρd1/b1(...ρdm/bm( S)...) Then we take the Cartesian product and select the tuples that are to be joined: : T := σb1=d1(...σbm=dm(R × S')...) Finally we take a projection to get rid of the renamed attributes: : U := πa1,...,an,b1,...,bm,c1,...,cm(T)

### The division

The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. For an example see the tables CompletedTask, DBProject and their division:

Completed
Fred Database1
Fred Database2
Fred Compiler1
Eugene Database1
Eugene Compiler1
Sara Database1
Sara Database2
DBProject
Database1
Database2
Completed ÷ DBProject
Student
Fred
Sara

If DBProject contains all the tasks of the Database project then the result of the division above contains exactly all the students that have completed the Database project.

More formally the semantics of the division is defined as follows:

R ÷ S = { t[a1,...,an] : ∀ sS ( (t[a1,...,an] ∪ s) ∈ R) }

where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of t to this set. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.

The simulation of the division with the basic operations is as follows. We assume that a1,...,an are the attribute names unique to R and b1,...,bm are the attribute names of S. In the first step we project R on its unique attribute names and construction all combinations with tuples in S:

T := πa1,...,an(R) × S

In the next step we subtract R from this relation:

U := T - R

Note that in U we have the combinations that "should have been in R but weren't". So if we now take the projection on the attribute names unique to R then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R:

V := πa1,...,an(U)

So what remains to be done is take the projection of R on its unique attribute names and subtract those in V:

W := πa1,...,an(R) - V

### The generalized selection

The generalized selection is a unary operation that is written as σφ(R) where φ is a propositional formula that consists of atoms as allowed in the normal selection and the logical operators ∧ (and), ∨ (or) and ¬ (negation). This selection selects all those tuples in R for which φ holds. For an example consider the following tables where the first table gives the relation Person and the second the result of σAge≥30 ∧ Weight≤60(Person).

Person
Name Age Weight
Harry 34 80
Sally 28 64
George 29 70
Helena 54 54
Peter 34 80
σAge≥30∧Weight≤60(Person)
Name Age Weight
Helena 54 54

More formally the semantics of the generalized selection is defined as follows:

σφ(R) = { t : tR, φ(t) }

The result of the selection is only defined if the attribute names that it mentions are in the header of the relation that it operates upon.

The simulation of a generalized selection that is not a fundamental selection with the fundamental operators is defined by the following rules:

σφ∧ψ(R) = σφ(R) ∩ σψ(R)
σφ∨ψ(R) = σφ(R) ∪ σψ(R)
σ¬φ(R) = R - σφ(R)

### The θ-join and equijoin

If we want to combine tuples from two relations where the combination condition is not simply the equality of shared columns then it is convenient to have a more general form of join operator, which is the θ-join. The θ-join is a binary operation that is written as R |X|aθb S or R |X|aθv S where a and b are attribute names, θ a binary operation in the set {<, ≤, =, >, ≥}, v is a value constant and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy the equation. The result of the θ-join is only defined if the headers of S and R are disjoint, i.e., do not contain a common attribute.

The simulation of this operation in the fundamental operations is therefore as follows:

R |X|φ S = σφ(R × S)

In case the operator θ is the equality operator (=) then this join is also called an equijoin.

### The semijoin

The semijoin is joining similar to the natural join and written as R |X S where R and S are relations. Whereas the result of the semijoin is only the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. For an example consider the tables Employee and Dept and their semi join:

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Dept
DeptName Manager
Sales Harriet
Production Charles
Employee |X  Dept
Name EmpId DeptName
Sally 2241 Sales
Harriet 2202 Sales

More formally the semantics of the semijoin is defined as follows:

R |X S = { t : tR, sS, fun(ts) }

where fun(r) is as in the definition of natural join.

The semijoin can be simulated using the natural join as follows. Assume that a1,...,an are the attribute names of R, then it holds that:

R |X S = πa1,..,an(R |X| S)

Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.

## Operations for null values

The outer join is a binary operation that combines tuples from two relations. There are three types of outer joins.

### Left outer join

The left outer join is written as R =X S where R and S are relations. The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in R that have no matching tuples in S.

For an consider the tables Employee and Dept and their left outer join:

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Tim 1123 Executive
Dept
DeptName Manager
Sales Harriet
Production Charles
Employee =X  Dept
Name EmpId DeptName Manager
Harry 3415 Finance null
Sally 2241 Sales Harriet
George 3401 Finance null
Harriet 2202 Sales Harriet
Tim 1123 Executive null

In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.

Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in DeptName have tuples of Finance or Executive.

...(maths)...

The left outer join can be simulated using the natural join and set union as follows:

R =X S = R ∪ (R |X| S)

### Right outer join

The right outer join behaves almost identically to the left outer join, above, with the exception that all the values from the "other" relation appear in the resulting relation.

The right outer join is written as R X= S where R and S are relations. The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.

For an consider the tables Employee and Dept and their right outer join:

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Tim 1123 Executive
Dept
DeptName Manager
Sales Harriet
Production Charles
Employee X=  Dept
Name EmpId DeptName Manager
Sally 2241 Sales Harriet
Harriet 2202 Sales Harriet
ω ω Production Charles

In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.

Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the resulting relation where tuples in DeptName had tuples of Production.

...(maths)...

The right outer join can be simulated using the natural join and set union as follows:

RX=S = S ∪ (R |X| S)

### Outer join

The outer join or full outer join in effect combines the results of the left and right outer joins.

The full outer join is written as R =X= S where R and S are relations. The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.

For an consider the tables Employee and Dept and their full outer join:

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Tim 1123 Executive
Dept
DeptName Manager
Sales Harriet
Production Charles
Employee =X=  Dept
Name EmpId DeptName Manager
Harry 3415 Finance ω
Sally 2241 Sales Harriet
George 3401 Finance ω
Harriet 2202 Sales Harriet
Tim 1123 Executive ω
ω ω Production Charles

In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.

The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:

R=X=S = (R=XS) ∪ (RX=S) or R=X=S = RS ∪ (R |X| S)

## Operations for domain computations

### The aggregation operation

how is the aggregation operation performed?

## Use of algebraic properties for query optimization

• Art and Cultures
• Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
• Space and Astronomy