Thursday, April 26, 2012

Relational division

Intuitive meaning

Relational division is one of the core operators in relational algebra. Using it, one can answer questions such as "What are all of the that have/did/associate with ?"

Examples

Example 1: what are all the combinations of (n1, n2) that have values of (1, 1) and (1, 2) for (n3, n4)?
+----+----+----+----+
| n1 | n2 | n3 | n4 |
+----+----+----+----+
|  1 |  1 |  1 |  1 |
|  1 |  1 |  1 |  2 |
|  1 |  1 |  1 |  3 |
|  1 |  1 |  1 |  4 |
|  1 |  1 |  2 |  1 |     +----+----+     +----+----+
|  1 |  1 |  2 |  2 |     | n3 | n4 |     | n1 | n2 |
|  1 |  1 |  2 |  3 |  %  +----+----+  =  +----+----+
|  1 |  2 |  1 |  1 |     |  1 |  1 |     |  1 |  1 |
|  1 |  2 |  1 |  2 |     |  1 |  2 |     |  1 |  2 |
|  1 |  2 |  2 |  3 |     +----+----+     +----+----+
|  1 |  2 |  2 |  4 |
|  1 |  3 |  1 |  3 |
|  1 |  3 |  1 |  4 |
|  2 |  1 |  1 |  1 |
|  2 |  2 |  1 |  2 |
|  2 |  2 |  2 |  1 |
|  2 |  2 |  2 |  2 |
+----+----+----+----+
Example 2: what are all the combinations of (n1, n2) that have values of (1, 1) for (n3, n4)?
+----+----+----+----+
| n1 | n2 | n3 | n4 |
+----+----+----+----+
|  1 |  1 |  1 |  1 |
|  1 |  1 |  1 |  2 |
|  1 |  1 |  1 |  3 |
|  1 |  1 |  1 |  4 |
|  1 |  1 |  2 |  1 |     +----+----+     +----+----+
|  1 |  1 |  2 |  2 |     | n3 | n4 |     | n1 | n2 |
|  1 |  1 |  2 |  3 |  %  +----+----+  =  +----+----+
|  1 |  2 |  1 |  1 |     |  1 |  1 |     |  1 |  1 |
|  1 |  2 |  1 |  2 |     +----+----+     |  1 |  2 |
|  1 |  2 |  2 |  3 |                     |  2 |  1 |
|  1 |  2 |  2 |  4 |                     +----+----+
|  1 |  3 |  1 |  3 |
|  1 |  3 |  1 |  4 |
|  2 |  1 |  1 |  1 |
|  2 |  2 |  1 |  2 |
|  2 |  2 |  2 |  1 |
|  2 |  2 |  2 |  2 |
+----+----+----+----+

The code: template

Here's a template for relational division. Each numbered box must be filled according to the exact tables used and columns desired in the result:
select 1 from (
  select 
    2.*  -- don't forget the ".*" !!
  from 2
  inner join 3
  4
) as r
group by 5
having count(*) = (select count(*) from 3);
Explanation of the numbered boxes:
  1. columns in outermost select: set of dividend's columns that ARE NOT in the divisor. Also known as "quotient" columns
  2. dividend table: there should be no duplicates
  3. divisor table: its set of columns must form a PROPER subset of dividend's columns*
  4. join condition: all divisor columns must compare equal to their corresponding dividend columns
  5. group by columns: same as 1
Note that the purpose of the "having" clause is to allow selection only of those groups with number of rows equal to the number of rows in divisor.

How it works: (Example 2 from above)

dividend divisor Step 1: join dividend to divisor on divisor's columns (n3, n4) Step 2: group join results by quotient columns Step 3: select groups that have the right "count", and project onto quotient columns
+----+----+----+----+
| n1 | n2 | n3 | n4 |
+----+----+----+----+
|  1 |  1 |  1 |  1 |
|  1 |  1 |  1 |  2 |
|  1 |  1 |  1 |  3 |
|  1 |  1 |  1 |  4 |
|  1 |  1 |  2 |  1 | 
|  1 |  1 |  2 |  2 | 
|  1 |  1 |  2 |  3 | 
|  1 |  2 |  1 |  1 | 
|  1 |  2 |  1 |  2 | 
|  1 |  2 |  2 |  3 | 
|  1 |  2 |  2 |  4 |
|  1 |  3 |  1 |  3 |
|  1 |  3 |  1 |  4 |
|  2 |  1 |  1 |  1 |
|  2 |  2 |  1 |  2 |
|  2 |  2 |  2 |  1 |
|  2 |  2 |  2 |  2 |
+----+----+----+----+
+----+----+
| n3 | n4 |
+----+----+
|  1 |  1 |
|  1 |  2 |
+----+----+
+----+----+----+----+
| n1 | n2 | n3 | n4 |
+----+----+----+----+
|  1 |  1 |  1 |  1 |
|  1 |  1 |  1 |  2 |
|  1 |  2 |  1 |  1 |
|  1 |  2 |  1 |  2 |
|  2 |  1 |  1 |  1 |
|  2 |  2 |  1 |  2 |
+----+----+----+----+
+----+----+----------+
| n1 | n2 | count(*) |
+----+----+----------+
|  1 |  1 |        2 |
|  1 |  2 |        2 |
|  2 |  1 |        1 |
|  2 |  2 |        1 |
+----+----+----------+
+----+----+
| n1 | n2 |
+----+----+
|  1 |  1 |
|  1 |  2 |
+----+----+

The code: filled-in template

Using the tables shown in the examples above, and "dividend" and "divisor" as the names of the dividend and divisor tables, respectively, we can fill in the template:
select r.n1, r.n2 from (
  select 
    dividend.*
  from dividend
  inner join divisor
  using (n3, n4)
) as r 
group by r.n1, r.n2
having count(*) = (select count(*) from divisor);

Other methods

There is another popular way to do relational division, involving a double-negative ("... where not exists (where not in ...) ..."). Although others find it easier to understand, I do not. Also, one code use a code generator or stored procedure that parameterizes production of a SQL string, automatically filling in the template. I didn't use this approach because that opens the door to a host of SQL injection-related problems.

No comments:

Post a Comment