UNION, INTERSECTION, EXCEPT

R UNION S

R INTERSECT S

R EXCEPT S

UNION

SQL uses bag semantics (not set semantics)

Find salespersons in Chicago that sold at least one table lamp.

(SELECT salpers_id

FROM Salesperson

WHERE office=’Chicago’)

UNION/~~Except/Intersect~~

(SELECT salpers_id

FROM Sale, Product

WHERE Sale.prod_id = Product.prod_id

and Product.prod_desc = ‘Table Lamp’);

Except, intersect doesn’t work in MySQL. Why?

We already have an intersect operator… called **IN**, except is **NOT IN**

SELECT salpers_id

FROM Salesperson

WHERE office=’Chicago’ and salpers_ID IN

(SELECT salpers_id

FROM Sale, Product

WHERE Sale.prod_id = Product.prod_id

and Product.prod_desc = ‘Table Lamp’);

#### Controlling duplicates

SELECT does not automatically eliminate duplicates like Projection in Relational Algebra.

SELECT DISTINCT … does

Example:

SELECT DISTINCT qty

FROM Sale;

There are many functions in SQL (and its dialects) that take in a bag of tuples and returns a result.

sum, avg, count, min, max

These can be applied to a column to produce an aggregation on that column.

Example:

SELECT avg(qty)

FROM Sale

Remember this uses bag semantics, we can apply distinct to get set semantics in SQL:

SELECT avg(distinct(qty))

FROM Sale

Nulls are ignored in aggregation, but an aggregation of only null values is null.

How many salespersons are there?

SELECT count(*)

FROM Salesperson

How many manager’s are there?

SELECT count(distinct(manager_id))

FROM Salesperson

## Grouping

SELECT

FROM

WHERE

GROUP BY

The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.

What is the avg commission per office?

SELECT office, avg(comm)

FROM salesperson

GROUP BY office;

Compute gross revenue for each product:

SELECT product.prod_desc, sum(price*qty)

FROM product, sale

WHERE product.prod_ID = sale.prod_id

GROUP BY product.prod_desc;

Whenever aggregation is used each element in SELECT must be either

- Aggregated or
- Grouped

SELECT salpers_id, avg(qty)

FROM Sale;

Does this work? No

SELECT~~salpers_id,~~avg(qty)

FROM Sale;

or

SELECT salpers_id, avg(qty)

FROM Sale

GROUP BY salpers_id;

## HAVING

The having clause is like a where clause from groups. It follows the GROUP BY clause

SELECT

FROM

WHERE

GROUP BY

HAVING

Condition applies to each group, and is used on aggregate value or grouped attribute after the grouping has occurred.

Find salespersons’ average sale quantity for those salespersons who have at least three sales.

SELECT salpers_id, avg(qty)

FROM sale

GROUP BY salpers_id

HAVING count(salpers_id) > 3

This can get very very complex.

What is the total revenue of the offices that made at least 3 sales?

SELECT office, SUM(price*qty)

FROM salesperson, sale, product

WHERE salesperson.salpers_id = sale.salpers_id and sale.prod_id = product.prod_id

GROUP BY office

HAVING count(*) >= 3;

SELECT $S$

FROM $R_1,\ldots,R_n$

WHERE $C_1$

GROUP BY $a_1,\ldots,a_k$

HAVING $C_2$

- Compute From-WHERE part, obtain table with all attributes in $R_1,\ldots,R_n$
- Group by the attributes $a_1,\ldots,a_k$
- Compute the aggregates $C_2$ and keep only groups satisfying $C_2$.
- Compute aggregates in $S$ and return the result.