Data Modelling and Databases (DMDB) Spring Semester 2017 Lecturer(s): Gustavo Alonso, Ce Zhang Assistant(s): Claude Barthels, Eleftherios Sidirourgos, Eliza Wszola, Ingo Müller, Kaan Kara, Renato Marroquín, Zsolt István
ETH Zurich Systems Group Date: March 20/March 27, 2017 Last update: August 16, 2017
Exercise 4: SQL The exercises marked with * will be discussed in the exercise session. You can solve the other exercises as practice, ask questions about them in the session, and hand them in for feedback. All exercises may be relevant for the exam. Ask Renato (
[email protected]) for feedback on this week’s exercise sheet or give it to the TA of your session (preferably stapled and with your e-mail address). In this exercise you will write SQL queries and execute them on the databases that you have to set up following the instructions found in the course website. For reference, the database schema is provided where primary keys are underlined and foreign keys are in italic.
1
Employee Database*
The following queries refer to the employee database. For reference, the schema of this database is shown in Figure 1. departments(dept_no:string, dept_name:string) dept_emp(emp_no:int, dept_no:string, from_date:date, to_date:date) dept_manager(emp_no:int, dept_no:string, from_date:date, to_date:date) employees(emp_no:int, birth_date:date, first_name:string, last_name:string, gender:enum, hire_date:date) salaries(emp_no:int, salary:int, from_date:date, to_date:date) title(emp_no:int, title:string, from_date:date, to_date:date) Figure 1: Employee database schema
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr:
1.1
Average salaries per department
The following three queries try to compute the average salary of all current employees per department and to display department names and salary averages in descending order of the average salary. Mark the correct solution:
SELECT d.dept_name, AVG(s.salary) AS avgs FROM employees e JOIN salaries s ON e.emp_no = s.emp_no JOIN dept_emp de ON de.emp_no = s.emp_no JOIN departments d ON d.dept_no = de.dept_no WHERE s.to_date > NOW() AND de.to_date > NOW() GROUP BY d.dept_name ORDER BY avgs DESC
1.2
SELECT d.dept_name, AVG(s.salary) AS avgs FROM employees e JOIN salaries s ON e.emp_no = s.emp_no JOIN dept_emp de ON de.emp_no = s.emp_no JOIN departments d ON d.dept_no = de.dept_no GROUP BY d.dept_name ORDER BY avgs DESC
SELECT e.dept_no, AVG(s.salary) AS avgs FROM employees e JOIN salaries s ON e.emp_no = s.emp_no JOIN dept_emp de ON de.emp_no = s.emp_no WHERE s.to_date > NOW() AND de.to_date > NOW() ORDER BY avgs DESC
Salaries
Consider the following query, which statement describes the best what the query is meant to do. SELECT e.first_name, e.last_name, s.salary, avgst.avgs FROM employees e JOIN salaries s ON s.emp_no = e.emp_no JOIN dept_manager dm ON dm.emp_no = e.emp_no JOIN ( SELECT de.dept_no, AVG(s.salary) AS avgs FROM employees e JOIN salaries s ON e.emp_no = s.emp_no JOIN dept_emp de ON de.emp_no = s.emp_no WHERE s.to_date > NOW() AND de.to_date > NOW() GROUP BY de.dept_no ) avgst ON avgst.dept_no = dm.dept_no WHERE dm.to_date > NOW()
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr: AND s.to_date > NOW() AND s.salary > avgst.avgs
It returns names of managers (first_name and last_name) who earn more than the average salary of all employees that they have ever managed. It also returns department’s name (dept_name), manager’s salary and department’s average salary.
It returns names of employees (first_name and last_name) who earn the most in their department. It also returns department’s name (dept_name), employees’ salary and department’s average salary.
It returns names of managers (first_name and last_name) who earn more than the average salary of the current employees that they manage. It also returns department’s name (dept_name), manager’s salary and department’s average salary.
It returns names of employees (first_name and last_name) who earn no more than the average salary of the current employees of their same department. It also returns department’s name (dept_name), and department’s average salary.
1.3
Employee Names (Substrings)
Which query or queries return the correct count of employees in the database whose first name starts with the letter B?
SELECT count(first_name) FROM employees WHERE first_name LIKE ’B%’
1.4
SELECT count(first_name) FROM employees WHERE substring(first_name,0,1) in (’B’)
SELECT count(first_name) FROM employees WHERE first_name LIKE ’B’ OR first_name LIKE ’b’
SELECT count(first_name) FROM employees WHERE substring(first_name,1,1)=’B’
First employees query
Write a query that returns a list of the 10 first employees (first name (first_name) and last name (last_name)) which the company hired (hire_date). The list should be sorted in alphabetical order of the employees’ last names. Hint: use the LIMIT keyword to constrain the number of output records.
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr:
1.5
1988
Write a query that determines for each department the average salary of anyone starting or receiving a salary change in 1988 from that department. It should display the department number, its name and its average.
1.6
Current managers
Return the first (first_name) and last name (last_name) of all current managers, their title, their salary, and the department name (dept_name) for which they are responsible. Hint: Use the NOW() function to obtain the current date and the from_date and to_date fields in the salary and titles relations to find the corresponding records that are valid at the current date.
2
ZVV Database
The following queries refer to the ZVV database. For reference, the schema of this database is shown in Figure 2. stops(stop_id:int, stop_name:string, stop_lat:float, stop_lon:float) stop_times(trip_id:int , departure_time:string, arrival_time:string, stop_id:int , stop_sequence:int) trips(trip_id:int, tram_number:int, trip_headsign:string) Figure 2: ZVV database schema
2.1
Trips and tram lines
Given the following query which is the statement that describes best the operation it performs. SELECT COUNT(*) AS count_trips, tram_number FROM trips
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr: GROUP BY tram_number ORDER BY count_trips DESC;
Finds how many tram lines (tram_number) are in a trip and orders its result in descending order.
Finds the number of trips each tram line (tram_number) makes and orders its result by number of trips in descending order.
Finds and counts the number of tram lines (tram_number) and orders them in descending order.
Finds the trips each tram line (tram_number) makes and orders its result by number of trips in ascending order.
2.2
Busiest tram stations
The following queries try to compute the top 10 stations according to the number of times that any tram stops at that station.
SELECT s.stop_id, s.stop_name FROM stop_times st, stops s WHERE s.stop_id = st.stop_id GROUP BY st.stop_id, s.stop_name ORDER BY count_stations desc;
3
SELECT count_stations, s.stop_name FROM stop_times st, stops s WHERE s.stop_id = st.stop_id GROUP BY st.stop_id ORDER BY count_stations desc LIMIT 10;
SELECT COUNT(*) as count_stations, s.stop_name FROM stop_times st, stops s WHERE s.stop_id = st.stop_id GROUP BY st.stop_id, s.stop_name ORDER BY count_stations desc LIMIT 10;
SELECT COUNT(*) as count_stations, s.stop_name FROM stop_times st, stops s WHERE s.stop_id = st.stop_id GROUP BY st.stop_id ORDER BY count_stations desc;
TPC-H Database
The following queries refer to the TPC-H database. For reference, the schema of this database is shown in Figure 3.
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr: Customer (customerid:integer, customername:string, nationid:integer ) Nation (nationid:integer, nationname:string, regionid:integer ) Orders (orderid:integer, customerid:integer , orderdate:date, orderpriority:string) Orderline (orderid:integer , olid:integer, partid:integer , supplierid:integer , olquantity:float, oldiscount:float, olreturnflag:char, olshipdate:date, olcommitdate:date, olreceiptdate:date) Part (partid:integer, partname:string, partbrand:string, parttype:string, partretailprice:float) Region (regionid:integer, regionname:string) Supplier (supplierid:integer, suppliername:string, nationid:integer ) Supplypart (partid:integer , supplierid:integer , supplypartcost:float) Figure 3: TPC-H Schema
3.1
Customers from Japan
Return how many customers come from ’JAPAN’ (nationname).
3.2
Nations from multiple regions *
Given the following query that tries to list all nations in the regions of ’AMERICA’, ’AFRICA’ or ’ASIA’. SELECT nationname FROM nation, region WHERE __________________ ________________________ ________________________; Which of the following predicates makes the query perform the task correctly:
regionname = ’AMERICA’ OR regionname = ’AFRICA’ OR regionname = ’ASIA’
regionname = ’AMERICA’ AND regionname = ’AFRICA’ AND regionname = ’ASIA’
nation.regionid = region.regionid AND (regionname = ’AMERICA’ OR regionname = ’AFRICA’ OR regionname = ’ASIA’)
nation.regionid = region.regionid AND (regionname = ’AMERICA’ AND regionname = ’AFRICA’ AND regionname = ’ASIA’)
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr:
3.3
Exclusive supplier *
How many parts are supplied by the supplier with suppliername ’Supplier#000000001’ but not supplied by the supplier with suppliername ’Supplier#000000002’?
3.4
First order *
Consider the following query that retrieves the date (orderdate) of the first order made by customer with name (customername) ’Customer#000000001’ SELECT orderdate FROM orders, customer WHERE orders.customerid = customer.customerid AND customername = ’Customer#000000001’ ORDER BY orderdate ASC LIMIT 1; Choose the correct statements: A. No join is actually needed. We could use only the data from the orders table. B. The join between the two relations is done on the WHERE clause. C. The extra predicate checking for ’Customer#00000001’ on the WHERE clause is redundant as we are going to retrieve it anyway. D. The ORDER BY statement is actually not needed to get the data needed. E. The ORDER BY and LIMIT statements help us getting the actual first order.
3.5
Customers with local shipments
Find the number of customers that have made at least one order where at least one part is supplied by the same nation as the customer.
Data Modelling and Databases (DMDB), Spring Semester 2017, Exercise 4: SQL, March 20/March 27, 2017 Legi-Nr:
3.6
Revenue and profit margin
Find the total revenue and profit margin (in percent) per year for all orders and orderlines. Revenue per orderline is defined as olquantity *partretailprice*(1 - oldiscount). Profit where cost per orderline is olquantity * supplypartcost. margin is calculated as revenue−costs revenue Sort the final result in ascending order of the year. Hint: use EXTRACT(YEAR FROM orderdate) to extract the year.