TDM 20100: Project 10 — SQL
Motivation: Now we learn how to write SQL queries that rely on three or more tables.
Context: We can use multiple JOIN
statements in SQL. Each JOIN
statement allows us to add data from an additional table. Each JOIN
needs its own ON
statement too.
Scope: We can use several JOIN
statements in the same query, to pull data from several tables.
Dataset(s)
This project will use the following dataset:
-
/anvil/projects/tdm/data/lahman/lahman.db
(Lahman baseball database)
Our page in The Examples Book about SQL (in general) is given here: the-examples-book.com/tools/SQL/
Now we learn how to join THREE OR MORE tables using multiple |
|
Questions
Using the seminar
kernel, if you run this line in a cell by itself:
%sql sqlite:////anvil/projects/tdm/data/lahman/lahman.db
If your kernel dies, then you need to re-run the line above. You also need to re-run this line at the start of any new Jupyter Lab session. |
Again, we remind students that the list of all of the tables in this database are:
AllstarFull
Appearances
AwardsManagers
AwardsPlayers
AwardsShareManagers
AwardsSharePlayers
Batting
BattingPost
CollegePlaying
Fielding
FieldingOF
FieldingOFsplit
FieldingPost
HallOfFame
HomeGames
Managers
ManagersHalf
Parks
People
Pitching
PitchingPost
Salaries
Schools
SeriesPost
Teams
TeamsFranchises
TeamsHalf
Please read the examples given here: the-examples-book.com/tools/SQL/lahman-examples-two-joins and then you are ready to start the questions for this project!
In the page of examples, sometimes we write |
Question 1 (2 pts)
Revisit Project 9, Question 1: Now add a third table, namely, the AwardsPlayers
table, so that we see the 4 Chicago Cubs who won a total of 7 awards in 2023.
For each of the 4 Chicago Cubs who won these 7 awards in 2023, in addition to printing their PlayerID
, hits (H
), home runs (HR
), nameFirst
, and nameLast
, please also add one more variable in the SELECT statement to print, namely, the awardID
from the AwardsPlayers
table.
When you join the |
-
Print the
playerID
,H
,HR
,nameFirst
,nameLast
, andawardID
values for all 4 of the players on the 2023 Chicago Cubs team who won 7 awards altogether during that year.
Question 2 (2 pts)
Revisit Project 9, Question 2: Now add a third table, namely, the People
table, so that we can extract the first and last name of the player:
Join the Batting
table to the Pitching
table by matching the playerID
, yearID
, and stint
columns, and in addition, now also join the People
table. There is only one person from 2023 appearing in both of these tables that hit more than 30 home runs. Print this person’s playerID
, home runs (HR
), first name (nameFirst
), and last name (nameLast
).
If you write |
-
Print the
PlayerID
, home runs (HR
), first name (nameFirst
), and last name (nameLast
) for the only person who is in both theBatting
andPitching
table in 2023 who had more than 30 home runs (HR
) in theBatting
table.
Question 3 (2 pts)
Revisit Project 9, Question 3, but this time study Omar Vizquel instead of Rickey Henderson: Now add a third table, namely, the Salaries
table, so that we can find the total amount of salary that Omar Vizquel earned in his career:
Join the People
and Batting
and Salaries
tables. Print only 1 row, corresponding to Omar Vizquel, displaying his playerID
, nameFirst
, nameLast
, SUM(R)
, SUM(SB)
, and SUM(salary)
values.
Omar Vizquel had 1445 runs scored altogether and 404 stolen bases, and he made more than 60 million dollars in salary during his career! Your solution will show his exact total salary during his career. |
-
Print only 1 row, corresponding to Omar Vizquel, displaying his
playerID
,nameFirst
,nameLast
,SUM(R)
,SUM(SB)
, andSUM(salary)
values.
Question 4 (2 pts)
-
Revisit Project 9, Question 4, but now join the
Batting
,People
, andAppearances
table, to find the top 5 players of all time, in terms of their total number of hits, in other words, according toSUM(H)
. For the top 5 players (in terms of the total number of hits), print theirplayerID
, theSUM(H)
(in other words, their total number of hits in their careers), theirnameFirst
andnameLast
values, and now also include a column that shows theSUM(G_all)
which is the total number of games played in their career. [Do not change the ordering from Project 9, Question 4; in other words, please continue to keep the results in order by the total number of hits.] -
Same question as 4b, but this time use home runs (according to
SUM(HR)
) instead of hits.
When you join the |
-
For the top 5 players (in terms of the total number of hits), print their
playerID
, theSUM(H)
(in other words, their total number of hits in their careers), theirnameFirst
andnameLast
values, and now also include a column that shows theSUM(G_all)
which is the total number of games played in their career. -
For the top 5 players (in terms of the total number of home runs), print their
playerID
, theSUM(HR)
(in other words, their total number of home runs in their careers), theirnameFirst
andnameLast
values, and now also include a column that shows theSUM(G_all)
which is the total number of games played in their career.
Question 5 (2 pts)
Join the CollegePlaying
and People
and HallOfFame
tables to find the playerID
, nameFirst
, nameLast
, yearID
, ballots
, needed
, votes
, and inducted
values for the only player who had schoolID = 'purdue'
in the CollegePlaying
table and who also appears in the HallOfFame
table. [There is only 1 such player!]
-
Print the
playerID
,nameFirst
,nameLast
,yearID
,ballots
,needed
,votes
, andinducted
values for the only player who hadschoolID = 'purdue'
in theCollegePlaying
table and who also appeared in theHallOfFame
table.
Submitting your Work
Now that you know how to join three tables together, you are very knowledgeable about SQL databases!
-
firstname-lastname-project10.ipynb
You must double check your You will not receive full credit if your |