TDM 20100: Project 10 — SQL
Motivation: Now we learn how to write SQL queries that rely on three or more tables.
Context: We can use multiple JOIN statements in SQL. Each JOIN statement allows us to add data from an additional table. Each JOIN needs its own ON statement too.
Scope: We can use several JOIN statements in the same query, to pull data from several tables.
Dataset(s)
This project will use the following dataset:
-
/anvil/projects/tdm/data/lahman/lahman.db(Lahman baseball database)
Our page in The Examples Book about SQL (in general) is given here: the-examples-book.com/tools/SQL/
|
Now we learn how to join THREE OR MORE tables using multiple |
|
|
Questions
Using the seminar kernel, if you run this line in a cell by itself:
%sql sqlite:////anvil/projects/tdm/data/lahman/lahman.db
|
If your kernel dies, then you need to re-run the line above. You also need to re-run this line at the start of any new Jupyter Lab session. |
Again, we remind students that the list of all of the tables in this database are:
AllstarFull
Appearances
AwardsManagers
AwardsPlayers
AwardsShareManagers
AwardsSharePlayers
Batting
BattingPost
CollegePlaying
Fielding
FieldingOF
FieldingOFsplit
FieldingPost
HallOfFame
HomeGames
Managers
ManagersHalf
Parks
People
Pitching
PitchingPost
Salaries
Schools
SeriesPost
Teams
TeamsFranchises
TeamsHalf
Please read the examples given here: the-examples-book.com/tools/SQL/lahman-examples-two-joins and then you are ready to start the questions for this project!
|
In the page of examples, sometimes we write |
Question 1 (2 pts)
Revisit Project 9, Question 1: Now add a third table, namely, the AwardsPlayers table, so that we see the 4 Chicago Cubs who won a total of 7 awards in 2023.
For each of the 4 Chicago Cubs who won these 7 awards in 2023, in addition to printing their PlayerID, hits (H), home runs (HR), nameFirst, and nameLast, please also add one more variable in the SELECT statement to print, namely, the awardID from the AwardsPlayers table.
|
When you join the |
-
Print the
playerID,H,HR,nameFirst,nameLast, andawardIDvalues for all 4 of the players on the 2023 Chicago Cubs team who won 7 awards altogether during that year.
Question 2 (2 pts)
Revisit Project 9, Question 2: Now add a third table, namely, the People table, so that we can extract the first and last name of the player:
Join the Batting table to the Pitching table by matching the playerID, yearID, and stint columns, and in addition, now also join the People table. There is only one person from 2023 appearing in both of these tables that hit more than 30 home runs. Print this person’s playerID, home runs (HR), first name (nameFirst), and last name (nameLast).
|
If you write |
-
Print the
PlayerID, home runs (HR), first name (nameFirst), and last name (nameLast) for the only person who is in both theBattingandPitchingtable in 2023 who had more than 30 home runs (HR) in theBattingtable.
Question 3 (2 pts)
Revisit Project 9, Question 3, but this time study Omar Vizquel instead of Rickey Henderson: Now add a third table, namely, the Salaries table, so that we can find the total amount of salary that Omar Vizquel earned in his career:
Join the People and Batting and Salaries tables. Print only 1 row, corresponding to Omar Vizquel, displaying his playerID, nameFirst, nameLast, SUM(R), SUM(SB), and SUM(salary) values.
|
Omar Vizquel had 1445 runs scored altogether and 404 stolen bases, and he made more than 60 million dollars in salary during his career! Your solution will show his exact total salary during his career. |
-
Print only 1 row, corresponding to Omar Vizquel, displaying his
playerID,nameFirst,nameLast,SUM(R),SUM(SB), andSUM(salary)values.
Question 4 (2 pts)
-
Revisit Project 9, Question 4, but now join the
Batting,People, andAppearancestable, to find the top 5 players of all time, in terms of their total number of hits, in other words, according toSUM(H). For the top 5 players (in terms of the total number of hits), print theirplayerID, theSUM(H)(in other words, their total number of hits in their careers), theirnameFirstandnameLastvalues, and now also include a column that shows theSUM(G_all)which is the total number of games played in their career. [Do not change the ordering from Project 9, Question 4; in other words, please continue to keep the results in order by the total number of hits.] -
Same question as 4b, but this time use home runs (according to
SUM(HR)) instead of hits.
|
When you join the |
-
For the top 5 players (in terms of the total number of hits), print their
playerID, theSUM(H)(in other words, their total number of hits in their careers), theirnameFirstandnameLastvalues, and now also include a column that shows theSUM(G_all)which is the total number of games played in their career. -
For the top 5 players (in terms of the total number of home runs), print their
playerID, theSUM(HR)(in other words, their total number of home runs in their careers), theirnameFirstandnameLastvalues, and now also include a column that shows theSUM(G_all)which is the total number of games played in their career.
Question 5 (2 pts)
Join the CollegePlaying and People and HallOfFame tables to find the playerID, nameFirst, nameLast, yearID, ballots, needed, votes, and inducted values for the only player who had schoolID = 'purdue' in the CollegePlaying table and who also appears in the HallOfFame table. [There is only 1 such player!]
-
Print the
playerID,nameFirst,nameLast,yearID,ballots,needed,votes, andinductedvalues for the only player who hadschoolID = 'purdue'in theCollegePlayingtable and who also appeared in theHallOfFametable.
Submitting your Work
Now that you know how to join three tables together, you are very knowledgeable about SQL databases!
-
firstname-lastname-project10.ipynb
|
You must double check your You will not receive full credit if your |