TDM 20100: Project 9 — SQL
Motivation: Now we learn how to write SQL queries that rely on more than one table.
Context: The JOIN
in SQL enables us to make queries that rely on information from multiple SQL tables. It is absolutely important to tell SQL which rows need to agree, by including the ON
portion of the JOIN
statement.
Scope: SQLite queries use a JOIN
to gather information from more than one table.
Dataset(s)
This project will use the following dataset:
-
/anvil/projects/tdm/data/lahman/lahman.db
(Lahman baseball database)
Our page in The Examples Book about SQL (in general) is given here: the-examples-book.com/tools/SQL/
Before you begin the project, try the examples from the Lahman baseball database found on this webpage of The Examples Book: the-examples-book.com/tools/SQL/lahman-examples-one-join All of these examples rely on one |
Questions
Using the seminar
kernel, if you run this line in a cell by itself:
%sql sqlite:////anvil/projects/tdm/data/lahman/lahman.db
If your kernel dies, then you need to re-run the line above. You also need to re-run this line at the start of any new Jupyter Lab session. |
Again, we remind students that the list of all of the tables in this database are:
AllstarFull
Appearances
AwardsManagers
AwardsPlayers
AwardsShareManagers
AwardsSharePlayers
Batting
BattingPost
CollegePlaying
Fielding
FieldingOF
FieldingOFsplit
FieldingPost
HallOfFame
HomeGames
Managers
ManagersHalf
Parks
People
Pitching
PitchingPost
Salaries
Schools
SeriesPost
Teams
TeamsFranchises
TeamsHalf
Please read the examples given here: the-examples-book.com/tools/SQL/lahman-examples-one-join and then you are ready to start the questions for this project!
In the page of examples, sometimes we write |
Question 1 (2 pts)
Join the Batting
table to the People
by matching the playerID
values in these two tables. For all 48 players on the 2023 Chicago Cubs team, print their PlayerID
(from either table), as well as their hits (H
) and home runs (HR
) from the Batting
table, and also their nameFirst
and nameLast
from the People
table.
-
Print the
playerID
,H
,HR
,nameFirst
, andnameLast
values for all 48 of the players on the 2023 Chicago Cubs team.
Question 2 (2 pts)
Join the Batting
table to the Pitching
table by matching the playerID
, yearID
, and stint
columns. There is only one person from 2023 appearing in both of these tables that hit more than 30 home runs. Print this person’s playerID
and the number of home runs (HR
) that they attained (from the Batting
table).
-
Print the
PlayerID
and the number of home runs (HR
) from theBatting
table for the only person who is in both theBatting
andPitching
table in 2023 who had more than 30 home runs (HR
) in theBatting
table.
Question 3 (2 pts)
In this question, we will accomplish everything from Project 8, Question 3abc in just one query.
Join the People
and Batting
table by matching the playerID
values in these two tables. Print only 1 row, corresponding to Rickey Henderson, displaying his playerID
, nameFirst
, nameLast
, SUM(R)
, and SUM(SB)
values.
He had 2295 runs scored altogether and 1406 stolen bases. |
-
Print only 1 row, corresponding to Rickey Henderson, displaying his
playerID
,nameFirst
,nameLast
,SUM(R)
, andSUM(SB)
values.
Question 4 (2 pts)
-
As in Project 8, Question 4a, use the
Batting
table but now alsoJOIN
thePeople
table (by matching theplayerID
values), to find the top 5 players of all time, in terms of their total number of hits, in other words, according toSUM(H)
. For the top 5 players (in terms of the total number of hits), print theirplayerID
, theSUM(H)
(in other words, their total number of hits in their careers), and theirnameFirst
andnameLast
values. -
Same question as 4b, but this time use home runs (according to
SUM(HR)
) instead of hits.
-
For the top 5 players (in terms of the total number of hits), print their
playerID
, theSUM(H)
(in other words, their total number of hits in their careers), and theirnameFirst
andnameLast
values. -
For the top 5 players (in terms of the total number of home runs), print their
playerID
, theSUM(HR)
(in other words, their total number of home runs in their careers), and theirnameFirst
andnameLast
values.
Question 5 (2 pts)
Join the CollegePlaying
and People
tables on the playerID
values. Print the DISTINCT(playerID)
and nameFirst
and nameLast
values from the People
table for each of the 15 distinct players that have schoolID = 'purdue'
in the CollegePlaying
table.
-
Print the
DISTINCT(playerID)
andnameFirst
andnameLast
values from thePeople
table for each of the 15 distinct players that haveschoolID = 'purdue'
in theCollegePlaying
table.
Submitting your Work
We hope that you enjoyed learning about databases this week! Please let us know if we can assist, as you are learning these new ideas!
-
firstname-lastname-project9.ipynb
You must double check your You will not receive full credit if your |