Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv
, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
Import data from csv, import pandas library
# Import pandas library
import pandas as pd
# Read in the data
schools = pd.read_csv("./nyc_schools.csv")
Examine first 10 rows
# Preview the data
schools.head(10)
school_name | borough | building_code | average_math | average_reading | average_writing | percent_tested | |
---|---|---|---|---|---|---|---|
0 | New Explorations into Science, Technology and ... | Manhattan | M022 | 657 | 601 | 601 | NaN |
1 | Essex Street Academy | Manhattan | M445 | 395 | 411 | 387 | 78.9 |
2 | Lower Manhattan Arts Academy | Manhattan | M445 | 418 | 428 | 415 | 65.1 |
3 | High School for Dual Language and Asian Studies | Manhattan | M445 | 613 | 453 | 463 | 95.9 |
4 | Henry Street School for International Studies | Manhattan | M056 | 410 | 406 | 381 | 59.7 |
5 | Bard High School Early College | Manhattan | M097 | 634 | 641 | 639 | 70.8 |
6 | Urban Assembly Academy of Government and Law | Manhattan | M445 | 389 | 395 | 381 | 80.8 |
7 | Marta Valle High School | Manhattan | M025 | 438 | 413 | 394 | 35.6 |
8 | University Neighborhood High School | Manhattan | M446 | 437 | 355 | 352 | 69.9 |
9 | New Design High School | Manhattan | M445 | 381 | 396 | 372 | 73.7 |
Data info
# Data info
schools.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 375 entries, 0 to 374 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 school_name 375 non-null object 1 borough 375 non-null object 2 building_code 375 non-null object 3 average_math 375 non-null int64 4 average_reading 375 non-null int64 5 average_writing 375 non-null int64 6 percent_tested 355 non-null float64 dtypes: float64(1), int64(3), object(3) memory usage: 20.6+ KB
Q1. Which NYC schools have the best math results?
# Subsetting best math results that are at least 80% of the maximum possible score of 800 for math
school_math_filtered = schools[schools['average_math']>=0.8*800]
# Get the best_math_schools with school_name and average_math columns, sorted by average_math
best_math_schools = school_math_filtered.loc[:,['school_name','average_math']].sort_values('average_math', ascending=False)
# Q1. Display the top 30 schools with best math results
print(best_math_schools.head(30))
school_name average_math 88 Stuyvesant High School 754 170 Bronx High School of Science 714 93 Staten Island Technical High School 711 365 Queens High School for the Sciences at York Co... 701 68 High School for Mathematics, Science, and Engi... 683 280 Brooklyn Technical High School 682 333 Townsend Harris High School 680 174 High School of American Studies at Lehman College 669 0 New Explorations into Science, Technology and ... 657 45 Eleanor Roosevelt High School 641
Q2. What are the top 10 performing schools based on the combined SAT scores?
# Adding total_SAT column
schools['total_SAT']=schools['average_math']+schools['average_reading']+schools['average_writing']
# Get top_10_schools with highest total SAT scores
top_10_schools = schools.loc[:,['school_name','total_SAT']].sort_values('total_SAT', ascending=False).head(10)
# Q2.Display the top 10 schools with highest total SAT scores
print(top_10_schools)
school_name total_SAT 88 Stuyvesant High School 2144 170 Bronx High School of Science 2041 93 Staten Island Technical High School 2041 174 High School of American Studies at Lehman College 2013 333 Townsend Harris High School 1981 365 Queens High School for the Sciences at York Co... 1947 5 Bard High School Early College 1914 280 Brooklyn Technical High School 1896 45 Eleanor Roosevelt High School 1889 68 High School for Mathematics, Science, and Engi... 1889
Q3. Which single borough has the largest standard deviation in the combined SAT score?
# Get the count, mean and std of total_SAT column
largest_std_dev = schools.groupby("borough")['total_SAT'].agg(['count','mean','std'])
# Filter the largest standard deviation of 'total_SAT' column
largest_std_dev = largest_std_dev[largest_std_dev['std']==largest_std_dev['std'].max()].round(2)
# Rename the columns for clarity
largest_std_dev = largest_std_dev.rename(columns={"count": "num_schools", "mean": "average_SAT", "std": "std_SAT"})
# Change borough from index to column
largest_std_dev.reset_index(inplace=True)
# Q3. print the borough with the largest standard deviation in the combined SAT score
print(largest_std_dev)
borough num_schools average_SAT std_SAT 0 Manhattan 89 1340.13 230.29