Assignment 4: Pandas¶
In [1]:
Copied!
# do imports here
# do imports here
Part I: Basic Pandas¶
1) Use Pandas read_csv
function to open the ground-level ozone data at Rutgers site in 2021 ('EPA_AQS_Ozone_Rutgers_2022.csv') as a DataFrame¶
(Don't use any special options). Display the first few rows and the DataFrame info.
In [ ]:
Copied!
In [ ]:
Copied!
2) Re-read the data in such a way that the Date column is identified as date and Date is used as the index¶
In [ ]:
Copied!
In [ ]:
Copied!
3) Rename the column 'Daily Max 8-hour Ozone Concentration' as 'ozone'¶
In [ ]:
Copied!
4) Use describe
to get the basic statistics of ozone in 2022¶
In [ ]:
Copied!
5) Use nlargest
to get the 10 days with highest ozone concentration in 2022¶
In [ ]:
Copied!
6) Make a time series plot of daily ozone concentration in 2022¶
In [ ]:
Copied!
7) Make a time series plot of monthly average ozone concentration in 2022¶
In [ ]:
Copied!
8) Read the ozone data in 2021, and merge it with the data in 2022.¶
Remember to rename the ozone column in the new data before merging two dataframes.
In [ ]:
Copied!
In [ ]:
Copied!
9) Make a time series plot of monthly average ozone concentration from 2021 to 2022¶
In [ ]:
Copied!
Part II: Advanced Pandas with hurricane data¶
Use the following code to download a csv file of the NOAA IBTrACS hurrican dataset.
In [ ]:
Copied!
! wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.since1980.list.v04r00.csv
! wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.since1980.list.v04r00.csv
In [ ]:
Copied!
df=pd.read_csv('ibtracs.since1980.list.v04r00.csv',usecols=range(12), skiprows = [1], parse_dates=['ISO_TIME'],na_values=[-999, ' '])
df=pd.read_csv('ibtracs.since1980.list.v04r00.csv',usecols=range(12), skiprows = [1], parse_dates=['ISO_TIME'],na_values=[-999, ' '])
In [ ]:
Copied!
df.head()
df.head()
1) Get the unique values of the BASIN
, SUBBASIN
, and NATURE
columns¶
In [ ]:
Copied!
In [ ]:
Copied!
In [ ]:
Copied!
2) Rename the WMO_WIND
column to Wind
, and WMO_PRES
column to Pressure
¶
In [ ]:
Copied!
3) Get the 10 largest rows in the dataset by Wind
¶
In [ ]:
Copied!
You will notice some names are repeated.
4) Group the data on SID
and get the 10 largest hurricanes by maximum Wind
¶
In [ ]:
Copied!
In [ ]:
Copied!
5) Plot the count of all datapoints by Basin¶
as a bar chart
In [ ]:
Copied!
6) Plot the count of unique hurricanes by Basin¶
as a bar chart. (You will need to call groupby
twice.)
In [ ]:
Copied!
7) Make a hexbin
of the location of datapoints in Latitude and Longitude¶
In [ ]:
Copied!
8) Find Hurricane Sandy (from 2012) and plot its track as a scatter plot¶
Use wind speed to color the points.
In [ ]:
Copied!
9) Make time the index on your dataframe¶
In [ ]:
Copied!
10) Plot the count of all datapoints per year as a timeseries¶
You should use resample
In [ ]:
Copied!
11) Plot all tracks from the West Pacific (BASIN:'WP') in 2005. Color the tracks by hurricane SID.¶
First create a subset dataframe by searching SEASON
and BASIN
. You will probably have to iterate through a GroupBy
object from the subset dataframe.
In [ ]:
Copied!
12) Create a filtered dataframe that contains only data from the West Pacific ("WP") Basin¶
Use this for the rest of the assignment
In [ ]:
Copied!
13) Plot the number of datapoints per day¶
Make sure you figure is big enough to actually see the plot
In [ ]:
Copied!
14) Calculate the climatology of datapoint counts as a function of dayofyear
¶
In [ ]:
Copied!
In [ ]:
Copied!
In [ ]:
Copied!