Assignment 4: Pandas¶

In [1]:

Copied!

# do imports here
# do imports here

Part I: Basic Pandas¶

1) Use Pandas `read_csv` function to open the ground-level ozone data at Rutgers site in 2021 ('EPA_AQS_Ozone_Rutgers_2022.csv') as a DataFrame¶

(Don't use any special options). Display the first few rows and the DataFrame info.

In [ ]:

2) Re-read the data in such a way that the Date column is identified as date and Date is used as the index¶

In [ ]:

3) Rename the column 'Daily Max 8-hour Ozone Concentration' as 'ozone'¶

In [ ]:

4) Use `describe` to get the basic statistics of ozone in 2022¶

In [ ]:

5) Use `nlargest` to get the 10 days with highest ozone concentration in 2022¶

In [ ]:

6) Make a time series plot of daily ozone concentration in 2022¶

In [ ]:

7) Make a time series plot of monthly average ozone concentration in 2022¶

In [ ]:

8) Read the ozone data in 2021, and merge it with the data in 2022.¶

Remember to rename the ozone column in the new data before merging two dataframes.

In [ ]:

9) Make a time series plot of monthly average ozone concentration from 2021 to 2022¶

In [ ]:

Part II: Advanced Pandas with hurricane data¶

Use the following code to download a csv file of the NOAA IBTrACS hurrican dataset.

In [ ]:

Copied!

! wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.since1980.list.v04r00.csv
! wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.since1980.list.v04r00.csv

In [ ]:

Copied!

df=pd.read_csv('ibtracs.since1980.list.v04r00.csv',usecols=range(12), skiprows  = [1], parse_dates=['ISO_TIME'],na_values=[-999, ' '])
df=pd.read_csv('ibtracs.since1980.list.v04r00.csv',usecols=range(12), skiprows  = [1], parse_dates=['ISO_TIME'],na_values=[-999, ' '])

In [ ]:

Copied!

df.head()
df.head()

1) Get the unique values of the `BASIN`, `SUBBASIN`, and `NATURE` columns¶

In [ ]:

2) Rename the `WMO_WIND` column to `Wind`, and `WMO_PRES` column to `Pressure`¶

In [ ]:

3) Get the 10 largest rows in the dataset by `Wind`¶

In [ ]:

You will notice some names are repeated.

4) Group the data on `SID` and get the 10 largest hurricanes by maximum `Wind`¶

In [ ]:

5) Plot the count of all datapoints by Basin¶

as a bar chart

In [ ]:

6) Plot the count of unique hurricanes by Basin¶

as a bar chart. (You will need to call groupby twice.)

In [ ]:

7) Make a `hexbin` of the location of datapoints in Latitude and Longitude¶

In [ ]:

8) Find Hurricane Sandy (from 2012) and plot its track as a scatter plot¶

Use wind speed to color the points.

In [ ]:

9) Make time the index on your dataframe¶

In [ ]:

10) Plot the count of all datapoints per year as a timeseries¶

You should use resample

In [ ]:

11) Plot all tracks from the West Pacific (BASIN:'WP') in 2005. Color the tracks by hurricane SID.¶

First create a subset dataframe by searching SEASON and BASIN. You will probably have to iterate through a GroupBy object from the subset dataframe.

In [ ]:

12) Create a filtered dataframe that contains only data from the West Pacific ("WP") Basin¶

Use this for the rest of the assignment

In [ ]:

13) Plot the number of datapoints per day¶

Make sure you figure is big enough to actually see the plot

In [ ]:

14) Calculate the climatology of datapoint counts as a function of `dayofyear`¶

In [ ]:

Assignment 4: Pandas¶

Part I: Basic Pandas¶

1) Use Pandas read_csv function to open the ground-level ozone data at Rutgers site in 2021 ('EPA_AQS_Ozone_Rutgers_2022.csv') as a DataFrame¶

2) Re-read the data in such a way that the Date column is identified as date and Date is used as the index¶

3) Rename the column 'Daily Max 8-hour Ozone Concentration' as 'ozone'¶

4) Use describe to get the basic statistics of ozone in 2022¶

5) Use nlargest to get the 10 days with highest ozone concentration in 2022¶

6) Make a time series plot of daily ozone concentration in 2022¶

7) Make a time series plot of monthly average ozone concentration in 2022¶

8) Read the ozone data in 2021, and merge it with the data in 2022.¶

9) Make a time series plot of monthly average ozone concentration from 2021 to 2022¶

Part II: Advanced Pandas with hurricane data¶

1) Get the unique values of the BASIN, SUBBASIN, and NATURE columns¶

2) Rename the WMO_WIND column to Wind, and WMO_PRES column to Pressure¶

3) Get the 10 largest rows in the dataset by Wind¶

4) Group the data on SID and get the 10 largest hurricanes by maximum Wind¶

5) Plot the count of all datapoints by Basin¶

6) Plot the count of unique hurricanes by Basin¶

7) Make a hexbin of the location of datapoints in Latitude and Longitude¶

8) Find Hurricane Sandy (from 2012) and plot its track as a scatter plot¶

9) Make time the index on your dataframe¶

10) Plot the count of all datapoints per year as a timeseries¶

11) Plot all tracks from the West Pacific (BASIN:'WP') in 2005. Color the tracks by hurricane SID.¶

12) Create a filtered dataframe that contains only data from the West Pacific ("WP") Basin¶

13) Plot the number of datapoints per day¶

14) Calculate the climatology of datapoint counts as a function of dayofyear¶

1) Use Pandas `read_csv` function to open the ground-level ozone data at Rutgers site in 2021 ('EPA_AQS_Ozone_Rutgers_2022.csv') as a DataFrame¶

4) Use `describe` to get the basic statistics of ozone in 2022¶

5) Use `nlargest` to get the 10 days with highest ozone concentration in 2022¶

1) Get the unique values of the `BASIN`, `SUBBASIN`, and `NATURE` columns¶

2) Rename the `WMO_WIND` column to `Wind`, and `WMO_PRES` column to `Pressure`¶

3) Get the 10 largest rows in the dataset by `Wind`¶

4) Group the data on `SID` and get the 10 largest hurricanes by maximum `Wind`¶

7) Make a `hexbin` of the location of datapoints in Latitude and Longitude¶

14) Calculate the climatology of datapoint counts as a function of `dayofyear`¶