Lab 5

Download the ipynb file pandas-I.ipynb.

Series

Note Exercise 1

What is the difference between Pandas and NumPy?

Note Exercise 2

What is the difference between pandas Series and DataFrame?

Note Exercise 3

Create a Series from a Python list. Print the Series and interpret the output.

Note Exercise 4

Create a Series containing the numbers (e.g. 6001) of courses you are taking this semester, with index indicating the instructors. Print only the data type of your Series.

Note Exercise 5

Create a Series containing 25 random numbers from the standard normal distribution. C

DataFrame

Note Exercise 6

Create a DataFrame from a list of lists so that at least one column is integer, one column is categorical, and one column is float.

Note Exercise 7

Create a DataFrame from a dictionary of lists so that the two columns contain, respectively, normally and uniformly distributed random numbers.

Note Exercise 8

Create a DataFrame from the Series creating in Exercise 5.

Note Exercise 9

Import the file cities.csv into a DataFrame.

Note Exercise 10

Save the DataFrame from Exercise 9 as a colon-delimited csv and without the header.

BMI Study

For our BMI study, we will generate fabricated data using numpy.random. As we discussed before, our features are weight, height, and age.

A good model for weight is uniform distribution supported on [105, 230] lbs. Similarly, age and height can be assumed to be uniformly distributed over [15, 85] and [60, 75], respectively. Recall that taking measurements of 25 respondents everyday for a year makes the shape of our data tensor (365, 25, 3).

Note Exercise 11

Write code to generate data just for a day for our BMI study. Create a DataFrame called bmi_df.

Note Exercise 12

Rename the index of bmi_df to Student.

Note Exercise 13

Rearrange the columns of bmi_df in this order: height, age, weight

Note Exercise 14

Access the agecolumn first.

Note Exercise 15

Access both age and height columns together.

Note Exercise 16

Change the indices to Student 1, Student 2, …, Student 25.

Note Exercise 17

Create a new column called Date filled with the value Jan 1.

Note Exercise 18

Return the features of Student 20. Describe the output type.

Note Exercise 19

Return the features of students 15 through 20.

Note Exercise 20

Round the age column up to the nearest integer.

Note Exercise 21

Select the last five row and the first two columns.

Note Exercise 22

Drop the Date column.

Note Exercise 23

Select the students with their height between 68 and 72.

Note Exercise 24

Sort bmi_df by age.

Note Exercise 25

Compute the standard deviation of each column of bmi_df. Describe the output type.