Download the ipynb
file pandas-I.ipynb.
Series
What is the difference between Pandas and NumPy?
What is the difference between pandas Series and DataFrame?
Create a Series from a Python list. Print the Series and interpret the output.
Create a Series containing the numbers (e.g. 6001) of courses you are taking this semester, with index indicating the instructors. Print only the data type of your Series.
Create a Series containing 25 random numbers from the standard normal distribution. C
DataFrame
Create a DataFrame from a list of lists so that at least one column is integer, one column is categorical, and one column is float.
Create a DataFrame from a dictionary of lists so that the two columns contain, respectively, normally and uniformly distributed random numbers.
Create a DataFrame from the Series creating in Exercise 5.
Save the DataFrame from Exercise 9 as a colon-delimited csv
and without the header.
BMI Study
For our BMI study, we will generate fabricated data using numpy.random
. As we discussed before, our features are weight
, height
, and age
.
A good model for weight
is uniform distribution supported on [105, 230]
lbs. Similarly, age
and height
can be assumed to be uniformly distributed over [15, 85]
and [60, 75]
, respectively. Recall that taking measurements of 25 respondents everyday for a year makes the shape of our data tensor (365, 25, 3)
.
Write code to generate data just for a day for our BMI study. Create a DataFrame called bmi_df
.
Rename the index of bmi_df
to Student
.
Rearrange the columns of bmi_df
in this order: height
, age
, weight
Access the age
column first.
Access both age
and height
columns together.
Change the indices to Student 1
, Student 2
, …, Student 25
.
Create a new column called Date
filled with the value Jan 1
.
Return the features of Student 20
. Describe the output type.
Return the features of students 15 through 20.
Round the age
column up to the nearest integer.
Select the last five row and the first two columns.
Select the students with their height between 68 and 72.
Compute the standard deviation of each column of bmi_df
. Describe the output type.