Lab 2: Introduction to DataFrames - Loading and Exploring Data
Objectives
Students will learn to:
- Load CSV data into a pandas DataFrame
- Explore basic DataFrame properties and methods
- Display and examine data structure
Lab Steps
Step 1: Load the Data
12345678
importpandasaspd# Load the healthcare datadf=pd.read_csv('healthcare-per-capita-2022.csv')# Display the DataFrameprint("Healthcare Per Capita Data:")print(df)
1 2 3 4 5 6 7 8 9101112
Country_Name Country_Code Health_Exp_PerCapita_2022
0 Africa Eastern and Southern AFE 228
1 Afghanistan AFG 383
2 Africa Western and Central AFW 201
3 Angola AGO 217
4 Albania ALB 1186
.. ... ... ...
233 Samoa WSM 396
234 Yemen, Rep. YEM 109
235 South Africa ZAF 1341
236 Zambia ZMB 208
237 Zimbabwe ZWE 96
Step 2: Explore DataFrame Shape and Info
12345678
# Check the shape (rows, columns)print(f"Dataset shape: {df.shape}")print(f"Number of rows: {df.shape[0]}")print(f"Number of columns: {df.shape[1]}")# Get basic information about the DataFrameprint("\nDataFrame Info:")df.info()
Step 3: Examine Column Names and Data Types
1234567
# Display column namesprint("Column names:")print(df.columns.tolist())# Check data typesprint("\nData types:")print(df.dtypes)
1 2 3 4 5 6 7 8 9101112131415
Dataset shape: (238, 3)
Number of rows: 238
Number of columns: 3
DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country_Name 238 non-null object
1 Country_Code 238 non-null object
2 Health_Exp_PerCapita_2022 238 non-null int64
dtypes: int64(1), object(2)
memory usage: 5.7+ KB
Step 4: Preview the Data
1234567
# Look at first 5 rowsprint("First 5 rows:")print(df.head())# Look at last 5 rowsprint("\nLast 5 rows:")print(df.tail())
Results:
1 2 3 4 5 6 7 8 9101112131415
First 5 rows:
Country_Name Country_Code Health_Exp_PerCapita_2022
0 Africa Eastern and Southern AFE 228
1 Afghanistan AFG 383
2 Africa Western and Central AFW 201
3 Angola AGO 217
4 Albania ALB 1186
Last 5 rows:
Country_Name Country_Code Health_Exp_PerCapita_2022
233 Samoa WSM 396
234 Yemen, Rep. YEM 109
235 South Africa ZAF 1341
236 Zambia ZMB 208
237 Zimbabwe ZWE 96
print("\nRandom sample of 5 rows:")print(df.sample(5))
Step 5: Basic Data Exploration
1 2 3 4 5 6 7 8 9101112
# Get basic statistics for numerical columnsprint("Basic statistics:")print(df.describe())# Check for missing valuesprint("\nMissing values per column:")print(df.isnull().sum())# Count unique values in each columnprint("\nUnique values per column:")forcolindf.columns:print(f"{col}: {df[col].nunique()} unique values")
Questions for Students
How many countries are included in this dataset?
What is the data type of each column?
Are there any missing values in the dataset?
What country has the highest healthcare expenditure per capita?
What is the average healthcare expenditure per capita across all countries?
Expected Output Discussion
Students should observe:
The dataset has 238 rows (countries) and 3 columns
Country_Name and Country_Code are text (object) data types
Health_Exp_PerCapita_2022 is numerical (integer)
Whether there are any missing values to handle
Extension Activities
For advanced students:
Sort the data by healthcare expenditure
Find countries with expenditure above/below certain thresholds
Create simple filtering operations
This lab builds naturally from counting rows to actually working with the data structure,
introducing essential pandas concepts while keeping the complexity manageable for beginners.