Data that will be used as an example in the following sections
data = {'Food': ['Taco', 'Sushi', 'Hamburger', 'Hotdog', 'Stew', 'Ribs', 'Spaghetti'],
'Calories': [27, 24, 22, 32, 99, 123, 231]
}
df = pd.DataFrame(data)
index | Food | Calories |
---|---|---|
0 | Taco | 27 |
1 | Sushi | 24 |
2 | Hamburger | 22 |
3 | Hotdog | 32 |
4 | Stew | 99 |
5 | Ribs | 123 |
6 | Spaghetti | 231 |
Ultimately it seems like the only method you need here is loc
Access a group of rows and columns by label(s) or a boolean array. Documentation
Expects an index or an index and column.
print(df.loc['Food']) # Key Error
print(df.loc['Taco']) # Key Error
print(df.loc[0]) # Returns pandas series
Food Taco
Calories 27
print(df.loc[[0]]) # Returns dataframe
Food Calories
0 Taco 27
Can access an individual value like so
print(df.loc[0, "Food"] # Returns str
Taco
You can also set a value in the same way
df.loc[0, "Food"] = "Roast"
Useful if you have reindexed the dataframe to a non integer index
df = df.set_index('Food')
df.loc['Taco'] # Return series
Calories 27
Purely integer-location based indexing for selection by position. Performs mostly the same as loc if you haven't changed the index. Deprecated since 2.2.0 Documentation
print(df.iloc[0, "Food"]) # ValueError, can't have a string, only ints
print(df.iloc[0, 0]) # Returns str
Taco
print(df.iloc[0]) # Returns series
Food Taco
Calories 27
print(df.iloc[[0]]) # Returns dataframe
Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series Documentation
df.at[0] # TypeError missing argument 'col'
df.at[1, "Food"] # Returns str
Sushi
df.at[1, "Food"] = "Roast Beef" # Set value
A way to filter down a dataframe to only rows of interest
df.loc[df['Calories'] > 100]
Food Calories
5 Ribs 123
6 Spaghetti 231
Can apply multiple filters, just make sure your parenthesis and boolean operators are in the right place
df.loc[df['Food'].str.contains('i') & (df['Calories'] > 200)]
Food Calories
6 Spaghetti 231
df.loc[df['Food'].str.contains('i') | (df['Food'].str.contains('o'))]
Food Calories
0 Taco 27
1 Sushi 24
3 Hotdog 32
5 Ribs 123
6 Spaghetti 231
Lets say we have another dataframe with street foods in it.
street_food_df =
Name | Type |
---|---|
Taco | Mexican |
Hamburger | American |
Hotdog | American |
We can filter down our food dataframe like so
final_df = df.loc[df['Food'].isin(street_food_df['Name'])]
print(final_df)
Food Calories
0 Taco 27
2 Hamburger 22
3 Hotdog 32
Here's how to specify what to do with each column in a group by operation
consolidated_df = df.groupby('cpe')[['ip', 'latest_version', 'eol_date', 'host_count']].agg({
'ip': lambda x: ','.join(x),
'latest_version': 'max',
'eol_date': 'max',
'host_count': 'count'
}).reset_index()