Sample Data

Data that will be used as an example in the following sections

data = {'Food': ['Taco', 'Sushi', 'Hamburger', 'Hotdog', 'Stew', 'Ribs', 'Spaghetti'],
        'Calories': [27, 24, 22, 32, 99, 123, 231]
        }
df = pd.DataFrame(data)
index Food Calories
0 Taco 27
1 Sushi 24
2 Hamburger 22
3 Hotdog 32
4 Stew 99
5 Ribs 123
6 Spaghetti 231

Accessing

Ultimately it seems like the only method you need here is loc

By Row - loc

Access a group of rows and columns by label(s) or a boolean array. Documentation

Expects an index or an index and column.

print(df.loc['Food'])  # Key Error
print(df.loc['Taco'])  # Key Error
print(df.loc[0])  # Returns pandas series
    Food        Taco
    Calories      27
print(df.loc[[0]])  # Returns dataframe
       Food  Calories
    0  Taco        27

Can access an individual value like so

print(df.loc[0, "Food"]  # Returns str
     Taco

You can also set a value in the same way

df.loc[0, "Food"] = "Roast"

Useful if you have reindexed the dataframe to a non integer index

df = df.set_index('Food')
df.loc['Taco']  # Return series
    Calories    27

By Row - iloc

Purely integer-location based indexing for selection by position. Performs mostly the same as loc if you haven't changed the index. Deprecated since 2.2.0 Documentation

print(df.iloc[0, "Food"])  # ValueError, can't have a string, only ints
print(df.iloc[0, 0])  # Returns str
    Taco
print(df.iloc[0])  # Returns series
    Food        Taco
    Calories      27
print(df.iloc[[0]])  # Returns dataframe

By Row - at

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series Documentation

df.at[0]  # TypeError missing argument 'col'
df.at[1, "Food"]  # Returns str
    Sushi
df.at[1, "Food"] = "Roast Beef"  # Set value

Masking

A way to filter down a dataframe to only rows of interest

Basic Masking

df.loc[df['Calories'] > 100]
        Food  Calories
5       Ribs       123
6  Spaghetti       231

Can apply multiple filters, just make sure your parenthesis and boolean operators are in the right place

df.loc[df['Food'].str.contains('i') & (df['Calories'] > 200)]
        Food  Calories
6  Spaghetti       231

df.loc[df['Food'].str.contains('i') | (df['Food'].str.contains('o'))]
        Food  Calories
0       Taco        27
1      Sushi        24
3     Hotdog        32
5       Ribs       123
6  Spaghetti       231

Compare to another list/dataframe

Lets say we have another dataframe with street foods in it.

street_food_df =

Name Type
Taco Mexican
Hamburger American
Hotdog American

We can filter down our food dataframe like so

final_df = df.loc[df['Food'].isin(street_food_df['Name'])]
print(final_df)
        Food  Calories
0       Taco        27
2  Hamburger        22
3     Hotdog        32

Grouping

Aggregating multiple columns

Here's how to specify what to do with each column in a group by operation

consolidated_df = df.groupby('cpe')[['ip', 'latest_version', 'eol_date', 'host_count']].agg({
    'ip': lambda x: ','.join(x),
    'latest_version': 'max',
    'eol_date': 'max',
    'host_count': 'count'
    }).reset_index()