[TOC]
Sample Data
==================
Data that will be used as an example in the following sections
```python
data = {'Food': ['Taco', 'Sushi', 'Hamburger', 'Hotdog', 'Stew', 'Ribs', 'Spaghetti'],
'Calories': [27, 24, 22, 32, 99, 123, 231]
}
df = pd.DataFrame(data)
```
|index | Food | Calories |
| ---- | --- | -------- |
| 0 | Taco | 27 |
| 1 | Sushi | 24 |
| 2 | Hamburger | 22 |
| 3 | Hotdog | 32 |
| 4 | Stew | 99 |
| 5 | Ribs | 123 |
| 6 | Spaghetti | 231 |
Accessing
==================
Ultimately it seems like the only method you need here is loc
By Row - loc
------------------
Access a group of rows and columns by label(s) or a boolean array. [Documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)
Expects an index or an index and column.
```python
print(df.loc['Food']) # Key Error
print(df.loc['Taco']) # Key Error
print(df.loc[0]) # Returns pandas series
Food Taco
Calories 27
print(df.loc[[0]]) # Returns dataframe
Food Calories
0 Taco 27
```
Can access an individual value like so
```python
print(df.loc[0, "Food"] # Returns str
Taco
```
You can also set a value in the same way
```python
df.loc[0, "Food"] = "Roast"
```
Useful if you have reindexed the dataframe to a non integer index
```python
df = df.set_index('Food')
df.loc['Taco'] # Return series
Calories 27
```
By Row - iloc
------------------
Purely integer-location based indexing for selection by position. Performs mostly the same as loc if you haven't changed the index. Deprecated since 2.2.0 [Documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html#)
```python
print(df.iloc[0, "Food"]) # ValueError, can't have a string, only ints
print(df.iloc[0, 0]) # Returns str
Taco
print(df.iloc[0]) # Returns series
Food Taco
Calories 27
print(df.iloc[[0]]) # Returns dataframe
```
By Row - at
------------------
Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series [Documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.at.html)
```python
df.at[0] # TypeError missing argument 'col'
df.at[1, "Food"] # Returns str
Sushi
df.at[1, "Food"] = "Roast Beef" # Set value
```
Masking
==================
A way to filter down a dataframe to only rows of interest
Basic Masking
------------------
```python
df.loc[df['Calories'] > 100]
Food Calories
5 Ribs 123
6 Spaghetti 231
```
Can apply multiple filters, just make sure your parenthesis and boolean operators are in the right place
```python
df.loc[df['Food'].str.contains('i') & (df['Calories'] > 200)]
Food Calories
6 Spaghetti 231
df.loc[df['Food'].str.contains('i') | (df['Food'].str.contains('o'))]
Food Calories
0 Taco 27
1 Sushi 24
3 Hotdog 32
5 Ribs 123
6 Spaghetti 231
```
Compare to another list/dataframe
------------------
Lets say we have another dataframe with street foods in it.
street_food_df =
| Name | Type |
| --- | --- |
| Taco | Mexican |
| Hamburger | American |
| Hotdog | American |
We can filter down our food dataframe like so
```python
final_df = df.loc[df['Food'].isin(street_food_df['Name'])]
print(final_df)
Food Calories
0 Taco 27
2 Hamburger 22
3 Hotdog 32
```
Grouping
==================
Aggregating multiple columns
------------------
Here's how to specify what to do with each column in a group by operation
```
consolidated_df = df.groupby('cpe')[['ip', 'latest_version', 'eol_date', 'host_count']].agg({
'ip': lambda x: ','.join(x),
'latest_version': 'max',
'eol_date': 'max',
'host_count': 'count'
}).reset_index()
```