Pandas.txt

Creating a DataFrame

You can create a DataFrame from a NumPy array, a Python dictionary, or a list of lists:


import pandas as pd
import numpy as np

# From a NumPy array
data = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(data, columns=['a', 'b', 'c'])
print(df)

# Output:
#    a  b  c
# 0  1  2  3
# 1  4  5  6

# From a dictionary
data = {'a': [1, 4], 'b': [2, 5], 'c': [3, 6]}
df = pd.DataFrame(data)
print(df)

# Output:
#    a  b  c
# 0  1  2  3
# 1  4  5  6

# From a list of lists
data = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(data, columns=['a', 'b', 'c'])
print(df)

# Output:
#    a  b  c
# 0  1  2  3
# 1  4  5  6
Accessing Data

You can access the values in a DataFrame using the [] operator or the dot (.) operator:


import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})

# Access a column
print(df['a'])

# Output:
# 0    1
# 1    2
# 2    3
# Name: a, dtype: int64

# Access multiple columns
print(df[['a', 'b']])

# Output:
#    a  b
# 0  1  4
# 1  2  5
# 2  3  6

# Access a row using the `loc` attribute
print(df.loc[0])

# Output:
# a    1
# b    4
# c    7
# Name: 0, dtype: int64

# Access a value using the `at` attribute
print(df.at[0, 'a'])

# Output: 1

Filtering Data

You can filter a DataFrame using a boolean index:

Copy code
import pandas as pd

df = pd.DataFrame({'Animal': ['Dog', 'Cat', 'Dog', 'Cat'], 'Age': [3, 5, 2, 8]})

# Filter rows where age is greater than 3
df[df['Age'] > 3]

# Output:
#   Animal  Age
# 1    Cat    5
# 3    Cat    8
Handling Missing Data

You can use the isnull method to identify missing values in a DataFrame, and the fillna method to fill missing values:

Copy code
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, None]})

# Identify missing values
df.isnull()

# Output:
#        A      B
# 0  False  False
# 1  False  False
# 2  False  False
# 3  False   True

# Fill missing values with 0
df.fillna(0)

# Output:
#    A    B
# 0  1  5.0
# 1  2  6.0
# 2  3  7.0
# 3  4  0.0


Modifying Data

You can modify the values in a DataFrame by assigning new values to a subset of the DataFrame:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})

# Modify multiple columns
df[['a', 'b']] = [[100, 200], [300, 400], [500, 600]]
print(df)

# Output:
#      a    b  c
# 0  100  200  7
# 1  300  400  8
# 2  500  600  9

# Modify a cell using the `at` attribute
df.at[0, 'a'] = 1000
print(df)

# Output:
#      a    b  c
# 0  1000  200  7
# 1   300  400  8
# 2   500  600  9

# Modify a cell using the `loc` attribute and a boolean index
df.loc[df['a'] < 500, 'b'] = 2000
print(df)

# Output:
#      a     b  c
# 0  1000  2000  7
# 1   300  2000  8
# 2   500   600  9
Sorting Data

You can sort a DataFrame by one or more columns using the sort_values method:


import pandas as pd

df = pd.DataFrame({'a': [1, 3, 2], 'b': [3, 2, 1], 'c': [2, 1, 3]})

# Sort by a single column
df.sort_values(by='a')

# Output:
#    a  b  c
# 0  1  3  2
# 2  2  1  3
# 1  3  2  1

# Sort by multiple columns
df.sort_values(by=['a', 'b'])

# Output:
#    a  b  c
# 0  1  3  2
# 1  3  2  1
# 2  2  1  3
Grouping Data

You can group a DataFrame by one or more columns and apply a function to each group using the groupby method:


import pandas as pd

df = pd.DataFrame({'Animal': ['Dog', 'Cat', 'Dog', 'Cat'], 'Age': [3, 5, 2, 8]})

# Group by a single column and get the mean age for each group
df.groupby('Animal').mean()

# Output:
#        Age
# Animal    
# Cat     6.5
# Dog     2.5

# Group by multiple columns and get the mean age for each group
df.groupby(['Animal', 'Age']).mean()

# Output:
#            Age
# Animal Age    
# Cat    5      5
#        8      8
# Dog    2      2
#        3      3

Merging and Joining DataFrames

You can merge or join two DataFrames using the merge or join methods

import pandas as pd

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'C': ['C0', 'C1', 'C2', 'C3'],
                     'D': ['D0', 'D1', 'D2', 'D3']})

# Merge two DataFrames using a common column (`key`)
df3 = pd.merge(df1, df2, on='key')
print(df3)

# Output:
#   key   A   B   C   D
# 0  K0  A0  B0  C0  D0
# 1  K1  A1  B1  C1  D1
# 2  K2  A2  B2  C2  D2
# 3  K3  A3  B3  C3  D3

# Join two DataFrames using the `join` method
df4 = df1.join(df2, lsuffix='_left', rsuffix='_right')
print(df4)

# Output:
#   key_left   A   B key_right   C   D
# 0       K0  A0  B0        K0  C0  D0
# 1       K1  A1  B1        K1  C1  D1
# 2       K2  A2  B2        K2  C2  D2
# 3       K3  A3  B3        K3  C3  D3