-
Notifications
You must be signed in to change notification settings - Fork 1
/
Pandas.txt
251 lines (181 loc) · 4.88 KB
/
Pandas.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
Creating a DataFrame
You can create a DataFrame from a NumPy array, a Python dictionary, or a list of lists:
import pandas as pd
import numpy as np
# From a NumPy array
data = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(data, columns=['a', 'b', 'c'])
print(df)
# Output:
# a b c
# 0 1 2 3
# 1 4 5 6
# From a dictionary
data = {'a': [1, 4], 'b': [2, 5], 'c': [3, 6]}
df = pd.DataFrame(data)
print(df)
# Output:
# a b c
# 0 1 2 3
# 1 4 5 6
# From a list of lists
data = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(data, columns=['a', 'b', 'c'])
print(df)
# Output:
# a b c
# 0 1 2 3
# 1 4 5 6
Accessing Data
You can access the values in a DataFrame using the [] operator or the dot (.) operator:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
# Access a column
print(df['a'])
# Output:
# 0 1
# 1 2
# 2 3
# Name: a, dtype: int64
# Access multiple columns
print(df[['a', 'b']])
# Output:
# a b
# 0 1 4
# 1 2 5
# 2 3 6
# Access a row using the `loc` attribute
print(df.loc[0])
# Output:
# a 1
# b 4
# c 7
# Name: 0, dtype: int64
# Access a value using the `at` attribute
print(df.at[0, 'a'])
# Output: 1
Filtering Data
You can filter a DataFrame using a boolean index:
Copy code
import pandas as pd
df = pd.DataFrame({'Animal': ['Dog', 'Cat', 'Dog', 'Cat'], 'Age': [3, 5, 2, 8]})
# Filter rows where age is greater than 3
df[df['Age'] > 3]
# Output:
# Animal Age
# 1 Cat 5
# 3 Cat 8
Handling Missing Data
You can use the isnull method to identify missing values in a DataFrame, and the fillna method to fill missing values:
Copy code
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, None]})
# Identify missing values
df.isnull()
# Output:
# A B
# 0 False False
# 1 False False
# 2 False False
# 3 False True
# Fill missing values with 0
df.fillna(0)
# Output:
# A B
# 0 1 5.0
# 1 2 6.0
# 2 3 7.0
# 3 4 0.0
Modifying Data
You can modify the values in a DataFrame by assigning new values to a subset of the DataFrame:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
# Modify multiple columns
df[['a', 'b']] = [[100, 200], [300, 400], [500, 600]]
print(df)
# Output:
# a b c
# 0 100 200 7
# 1 300 400 8
# 2 500 600 9
# Modify a cell using the `at` attribute
df.at[0, 'a'] = 1000
print(df)
# Output:
# a b c
# 0 1000 200 7
# 1 300 400 8
# 2 500 600 9
# Modify a cell using the `loc` attribute and a boolean index
df.loc[df['a'] < 500, 'b'] = 2000
print(df)
# Output:
# a b c
# 0 1000 2000 7
# 1 300 2000 8
# 2 500 600 9
Sorting Data
You can sort a DataFrame by one or more columns using the sort_values method:
import pandas as pd
df = pd.DataFrame({'a': [1, 3, 2], 'b': [3, 2, 1], 'c': [2, 1, 3]})
# Sort by a single column
df.sort_values(by='a')
# Output:
# a b c
# 0 1 3 2
# 2 2 1 3
# 1 3 2 1
# Sort by multiple columns
df.sort_values(by=['a', 'b'])
# Output:
# a b c
# 0 1 3 2
# 1 3 2 1
# 2 2 1 3
Grouping Data
You can group a DataFrame by one or more columns and apply a function to each group using the groupby method:
import pandas as pd
df = pd.DataFrame({'Animal': ['Dog', 'Cat', 'Dog', 'Cat'], 'Age': [3, 5, 2, 8]})
# Group by a single column and get the mean age for each group
df.groupby('Animal').mean()
# Output:
# Age
# Animal
# Cat 6.5
# Dog 2.5
# Group by multiple columns and get the mean age for each group
df.groupby(['Animal', 'Age']).mean()
# Output:
# Age
# Animal Age
# Cat 5 5
# 8 8
# Dog 2 2
# 3 3
Merging and Joining DataFrames
You can merge or join two DataFrames using the merge or join methods
import pandas as pd
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# Merge two DataFrames using a common column (`key`)
df3 = pd.merge(df1, df2, on='key')
print(df3)
# Output:
# key A B C D
# 0 K0 A0 B0 C0 D0
# 1 K1 A1 B1 C1 D1
# 2 K2 A2 B2 C2 D2
# 3 K3 A3 B3 C3 D3
# Join two DataFrames using the `join` method
df4 = df1.join(df2, lsuffix='_left', rsuffix='_right')
print(df4)
# Output:
# key_left A B key_right C D
# 0 K0 A0 B0 K0 C0 D0
# 1 K1 A1 B1 K1 C1 D1
# 2 K2 A2 B2 K2 C2 D2
# 3 K3 A3 B3 K3 C3 D3