[pandas cheet sheet] (https://www.dataquest.io/blog...
import pandas as pd
Task 1: read in and partition
df_original = pd.read_csv("csv_file_name")
count_rec = df_original['attri_1'].count()
# must specify one column; otherwise cannot count
df_train = df_original[: int(0.8*count_rec)]
df_test = df_original[int(0.8*count_rec) :]
# use int(...) to convert
Task 2: divide or count by specific column value
Method1: boolean indexing
# Find rows with attri_1 = certain_value
# df.loc('index') selects by index
df_c1 = df_train.loc[df_train['attri_1'] == certain_value]
# '==' can also be substituted by '!='
# Or if we want to specify a range of values
df_c1 = df_train.loc[df_train['attri_1'].isin(some_values)]
# Not in certain values, add '~' at begining
df_c1 = df_train.loc[~ df_train['attri_1'].isin(some_values)]
# Combined conditions, note that '( )' is needed
df_c1 = df_train.loc[(df_train['attri_1']<a) & (df_train['attri_1']>b)]
Method2: Label indexing
df_c1 = df_train.set_index('attri_1', append = True, drop = False).xs(value, level = 1)
Method3: df.query()
df_c1 = df_train.query(' attri_1 == value')
Reference
Stackcverflow. Available at https://stackoverflow.com/que...
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。