您現在的位置是：首頁 > 人文

Pandas資料結構：DataFrame

由 CDA資料分析師發表于人文2022-07-19

簡介rand（3），“two”： np

dataframe選擇其中几几列

剛剛接觸pandas的朋友，想了解資料結構，就一定要認識DataFrame，接下來給大家詳細介紹！

初識DataFrame

import numpy as np

import pandas as pd

data = {“name”：［“Jack”， “Tom”， “LiSa”］，

“age”：［20， 21， 18］，

“city”：［“BeiJing”， “TianJin”， “ShenZhen”］}

print（data）

print（“”）

frame = pd。DataFrame（data） # 建立DataFrame

print（frame）

print（“”）

print（frame。index） # 檢視行索引

print（“”）

print（frame。columns） # 檢視列索引

print（“”）

print（frame。values） # 檢視值

{‘name’：［‘Jack’， ‘Tom’， ‘LiSa’］， ‘age’：［20， 21， 18］， ‘city’：［‘BeiJing’， ‘TianJin’， ‘ShenZhen’］}

age city name

0 20 BeiJing Jack

1 21 TianJin Tom

2 18 ShenZhen LiSa

RangeIndex（start=0， stop=3， step=1）

Index（［‘age’， ‘city’， ‘name’］， dtype=‘object’）

［［20 ‘BeiJing’ ‘Jack’］

［21 ‘TianJin’ ‘Tom’］

［18 ‘ShenZhen’ ‘LiSa’］］

建立DataFrame

方法一：由字典建立字典的key是列索引值可以是1。列表 2。ndarray 3。Series

# 值是列表

data1 = {“a”：［1，2，3］，

“b”：［4，5，6］，

“c”：［7，8，9］

}

print（data1）

print（“”）

print（pd。DataFrame（data1）） # 建立DataFrame

print（“”）

# 注意： index是可以給行索引重新命名 columns是給列索引重新指定順序如果沒有該列那麼產生NaN值

print（pd。DataFrame（data1， index=list（“mnp”）， columns=list（“bcad”）））

print（“”）

{‘a’：［1， 2， 3］， ‘b’：［4， 5， 6］， ‘c’：［7， 8， 9］}

a b c

0 1 4 7

1 2 5 8

2 3 6 9

b c a d

m 4 7 1 NaN

n 5 8 2 NaN

p 6 9 3 NaN

# 值是ndarray 注意：用ndarray建立DataFrame值的個數必須相同否則報錯

data2 = {“one”： np。random。rand（3），

“two”： np。random。rand（3）

}

print（data2）

print（“”）

print（pd。DataFrame（data2））

{‘one’： array（［ 0。60720023， 0。30838024， 0。30678266］）， ‘two’： array（［ 0。21368784， 0。03797809， 0。41698718］）}

one two

0 0。607200 0。213688

1 0。308380 0。037978

2 0。306783 0。416987

# 值是Series——帶有標籤的一維陣列注意：用Series建立DataFrame值的個數可以不同少的值用Nan填充

data3 = {“one”： pd。Series（np。random。rand（4）），

“two”： pd。Series（np。random。rand（5））

}

print（data3）

print（“”）

df3 = pd。DataFrame（data3）

print（df3）

print（“”）

{‘one’： 0 0。217639

1 0。921641

2 0。898810

3 0。933510

dtype： float64， ‘two’： 0 0。132789

1 0。099904

2 0。723495

3 0。719173

4 0。477456

dtype： float64}

one two

0 0。217639 0。132789

1 0。921641 0。099904

2 0。898810 0。723495

3 0。933510 0。719173

4 NaN 0。477456

方法二：透過二維陣列直接建立

arr = np。random。rand（12）。reshape（3，4）

print（arr）

print（“”）

df1 = pd。DataFrame（arr）

print（df1）

print（“”）

df2 = pd。DataFrame（arr， index=list（“abc”）， columns=［“one”， “two”， “three”， “four”］） # 透過index和columns指定行索引和列索引

print（df2）

［［ 0。85898536 0。40300549 0。80043098 0。87045042］

［ 0。56918302 0。34589982 0。63733905 0。93012927］

［ 0。47535281 0。93652147 0。57446896 0。59470213］］

0 1 2 3

0 0。858985 0。403005 0。800431 0。870450

1 0。569183 0。345900 0。637339 0。930129

2 0。475353 0。936521 0。574469 0。594702

one two three four

a 0。858985 0。403005 0。800431 0。870450

b 0。569183 0。345900 0。637339 0。930129

c 0。475353 0。936521 0。574469 0。594702

方法三：由字典組成的列表建立 DataFrame

data = ［{“one”： 1， “two”： 2}， {“one”： 5， “two”： 10， “three”： 15}］ # 每一個字典在DataFrame裡就是一行資料

print（data）

print（“”）

df1 = pd。DataFrame（data）

print（df1）

print（“”）

df2 = pd。DataFrame（data， index=list（“ab”）， columns=［“one”， “two”， “three”， “four”］）

print（df2）

［{‘one’： 1， ‘two’： 2}， {‘one’： 5， ‘two’： 10， ‘three’： 15}］

one three two

0 1 NaN 2

1 5 15。0 10

one two three four

a 1 2 NaN NaN

b 5 10 15。0 NaN

建立方法四：由字典組成的字典

# columns為字典的key index為子字典的key

data = {“Jack”： {“age”：1， “country”：“China”， “sex”：“man”}，

“LiSa”： {“age”：18， “country”：“America”， “sex”：“women”}，

“Tom”： {“age”：20， “country”：“English”}}

df1 = pd。DataFrame（data）

print（df1）

print（“”）

# 注意：這裡的index並不能給子字典的key（行索引）重新命名但可以給子字典的key重新排序若出現原陣列沒有的index 那麼就填充NaN值

df2 = pd。DataFrame（data， index=［“sex”， “age”， “country”］）

print（df2）

print（“”）

df3 = pd。DataFrame（data， index=list（“abc”））

print（df3）

print（“”）

# columns 給列索引重新排序若出現原陣列沒有的列索引填充NaN值

df4 = pd。DataFrame（data， columns=［“Tom”， “LiSa”， “Jack”， “TangMu”］）

print（df4）

Jack LiSa Tom

age 1 18 20

country China America English

sex man women NaN

Jack LiSa Tom

sex man women NaN

age 1 18 20

country China America English

Jack LiSa Tom

a NaN NaN NaN

b NaN NaN NaN

c NaN NaN NaN

Tom LiSa Jack TangMu

age 20 18 1 NaN

country English America China NaN

sex NaN women man NaN

DataFrame索引

選擇行與列

選擇列直接用df［“列標籤”］

df = pd。DataFrame（np。random。rand（12）。reshape（3，4）*100，

index = ［“one”， “two”， “three”］， columns = ［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

print（df［“a”］， “ ”， type（df［“a”］）） # 取一列

print（“”）

print（df［［“a”， “c”］］， “ ”， type（df［［“a”， “c”］］）） # 取多列

a b c d

one 92。905464 11。630358 19。518051 77。417377

two 91。107357 0。641600 4。913662 65。593182

three 3。152801 42。324671 14。030304 22。138608

one 92。905464

two 91。107357

three 3。152801

Name： a， dtype： float64

a c

one 92。905464 19。518051

two 91。107357 4。913662

three 3。152801 14。030304

選擇行不能透過標籤索引 df［“one”］來選擇行要用 df。loc［“one”］， loc就是針對行來操作的

print（df）

print（“”）

print（df。loc［“one”］， “ ”， type（df。loc［“one”］）） # 取一行

print（“”）

print（df。loc［［“one”， “three”］］， “ ”， type（df。loc［［“one”， “three”］］）） # 取不連續的多行

print（“”）

a b c d

one 92。905464 11。630358 19。518051 77。417377

two 91。107357 0。641600 4。913662 65。593182

three 3。152801 42。324671 14。030304 22。138608

a 92。905464

b 11。630358

c 19。518051

d 77。417377

Name： one， dtype： float64

a b c d

one 92。905464 11。630358 19。518051 77。417377

three 3。152801 42。324671 14。030304 22。138608

loc支援切片索引——針對行幷包含末端 df。loc［“one”： “three”］

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）*100， index=［“one”， “two”， “three”， “four”］，

columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

print（df。loc［“one”： “three”］）

print（“”）

print（df［： 3］） # 切片表示取連續的多行（儘量不用免得混淆）

a b c d

one 65。471894 19。137274 31。680635 41。659808

two 31。570587 45。575849 37。739644 5。140845

three 54。930986 68。232707 17。215544 70。765401

four 45。591798 63。274956 74。056045 2。466652

a b c d

one 65。471894 19。137274 31。680635 41。659808

two 31。570587 45。575849 37。739644 5。140845

three 54。930986 68。232707 17。215544 70。765401

a b c d

one 65。471894 19。137274 31。680635 41。659808

two 31。570587 45。575849 37。739644 5。140845

three 54。930986 68。232707 17。215544 70。765401

iloc也是對行來操作的只不過把行標籤改成了行索引並且是不包含末端的

print（df）

print（“”）

print（df。iloc［0］） # 取一行

print（“”）

print（df。iloc［［0，2］］） # 取不連續的多行

print（“”）

print（df。iloc［0：3］） # 不包含末端

a b c d

one 65。471894 19。137274 31。680635 41。659808

two 31。570587 45。575849 37。739644 5。140845

three 54。930986 68。232707 17。215544 70。765401

four 45。591798 63。274956 74。056045 2。466652

a 65。471894

b 19。137274

c 31。680635

d 41。659808

Name： one， dtype： float64

a b c d

one 65。471894 19。137274 31。680635 41。659808

three 54。930986 68。232707 17。215544 70。765401

a b c d

one 65。471894 19。137274 31。680635 41。659808

two 31。570587 45。575849 37。739644 5。140845

three 54。930986 68。232707 17。215544 70。765401

布林型索引

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）*100， index=［“one”， “two”， “three”， “four”］，

columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

d1 = df >50 # d1為布林型索引

print（d1）

print（“”）

print（df［d1］） # df根據d1 只返回True的值 False的值對應為NaN

print（“”）

a b c d

one 91。503673 74。080822 85。274682 80。788609

two 49。670055 42。221393 36。674490 69。272958

three 78。349843 68。090150 22。326223 93。984369

four 79。057146 77。687246 32。304265 0。567816

a b c d

one True True True True

two False False False True

three True True False True

four True True False False

a b c d

one 91。503673 74。080822 85。274682 80。788609

two NaN NaN NaN 69。272958

three 78。349843 68。090150 NaN 93。984369

four 79。057146 77。687246 NaN NaN

選取某一列作為布林型索引返回True所在行的所有列注意：不能選取多列作為布林型索引

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）*100， index=［“one”， “two”， “three”， “four”］，

columns=［“a”， “b”， “c”， “d”］， dtype=np。int64）

print（df）

print（“”）

d2 = df［“b”］ > 50

print（d2）

print（“”）

print（df［d2］）

a b c d

one 27 18 47 61

two 26 35 16 78

three 80 98 94 41

four 85 3 47 90

one False

two False

three True

four False

Name： b， dtype： bool

a b c d

three 80 98 94 41

選取多列作為布林型索引返回True所對應的值 False對應為NaN 沒有的列全部填充為NaN

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）*100， index=［“one”， “two”， “three”， “four”］，

columns=［“a”， “b”， “c”， “d”］， dtype=np。int64）

print（df）

print（“”）

d3 = df［［“a”， “c”］］ > 50

print（d3）

print（“”）

print（df［d3］）

a b c d

one 49 82 32 39

two 78 2 24 84

three 6 84 84 69

four 21 89 16 77

a c

one False False

two True False

three False True

four False False

a b c d

one NaN NaN NaN NaN

two 78。0 NaN NaN NaN

three NaN NaN 84。0 NaN

four NaN NaN NaN NaN

多重索引

print（df）

a b c d

one 49 82 32 39

two 78 2 24 84

three 6 84 84 69

four 21 89 16 77

print（df［“a”］。loc［［“one”， “three”］］） # 取列再取行

print（“”）

print（df［［“a”， “c”］］。iloc［0：3］）

one 49

three 6

Name： a， dtype： int64

a c

one 49 32

two 78 24

three 6 84

print（df。loc［［“one”， “three”］］［［“a”， “c”］］） # 取行再取列

a c

one 49 32

three 6 84

print（df > 50）

print（“”）

print（df［df>50］）

print（“”）

print（df［df>50］［［“a”，“b”］］）

a b c d

one False True False False

two True False False True

three False True True True

four False True False True

a b c d

one NaN 82。0 NaN NaN

two 78。0 NaN NaN 84。0

three NaN 84。0 84。0 69。0

four NaN 89。0 NaN 77。0

a b

one NaN 82。0

two 78。0 NaN

three NaN 84。0

four NaN 89。0

DataFrame基本技巧

import numpy as np

import pandas as pd

arr = np。random。rand（16）。reshape（8， 2）*10

# print（arr）

print（“”）

print（len（arr））

print（“”）

df = pd。DataFrame（arr， index=［chr（i） for i in range（97， 97+len（arr））］， columns=［“one”， “two”］）

print（df）

one two

a 2。129959 1。827002

b 8。631212 0。423903

c 6。262012 3。851107

d 6。890305 9。543065

e 6。883742 3。643955

f 2。740878 6。851490

g 6。242513 7。402237

h 9。226572 3。179664

檢視資料

print（df）

print（“”）

print（df。head（2）） # 檢視頭部資料預設檢視5條

print（“”）

print（df。tail（3）） # 檢視末尾資料預設檢視5條

one two

a 2。129959 1。827002

b 8。631212 0。423903

c 6。262012 3。851107

d 6。890305 9。543065

e 6。883742 3。643955

f 2。740878 6。851490

g 6。242513 7。402237

h 9。226572 3。179664

one two

a 2。129959 1。827002

b 8。631212 0。423903

one two

f 2。740878 6。851490

g 6。242513 7。402237

h 9。226572 3。179664

轉置

print（df）

one two

a 2。129959 1。827002

b 8。631212 0。423903

c 6。262012 3。851107

d 6。890305 9。543065

e 6。883742 3。643955

f 2。740878 6。851490

g 6。242513 7。402237

h 9。226572 3。179664

print（df。T）

a b c d e f g \

one 2。129959 8。631212 6。262012 6。890305 6。883742 2。740878 6。242513

two 1。827002 0。423903 3。851107 9。543065 3。643955 6。851490 7。402237

one 9。226572

two 3。179664

新增與修改

df = pd。DataFrame（np。random。rand（16）。reshape（4，4），index=［“one”， “two”， “three”， “four”］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

df。loc［“five”］ = 100 # 增加一行

print（df）

print（“”）

df［“e”］ = 10 # 增加一列

print（df）

print（“”）

df［“e”］ = 101 # 修改一列

print（df）

print（“”）

df。loc［“five”］ = 111 # 修改一行

print（df）

print（“”）

a b c d

one 0。708481 0。285426 0。355058 0。990070

two 0。199559 0。733047 0。322982 0。791169

three 0。198043 0。801163 0。356082 0。857501

four 0。430182 0。020549 0。896011 0。503088

a b c d

one 0。708481 0。285426 0。355058 0。990070

two 0。199559 0。733047 0。322982 0。791169

three 0。198043 0。801163 0。356082 0。857501

four 0。430182 0。020549 0。896011 0。503088

five 100。000000 100。000000 100。000000 100。000000

a b c d e

one 0。708481 0。285426 0。355058 0。990070 10

two 0。199559 0。733047 0。322982 0。791169 10

three 0。198043 0。801163 0。356082 0。857501 10

four 0。430182 0。020549 0。896011 0。503088 10

five 100。000000 100。000000 100。000000 100。000000 10

a b c d e

one 0。708481 0。285426 0。355058 0。990070 101

two 0。199559 0。733047 0。322982 0。791169 101

three 0。198043 0。801163 0。356082 0。857501 101

four 0。430182 0。020549 0。896011 0。503088 101

five 100。000000 100。000000 100。000000 100。000000 101

a b c d e

one 0。708481 0。285426 0。355058 0。990070 101

two 0。199559 0。733047 0。322982 0。791169 101

three 0。198043 0。801163 0。356082 0。857501 101

four 0。430182 0。020549 0。896011 0。503088 101

five 111。000000 111。000000 111。000000 111。000000 111

刪除 del（刪除行）/drop（刪除列指定axis=1刪除行）

df = pd。DataFrame（np。random。rand（16）。reshape（4，4），index=［“one”， “two”， “three”， “four”］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

del df［“a”］ # 刪除列改變原陣列

print（df）

a b c d

one 0。339979 0。577661 0。108308 0。482164

two 0。374043 0。102067 0。660970 0。786986

three 0。384832 0。076563 0。529472 0。358780

four 0。938592 0。852895 0。466709 0。938307

b c d

one 0。577661 0。108308 0。482164

two 0。102067 0。660970 0。786986

three 0。076563 0。529472 0。358780

four 0。852895 0。466709 0。938307

df = pd。DataFrame（np。random。rand（16）。reshape（4，4），index=［“one”， “two”， “three”， “four”］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

d1 = df。drop（“one”） # 刪除行並返回新的陣列不改變原陣列

print（d1）

print（“”）

print（df）

a b c d

one 0。205438 0。324132 0。401131 0。368300

two 0。471426 0。671785 0。837956 0。097416

three 0。888816 0。451950 0。137032 0。568844

four 0。524813 0。448306 0。875787 0。479477

a b c d

two 0。471426 0。671785 0。837956 0。097416

three 0。888816 0。451950 0。137032 0。568844

four 0。524813 0。448306 0。875787 0。479477

a b c d

one 0。205438 0。324132 0。401131 0。368300

two 0。471426 0。671785 0。837956 0。097416

three 0。888816 0。451950 0。137032 0。568844

four 0。524813 0。448306 0。875787 0。479477

df = pd。DataFrame（np。random。rand（16）。reshape（4，4），index=［“one”， “two”， “three”， “four”］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

d2 = df。drop（“a”， axis=1） # 刪除列返回新的陣列不會改變原陣列

print（d2）

print（“”）

print（df）

a b c d

one 0。939552 0。613218 0。357056 0。534264

two 0。110583 0。602123 0。990186 0。149132

three 0。756016 0。897848 0。176100 0。204789

four 0。655573 0。819009 0。094322 0。656406

b c d

one 0。613218 0。357056 0。534264

two 0。602123 0。990186 0。149132

three 0。897848 0。176100 0。204789

four 0。819009 0。094322 0。656406

a b c d

one 0。939552 0。613218 0。357056 0。534264

two 0。110583 0。602123 0。990186 0。149132

three 0。756016 0。897848 0。176100 0。204789

four 0。655573 0。819009 0。094322 0。656406

排序

根據指定列的列值排序同時列值所在的行也會跟著移動。sort_values（［‘列’］）

# 單列

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

print（df。sort_values（［‘a’］）） # 預設升序

print（“”）

print（df。sort_values（［‘a’］， ascending=False）） # 降序

a b c d

0 0。616386 0。416094 0。072445 0。140167

1 0。263227 0。079205 0。520708 0。866316

2 0。665673 0。836688 0。733966 0。310229

3 0。405777 0。090530 0。991211 0。712312

a b c d

1 0。263227 0。079205 0。520708 0。866316

3 0。405777 0。090530 0。991211 0。712312

0 0。616386 0。416094 0。072445 0。140167

2 0。665673 0。836688 0。733966 0。310229

a b c d

2 0。665673 0。836688 0。733966 0。310229

0 0。616386 0。416094 0。072445 0。140167

3 0。405777 0。090530 0。991211 0。712312

1 0。263227 0。079205 0。520708 0。866316

根據索引排序。sort_index（）

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）， index=［2，1，3，0］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

print（df。sort_index（）） # 預設升序

print（“”）

print（df。sort_index（ascending=False）） # 降序

a b c d

2 0。669311 0。118176 0。635512 0。248388

1 0。752321 0。935779 0。572554 0。274019

3 0。701334 0。354684 0。592998 0。402686

0 0。548317 0。966295 0。191219 0。307908

a b c d

0 0。548317 0。966295 0。191219 0。307908

1 0。752321 0。935779 0。572554 0。274019

2 0。669311 0。118176 0。635512 0。248388

3 0。701334 0。354684 0。592998 0。402686

a b c d

3 0。701334 0。354684 0。592998 0。402686

2 0。669311 0。118176 0。635512 0。248388

1 0。752321 0。935779 0。572554 0。274019

0 0。548317 0。966295 0。191219 0。307908

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）， index=［“x”， “z”， “y”， “t”］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

print（df。sort_index（）） # 根據字母順序表排序

a b c d

x 0。717421 0。206383 0。757656 0。720580

z 0。969988 0。551812 0。210200 0。083031

y 0。956637 0。759216 0。350744 0。335287

t 0。846718 0。207411 0。936231 0。891330

a b c d

t 0。846718 0。207411 0。936231 0。891330

x 0。717421 0。206383 0。757656 0。720580

y 0。956637 0。759216 0。350744 0。335287

z 0。969988 0。551812 0。210200 0。083031

df = pd。DataFrame（np。random。rand（16）。reshape（4，4）， index=［“three”， “one”， “four”， “two”］， columns=［“a”， “b”， “c”， “d”］）

print（df）

print（“”）

print（df。sort_index（）） # 根據單詞首字母排序

a b c d

three 0。173818 0。902347 0。106037 0。303450

one 0。591793 0。526785 0。101916 0。884698

four 0。685250 0。364044 0。932338 0。668774

two 0。240763 0。260322 0。722891 0。634825

a b c d

four 0。685250 0。364044 0。932338 0。668774

one 0。591793 0。526785 0。101916 0。884698

three 0。173818 0。902347 0。106037 0。303450

two 0。240763 0。260322 0。722891 0。634825

上一篇：2007年，湯志偉跟黃琴伶假離婚，丈夫早有新歡，她卻淨身出戶

下一篇：歷史上第一位武狀元“行不更名坐不改姓”他連姓氏都改了

您現在的位置是：首頁 > 人文

Pandas資料結構：DataFrame

相關文章

推薦文章