(3条消息) Python3pandas库Series用法(基础整理)
pandas库Series用法
- 构造/初始化Series的3种方法:
- 1)用列表list构建Series
- 1.a)pandas会默认用0到n来做Series的index,但也可以自己指定index,index你可以理解为dict里面的key
- 2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
- 3)用numpy array来构建Series
- 选择数据:
- 1)可以像对待一个list一样对待一个Series,完成各种切片的操作
- 2)Series就像一个dict,前面定义的index就是用来选择数据的
- 3)boolean indexing,和numpy很像
- Series元素赋值:
- 1)直接利用索引值赋值
- 2)不要忘了上面的boolean indexing,在赋值里它也可以用
- 数学运算
- 数据缺失
构造/初始化Series的3种方法:
1)用列表list构建Series
import pandas as pdmy_list=[7,'Beijing','19大',3.1415,-10000,'Happy']s=pd.Series(my_list)print(type(s))print(s)<class 'pandas.core.series.Series'>0 71 Beijing2 19大3 3.14154 -100005 Happydtype: object
1.a)pandas会默认用0到n来做Series的index,但也可以自己指定index,index你可以理解为dict里面的key
s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],index=['A','B','C','D','E','F'])print(s)A 7B BeijingC 19大D 3.1415E -10000F Happydtype: object
2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
3)用numpy array来构建Series
import numpy as npd=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])print(d)a -0.329401b -0.435921c -0.232267d -0.846713e -0.406585dtype: float64
选择数据:
1)可以像对待一个list一样对待一个Series,完成各种切片的操作
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64print(apts[3])60000.0print(apts[[3,4,1]])Shanghai 60000.0Suzhou NaNGuangzhou 45000.0Name: income, dtype: float64print(apts[1:])Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64print(apts[:-2])Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Name: income, dtype: float64print(apts[1:]+apts[:-1])Beijing NaNGuangzhou 90000.0Hangzhou 40000.0Shanghai 120000.0Suzhou NaNshenzhen NaNName: income, dtype: float64
2)Series就像一个dict,前面定义的index就是用来选择数据的
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts['Shanghai']) ###60000.0print('Hangzhou' in apts)Trueprint('Choingqing' in apts)False
3)boolean indexing,和numpy很像
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')less_than_50000=(apts<=50000) ###print(apts[less_than_50000])Guangzhou 45000.0Hangzhou 20000.0shenzhen 50000.0Name: income, dtype: float64
注:可以使用numpy的各种函数mean,median,max,min
print(apts.mean()) 46000.0
Series元素赋值:
1)直接利用索引值赋值
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)print('Old income of shenzhen:{}'.format(apts['shenzhen']))Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64Old income of shenzhen:50000.0apts['shenzhen']=70000 ###print(apts)print('New income of shenzhen:{}'.format(apts['shenzhen']))Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64New income of shenzhen:70000.0
2)不要忘了上面的boolean indexing,在赋值里它也可以用
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000) ###print(less_than_50000)apts[less_than_50000]=40000 ###print(apts)Beijing FalseGuangzhou TrueHangzhou TrueShanghai FalseSuzhou Falseshenzhen FalseName: income, dtype: boolBeijing 55000.0Guangzhou 40000.0Hangzhou 40000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64
数学运算
import pandas as pdimport numpy as npcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000) apts[less_than_50000]=40000 print(apts)print(apts/2) ###print(apts**1.5) ###print(np.log(apts)) ###apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)print(apts+apts2) ###
数据缺失
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000less_than_50000=(apts<50000)apts[less_than_50000]=40000print(apts)Beijing 55000.0Guangzhou 40000.0Hangzhou 40000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)Beijing 10000Chongqing 30000Guangzhou 7000Shanghai 8000Tianjin 40000shenzhen 6000dtype: int64print('Hangzhou' in apts) ###print('Hangzhou' in apts2)TrueFalseprint(apts.notnull()) #boolean条件 ###Beijing TrueGuangzhou TrueHangzhou TrueShanghai TrueSuzhou Falseshenzhen TrueName: income, dtype: boolprint(apts.isnull()) ###Beijing FalseGuangzhou FalseHangzhou FalseShanghai FalseSuzhou Trueshenzhen FalseName: income, dtype: boolprint(apts[apts.isnull()]) #利用缺失索引布尔值取元素Suzhou NaNName: income, dtype: float64apts=apts+apts2 #索引缺失相加print(apts)Beijing 65000.0Chongqing NaNGuangzhou 47000.0Hangzhou NaNShanghai 68000.0Suzhou NaNTianjin NaNshenzhen 76000.0dtype: float64apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值print(apts)Beijing 65000.0Chongqing 64000.0Guangzhou 47000.0Hangzhou 64000.0Shanghai 68000.0Suzhou 64000.0Tianjin 64000.0shenzhen 76000.0dtype: float64
赞 (0)
