Python之pandsa库apply,map,applymap使用详解

pandas官方文档http://pandas.pydata.org/pandas-docs/stable/index.html

作用：在Pandas中应用函数

1. apply的使用（最为灵活的使用）

常见使用方式：

import pandas as pdimport numpy as npdates = pd.date_range('20130101', periods=6)df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) # 通过index和columns定义行索引和列索引df               A	         B	         C          D2013-01-01	0.345018	-1.152533	0.579335	0.7851592013-01-02	0.345413	-1.789114	-1.291485	0.7484882013-01-03	-0.463054	0.836781	-0.534286	0.4085832013-01-04	1.056680	-2.314360	-0.148108	0.5184692013-01-05	-1.387264	-1.029623	-1.805952	-0.9043342013-01-06	-0.073426	0.413364	-0.371006	-0.023453df.apply(sum)A    -0.521652B    -3.882952C    -3.571501D    30.000000F    18.000000dtype: float64df.apply(sum,axis=0)A    -0.521652B    -3.882952C    -3.571501D    30.000000F    18.000000dtype: float64df.apply(sum,axis=1)2013-01-01    8.5793352013-01-02    3.2648142013-01-03    6.8394412013-01-04    6.5942122013-01-05    4.7771612013-01-06    9.968932Freq: D, dtype: float64

总结：默认将函数作用于每一列，可以通过设置axis来设定函数作用的轴，0作用在列，1作用在行

通过apply作用在指定的列或指定的行(接着上面的代码演示)

df["B"].apply(sum)  # 想将sum函数作用在B列TypeError: 'float' object is not iterable

原本想通过以上代码将sum函数作用在B列，结果报错了，于是换了一种方式：

df[["B"]].apply(sum)     # 作用于指定行，两个中括号B   -3.882952dtype: float64

用以上代码就成功了，于是就想着查看一下原因在那里。

print(type(df["B"]))<class 'pandas.core.series.Series'>print(type(df[["B"]]))<class 'pandas.core.frame.DataFrame'>

通过以上代码可以看出两者的区别在于一个中括号返回的是Series类型，而两个中括号返回的是DataFrame格式。
于是就在想难道是apply只能作用在DataFrame类型而不能作用在Series类型上，于是又做了一下测试：

s = pd.Series([1, 3, 5, 6, 6, 8])s0    11    32    53    64    65    8s.apply(lambda x:x+50)0    511    532    553    564    565    58

通过这段测试代码发现不是这个原因导致，apply可以作用于Series类型，那么问题又出在那里呢，于是看看异常，是TypeError: ‘float’ object is not iterable，难道是sum函数的问题，于是又接着测试：

b=pd.DataFrame(s)b	00	11	32	53	64	65	8b.apply(sum)0    29dtype: int64s.apply(sum)TypeError: 'int' object is not iterablesum([1,2])3sum(1,2)TypeError: 'int' object is not iterable

通过上述代码可以发现sum函数的参数需要可迭代的，当给1,2时，sum作用在1时int类型不可迭代，所以报出异常，同理在Series类型的s中，sum作用在第一个元素1上时也是int类型，所以报错。而作用在DataFrame类型b上时，sum作用在b的第一个元素上（即b[0]一个Series类型的数据）是可以迭代的，所以正常运行。

apply的使用总结：

apply可以将函数作用于行或列上，通过axis来设置作用轴，默认为0作用在列上，设置为1作用在行上
跟据传入函数的传参要求进行传参

2.map的使用

代码还是接着上方代码演示

df               A	         B	         C          D2013-01-01	0.345018	-1.152533	0.579335	0.7851592013-01-02	0.345413	-1.789114	-1.291485	0.7484882013-01-03	-0.463054	0.836781	-0.534286	0.4085832013-01-04	1.056680	-2.314360	-0.148108	0.5184692013-01-05	-1.387264	-1.029623	-1.805952	-0.9043342013-01-06	-0.073426	0.413364	-0.371006	-0.023453df["A"].map(lambda x:x+50)     # 作用于指定列的元素2013-01-01    50.0000002013-01-02    50.3454132013-01-03    49.5369462013-01-04    51.0566802013-01-05    48.6127362013-01-06    49.926574Freq: D, Name: A, dtype: float64df[["A"]].map(lambda x:x+50) AttributeError: 'DataFrame' object has no attribute 'map'

map使用总结：map只能将函数作用在Series类型上，通过异常很容易发现DataFrame没有这个特性。（两个中括号和一个钟阔号取值的区别上面的代码已经做出说明）

3.applymap的使用

df                 A	         B	        C	    D  	F2013-01-01	0.000000	0.000000	0.579335	5	3.02013-01-02	0.345413	-1.789114	-1.291485	5	1.02013-01-03	-0.463054	0.836781	-0.534286	5	2.02013-01-04	1.056680	-2.314360	-0.148108	5	3.02013-01-05	-1.387264	-1.029623	-1.805952	5	4.02013-01-06	-0.073426	0.413364	-0.371006	5	5.0df.applymap(lambda x:x+50)     # 作用于表格中所有元素                   A	       B	        C	 D	 2013-01-01	50.000000	50.000000	50.579335	55	53.02013-01-02	50.345413	48.210886	48.708515	55	51.02013-01-03	49.536946	50.836781	49.465714	55	52.02013-01-04	51.056680	47.685640	49.851892	55	53.02013-01-05	48.612736	48.970377	48.194048	55	54.02013-01-06	49.926574	50.413364	49.628994	55	55.0df[["A","B"]].applymap(lambda x:x+50)     # 作用于指定的几列元素                   A	B2013-01-01	50.000000	50.0000002013-01-02	50.345413	48.2108862013-01-03	49.536946	50.8367812013-01-04	51.056680	47.6856402013-01-05	48.612736	48.9703772013-01-06	49.926574	50.413364df["A"].applymap(lambda x:x+50)     # 作用于指定的几列元素AttributeError: 'Series' object has no attribute 'applymap'

applymap使用总结:将行数作用在表格中每个单元的数据上，但是只能用在DataFrame类型上，Series类型没有这一特性，和map相反。

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。