打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
R语言ctree()

转自CSDN博客:Daeyeon7

说明

除了传统的决策树(rpart)算法,条件推理树(ctree)是另一种比较常用的基于树的分类算法。两者的不同之处是,条件推理树是选择分类变量时的依据是显著性测量的结果,而不是采用信息最大化法(rpart采用的是基尼系数)。

操作

调用party包的ctree命令来构建分类器

library(zoo)library(party)ctree.model = ctree(churn ~ .,data = trainset)ctree.model Conditional inference tree with 18 terminal nodesResponse:  churn Inputs:  international_plan, voice_mail_plan, number_vmail_messages, total_day_minutes, total_day_calls, total_day_charge, total_eve_minutes, total_eve_calls, total_eve_charge, total_night_minutes, total_night_calls, total_night_charge, total_intl_minutes, total_intl_calls, total_intl_charge, number_customer_service_calls Number of observations:  2315 1) international_plan == {no}; criterion = 1, statistic = 173.582  2) number_customer_service_calls <= 3; criterion = 1, statistic = 133.882    3) total_day_minutes <= 259.3; criterion = 1, statistic = 232.371      4) total_eve_minutes <= 258.7; criterion = 1, statistic = 39.065        5)*  weights = 1544       4) total_eve_minutes > 258.7        6) total_day_minutes <= 222.9; criterion = 1, statistic = 47.453          7)*  weights = 209         6) total_day_minutes > 222.9          8) voice_mail_plan == {yes}; criterion = 1, statistic = 20            9)*  weights = 8           8) voice_mail_plan == {no}            10)*  weights = 28     3) total_day_minutes > 259.3      11) voice_mail_plan == {no}; criterion = 1, statistic = 46.262        12) total_eve_charge <= 14.09; criterion = 1, statistic = 37.877          13)*  weights = 21         12) total_eve_charge > 14.09          14) total_night_minutes <= 178.3; criterion = 1, statistic = 19.789            15)*  weights = 23           14) total_night_minutes > 178.3            16)*  weights = 60       11) voice_mail_plan == {yes}        17)*  weights = 34   2) number_customer_service_calls > 3    18) total_day_minutes <= 159.4; criterion = 1, statistic = 34.903      19) total_eve_minutes <= 233.2; criterion = 0.991, statistic = 11.885        20) voice_mail_plan == {no}; criterion = 0.99, statistic = 11.683          21)*  weights = 40         20) voice_mail_plan == {yes}          22)*  weights = 7       19) total_eve_minutes > 233.2        23)*  weights = 16     18) total_day_minutes > 159.4      24)*  weights = 96 1) international_plan == {yes}  25) total_intl_charge <= 3.51; criterion = 1, statistic = 35.28    26) total_intl_calls <= 2; criterion = 1, statistic = 28.013      27)*  weights = 40     26) total_intl_calls > 2      28) number_customer_service_calls <= 3; criterion = 0.957, statistic = 8.954        29) total_day_minutes <= 271.5; criterion = 1, statistic = 25.328          30) total_eve_charge <= 25.82; criterion = 0.987, statistic = 11.167            31)*  weights = 116           30) total_eve_charge > 25.82            32)*  weights = 7         29) total_day_minutes > 271.5          33)*  weights = 11       28) number_customer_service_calls > 3        34)*  weights = 14   25) total_intl_charge > 3.51    35)*  weights = 41 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62

条件推理树可视化

plot(ctree.model)
  • 1
  • 2

 
通过减少特征值,再重新绘制分类树,得到一颗简化的条件推理树

daycharge.model = ctree(churn ~ total_day_charge,data = trainset)plot(daycharge.mode)
  • 1
  • 2

 
用total_day_charge作为唯一分割条件得到的推理树 
输出结果图可以显示出每个中间节点的相应的依赖的变量名称与p值,分裂条件在左右的分枝上有所显示,叶子节点可以显示不同类别样本的个数n,以及样例属于0与1的概率。从图中可以知道,当total_day_charge的值大于48.18时,节点9的亮灰域要大于深灰域,这意味日消费大于48.18的客户流失文概率要非常大(类标签yes) 
####评测条件推理树的预测能力

ctree.predict = predict(ctree.model,testset)table(ctree.predict,testset$churn)ctree.predict yes  no          yes  99  15          no   42 862
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

使用caret包的confusionMarix完成

library(lattice)library(ggplot2)library(caret)confusionMatrix(table(ctree.predict,testset$churn))Confusion Matrix and Statisticsctree.predict yes  no          yes  99  15          no   42 862               Accuracy : 0.944                            95% CI : (0.9281, 0.9573)    No Information Rate : 0.8615              P-Value [Acc > NIR] : < 2.2e-16                         Kappa : 0.7449           Mcnemar's Test P-Value : 0.0005736                   Sensitivity : 0.70213                     Specificity : 0.98290                  Pos Pred Value : 0.86842                  Neg Pred Value : 0.95354                      Prevalence : 0.13851                  Detection Rate : 0.09725            Detection Prevalence : 0.11198               Balanced Accuracy : 0.84251                'Positive' Class : yes   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

调用treeresponse( )函数,输出这一列的概率

tr = treeresponse(ctree.model,newdata = testset[1:5,])> tr[[1]][1] 0.03497409 0.96502591[[2]][1] 0.02586207 0.97413793[[3]][1] 0.02586207 0.97413793[[4]][1] 0.02586207 0.97413793[[5]][1] 0.03497409 0.96502591
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

本节首先使用prediction函数实现测试数据集的标记(类别号),然后调用table函数生成分类表,最后使用caret包内置的confusionMatrix函数来评价预测性能。除了predict函数,也可以使用treeresponse函数来评估类概率,该函数通常会选择概率更高的类标号来标记数据。 
本节样例展示了使用测试数据集testset中的前五条记录来得到分类的概率的估计值,调用treeresponse函数可以得到这5个概率的具体值,可以根据这个来判断样例的类别标号。

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
机器学习开放课程(一):使用Pandas探索数据分析
r语言电信公司客户churn数据流失knn预测分析
Eviews多重共线性检验及补救
浅谈中国城镇劳动力的就业情况
Astronomy Picture of the Day—20190122
Krupp 75mm mountain gun M. 1904
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服