打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
Lessons for Machine Learning from Econometrics | Machine Learning Mastery
userphoto

2014.06.15

关注
Hal Varian is the chief economist at Google and gave a talk to Electronic Support Group at EECS Department at the University of California at Berkeley in November 2013. The talk was titledMachine Learning and Econometrics and was really focused on what lessons the machine learning can take away from the field ofEconometrics.
Hal started out by summarizing a recent paper of his titled “Big Data: New Tricks for Econometrics” (PDF) which comments on what the econometrics community can learn from the machine learning community, namely:
Train-test-validate to avoid overfitting
Cross validation
Nonlinear estimation (trees, forests, SVMs, neural nets, etc)
Bootstrap, bagging, boosting
Variable selection (lasso and friends)
Model averaging
Computational Bayesian methods (MCMC)
Tools for manipulating big data (SQL, NoSQL databases)
Textual analysis (not discussed)
He continued by talking about non-i.i.d data such as time series data and panel data. This is data where cross validation typically does not perform well. He suggests decomposing data trend + seasonal components and look at deviations from expected behavior. An example is given ofGoogle Correlate showing that auto dealer sales data best correlates with searches for indian restaurants (madness!).
NSA auto sales and Google Correlate to 2012
The focus on the talk is causal inference, a big subject in econometrics. He covers:
Counterfactuals: What would have happened to the treated if they weren’t treated? Would they look like the control on average? Read more aboutcounterfactuals within empirical testing.
Confounding Variables: Unobserved variables that correlates with both x and y (the other stuff). Commonly an issue when human choice is involved. Read more aboutconfounding variables.
Natural Experiments: May or may not be randomized. An example is the draft lottery. Read more aboutnatural experiments.
Regression Discontinuity: Cut-off or threshold above or below the treatment is applied. You can compare cases close to the (arbitrary) threshold to estimate the average treatment effect when randomization is not possible. Tune the threshold once you can model the causal relationship and play what-if’s (don’t leave randomization to chance). Read more onregression discontinuity design (RDD).
Difference in Differences (DiD): It’s not enough to look at before and after of the treatment, you need to adjust the treated by the control. The treatment may not be randomly assigned. Read more aboutdifference in differences.
Instrumental Variables: Variation in X that is independent of error. Something that changes X (correlates with X) but does not change the error. Provides a control lever. Randomization is an instrumental variable. Read more aboutinstrumental variables.
He summarized the lessons for the machine learning community from econometrics as follows:
Observational data (usually) can’t determine causality, no matter how big it is (big data is not enough)
Causal inference is what you want for policy
Treatment-control with random assignment is the gold standard
Sometimes you can find natural experiments, discontinuities, etc.
Prediction is critical to causal inference for both selection issue and counterfactual
Very interesting research in systems for continual testing
Hal finished with two book recommendations:
Mostly Harmless Econometrics: An Empiricist’s Companion
An Introduction to Statistical Learning: with Applications in R
The talk was also given to the Stanford University Department of Electrical Engineering in 2014 titledWhat Machine Learning Can Learn from Econometrics and Vice Versa. You can see thePDF slides from this second talk, it’s pretty much the same.
8
13
3
7
Related posts:
Bootstrapping Machine Learning: An Upcoming Book on Prediction APIs 
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
机器学习方法出现在AER, JPE, QJE等顶刊上了!!!
因果和关联,为什么需要因果推断
机器学习入门资源不完全汇总
Machine Learning
史上最全的因果识别经典前沿书籍, 仅此一份
网络公开课资源
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服