为了更及时地扩展知识图谱,自动从海量数据中获取新的世界知识已成为必由之路。以实体关系抽取为代表的知识获取技术已经取得了一些成果,特别是近年来深度学习模型极大地推动了关系抽取的发展。但是,与实际场景的关系抽取复杂挑战的需求相比,现有技术仍有较大的局限性。我们亟需从实际场景需求出发,解决训练数据获取、少次学习能力、复杂文本语境、开放关系建模等挑战问题,建立有效而鲁棒的关系抽取系统,这也是实体关系抽取任务需要继续努力的方向。 我们课题组从2016年开始耕耘实体关系抽取任务,先后有林衍凯、韩旭、姚远、曾文远、张正彦、朱昊、于鹏飞、于志竟成、高天宇、王晓智、吴睿东等同学在多方面开展了研究工作。去年在韩旭和高天宇等同学的努力下,发布了OpenNRE工具包 [33],经过近两年来的不断改进,涵盖有监督关系抽取、远程监督关系抽取、少次学习关系抽取和文档级关系抽取等丰富场景。此外,也花费大量科研经费标注了FewRel (1.0和2.0)和DocRED等数据集,旨在推动相关方向的研究。 本文总结了我们对实体关系抽取现状、挑战和未来发展方向的认识,以及我们在这些方面做出的努力,希望能够引起大家的兴趣,对大家有些帮助。期待更多学者和同学加入到这个领域研究中来。当然,本文没有提及一个重要挑战,即以事件抽取为代表的复杂结构的知识获取,未来有机会我们再专文探讨。 限于个人水平,难免有偏颇舛误之处,还请大家在评论中不吝指出,我们努力改进。需要说明的是,我们没想把这篇文章写成严谨的学术论文,所以没有面面俱到把每个方向的所有工作都介绍清楚,如有重要遗漏,还请批评指正。 我们课题组在实体关系抽取方面开展的多项工作(如FewRel、DocRED等)是与腾讯微信模式识别中心团队合作完成的。微信模式识别中心是微信AI(WeChat AI)下辖的中心之一,主要关注自然语言处理相关的研究和产品。 研究方面,他们的研究工作涵盖对话系统、知识学习、认知推理、机器翻译等多个方向,今年在ACL、EMNLP、AAAI等会议上发表论文20多篇,也在多个比赛中获得优异成绩,学术成果颇丰。产品方面,他们开发的小微对话系统和微信对话开放平台在音箱、公众号自动客服等场景方面也有不俗的表现,但投入的人力比亚马逊Alex团队要少得多,也算是对微信“小”团队做大事风格的一种体现。 我们与腾讯微信的这些合作是基于“清华-腾讯联合实验室”开展的。我们与腾讯高校合作中心合作多年,参与了包括清华-腾讯联合实验室(与清华各院系开展合作的学校级平台)、犀牛鸟专项基金(面向各类老师的前沿探索研究性项目)、犀牛鸟精英人才培养计划(面向学生,腾讯和清华双导师联合在腾讯培养科研型人才)等项目。 作者简介 韩旭,清华大学计算机科学与技术系博士三年级同学,主要研究方向为自然语言处理、知识图谱、信息抽取。在人工智能领域国际著名会议AAAI、ACL、EMNLP、COLING、NAACL上发表多篇论文,是OpenKE、OpenNRE等开源项目的开发者之一。 高天宇,清华大学计算机系大四本科生,主要研究方向为自然语言处理、知识图谱、关系抽取。在人工智能领域国际著名会议AAAI、EMNLP上发表多篇论文,是OpenNRE等开源项目的主要开发者之一。 刘知远,清华大学计算机系副教授、博士生导师。主要研究方向为表示学习、知识图谱和社会计算。 参考文献 [1] ChunYang Liu, WenBo Sun, WenHan Chao, Wanxiang Che. Convolution Neural Network for Relation Extraction. The 9th International Conference on Advanced Data Mining and Applications (ADMA 2013).[2] Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, Jun Zhao. Relation Classification via Convolutional Deep Neural Network. The 25th International Conference on Computational Linguistics (COLING 2014).[3] Dongxu Zhang, Dong Wang. Relation Classification via Recurrent Neural Network. arXiv preprint arXiv:1508.01006 (2015).[4] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, Bo Xu. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. The 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016).[5] Richard Socher, Brody Huval, Christopher D. Manning, Andrew Y. Ng. Semantic Compositionality through Recursive Matrix-Vector Spaces. The 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012).[6] Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, Stan Szpakowicz. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals. The 5th International Workshop on Semantic Evaluation (SEMEVAL 2010).[7] Thien Huu Nguyen, Ralph Grishman. Relation Extraction: Perspective from Convolutional Neural Networks. The 1st Workshop on Vector Space Modeling for Natural Language Processing (LatentVar 2015).[8] Cícero dos Santos, Bing Xiang, Bowen Zhou. Classifying Relations by Ranking with Convolutional Neural Networks. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015).[9] Shu Zhang, Dequan Zheng, Xinchen Hu, Ming Yang. Bidirectional Long Short-Term Memory Networks for Relation Classification. The 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 2015).[10] Minguang Xiao, Cong Liu. Semantic Relation Classification via Hierarchical Recurrent Neural Network with Attention. The 26th International Conference on Computational Linguistics (COLING 2016).[11] Kun Xu, Yansong Feng, Songfang Huang, Dongyan Zhao. Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling. The 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015).[12] Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, Zhi Jin. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths. The 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015).[13] Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, Houfeng Wang. A Dependency-Based Neural Network for Relation Classification. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015).[14] Yan Xu, Ran Jia, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu, Zhi Jin. Improved Relation Classification by Deep Recurrent Neural Networks with Data Augmentation. The 26th International Conference on Computational Linguistics (COLING 2016).[15] Rui Cai, Xiaodong Zhang, Houfeng Wang. Bidirectional Recurrent Convolutional Neural Network for Relation Classification. The 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016).[16] Mike Mintz, Steven Bills, Rion Snow, Daniel Jurafsky. Distant Supervision for Relation Extraction without Labeled Data. The 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2009).[17] Daojian Zeng, Kang Liu, Yubo Chen, Jun Zhao. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. The 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015).[18] Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, Maosong Sun. Neural Relation Extraction with Selective Attention over Instances. The 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016).[19] Yi Wu, David Bamman, Stuart Russell. Adversarial Training for Relation Extraction. The 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).[20] Jun Feng, Minlie Huang, Li Zhao, Yang Yang, Xiaoyan Zhu. Reinforcement Learning for Relation Classification from Noisy Data. The 32th AAAI Conference on Artificial Intelligence (AAAI 2018).[21] Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, Maosong Sun. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. The 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).[22] Tianyu Gao, Xu Han, Zhiyuan Liu, Maosong Sun. Hybrid Attention-based Prototypical Networks for Noisy Few-Shot Relation Classification. The 33th AAAI Conference on Artificial Intelligence (AAAI 2019).[23] Zhi-Xiu Ye, Zhen-Hua Ling. Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification. The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).[24] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Tom Kwiatkowski. Matching the Blanks: Distributional Similarity for Relation Learning. The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).[25] Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou. FewRel 2.0: Towards More Challenging Few-Shot Relation Classification. 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019).[26] Chris Quirk, Hoifung Poon. Distant Supervision for Relation Extraction beyond the Sentence Boundary. The 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017).[27] Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, Wen-tau Yih. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. Transactions of the Association for Computational Linguistics (TACL 2017).[28] Chih-Hsuan Wei, Yifan Peng, Robert Leaman, Allan Peter Davis, Carolyn J. Mattingly, Jiao Li, Thomas C. Wiegers, Zhiyong Lu. Overview of the BioCreative V Chemical Disease Relation (CDR) Task. The 5th BioCreative Challenge Evaluation Workshop (BioC 2015).[29] Omer Levy, Minjoon Seo, Eunsol Choi, Luke Zettlemoyer. Zero-Shot Relation Extraction via Reading Comprehension. The 21st Conference on Computational Natural Language Learning (CoNLL 2017).[30] Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, Maosong Sun. DocRED: A Large-Scale Document-Level Relation Extraction Dataset. The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).[31] Ruidong Wu, Yuan Yao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun. Open Relation Extraction: Relational Knowledge Transfer from Supervised Data to Unsupervised Data. 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019).[32] Tianyu Gao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun. Neural Snowball for Few-Shot Relation Learning. The 34th AAAI Conference on Artificial Intelligence (AAAI 2020).[33] Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan Liu, Maosong Sun. OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2019).