123,123,123

基于词汇增强和表格填充的中文命名实体识别

电子技术应用

褚天舒1，唐球1，梁军学2，徐睿1，王明阳2，刘涛2

1.华北计算机系统工程研究所，北京 100083；2.中国人民解放军93216部队，北京 100085

摘要： 中文命名实体识别主要包括中文平面命名实体识别和中文嵌套命名实体识别两个任务，其中中文嵌套命名实体识别任务难度更大。提出了一个基于词汇增强和表格填充的统一模型TLEXNER，该模型能够同时处理上述任务。该模型首先针对中文语料分词困难的问题，使用词典适配器将词汇信息融合到BERT预训练模型，并且将字符与词汇组的相对位置信息集成到BERT的嵌入层中；然后通过条件层归一化和双仿射模型构造并预测字符对表格，使用表格建模字符与字符之间的关系，得到平面实体与嵌套实体的统一表示；最后根据字符对表格上三角区域的数值判断实体类别。提出的模型在平面实体的公开数据集Resume和自行标注的军事领域嵌套实体数据集上F1分别是97.35%和91.96%，证明了TLEXNER模型的有效性。

關(guān)鍵詞： 词汇增强中文命名实体识别表格填充

中圖分類號：TP391 文獻(xiàn)標(biāo)志碼：A DOI: 10.16157/j.issn.0258-7998.233939
中文引用格式： 褚天舒，唐球，梁軍學(xué)，等. 基于詞匯增強(qiáng)和表格填充的中文命名實(shí)體識別[J]. 電子技術(shù)應(yīng)用，2024，50(2)：23-29.
英文引用格式： Chu Tianshu，Tang Qiu，Liang Junxue，et al. Chinese named entity recognition based on lexicon enhancement and table filling[J]. Application of Electronic Technique，2024，50(2)：23-29.

Chinese named entity recognition based on lexicon enhancement and table filling

Chu Tianshu1，Tang Qiu1，Liang Junxue2，Xu Rui1，Wang Mingyang2，Liu Tao2

1.National Computer System Engineering Research Institute of China， Beijing 100083， China； 2.People′s Liberation Army 93216， Beijing 100085， China

Abstract： Chinese named entity recognition has been involved with two tasks, including Chinese flat named entity recognition and Chinese nested named entity recognition. Chinese nested named entity recognition is more difficult. Therefore, this paper proposes a unified model, namely TLEXNER, based on lexicon enhancement and table filling, which can tackle the above two tasks concurrently. Aiming at the difficulty of Chinese word segmentation, the lexicon adapter is used to integrate the lexicon information into the BERT pre-training model，and integrates the relative position information of characters and lexical groups into the BERT embedding layer. Then conditional layer normalization and biaffine model is used to build and predict the representation of the character-pair table, and the relationship between character pairs is modeled by table structure to obtain the unified representation of flat entities and nested entities.

Key words : lexicon enhancement；Chinese named entity recognition；table filling

引言

在大數(shù)據(jù)時(shí)代，每天都產(chǎn)生海量的文本數(shù)據(jù)，如何從這些存在大量冗余的數(shù)據(jù)中獲取真正有價(jià)值的知識信息顯得愈發(fā)重要。使用知識抽取方法能夠自動識別并提取所需知識要素信息，為后續(xù)的知識融合、知識加工、知識應(yīng)用提供數(shù)據(jù)支撐，其中命名實(shí)體識別是知識抽取的重要任務(wù)，也是知識圖譜、數(shù)據(jù)挖掘、智能檢索、問答系統(tǒng)等下游任務(wù)的基礎(chǔ)，命名實(shí)體識別技術(shù)的研究具有重要的理論需求與現(xiàn)實(shí)意義。

中文命名實(shí)體識別根據(jù)粒度劃分可分為基于詞的命名實(shí)體識別、基于字符的命名實(shí)體識別和基于字詞混合的命名實(shí)體識別。與英文命名實(shí)體識別相比，中文沒有明確的單詞分隔符號，因此，中文命名實(shí)體識別存在分詞困難的問題。

本文詳細(xì)內(nèi)容請下載：

http://www.ihrv.cn/resource/share/2000005850

作者信息：

褚天舒1，唐球1，梁軍學(xué)2，徐睿1，王明陽2，劉濤2

1.華北計(jì)算機(jī)系統(tǒng)工程研究所，北京 100083；2.中國人民解放軍93216部隊(duì)，北京 100085

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容