123,123,123

面向多源异构数据的知识图谱可视化融合方法

电子技术应用

梁浩1，付达2

1.深圳鹏锐信息技术股份有限公司；2.北京京能能源技术研究有限责任公司

摘要： 为解决数据冗余冲突与关联缺失问题，研究面向多源异构数据的知识图谱可视化融合方法，提升数据融合的可靠性。利用网络本体语言为多源异构数据建立对应的领域本体库与全局本体库，使得知识实体抽取和知识融合在同一框架下进行；通过长短期记忆网络-条件随机场模型，在本体库约束下，从多源异构数据中抽取符合领域定义的知识实体；利用基于层次过滤思想的知识融合模型，可视化融合抽取的知识实体，解决多源异构数据中冗余信息和不一致性问题，形成准确、完整、可靠的多源异构数据可视化融合知识图谱，有助于发现潜在的数据关联，补全数据关联缺失。实验结果表明：随着数据缺失比例的提升，尺度系数与属性覆盖度均开始下降，最低尺度系数与属性覆盖度是0.86与0.87，均显著高于对应的阈值；所提方法在处理四个数据源时，视觉清晰度达93%~97%，信息融合度达92%~96%，均优于对比方法。说明该方法可有效抽取多源异构数据知识实体，建立知识图谱，实现多源异构数据可视化融合；在不同数据缺失比例下，该方法多源异构数据可视化融合的尺度系数与属性覆盖度均较大，即数据可视化融合效果较优；同时有效提升了数据可视化效果和信息整合程度。

關(guān)鍵詞： 多源异构数据知识图谱可视化融合本体库长短期记忆网络

中圖分類號(hào)：TP391 文獻(xiàn)標(biāo)志碼：A DOI: 10.16157/j.issn.0258-7998.245966
中文引用格式： 梁浩，付達(dá). 面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法[J]. 電子技術(shù)應(yīng)用，2025，51(6)：47-53.
英文引用格式： Liang Hao，F(xiàn)u Da. Knowledge graph visualization fusion method for heterogeneous data from multiple sources[J]. Application of Electronic Technique，2025，51(6)：47-53.

Knowledge graph visualization fusion method for heterogeneous data from multiple sources

Liang Hao1，Fu Da2

1.Plant Resource Technology Co.， Ltd.； 2.Beijing Jingneng Energy Technology Reach Co.， Ltd.

Abstract： In order to solve the problem of data redundancy conflict and lack of association, a knowledge graph visualization fusion method for multi-source heterogeneous data is studied to improve the reliability of data fusion. The domain ontology database and global ontology database corresponding to multi-source heterogeneous data are established by using Web Ontdogy Languge(OWL), so that knowledge entity extraction and knowledge fusion are carried out under the same framework. Based on the Long Short-Term Memory network(LSTM) and Conditional Random Field(CRF) model, knowledge entities conforming to domain definition are extracted from heterogeneous data from multiple sources under the constraint of ontology library. The knowledge fusion model based on hierarchical filtering is used to visualize the extracted knowledge entities, solve the redundant information and inconsistency problems in multi-source heterogeneous data, and form an accurate, complete and reliable multi-source heterogeneous data visualization fusion knowledge graph, which helps to find potential data associations and complete the missing data associations. The experimental results show that with the increase of the proportion of missing data, the scaling coefficient and attribute coverage begin to decrease, and the lowest scaling coefficient and attribute coverage are 0.86 and 0.87, which are significantly higher than the corresponding thresholds. When dealing with four data sources, the visual clarity of the proposed method is 93%~97%, and the information fusion is 92%~96%, which are better than the comparison methods. It shows that the method can effectively extract the knowledge entities of multi-source heterogeneous data, establish the knowledge graph, and realize the visualization fusion of multi-source

Key words : multi-source heterogeneous data；knowledge graph；visual ization fusion；ontology library；long short-term memory network；conditional random field

引言

在實(shí)際應(yīng)用中，數(shù)據(jù)往往來(lái)源于多個(gè)不同的源頭，具有異構(gòu)性、多樣性和復(fù)雜性等特點(diǎn)，這給數(shù)據(jù)的處理、分析和應(yīng)用帶來(lái)了巨大挑戰(zhàn)[1]。多源異構(gòu)數(shù)據(jù)融合方法應(yīng)運(yùn)而生，旨在通過(guò)先進(jìn)的技術(shù)手段，將來(lái)自不同數(shù)據(jù)源、不同格式、不同結(jié)構(gòu)的數(shù)據(jù)進(jìn)行有效整合與展示，為用戶提供直觀、全面、深入的數(shù)據(jù)洞察[2]。

多源異構(gòu)數(shù)據(jù)融合方法不僅有助于解決數(shù)據(jù)孤島問(wèn)題，實(shí)現(xiàn)數(shù)據(jù)的互聯(lián)互通[3]，還能夠顯著提升數(shù)據(jù)處理的效率和準(zhǔn)確性，為決策支持、科學(xué)研究、產(chǎn)業(yè)創(chuàng)新等領(lǐng)域提供強(qiáng)有力的數(shù)據(jù)支撐。例如，莫慧凌等人利用聯(lián)邦學(xué)習(xí)框架實(shí)現(xiàn)數(shù)據(jù)融合，各參與方均利用張量Tucker分解理論，提取數(shù)據(jù)特征；通過(guò)中央服務(wù)器收集并聚合來(lái)自各參與方的模型參數(shù)，形成全局模型；以多次迭代方式優(yōu)化全局模型，完成數(shù)據(jù)融合[4]。在異構(gòu)數(shù)據(jù)中，存在冗余或沖突的信息。Tucker分解和聯(lián)邦學(xué)習(xí)框架在處理這些信息時(shí)無(wú)法完全避免冗余和沖突的影響，進(jìn)而影響數(shù)據(jù)融合效果。王姝等人利用信息熵評(píng)估各證據(jù)源的相對(duì)重要性，并通過(guò)散度計(jì)算來(lái)獲取證據(jù)可信度優(yōu)化證據(jù)，得到差異信息量，確定各數(shù)據(jù)源的最終權(quán)重，進(jìn)行數(shù)據(jù)融合[5]。信息熵方法主要關(guān)注于信息量的評(píng)估，而對(duì)于數(shù)據(jù)之間的冗余性缺乏直接的識(shí)別能力，導(dǎo)致數(shù)據(jù)融合過(guò)程中冗余數(shù)據(jù)仍然被保留，增加數(shù)據(jù)處理的復(fù)雜性和計(jì)算成本?？飶V生等人利用圖的聚類算法來(lái)識(shí)別數(shù)據(jù)中的相似性，進(jìn)而將相似的數(shù)據(jù)項(xiàng)進(jìn)行融合[6]。圖的聚類算法主要依賴于數(shù)據(jù)間的相似關(guān)系進(jìn)行聚類。然而，當(dāng)數(shù)據(jù)集中存在關(guān)聯(lián)缺失時(shí)，該算法無(wú)法準(zhǔn)確地將這些數(shù)據(jù)項(xiàng)劃分為同一聚類，導(dǎo)致數(shù)據(jù)融合結(jié)果無(wú)法完全反映數(shù)據(jù)間的真實(shí)關(guān)系。Gong等人提出了一種多粒度視覺(jué)引導(dǎo)的多模態(tài)異構(gòu)圖實(shí)體級(jí)融合命名實(shí)體識(shí)別方法，該方法通過(guò)在不同視覺(jué)粒度上整合文本與視覺(jué)的跨模態(tài)語(yǔ)義交互信息，構(gòu)建全面的多模態(tài)表示[7]。利用多模態(tài)異構(gòu)圖精確描述實(shí)體級(jí)單詞與視覺(jué)對(duì)象的語(yǔ)義關(guān)系，并借助異構(gòu)圖注意力網(wǎng)絡(luò)實(shí)現(xiàn)細(xì)粒度跨模態(tài)語(yǔ)義交互，顯著提升識(shí)別準(zhǔn)確率，但實(shí)現(xiàn)過(guò)程復(fù)雜度較高，可能影響應(yīng)用效率。

在多源數(shù)據(jù)融合過(guò)程中，數(shù)據(jù)冗余和沖突是常見(jiàn)問(wèn)題。知識(shí)圖譜通過(guò)去重、糾錯(cuò)等步驟，以及關(guān)系網(wǎng)絡(luò)的構(gòu)建，能夠減少數(shù)據(jù)冗余和沖突，提高數(shù)據(jù)融合的準(zhǔn)確性和可靠性。同時(shí)，知識(shí)圖譜通過(guò)構(gòu)建實(shí)體之間的關(guān)系網(wǎng)絡(luò)，能夠發(fā)現(xiàn)數(shù)據(jù)之間的潛在關(guān)聯(lián)，從而補(bǔ)全數(shù)據(jù)關(guān)聯(lián)缺失的問(wèn)題。為此，研究面向多源異構(gòu)數(shù)據(jù)的知識(shí)圖譜可視化融合方法，充分利用各種數(shù)據(jù)資源，避免數(shù)據(jù)浪費(fèi)，提高數(shù)據(jù)利用率。

本文詳細(xì)內(nèi)容請(qǐng)下載：

http://www.ihrv.cn/resource/share/2000006561

作者信息：

梁浩1，付達(dá)2

（1.深圳鵬銳信息技術(shù)股份有限公司，廣東深圳 518055；

2.北京京能能源技術(shù)研究有限責(zé)任公司，北京 100020）

Magazine.Subscription.jpg

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容