《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 通信与网络 > 设计应用 > 基于聚类的HTTP/HTTPS协议资产发现
基于聚类的HTTP/HTTPS协议资产发现
电子技术应用
马琰1,2,苏马婧1,2,姚旺君1,2,权晓文3,刘红1,2
1.中国信息安全研究院有限公司;2.华北计算机系统工程研究所;3.远江盛邦(北京)网络安全科技股份有限公司
摘要: 网络探测扫描是发现网络资产的重要方法,在探测结果中HTTP/HTTPS协议占比较高,是重要的互联网资产识别来源。随着网络环境的日益复杂,应用HTTP/HTTPS协议的资产种类和数量也在急剧增加,这使得传统基于指纹规则的网络资产识别方法面临着识别效率低、适应性差等问题,无法满足HTTP/HTTPS协议识别的需要。因此,提出了一种新型HTTP/HTTPS协议资产发现方法,通过自动化规则生成器对HTTP/HTTPS协议响应数据进行处理,并基于词频统计和相似度信息对原始数据进行预过滤,利用文本编码模型实现对HTTP/HTTPS协议响应体信息的文本编码和特征融合,结合无监督聚类算法实现对HTTP/HTTPS协议资产的发现。实验结果表明,所提出的方法能够显著提高HTTP/HTTPS协议资产发现效率,提升资产标注速度,并可在无先验知识下发现未知资产。
中圖分類號(hào):TP393.08 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.256341
中文引用格式: 馬琰,蘇馬婧,姚旺君,等. 基于聚類的HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)[J]. 電子技術(shù)應(yīng)用,2025,51(11):98-106.
英文引用格式: Ma Yan,Su Majing,Yao Wangjun,et al. HTTP/HTTPS protocol asset discovery based on clustering[J]. Application of Electronic Technique,2025,51(11):98-106.
HTTP/HTTPS protocol asset discovery based on clustering
Ma Yan1,2,Su Majing1,2,Yao Wangjun1,2,Quan Xiaowen3,Liu Hong1,2
1.China Information Security Research Institute Co., Ltd.;2.National Computer System Engineering Research Institute of China;3.WebRAY Tech (Beijing) Co., Ltd.
Abstract: Network probing and scanning is an essential method for discovering network assets, with HTTP/HTTPS protocols representing a significant proportion of the discovery results and serving as a key source for identifying Internet assets. As the network environment becomes increasingly complex, the variety and volume of assets utilizing the HTTP/HTTPS protocol have grown rapidly, which poses challenges for traditional network asset identification methods based on fingerprinting rules. These conventional approaches suffer from low recognition efficiency and poor adaptability, making them inadequate for identifying HTTP/HTTPS protocol assets. Therefore, this paper proposes a novel method for discovering HTTP/HTTPS protocol assets. The approach processes HTTP/HTTPS response data through an automated rule generator, performs pre-filtering of the raw data based on term frequency statistics and similarity information, and applies a text encoding model to encode the HTTP/HTTPS response body and fuse the features. By integrating an unsupervised clustering algorithm, this method enables the discovery of HTTP/HTTPS protocol assets. Experimental results show that the proposed method significantly improves the efficiency of HTTP/HTTPS protocol asset discovery, accelerates asset labeling, and enables the discovery of unknown assets without prior knowledge.
Key words : network asset discovery;HTTP/HTTPS protocols;automated rule generation;unsupervised clustering;Word2Vec;DBSCAN

引言

在數(shù)字化轉(zhuǎn)型的推動(dòng)下,網(wǎng)絡(luò)資產(chǎn)的種類和數(shù)量呈指數(shù)級(jí)增長(zhǎng),網(wǎng)絡(luò)安全面臨日益復(fù)雜的挑戰(zhàn)。網(wǎng)絡(luò)資產(chǎn)不僅包括傳統(tǒng)的網(wǎng)絡(luò)設(shè)備(如網(wǎng)絡(luò)攝像頭、防火墻),還擴(kuò)展至各種內(nèi)容管理系統(tǒng)和網(wǎng)絡(luò)服務(wù)。當(dāng)前,網(wǎng)絡(luò)資產(chǎn)識(shí)別主要依賴基于靜態(tài)指紋規(guī)則匹配的方法,這種方法雖然在已知類型資產(chǎn)的識(shí)別中表現(xiàn)良好,但其局限性同樣明顯:首先,指紋規(guī)則構(gòu)建和維護(hù)依賴于專家經(jīng)驗(yàn)和大量人力資源投入;其次,基于靜態(tài)指紋庫(kù)的方法在面對(duì)新型設(shè)備時(shí)響應(yīng)速度緩慢,導(dǎo)致對(duì)未知類型資產(chǎn)的識(shí)別率顯著降低。這些缺陷限制了當(dāng)前基于指紋規(guī)則匹配的資產(chǎn)識(shí)別技術(shù)的有效性和適應(yīng)性。

為解決上述問(wèn)題,本文創(chuàng)新性地提出了一種針對(duì)HTTP/HTTPS協(xié)議網(wǎng)絡(luò)資產(chǎn)的發(fā)現(xiàn)方法,通過(guò)自動(dòng)化規(guī)則生成器對(duì)主動(dòng)探測(cè)所采集到的HTTP/HTTPS協(xié)議數(shù)據(jù)進(jìn)行指紋規(guī)則生成和數(shù)據(jù)過(guò)濾,配合無(wú)監(jiān)督聚類方法實(shí)現(xiàn)對(duì)網(wǎng)絡(luò)資產(chǎn)數(shù)據(jù)按共同特征進(jìn)行劃分,以實(shí)現(xiàn)協(xié)議的自動(dòng)發(fā)現(xiàn),此方法可以發(fā)現(xiàn)未知資產(chǎn),提高標(biāo)注效率。本文提出的自動(dòng)化規(guī)則生成器基于層次化分組策略,逐步對(duì)數(shù)據(jù)集進(jìn)行細(xì)化,提煉具有高區(qū)分度的特征字段并構(gòu)建可以進(jìn)行粗分類的指紋規(guī)則,以過(guò)濾掉無(wú)共性資產(chǎn)特征的數(shù)據(jù)。針對(duì)HTTP/HTTPS響應(yīng)頭部字段的多樣性,本文對(duì)大規(guī)模探測(cè)結(jié)果數(shù)據(jù)集進(jìn)行了統(tǒng)計(jì)分析并結(jié)合專家經(jīng)驗(yàn),篩選出了21個(gè)響應(yīng)頭部字段用于生成自動(dòng)化過(guò)濾規(guī)則,設(shè)計(jì)了自動(dòng)化規(guī)則生成器;在此基礎(chǔ)上,對(duì)經(jīng)預(yù)過(guò)濾后的數(shù)據(jù),設(shè)計(jì)了面向HTTP/HTTPS響應(yīng)體信息的多特征融合資產(chǎn)聚類算法,該算法采用Word2Vec[1]進(jìn)行特征編碼,將處理后的數(shù)據(jù)轉(zhuǎn)化為特征向量,結(jié)合特征融合技術(shù)與DBSCAN[2]聚類技術(shù),在多維特征空間中進(jìn)行高效聚類以實(shí)現(xiàn)對(duì)潛在資產(chǎn)的發(fā)現(xiàn)。最后,本文通過(guò)實(shí)驗(yàn)驗(yàn)證了所提方法的有效性。此方法不僅提高了HTTP/HTTPS協(xié)議資產(chǎn)發(fā)現(xiàn)的效率,還能夠有效發(fā)現(xiàn)未知資產(chǎn),進(jìn)而提高指紋標(biāo)注和規(guī)則提取的效率。


本文詳細(xì)內(nèi)容請(qǐng)下載:

http://www.ihrv.cn/resource/share/2000006847


作者信息:

馬琰1,2,蘇馬婧1,2,姚旺君1,2,權(quán)曉文3,劉紅1,2

(1.中國(guó)信息安全研究院有限公司,北京 102200;

2.華北計(jì)算機(jī)系統(tǒng)工程研究所,北京 100083;

3.遠(yuǎn)江盛邦(北京)網(wǎng)絡(luò)安全科技股份有限公司,北京 100084)


subscribe.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容