首页/文章/ 详情

使用Transformers确定句子之间的相似度

1年前浏览766

1 引言

在上一篇文章中,呈现了使用WMD(Word Moving Distance)---单词移动距离确定句子相似度的方法。这种方法基于Doc2Vec的训练向量,效果比单纯使用余弦计算要好。 这篇文章使用一种新的方法Transformers确定句子之间的相似度。

2 Transformers的工作原理

Transformers首先需要建立model, 于WMD不同的是,这个model不是依靠自身的corpus来训练的,而是基于一些预训练的模型。

    model = SentenceTransformer('my corpus-model')

    Trained on NLI data

      bert-base-nli-mean-tokensbert-large-nli-mean-tokensroberta-base-nli-mean-tokensroberta-large-nli-mean-tokensdistilbert-base-nli-mean-tokens

      Trained on STS data

        bert-base-nli-stsb-mean-tokensbert-large-nli-stsb-mean-tokensroberta-base-nli-stsb-mean-tokensroberta-large-nli-stsb-mean-tokensdistilbert-base-nli-stsb-mean-tokens

        这些模型需要先下载才能使用。

        读取自己的文件:

          with open('corpus-pfc.txt','r', encoding='utf-8') as outfile:    _c = outfile.read()

          转换文本文件成为列表文件

            corpus=[i for i in _c.split('\n')if i != ''and len(i.split(' '))>=4]

            对每一个句子获取矢量

              corpus_embeddings = model.encode(corpus)

              查询语句获取矢量

                queries = ['PFC2D PFC3D slope stability simulation']query_embeddings = model.encode(queries)

                返回相似句子

                  for query, query_embedding in zip(queries, query_embeddings):    distances = scipy.spatial.distance.cdist( \        [query_embedding], corpus_embeddings, "cosine")[0]

                  3 Transformers计算结果

                  使用corpus-pfc.txt(E:\Geotech\mydata)作为corpus, 这个文档是上一篇文章产生的一个经过优化的PFC数据集。查询句子仍然如下:

                    query = 'PFC2D PFC3D slope stability simulation'

                    Top10 相似结果如下:

                    PFC2D PFC3D slope stability simulation (Similarity: 1.00)

                    PFC2D PFC3D slope stability (Similarity: 0.89)

                    slope instability, pfc2d, numerical simulation, parallel bond model, stability analysis (Similarity: 0.84)

                    PFC2D rock slopes stability simulation (Similarity: 0.81)

                    General two-dimensional slope stability analysis (Similarity: 0.81)

                    Then the particle discrete element software PFC2D is used to simulate the stability of slope excavation from the meso-mechanical level. (Similarity: 0.80)

                    "System reliability analysis of slope stability using generalized Subset Simulation". (Similarity: 0.80)

                    Application of distinct element analysis in slope stability problems (Similarity: 0.78)

                    Fluid coupling in PFC2D and PFC3D (Similarity: 0.78)

                    'Then the particle discrete element software PFC2D is used to simulate the stability of slope excavation from the meso-mechanical level.' (Similarity: 0.82) (Similarity: 0.77)

                    使用WMD进行相似查询,得出的Top 10相似结果如下:

                    PFC2D PFC3D slope stability simulation (Similarity: 1.00)

                    PFC2D Simulation on Stability of Loose Deposits Slope in Highway Cutting Excavation (Similarity: 0.99)

                    PFC2D PFC3D slope stability (Similarity: 0.98)

                    The PFC3D simulation platform was employed to calcaulate the single-hole blasting processes with different heights,buried depths and charge amounts in the open mine slope,and the slope stability after blasting was discussed. (Similarity: 0.98)

                    Simulation and analysis of the earthquake stability of the tailing reservoir based on PFC3D (Similarity: 0.98)

                    "Study on the similar materials simulation of the slope stability of the west-l zone in Luming Molybdenum Mine". (Similarity: 0.98)

                    NUMERICAL SIMULATION OF A FILLED SLOPE STABILITY ON SOFT SOIL ROADBED REINFORCED BY GRAVEL PILE USING PFC2D (Similarity: 0.98)

                    'DEM simulation pfc2d slope', (Similarity: 0.84) (Similarity: 0.98)

                    "The Numerical Simulation on the Stability of Steep Rock Slope by DDA". (Similarity: 0.97)

                    We show by simulation that the proposed robot model can walk down a slope passively and also verify the stability of this walking by calculating the eigenvalues of the Jacobian of the Poincare map. (Similarity: 0.97)

                    二者比较,可以发现,Transformers的结果更好一些。

                    4 Transformers聚类

                    Transformers能够实现聚类,通过输入sklearn模块:

                      from sklearn.cluster import KMeans

                      下面是聚类后其中的一个结果,通过词频统计我们发现这个聚类的主题是 "Failure"。 聚类能帮助我们集中关注某一类论题。

                      '3-D Granular Simulation on the Process of Slope Failure and Collapse', 

                      'Failure process simulation of sliding unstable rock based on PFC2D', 

                      'rock slope; step-path failure; rock bridge; slope stability; PFC2D', 

                      'Similar to slope stability failure', 

                      'The effect of discontinuity Persistence an Rock Slope Stability', 

                      'slope stability 1; wedge failure', 

                      'Jointed rock slope Step-path failure Rock bridges Slope stability PFC', 

                      'rock slope step-path failure rock bridge slope stability after blasting was discussed.'

                      5 结束语

                      本文使用Transformers确定句子之间的相似度。结果发现,Transformers得出的结果优于WMD得出的结果,同时,Transformers的聚类能帮助我们集中关注某一类论题。今后将继续开发Tranformers的功能。

                      本文相似文档

                      来源:计算岩土力学
                      SystemUMPFC
                      著作权归作者所有,欢迎分享,未经许可,不得转载
                      首次发布时间:2022-09-28
                      最近编辑:1年前
                      计算岩土力学
                      传播岩土工程教育理念、工程分析...
                      获赞 119粉丝 873文章 1732课程 0
                      点赞
                      收藏
                      未登录
                      还没有评论

                      课程
                      培训
                      服务
                      行家

                      VIP会员 学习 福利任务 兑换礼品
                      下载APP
                      联系我们
                      帮助与反馈