Detecting Semantic-based Similarity between Verses of the Quran with Doc2vec
Abstract
Semantic similarity analysis in natural language texts is getting great attention recently. Semantic analysis of the Quran is especially challenging because it is not simply factual but encodes subtle religious meanings. Investigating similarity and relatedness between the Quranic verses is a hot topic and can promote the acquisition of the underlying knowledge. Therefore, we use an NPL method to detect the semantic-based similarity between the verses of the Quran. The idea is to exploit the distributed representation of text, to learn an informative representation of the Quran’s passages. We map the Arabic Quranic verses to numerical vectors that encode the semantic properties of the text. We then measure similarity among those vectors. The performance of our model is judged through cosine similarity between our assigned semantic similarity scores and annotated textual similarity datasets. Our model scored 76% accuracy on detecting the similarity, and it can act as a basis for potential experiments and research.