Towards improved saudi dialectal Arabic stemming
Abstract
Arabic is a highly inflected language with a complex morphological structure compared to English, which makes the development of stemming mechanisms difficult. Stemming is even more challenging for dialectal Arabic, as it does not follow standard Arabic grammar rules for term conflation and attaching affixes to words. In this paper, we propose a new stemmer that integrates two techniques to stem Saudi words: The Information Science Research Institute (ISRI) Arabic stemmer and a rule-based stemmer developed in-house. Our stemming approach was applied to a corpus collected from Saudi dialect tweets and demonstrated improved stemming accuracy.
Keywords
Pattern matching , Natural language processing , Tools , Manuals , Task analysis , Social networking (online) , Tagging