تصنيف المشاعر في نصوص وسائل التواصل الاجتماعي باللهجة العربية باستخدام تقنيات التعلم العميق ومعالجة اللغة الطبيعية

رسالة ماجستير

اسم الباحث : زكريا حسين علي صالح

اسم المشرف : أ.م.د هبة جبارعبدالواحد

الكلمات المفتاحية :

الكلية : كلية علوم الحاسوب وتكنولوجيا المعلومات

الاختصاص : علوم الحاسوب

سنة نشر البحث : 2025

تحميل الملف : اضغط هنا لتحميل البحث

المشاهدات: 0

الخلاصة

نظرًا للشعبية الواسعة والتأثير المتزايد لوسائل التواصل الاجتماعي، حظي تصنيف المشاعر في النصوص المكتوبة باهتمام كبير، وتم توظيفه في تطبيقات متعددة مثل تحليل الميول، وتنقيب الرأي العام، ومراقبة الصحة النفسية.

تُعَدّ اللغة العربية، وبالأخص اللهجات العربية مثل اللهجة العراقية، من أكثر اللغات تحديًا في مجال المعالجة الحاسوبية بسبب تعقيدها الصرفي، وقلة توافر مجموعات البيانات المعرّفة مسبقًا، والطابع غير الرسمي الذي يغلب على النصوص العربية المكتوبة في منصات التواصل الاجتماعي.

تتمثل قوة نموذج BERT في قدرته على استخراج السياق من النص عبر التمثيل السياقي للكلمات، في حين يتميز نموذج LSTM بفعاليته في التقاط الاعتماديات طويلة المدى في تسلسل النصوص مما يساعد في عملية التصنيف. وبناءً على ذلك، تقترح هذه الدراسة منهجًا هجينًا يجمع بين BERT وLSTM بهدف تحسين تصنيف المشاعر في النصوص العربية على وسائل التواصل الاجتماعي، مع التركيز بشكل خاص على اللهجة العراقية.

تم استخدام ثلاث مجموعات بيانات مصنفة مسبقًا للمشاعر، وهي: (IAEC) للهجة العراقية (ArPanEmo) للهجة السعودية، (AETD) للهجة المصرية.

ولمعالجة الفروقات اللهجية وندرة البيانات، طُبّقت تقنيات مختلفة للمعالجة المسبقة للنصوص مثل: التطبيع، وتحويل الرموز التعبيرية (الإيموجي) إلى نصوص، وزيادة البيانات بالاعتماد على المرادفات.

أظهرت النتائج أن استخدام BERT وحده حقق دقة بلغت %81، بينما ارتفعت الدقة إلى %84 عند اعتماد النموذج الهجين المقترح. وتبين هذه النتائج أن الجمع بين التعلم السياقي والتعلم التسلسلي يحقق أفضل أداء مقارنة بالنماذج السابقة في مجال تحليل المشاعر العربية. وبالتالي، تقدم هذه الدراسة إسهامًا موثوقًا في مجال معالجة اللغة الطبيعية العربية (Arabic NLP)، من خلال اقتراح منهج فعال وقابل للتوسع للكشف عن المشاعر في النصوص اللهجية.

Emotion Classification in Dialectal Arabic Social Media Texts Using Deep Learning and Natural Language Processing Techniques

Abstract

Due to the popularity and consensus on social media, emotion classification in textual data has received considerable attention and has been used in various applications such as sentiment analysis, public opinion mining, and mental health monitoring.

Arabic, and specifically dialectal Arabic, such as Iraqi Arabic, has its own set of unique challenges when it comes to language processing due to morphological complexity, the availability of labeled datasets, and the informal nature of many Arabic-language written forms on social media.

BERT’s strength is extracting context from the text by contextual representation of words, while LSTM capturing long-term dependencies of text sequences which helps in classification, this work proposes a hybrid deep learning approach based on BERT and LSTM to improve emotion classification in Arabic social media texts, specifically in Iraqi dialect.

Three pre-classified datasets were used for emotions, and the datasets used differed in dialects. The Iraqi dialect dataset was chosen, which is Iraqi Arabic Emotion Corpus (IAEC), the Saudi dialect dataset is Arabic Pan-Arab Emotion Dataset (ArPanEmo), and the Egyptian dialect dataset is Arabic Emotions Twitter Dataset (AETD). To remedy dialectal differences and data scarcity, various text preprocessing techniques such as normalization, emoji translation, and synonym-based data augmentation are implemented.

Based on the results we obtained when using BERT in classification, the accuracy rate was 81%, and when using the proposed hybrid model, the accuracy became 84%.

The results show the contribution of these summarized methods of how sequential and contextual learning could be combined to achieve the highest accuracy among the previous Arabic sentiment analysis models. the study presents credible contribution to Arabic natural language processing (NLP) as it proposes a scalable and efficient method for the detection of emotion in dialectal text.