تصنيف التنمر الالكتروني واكتشافه في تويتر بإستخدام تقنيات تنقيب البيانات

رسالة ماجستير

اسم الباحث : فاطمة نادي علي حسين

اسم المشرف : هبة جبار العقابي

الكلمات المفتاحية :

الكلية : كلية علوم الحاسوب وتكنولوجيا المعلومات

الاختصاص : علوم الحاسوب

سنة نشر البحث : 2024

تحميل الملف : اضغط هنا لتحميل البحث

المشاهدات: 21

الخلاصة

أدى النمو السريع لوسائل التواصل الاجتماعي إلى ظهور أشكال جديدة من التنمر الالكتروني. أصبحت منصات التواصل الاجتماعي مثل فيسبوك وتويتر ويوتيوب مصدر قلق كبير للأفراد والمنظمات والمجتمع ككل. يعد الكشف المبكر عن التسلط عبر الإنترنت واعتراضه أمرا بالغ الأهمية للتخفيف من آثاره الضارة.

تضمن النظام المقترح نموذجين. تضمن النموذج الأول مجموعتين من البيانات متعددة التصنيفات وعمل مع التنقيب عن النصوص لتصنيف التغريدات إلى تصنيفات متعددة باستخدام تقنيات مختلفة. استخدم النموذج الثاني تحليل الشبكة الاجتماعية social network analysis (SNA) للكشف عن المستخدمين المؤثرين الذين نشروا التنمر في المجتمعات ومحتوى التنمر المرتبط به.

في النموذج الأول ، العديد من التقنيات المستخدمة في خطوة استخراج الميزات هي TF-IDF مع Bow و Word2Vec للتصنيف ، يتم استخدام أربعة خوارزميات التعلم الآلي ، Random Forest (RF), Support Vector Machine (SVM), K-nearest neighbors (KNN), and Naïve Bayes (NB). واستخدم النموذج الثاني ثلاثة مقاييس مركزية centrality measures: degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC).

أظهرت نتائج النموذج الأول فعالية مجموعة البيانات الأولى، “Cyberbullying Classification Dataset” ، دقة ومعدلات الدقة 93 ٪ و 87 ٪ على التوالي. بينما حصلت مجموعة البيانات الثانية، “Cyber bullying Types Dataset” ، على نتائج بدقة ومعدلات دقة بلغت 89 ٪ و 90 ٪ على التوالي ، أدت هذه النتائج إلى اختيار” Cyberbullying Classification Dataset ” كبيانات مناسبة لتحليل الشبكات الاجتماعية . social network analysis (SNA)

استنتجت الدراسة بأن كشف social network analysis (SNA) عن رؤى قيمة حول اكتشاف التسلط عبر الإنترنت ، مع التركيز بشكل خاص على الإشارات المتكررة للمستخدمين (user mentions) (المستخدمين المؤثرين) والمقاييس المركزية العالية (centrality measures) كمؤشرات موثوقة. وأن استقرار علامات التصنيف (hashtags) بمرور الوقت أيضا دورا مهما في تحديد المحتوى المرتبط بالتنمر.

مرتبط

Rp-CYBERBULLYING CLASSIFICATION AND DETECTION IN TWITTER USING DATA MINING TECHNIQUES

Abstract

The rapid growth of social media has given rise to new forms of bullying. Facebook, Twitter, and YouTube platforms have become a significant concern for individuals, organizations, and society as a whole. The early detection and intervention of cyberbullying on social media are critical to mitigating its harmful effects.
The proposed system involved two models. The first model included two multi-classification datasets and worked with text mining to classify the tweets into multi classes using different techniques. The second model utilized social network analysis (SNA) to detect the influential users that disseminated the bullying in communities and the bullying content associated with it.
In the first model, several techniques used in the feature extraction step are TF-IDF with Bow and Word2Vec. For the classification, four supervised machine learning algorithms, Random Forest (RF), Support Vector Machine (SVM), K-nearest neighbors (KNN), and Naïve Bayes (NB), are utilized. The second model used three centrality measures: degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC).
The results of the first model demonstrated the effectiveness of the first dataset, the “Cyberbullying Classification Dataset,” with result accuracy and precision rates of 93% and 87%, respectively, with minimal computational time. While the second dataset, “Cyberbullying Types Dataset,” got results with accuracy and precision rates of 89% and 90%, respectively, these results led us to select the “Cyberbullying Classification Dataset” as a suitable candidate for Social Network Analysis (SNA).
In conclusion, SNA revealed valuable insights into cyberbullying detection, with a particular focus on frequent user mentions (influential users) and high centrality measures as reliable indicators. The stability of hashtags over time also played a critical role in identifying problematic content.