Titre : | Une approcheconnexionniste pour l’intelligence territoriale basée sur les réseaux sociaux |
Auteurs : | Wedjdane Nahili, Auteur ; Khaled Rezeg, Directeur de thèse |
Type de document : | Thése doctorat |
Editeur : | Biskra [Algérie] : Faculté des Sciences Exactes et des Sciences de la Nature et de la Vie, Université Mohamed Khider, 2020 |
Format : | 1 vol. (149 p.) / couv. ill. en coul / 30 cm |
Langues: | Anglais |
Mots-clés: | sentiment analysis,natural language processing,text analytics,text mining,deep learning,convolutional neural networks,IMDb dataset |
Résumé : |
Long before the invention of the Internet, the purchasing process and customer behaviour were supported by the word-of-mouth, as it was the only channel to acquire feedback and customer reviews. In many cases, our buying choices were made with a leap of faith and hope that our purchase turned out to be everything we expected. But with the rise of Web 2.0, customers share information or opinions about products and services, politics, current events online. As result, people and organisations refer to these information to harvest valuable insights and hence, make intelligent decisions. This shared information is a golmine, if leveraged effectively, can provide rich and valuable insights. The problem with this information is that it is informal and unstructured, thus, difficult to assess automatically and in huge volume. Accordingly, these data require appropriate processing to obtain useful information. Sentiment analysis (SA) is used to extract knowledge from online data. Research in the field of SA seek to extract sentiment from textual data. In this thesis, two approaches are provided to conduct sentiment analysis on text. The first one is a lexicon-based approach for multi-class Twitter sentiment analysis by developinga sentiment lexicon specific to the social media domain. The second one is a deep learning approach for binary-class sentiment analysis of reviews by proposing a convolutional neural network (CNN). This research uses universally accessible data, i.e Twitter and movie reviews datasets to evaluate the proposed frameworks for their reliability and validity. Experiments were conducted using the proposed methodologies; firstly, the lexicon-based approach was evaluated on Twitter data. The results show that the developed lexicon is able to capture sentiment intensity and handle social media text. Secondly, the proposed CNN model was trained and tested using the IMDb dataset. For evaluation, accuracy was used. A sizeable performance improvement was reported whereby the proposed network yielded better resultscompared to prior models from the related work. |
Sommaire : |
Ackowledgements ii List of Figures viii List of Tables x Abbreviations xi 1 Introduction 1 1.1 Research background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem statement and research questions . . . . . . . . . . . . . . . . . 4 1.3 Aim and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Literature Review 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 The impact of social networks . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Sentiment Analysis in Social Media . . . . . . . . . . . . . . . . . . . . . 11 2.4 The notion of sentiment analysis . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Depth of sentiment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1 Document level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.2 Sentence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.3 Aspect level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.4 User level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Development of sentiment analysis . . . . . . . . . . . . . . . . . . . . . 17 2.6.1 Text interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6.2 Text annotation for information extraction . . . . . . . . . . . . . 18 2.6.3 Web mining or Text mining . . . . . . . . . . . . . . . . . . . . . . 19 2.7 Approaches of sentiment analysis . . . . . . . . . . . . . . . . . . . . . . 20 2.7.1 Semantic orientation approach . . . . . . . . . . . . . . . . . . . . 22 2.7.2 Corpus-based method for sentiment analysis . . . . . . . . . . . 23 2.7.3 Dictionary-based method for sentiment analysis . . . . . . . . . 25 2.7.4 Linguistic approach for sentiment analysis . . . . . . . . . . . . . 26 2.8 Machine learning approach . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.8.1 Sentiment analysis using supervised learning . . . . . . . . . . . 28 2.8.2 Sentiment analysis using unsupervised learning . . . . . . . . . 29 vContents vi 2.8.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.8.4 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.9 Supervised learning methods for SA . . . . . . . . . . . . . . . . . . . . . 33 2.10 Domains of sentiment analysis . . . . . . . . . . . . . . . . . . . . . . . . 36 2.10.1 Twitter Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 36 2.10.2 Sentiment analysis of reviews (product/movie) . . . . . . . . . . 40 2.11 Comparison of semantic and supervised learning approaches . . . . . 44 2.12 Challenges of research in sentiment analysis . . . . . . . . . . . . . . . 48 2.13 Research gaps in sentiment analysis . . . . . . . . . . . . . . . . . . . . . 49 2.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3 Proposed lexicon-based approach for sentiment analysis of tweets 52 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 The specifics of research on Twitter data . . . . . . . . . . . . . . . . . . 52 3.2.1 Sentiment Analysis of Twitter data . . . . . . . . . . . . . . . . . . 53 3.3 Proposed lexicon-based approach for Twitter SA . . . . . . . . . . . . . 54 3.3.1 Sentiment lexicon development . . . . . . . . . . . . . . . . . . . 55 3.3.1.1 Word collection . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.2 Score value assignment . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.3 Acquisition of single word terms for the lexicon . . . . . . . . . . 59 3.3.4 Acquisition of multi-word terms for the lexicon . . . . . . . . . . 60 3.3.5 Handling intensifiers . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.6 Handling negation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.7 Evaluation of sentiment lexicon . . . . . . . . . . . . . . . . . . . 62 3.3.8 Text segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.9 Polarity calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.10 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4 Implementation and Experimental Results . . . . . . . . . . . . . . . . . 68 3.5 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.6 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.6.1 Twitter API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.7 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.7.1 Standard pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 73 3.7.2 Twitter specific pre-processing . . . . . . . . . . . . . . . . . . . . 74 3.8 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.8.1 Part of speech tagging . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.8.2 TF-IDF (Term Frequency-Inverse Document Frequency) . . . . 76 3.9 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4 Proposed deep learning approach for sentiment analysis of movie reviews 80 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 Deep learning based approach for text analytics . . . . . . . . . . . . . . 80 4.2.1 Proposed emb-CNN model . . . . . . . . . . . . . . . . . . . . . . 81 4.2.1.1 Input layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Contents vii 4.2.1.2 Embedding layer (word2vec) . . . . . . . . . . . . . . . . 83 4.2.1.3 Convolutional layer . . . . . . . . . . . . . . . . . . . . . 84 4.2.1.4 Flatten layer . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.1.5 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.1.6 Fully activated layer (Dense) . . . . . . . . . . . . . . . . 85 4.2.1.7 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3 Implementation and experimental results . . . . . . . . . . . . . . . . . 87 4.3.1 Proposed Model: CNN and word2vec for sentence-level SA . . . 87 4.3.2 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.3 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Experimental results and discussion . . . . . . . . . . . . . . . . . . . . . 93 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 Conclusion 101 5.1 Synopsis of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 Affect of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4 Limitations and Future Research . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . . 106 A Sentiment Lexicon B Publications and Communications Bibliography |
En ligne : | http://thesis.univ-biskra.dz/id/eprint/4993 |
Disponibilité (1)
Cote | Support | Localisation | Statut |
---|---|---|---|
TINF/149 | Théses de doctorat | bibliothèque sciences exactes | Consultable |