Sentiment analysis

Sentiment analysis: Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgement or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).

Contents

1 Subtasks

2 Methods

3 Evaluation

4 Sentiment analysis and Web 2.0

5 See also

6 References

7 Further reading

Subtasks

A basic task in sentiment analysis^[1] is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy."

Early work in that area includes Turney ^[2] and Pang ^[3] who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang ^[4] and Snyder ^[5] (among others):^[4] expanded the basic task of classifying a movie review as either positive or negative to predicting star ratings on either a 3 or a 4 star scale, while Snyder ^[5] performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral or positive sentiment with them are given an associated number on a -5 to +5 scale (most negative up to most positive) and when a piece of unstructured text is analyzed using natural language processing, the subsequent concepts are analyzed for an understanding of these words and how they relate to the concept^{[citation needed]}. Each concept is then given a score based on the way sentiment words relate to the concept, and their associated score. This allows movement to a more sophisticated understanding of sentiment based on an 11 point scale. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text ^[6].

Another research direction is subjectivity/objectivity identification. This task is commonly ^[7] defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification ^[8]: the subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su ,^[9] results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang ^[10] showed that removing objective sentences from a document before classifying its polarity helped improve performance.

The more fine-grained analysis model is called the feature/aspect-based sentiment analysis .^[11] It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank. A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, or the picture quality of a camera. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral .^[12] More detailed discussions about this level of sentiment analysis can be found in Liu's NLP Handbook chapter, "Sentiment Analysis and Subjectivity" .^[13]

Methods

Computers can perform automated sentiment analysis of digital texts, using elements from machine learning such as latent semantic analysis, support vector machines, "bag of words" and Semantic Orientation — Pointwise Mutual Information (See Peter Turney's ^[2] work in this area). More sophisticated methods try to detect the holder of a sentiment (i.e. the person who maintains that affective state) and the target (i.e. the named entity or target whose affective state one is interested in).^[14] To mine the opinion in context and get the feature which has been opinionated, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.^[15]

In sentic computing,^[16] a multi-disciplinary approach to opinion mining and sentiment analysis, text processing is not based on statistical learning models but rather on common sense reasoning tools and affective ontologies. Differently from statistical classification, which generally requires large inputs and thus cannot appraise texts with satisfactory granularity, sentic computing enables the analysis of documents not only on the page- or paragraph-level but also on the sentence- and clause-level.

Open source software tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.^[15]

Evaluation

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by precision and recall. However, human raters typically agree about 70%^{[citation needed]} of the time (see Inter-rater reliability). Thus, a 70% accurate program is doing as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 30% of the time, since they disagree that much about any answer. More sophisticated measures can be applied, but evaluation of sentiment analysis systems remains a complex matter. For sentiment analysis tasks returning a scale rather than a binary judgement, correlation is a better measure than precision because it takes into account how close the predicted value is to the target value.

Sentiment analysis was used to test the relationship between Internet financial message boards and the behavior of the stock market to find a strong correlation between posts and volume of stock ^[17].

Sentiment analysis and Web 2.0

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.^[18] If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.^[19]

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis.^[20] The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.^[21] Sentiment analysis could therefore help understand why certain e-communities die or fade away (e.g., MySpace) while others seem to grow without limits (e.g., Facebook).

Sentiment analysis (together with opinion mining) is becoming a promising topic in the field of CRM 2.0 as well. As a direct consequence of the concept of Web 2.0, CRM 2.0 refers to all CRM solutions where the customer engages with the products/services provided by the enterprise. This way, customer profiling becomes more effective and enterprises can move towards one-to-one marketing. In this perspective, social media are an important source of information for enterprises: the word-of-mouth effect can be highly positive or highly negative, as far as prospect customers' sentiment towards brands and products. Thus, it is clear that sentiment analysis and opinion mining will shortly become a key component of modern and more innovative CRM solutions [1].

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.^[18] The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

Sentiment analysis on the web has also been the subject of art. Artist Jonathan Harris' We Feel Fine project is an example of the depiction of emotions across the blogosphere, which uses many of the same techniques involved in the commercial application of sentiment analysis.

See also

Grammatical polarity

Sentic computing

Web Bot

References

^ Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink, http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_it, retrieved 2010-03-12.

^ ^a ^b Peter Turney (2002). "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews". Proceedings of the Association for Computational Linguistics. pp. 417–424. arXiv:cs.LG/0212032.

^ Bo Pang; Lillian Lee and Shivakumar Vaithyanathan (2002). "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 79–86. http://www.cs.cornell.edu/home/llee/papers/sentiment.home.html.

^ ^a ^b Bo Pang; Lillian Lee (2005). "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales". Proceedings of the Association for Computational Linguistics (ACL). pp. 115–124. http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.home.html.

^ ^a ^b Benjamin Snyder; Regina Barzilay (2007). "Multiple Aspect Ranking using the Good Grief Algorithm". Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). pp. 300–307. http://people.csail.mit.edu/regina/my_papers/ggranker.ps.

^ Thelwall, Mike; Buckley, Kevan; Paltoglou, Georgios; Cai, Di; Kappas, Arvid (2010). "Sentiment strength detection in short informal text". Journal of the American Society for Information Science and Technology 61 (12): 2544–2558. http://www.scit.wlv.ac.uk/~cm1993/papers/SentiStrengthPreprint.doc.

^ Pang, Bo; Lee, Lillian (2008). "4.1.2 Subjectivity Detection and Opinion Identification". Opinion Mining and Sentiment Analysis. Now Publishers Inc. http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html.

^ Rada Mihalcea; Carmen Banea and Janyce Wiebe (2007). "Learning Multilingual Subjective Language via Cross-Lingual Projections". Proceedings of the Association for Computational Linguistics (ACL). pp. 976–983. http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf.

^ Fangzhong Su; Katja Markert (2008). "From Words to Senses: a Case Study in Subjectivity Recognition". Proceedings of Coling 2008, Manchester, UK. http://www.comp.leeds.ac.uk/markert/Papers/Coling2008.pdf.

^ Bo Pang; Lillian Lee (2004). "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts". Proceedings of the Association for Computational Linguistics (ACL). pp. 271–278. http://www.cs.cornell.edu/home/llee/papers/cutsent.home.html.

^ Minqing Hu; Bing Liu (2004). "Mining and Summarizing Customer Reviews". Proceedings of KDD 2004.. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.

^ Bing Liu; Minqing Hu and Junsheng Cheng (2005). "Opinion Observer: Analyzing and Comparing Opinions on the Web". Proceedings of WWW 2005.. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.

^ Bing Liu (2010). "Sentiment Analysis and Subjectivity". Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010.. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.

^ Kim, S.M. & Hovy, E.H. (2006). "Identifying and Analyzing Judgment Opinions.". Proceedings of the Human Language Technology / North American Association of Computational Linguistics conference (HLT-NAACL 2006). New York, NY.. http://acl.ldc.upenn.edu/P/P06/P06-2063.pdf.

^ ^a ^b Lipika Dey , S K Mirajul Haque (2008). "Opinion Mining from Noisy Text Data". Proceedings of the second workshop on Analytics for noisy unstructured text data, p.83-90. http://portal.acm.org/citation.cfm?id=1390763&dl=GUIDE&coll=GUIDE&CFID=92244761&CFTOKEN=30578437.

^ Erik Cambria, Amir Hussain, Catherine Havasi, Chris Eckl (2010). "Sentic Computing: Exploitation of Common Sense for the Development of Emotion-Sensitive Systems". LNCS, vol. 5967, pp. 148-156. Springer-Verlag, Berlin Heidelberg. http://springerlink.com/content/9305u22257427j24.

^ Papadopoulos, E (2001), The relationship between the Internet financial message boards and the behavior of the stock market, ProQuest, http://sunzi.lib.hku.hk/ER/detail/hkul/3067352, retrieved 2011-09-23.

^ ^a ^b Wright, Alex. "Mining the Web for Feelings, Not Facts", New York Times, 2009-08-23. Retrieved on 2009-10-01.

^ Kirkpatrick, Marshall. ", ReadWriteWeb, 2009-04-15. Retrieved on 2009-10-01.

^ CORDIS. "Collective emotions in cyberspace (CYBEREMOTIONS)", European Commission, 2009-02-03. Retrieved on 2010-12-13.

^ Condliffe, Jamie. "Flaming drives online social networks ", NewScientist, 2010-12-07. Retrieved on 2010-12-13.

Further reading

Cambria, E., Song, Y., Wang, H., Hussain, A. (2011) "Isanette: A Common and Common Sense Knowledge Base for Opinion Mining". Proceedings of ICDM11

Bodendorf, F., Kaiser, C. (2010). "Mining Customer Opinions on the Internet - A Case Study in the Automotive Industry". 2010 Third International Conference on Knowledge Discovery and Data Mining, pp. 24-27.

Cambria, E., Speer, R., Havasi, C., Hussain, A. (2010) "SenticNet: a Publicly Available Semantic Resource for Opinion Mining". Proceedings of AAAI CSK

Dave, K., Lawrence, S., Pennock, D. M. (2003). "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews". Proceedings of the Twelfth International Conference on World Wide Web. ACM Press, Budapest, pp. 519–528.

Ganapathibhotla, M., Liu, B. (2008). "Mining Opinions in Comparative Sentences". Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 241-248.

Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., Tomokiyo, T. (2005). "Deriving Marketing Intelligence from Online Discussion" Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, Chicago, pp. 419-428.

Kaiser, C. (2009). "Combining Text and Data Mining for Gaining Valuable Knowledge from Online Reviews". IADIS International Journal on WWW/Internet 7 (1), pp. 63-78.

Kim, S.-M., Hovy, E., (2007). "Crystal: Analysing Predictive Opinions on the Web". Proceedings of the 2007 Joint Conference on the Empirical Methods Natural Language Processing and Computational Natural Language Learning, Prague, pp. 1056-1064.

Popescu, A. M., Etzioni, O. (2007). "Extracting Product Features and Opinions from Reviews". Kao, A., Poteet, S. R. (Ed.), Natural Language Processing and Text Mining. London: Springer, pp. 9-28.

Categories:
Natural language processing

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

sentiment analysis — UK [ˈsentɪmənt əˌnæləsɪs] / US noun [uncountable] the process of using a computer to automatically examine a piece of writing in order to find out the writer s opinions or feelings about something … English dictionary
Sentiment Detection — (auch Sentimentanalyse, englisch für „Stimmungserkennung“) ist ein Untergebiet des Text Mining und bezeichnet die automatische Auswertung von Texten mit dem Ziel, eine geäußerte Haltung als positiv oder negativ zu erkennen. Inhaltsverzeichnis 1… … Deutsch Wikipedia
Sentiment — can refer to:*Feelings and emotions*Sentimentality, the literary device which is used to induce an emotional response disproportionate to the situation, and thus to substitute heightened and generally unthinking feeling for normal ethical and… … Wikipedia
Technical analysis — Financial markets Public market Exchange Securities Bond market Fixed income Corporate bond Government bond Municipal bond … Wikipedia
Market sentiment — is the general prevailing attitude of investors as to anticipated price development in a market. This attitude is the accumulation of a variety of fundamental and technical factors, including price history, economic reports, seasonal factors, and … Wikipedia
Anti-French sentiment in the United States — is the manifestation of Francophobia by Americans. It signifies a consistent hostility towards the government, culture, and people of France that employs stereotypes.Understanding anti French sentimentsAs with any foreign country phobia ,… … Wikipedia
Anti-Quebec sentiment — Part of a series on Discrimination General forms … Wikipedia
Fundamental analysis — of a business involves analyzing its financial statements and health, its management and competitive advantages, and its competitors and markets. The term is used to distinguish such analysis from other types of investment analysis, such as… … Wikipedia
Anti-Russian sentiment — covers a wide spectrum of prejudices, dislikes or fears of Russia, Russians, or Russian culture, including Russophobia. In modern international politics the term Russophobia is also used more specifically to describe clichés preserved from the… … Wikipedia
Anti-Iranian sentiment — (ایرانی ستیزی also ایران ستیزی) refers to feelings of hostility, hatred, or prejudice towards Iran, its citizens, or Iranian peoples.By Arabs Ajam According to Encyclopedia Iranica, the word ajam , in Arabic is applied especially to Persians and… … Wikipedia

Academic Dictionaries and Encyclopedias

Sentiment analysis

Contents

Subtasks

Methods

Evaluation

Sentiment analysis and Web 2.0

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Sentiment analysis

Contents

Subtasks

Methods

Evaluation

Sentiment analysis and Web 2.0

See also

References

Further reading

Look at other dictionaries:

Share the article and excerpts

Direct link