How often do you get the urge to know what others think about you, that is, your actions and preferences? No matter what the number is, you are not alone. Humans as individuals, have always been interested in knowing the opinion of the people around them. Should I talk to her? Is it a good idea to buy that car? Who should I vote? These are only the simplest examples of opinion related questions that arise in our minds every single day. Getting a nod from your best friend, a positive gesture from your dad or a better campaign slogan helps a lot to answer these questions. Similar questions exist not only in our personal lives but in the academia, business and governments as well. The hunting ground for opinions on these questions used to be friends and family, acquaintances, consumer reports et cetera.
But have you ever wondered how this scenario has turned out to be in the post web era? During the last decade, mining for public opinion has gotten a lot organized, thanks to the abundance of information in the web. This is what we call opinion mining, otherwise known as sentiment analysis.
By definition sentiment analysis is the process of extraction and analysis of public opinion from any sort of source material using natural language processing, text analysis and computational linguistics. In essence, we try to get the subjective information from a document, usually written. This can help determining the attitude and emotional state of the author or speaker on some certain topic. The simplest task of sentiment analysis is finding out whether the text has a negative or positive attitude. It can even be translated to good or bad. A little more complex is to rank the attitude in a scale of 1 to 5. This gets way more complicated when emotional states like “happy”, “sad”, or “angry” are added as parameters. Based on this analysis of sentiment can be classified in a few ways:
- Polarity or valence
- Emotional state or feelings
- Subjectivity or objectivity identification
Sentiment analysis draws ideas and techniques from statistics, natural language processing, machine learning and a few more. The massive amount of data found in the web is overwhelming if not sorted out. Statistics helps to get a count of words and sentence patterns to give a overview of the document’s polarity. Natural language processing techniques give more accuracy by breaking the sentences in to tokens with weights. Machine learning refine the result further more by learning from previously analyzed data and modifying the algorithms.
Natural language processing is breaking down sentences into a language’s bare basic components, nouns, verbs, adverbs, pronouns, punctuations etc. By breaking down a sentence we can understand how a sentence is formed, whether it has negated words or not etc.
An example of application of machine learning in sentiment analysis is the use of Support Vector Machine (SVM). This standard machine-learning technique is said to be the most accurate text classifier method. Naive Bayes classifier is another technique, a Bayes theorem based simple probabilistic classifier widely used in spam filtering, sorting emails, language detection and sentiment detection.
There are many examples of sentiment analysis in action in the internet. For example Sentdex goes through thousands of news articles, analyses words and brand names and provides infographics on the polarity of them.