Analyzing Real Time Tweets with Tweepy

Ipshita
4 min readDec 18, 2020

With the introduction of the Farmers Bill, there has been a stir in India. With rising protest in different parts of India, the discussion around this is impossible to avoid. However, this has been a divisive and polarizing topic dividing the public into two different groups.

In this project, we will try to understand the sentiment of the general public regarding the bill. This task has been done with the help of Twitter and its user’s opinion.

Data

The data was extracted from twitter. To do this, we first need to authorize our API. You can get your twitter API, from twitter developer account, by filling in a form.

Tweepy was used to extract 3000 tweets with the search terms, “#FarmerBill2020” , “#FarmersProtests”, ”#FarmersBill2020" , “#agriculturebill2020”.

It was done in the following way,

tweepy.Cursor is used to extract tweets based on certain terms. There are other ways extract, like extracting tweets from certain users.

After clearing duplicate tweets, around 1200 tweets remained.

The next step is to clean the data. The tweets collected have a lot of links, words like RT(retweet) and so on.

This piece of code will

  1. Remove RT
  2. Remove @ some text
  3. Remove links
  4. Remove punctuations and special characters

Sentiment Labelling

The next step was to label the sentiment of each tweet. For labelling the sentiment, VADEr was used

This gave the rows positive, negative and neutral scores.

To label the rows, map function was used,

On doing this, we got labels for each tweet.

Exploratory Data Analysis

The first thing that was done, was to see the number of tweets in different categories.

Barplot showing sentiment distribution

A pie plot to understand the percentage difference

pie plot showing sentiment distribution

Another way to understand sentiments, is using the polarity plot

polarity plot

Word clouds to show common words from different sentiments were made.

To make a word cloud, first all the related sentiment were stored in another data frame and then the word cloud was made.

Conclusion

The use of social media such as Twitter has been such an essential tool for capturing public opinion on any topics. This is a piece of vital information for companies, policymakers and government alike. We were able to use Twitter to capture public opinion around farmer bill and highlight the key issues and themes around each sentiment — positive, negative and neutral. This can be significant for the government to understand where the problem lies and facilitate better conversation.

In this case study, we could see that the sentiments were near to negative in the majority of the tweets. The words that appeared in the neutral segment were divided into both positive and negative words. This analysis showed us that there is a fair share of both sentiments in the case of the bill. Public sentiment was high when the bill was introduced but kept coming down after that. There were ups and downs in the sentiments. At later stages, the sentiments were going up and down in a crazy manner. From the polarity graph, we could see that the polarity was maximum for the range -0.25 to 0.0, which can be said to be negative tends to neutral. By analysing the tweets of the last few days, it can be concluded that the sentiment of people has been negative majorly.

Furthermore, it also brought forward the need for better social media analyser, as most of the positive encoded tweets were actually negative, and vice versa.

Future Work

A better NLP model dealing with social media text, would be more beneficial, as we could see that the tweets that were said to be of a certain type were not actually of that type.

Another place of improvement is in the data cleaning. There were lot of stopwords that kept invading the tweets, a better wat to remove stopwords and punctuations would have been better.

Thankyou

Photo by Kelly Sikkema on Unsplash

--

--