Analyze Twitter sentiment trend using Python Flair

A reusable python class that can be used to detect historical trend of sentiments from Twitter data. Useful for analyzing brand values over time.

By

There are lots of Twitter sentiment analyzer out there.

But I wanted to make one that is -

  • Reusable 
  • Provides historical data, not just the current snapshot
  • Has a reasonably powerful NLP model for english language sentiment analysis

Ok, so here's what I have got.

  1. I will be using https://docs.tweepy.org/en/stable/ API for querying Twitter.
  2. For sentiment analysis, I will be using https://github.com/flairNLP/flair. Flair has really strong NLP model that works quite well in my tests.

Connecting to Twitter

I am using bearer token based authentication. The token can be obtained from Twitter developer page.

import tweepy
auth = tweepy.OAuth2BearerHandler(bearer_token)
twitter_api = tweepy.API(auth, wait_on_rate_limit=True, 
                         retry_count=10, retry_delay=30)

Querying data for Past dates

I will be using Tweepy's "search_tweets()" method in addition to Cursor() and pass start and end dates of query via 'since' and 'until' query parameters. Also, I will filter out any retweets.

#topic is the topic of interest
topic = "tata motors"

# create query string
query = topic + " -filter:retweets" \
                    + " since: 2022-10-01" \
                    + " until: 2022-10-15"

# fire the query
tweets = tweepy.Cursor(self.twitter_api.search_tweets,
                   q=query, lang='en', count=100,
                   tweet_mode='extended').items()

 Sentiment Analysis

In order to analyze the sentiment, I will be using an english language text classifier.

import flair
sentiment_model = flair.models.TextClassifier.load('en-sentiment')
sentence = flair.data.Sentence(text)
sentiment_model.predict(sentence)

Putting it together

We can put everything in a single Python class as below.

import re
import tweepy
import flair
import datetime

class TwitterSentimentAnalyzer:
    twitter_api = None
    whitespace = re.compile(r"\s+")
    web_address = re.compile(r"(?i)http(s):\/\/[a-z0-9.~_\-\/]+")
    user = re.compile(r"(?i)@[a-z0-9_]+")
    sentiment_model = flair.models.TextClassifier.load('en-sentiment')

    def __init__(self, bearer_token):
        auth = tweepy.OAuth2BearerHandler(bearer_token)
        self.twitter_api = tweepy.API(auth,
                                      wait_on_rate_limit=True,
                                      retry_count=10,
                                      retry_delay=30)

    def __search_twitter_history(self, topic, start_date_str, end_date_str):
        start = datetime.datetime.strptime(start_date_str, '%Y-%m-%d')
        end = datetime.datetime.strptime(end_date_str, '%Y-%m-%d')
        date_obj = start
        results = []
        while(date_obj <= end):
            next_day = date_obj + datetime.timedelta(days=1)
            query = topic + " -filter:retweets" \
                    + " since:" + date_obj.strftime('%Y-%m-%d') \
                    + " until:" + next_day.strftime('%Y-%m-%d')
            tweets = tweepy.Cursor(self.twitter_api.search_tweets,
                                   q=query, lang='en', count=100,
                                   tweet_mode='extended').items()
            results.append({
                'day': date_obj.strftime('%Y-%m-%d'),
                'tweets': tweets
            })
            date_obj = next_day
        return results

    def __clean_tweet(self, text):
        text = self.whitespace.sub(' ', text)
        text = self.web_address.sub('', text)
        text = self.user.sub('', text)
        return text

    def __analyze(self, text):
        sentence = flair.data.Sentence(text)
        self.sentiment_model.predict(sentence)
        return sentence

    def __get_sentiment(self, tweets, dt):
        results = []
        for tweet in tweets:
            if tweet.lang != 'en':
                continue
            cleaned_tweet = self.__clean_tweet(tweet.full_text)
            s = self.__analyze(cleaned_tweet)
            results.append({
                "date": dt,
                "tweet": tweet.full_text,
                "sentiment": s.labels[0].value,
                "score": s.labels[0].score * (1 if s.labels[0].value == 'POSITIVE' else -1),
                "lang": tweet.lang
            })
        return results

    def historical_sentiment(self, term, start_dt, end_dt):
        tweets_across_days = self.__search_twitter_history(term, start_dt, end_dt)
        results = {}
        for day_tweets in tweets_across_days:
            results [day_tweets['day']] = self.__get_sentiment(day_tweets['tweets'], day_tweets['day'])
        return results

How to Use

To use the above class, all you need to do is to initialize the class with your bearer token. And then you can pass any search query with a start and end dates in YYYY-MM-DD format. See example,

# Usage
analyzer = TwitterSentimentAnalyzer(bearer_token)
results = analyzer.historical_sentiment('tata motors', '2022-10-20', '2022-10-29')

# results is a dictionary, where the key of the dictionary is
# the date and the value is an array containing tweets with sentiments.

avg_score = []
for key in results:
    sum = p = n = 0
    # we can iterate through each tweet
    # and check the sentiment score.
    for tweet in results[key]:
        sum = sum + tweet['score'] # add sentiment score
        p = p + (1 if tweet['sentiment'] == 'POSITIVE' else 0) # count positive tweets
        n = n + (1 if tweet['sentiment'] == 'NEGATIVE' else 0) # count negative tweets
    l = len(results[key])
    avg_score.append(sum/l)
    print(key, ", total=", l, ", positive=", p, ", negative=", n, ", sum=", sum, ", avg=", sum/l)

Here are the output

2022-10-20 , total= 20 , positive= 12 , negative= 8 , sum= 3.959374248981476 , avg= 0.1979687124490738
2022-10-21 , total= 105 , positive= 64 , negative= 41 , sum= 19.04798948764801 , avg= 0.1814094236918858
2022-10-22 , total= 84 , positive= 41 , negative= 43 , sum= -1.6084647178649902 , avg= -0.019148389498392742
2022-10-23 , total= 44 , positive= 15 , negative= 29 , sum= -13.905275464057922 , avg= -0.31602898781949823
2022-10-24 , total= 83 , positive= 46 , negative= 37 , sum= 8.504245340824127 , avg= 0.1024607872388449
2022-10-25 , total= 83 , positive= 37 , negative= 46 , sum= -8.486356735229492 , avg= -0.10224526187023485
2022-10-26 , total= 90 , positive= 27 , negative= 63 , sum= -37.23467010259628 , avg= -0.4137185566955143
2022-10-27 , total= 82 , positive= 32 , negative= 50 , sum= -19.580824553966522 , avg= -0.23879054334105515
2022-10-28 , total= 85 , positive= 35 , negative= 50 , sum= -14.03496652841568 , avg= -0.16511725327547858
2022-10-29 , total= 64 , positive= 22 , negative= 42 , sum= -20.99219560623169 , avg= -0.32800305634737015

If you want you can also plot the average of the daily sentiments

Plotting the Sentiment Trend

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
x = [datetime.datetime.strptime(d, '%Y-%m-%d').date() for d in results.keys()]
y = avg_score
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
plt.plot(x,y)
plt.gcf().autofmt_xdate()

And here is the output -

Wrapping Up

The above plot shows how the cumulative sentiments of "tata motors" vary from 2022-10-20 to 2022-10-29 as per the Tweets. A negative value represents an overall negative sentiment and vice versa.

The code can be easily reused for historical sentiment analysis of any other search terms. You can download the full code here https://gist.github.com/akash-mitra/9d2ac17c22c04728f027891cdb648d23 

Terms Privacy Feed