Sentiment analysis
===================
.. highlight:: Python
In this tutorial, you will learn:
* The basics of sentiment analysis
* How to collect tweets
* How to collect financial news headlines
* What are the common ways of analysing sentiment
* How to measure the accuracy of the sentiment prediction
Intro to sentiment analysis
---------------------------
| As we have discussed in the Introduction part, sentiment analysis is a natural language processing technique that is used to
determine whether a statement contains positive, negative or neutral sentiment.
In this tutorial, we aim to analyse the daily sentiment of a stock with the use of relevant news headlines and tweets,
and thus to find out the market sentiment.
Collection of tweets
---------------------
**Apply for developer account from Twitter use Tweepy**
| 1. Click and apply for a developer account through this link: https://developer.twitter.com/en/apply-for-access
| 2. Create a new project and connect it with the developer App in the developer portal
| 3. Enable App permissions (*Read* and *Write*)
| 4. Navigate to the **'Keys and token'** page, save your API key, API secret, Access token and Access secret
**Code example**
::
import tweepy
# do not share the API key in any public platform (e.g github, public website)
consumer_key = API secret
consumer_secret = API secret
access_token = Access token
access_token_secret = Access secret
# authorisation of consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
**Access the relevant tweets using the Twitter API**
| There are different types of API provided by Twitter with various limitations. Please visit this link for further
information: https://developer.twitter.com/en/docs/twitter-api .In the following section, you will learn how to
retrieve tweets from the Twitter timeline, hashtag/cashtag and also stream data that contains real time tweets.
Timeline tweets
^^^^^^^^^^^^^^^^
| Returns the 20 most recent tweets posted from the authenticated user. It is also possible to request another
user's timeline via the the :code:`id` parameter.
| Pass the :code:`user_id` or :code:`screen_name` parameters to access the user-specified tweets. For more information regarding the parameters,
please visit the official documentation: https://docs.tweepy.org/en/v3.5.0/api.html
**Code example**
::
# create an empty list
alltweets = []
# extract data from the API
timeline = api.user_timeline(user_id=userid, count=number_of_tweets)
alltweets.extend(timeline)
with open('%s_tweets.csv' % screen_name, 'a') as f:
writer = csv.writer(f)
for tweet in alltweets:
tweet_text = tweet.text.encode("utf-8")
dates=tweet.created_at
writer.writerow([dates,tweet_text])
Hashtag/Cashtag tweets
^^^^^^^^^^^^^^^^^^^^^^^
| **Cashtag** is a feature on Twitter that allows users retrieve tweets relevant to a particular ticker, say $GOOG, $AAPL or $FB.
Use :code:`tweepy.Cursor()` to access data from hashtag and cashtags.
**Code example**
::
# extract data from the API
hashtags = tweepy.Cursor(api.search, q=name, lang='en', tweet_mode='extended').items(200)
with open('%s_tweets.csv' % screen_name, 'a') as f:
writer = csv.writer(f)
for status in hashtags:
tweet_text = status.full_text
dates = str(status.created_at)[:10]
writer.writerow([dates,tweet_text])
| If you want to collect tweets for a period of time, we could further amend the code snippet in the following way:
::
with open('%s_tweets.csv' % screen_name, 'a') as f:
writer = csv.writer(f)
for status in hashtags:
# Add this line
** if (datetime.datetime.now() - status.created_at).days <= day_required: **
tweet_text = status.full_text
dates = str(status.created_at)[:10]
writer.writerow([dates,tweet_text])
Stream tweets
^^^^^^^^^^^^^^^
| The Twitter streaming API is used to download the tweets in real time. It is useful for obtaining a high volume of
tweets, or for creating a live feed using a site stream. For more information with the API, please visit this link:
https://docs.tweepy.org/en/v3.5.0/streaming_how_to.html.
1. Create a class inheriting from StreamListener
::
# override tweepy.StreamListener
class MyStreamListener(tweepy.StreamListener):
# add logic to the on_staus method
def on_status(self, status):
if (self.tweet_count == self.max_tweets):
return False
# collect tweets
else:
tweet_text = status.text
writer = csv.writer(self.output_file)
writer.writerow([status.created_at,status.extended_tweet['full_text'].encode("utf-8")])
self.tweet_count += 1
# add logic to the initialisation function
def __init__(self, output_file=sys.stdout,input_name=sys.stdout):
super(MyStreamListener,self).__init__()
self.max_tweets = 200
self.tweet_count = 100
self.input_name = input_name
2. Create a stream
::
# add an output_file parameter to store the output tweets
myStreamListener = MyStreamListener(output_file=f, input_name=firm)
myStream = tweepy.Stream(auth=api.auth, tweet_mode='extended', listener=myStreamListener, languages=["en"])
3. Start a stream
::
myStream.filter(track=target_firm)
Collect financial headlines
------------------------------------------
US news headlines
^^^^^^^^^^^^^^^^^^
| `Finviz.com