Skip to content

Twitter sentiment classification - Part 2

Published:

This is the part 2 of a se­ries, please read part 1 be­fore read­ing this.

We’ll use Re­cur­rent Neural Net­works to clas­sify the Sen­ti­ment140 dataset into pos­i­tive or neg­a­tive tweets. Pre­vi­ously, we’ve used a Bag of Words fol­lowed by a lo­gis­tic re­gres­sion clas­si­fier. This ap­proach, how­ever, com­pletely ig­nores the se­man­tic re­la­tion­ship be­tween words as it only con­sider the count of each word in a tweet. Thus, we aim at con­sid­er­ing the po­si­tion of words and its com­plex re­la­tion­ships to achieve bet­ter clas­si­fi­ca­tion.

Pre-​processing

The exact same pre-​processing steps will be used:

import re
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

data = pd.read_csv('training.1600000.processed.noemoticon.csv', encoding = 'ISO-8859-1', header = None)
data.columns = ['sentiment','id','date','flag','user','tweet']
def preprocess_tweets(tweet):
    tweet = re.sub(r"([A-Z]+\s?[A-Z]+[^a-z0-9\W]\b)", r"\1 <ALLCAPS> ", tweet)
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','<URL> ', tweet)
    tweet = re.sub(r"/"," / ", tweet)
    tweet = re.sub('@[^\s]+', "<USER>", tweet)
    tweet = re.sub('[^A-Za-z0-9<>/.!,?\s]+', '', tweet)
    tweet = re.sub('(([!])\\2+)', '! <REPEAT> ', tweet)
    tweet = re.sub('(([?])\\2+)', '? <REPEAT> ', tweet)
    tweet = re.sub('(([.])\\2+)', '. <REPEAT> ', tweet)
    tweet = re.sub(r'#([^\s]+)', r'<HASHTAG> \1', tweet)
    tweet = re.sub(r'(.)\1{2,}\b', r'\1 <ELONG> ', tweet)
    tweet = re.sub(r'(.)\1{2,}', r'\1)', tweet)
    tweet = re.sub(r"'ll", " will", tweet)
    tweet = re.sub(r"'s", " is", tweet)
    tweet = re.sub(r"'d", " d", tweet) # Would/Had ambiguity
    tweet = re.sub(r"'re", " are", tweet)
    tweet = re.sub(r"didn't", "did not", tweet)
    tweet = re.sub(r"couldn't", "could not", tweet)
    tweet = re.sub(r"can't", "cannot", tweet)
    tweet = re.sub(r"doesn't", "does not", tweet)
    tweet = re.sub(r"don't", "do not", tweet)
    tweet = re.sub(r"hasn't", "has not", tweet)
    tweet = re.sub(r"'ve", " have", tweet)
    tweet = re.sub(r"shouldn't", "should not", tweet)
    tweet = re.sub(r"wasn't", "was not", tweet)
    tweet = re.sub(r"weren't", "were not", tweet)
    tweet = re.sub('[\s]+', ' ', tweet)
    tweet = tweet.lower()

    return tweet

This time, how­ever, we’ll split the data into train, test, and val­i­da­tion sets. The val­i­da­tion set is used to mon­i­tor over­fit­ting dur­ing train­ing. The test set should be left un­touched and un­seen until its eval­u­a­tion. It may sound counter-​intuitive, but merely tweak­ing the model ac­cord­ing to the val­i­da­tion set per­for­mance may pro­duce in­di­rect over­fit­ting (in­di­rect as the model never sees any of the val­i­da­tion data). Thus, jeop­ar­diz­ing its gen­er­al­iza­tion ca­pa­bil­ity (that is, its abil­ity to per­form well in data other than what it was trained on).

from sklearn.model_selection import train_test_split

train_data, test_data = train_test_split(data, train_size = 0.8, random_state = 42)

sentiment = np.array(data['sentiment'])
tweets = np.array(data['tweet'].apply(preprocess_tweets))

sentiment_train = np.array(train_data['sentiment'])
tweets_train = np.array(train_data['tweet'].apply(preprocess_tweets))

sentiment_test = np.array(test_data['sentiment'])
tweets_test = np.array(test_data['tweet'].apply(preprocess_tweets))

train_data, val_data = train_test_split(train_data, train_size = 0.9, random_state = 42)

sentiment_train = np.array(train_data['sentiment'])
tweets_train = np.array(train_data['tweet'].apply(preprocess_tweets))

sentiment_val = np.array(val_data['sentiment'])
tweets_val = np.array(val_data['tweet'].apply(preprocess_tweets))

Just like in the pre­vi­ous post, we’ll count word oc­cur­rences and es­tab­lish a rea­son­able thresh­old. Words below this thresh­old will be re­placed by the OUT tag. This way, we limit model com­plex­ity while re­tain­ing most in­for­ma­tion (more than 95% of word oc­cur­rences). Next, we build a word2int dic­tio­nary that as­signs an in­te­ger value to each word in our dic­tio­nary. PAD, OUT, EOS and SOS to­kens are also in­cluded in the dic­tio­nary. How­ever, the end-​of-​sentence (EOS) and start-​of-​sentence (SOS) ended up not being used on this model.

word2count = {}
for tweet in tweets:
    for word in re.findall(r"[\w']+|[.,!?]", tweet):
        if word not in word2count:
            word2count[word] = 1
        else:
            word2count[word] += 1

total_count = np.array(list(word2count.values()))

print(sum(total_count[total_count > 75]) / sum(total_count))

threshold = 75
words2int = {}
word_number = 0
for word, count in word2count.items():
    if count > threshold:
        words2int[word] = word_number
        word_number += 1

tokens = ['<PAD>', '<OUT>', '<EOS>', '<SOS>']

for token in tokens:
    words2int[token] = len(words2int) + 1

int2word = {w_i: w for w, w_i in words2int.items()}

print(len(words2int))
0.9551287692699321
9983

Thus, our final dic­tio­nary con­tains 9983 unique en­tries. Let’s con­vert all of our tweets into se­ries of in­te­gers ac­cord­ing to our dic­tio­nary:

tweets_train_int = []

for _tweet in tweets_train:
    ints = []
    for word in re.findall(r"[\w']+|[.,!?]", _tweet):
        if word not in words2int:
            ints.append(words2int['<OUT>'])
        else:
            ints.append(words2int[word])
    tweets_train_int.append(ints)

tweets_val_int = []

for _tweet in tweets_val:
    ints = []
    for word in re.findall(r"[\w']+|[.,!?]", _tweet):
        if word not in words2int:
            ints.append(words2int['<OUT>'])
        else:
            ints.append(words2int[word])
    tweets_val_int.append(ints)

tweets_test_int = []

for _tweet in tweets_test:
    ints = []
    for word in re.findall(r"[\w']+|[.,!?]", _tweet):
        if word not in words2int:
            ints.append(words2int['<OUT>'])
        else:
            ints.append(words2int[word])
    tweets_test_int.append(ints)


tweets_int = tweets_train_int + tweets_val_int + tweets_test_int

Our re­cur­rent neural net­work re­ceive in­puts of fixed length. There­fore, our se­quences will be padded (the PAD token will be added to the be­gin­ning of every tweet until it reaches our fixed length). Again, some tweets are ex­tremely long due to rep­e­ti­tions and ex­ces­sive punc­tu­a­tion. As a rea­son­able length, the 99th per­centile of all lengths was cho­sen, that is, 99% of all tweets will fit in our max­i­mum padding length. Tweets longer than this will be trun­cated at the max­i­mum length.

lens = []
for i in tweets_int:
    lens.append(len(i))

max_len = int(np.quantile(lens, 0.99))

print(max_len)
34
from keras.preprocessing.sequence import pad_sequences

pad_tweets_train = pad_sequences(tweets_train_int, maxlen = max_len, value = words2int['<PAD>'])
pad_tweets_val = pad_sequences(tweets_val_int, maxlen = max_len, value = words2int['<PAD>'])
pad_tweets_test = pad_sequences(tweets_test_int, maxlen = max_len, value = words2int['<PAD>'])

sentiment_train[sentiment_train == 4] = 1
sentiment_test[sentiment_test == 4] = 1
sentiment_val[sentiment_val == 4] = 1

Build­ing our Neural Net­work

A re­search paper sug­gests that a com­bi­na­tion of 1D con­vo­lu­tions with re­cur­rent units re­sult in a higher per­for­mance than both of these alone. Thus, we built a neural net­work with an ar­chi­tec­ture in­spired by this re­search paper.

Neural net­works can­not make sense of the dictionary-​labeled in­te­gers for each word, so a one-​hot-​encoded vec­tor is passed as its input. That is, each word be­comes a 9983-​dimensional vec­tor with all val­ues set to 0 ex­cept one: the cor­re­spond­ing word is the set to one. This data struc­ture is ex­tremely sparse and would re­quire a stag­ger­ing amount of train­able pa­ra­me­ters. Thus, a word em­bed­ding is in­cluded as a first layer to the model.

An em­bed­ding is a form of di­men­sion­al­ity re­duc­tion. In the one-​hot-​encoded vec­tor, every word is in­de­pen­dent of all oth­ers, as each word has a sin­gle ex­clu­sive di­men­sion for rep­re­sen­ta­tion. In an em­bed­ding, each word is rep­re­sented as a vec­tor in a n-​dimensional space, where n is much smaller than the num­ber of words. This way, words are de­pen­dent on each other and, in a good em­bed­ding, se­man­ti­cally sim­i­lar words lay closer in the em­bed­ding space. In our model, the em­bed­ding will be 200-​dimensional. Learn­ing how to embed words is not a sim­ple task and many mod­els use pre-​trained em­bed­dings. How­ever, as our data con­sists of tweets, which con­tain many typos and id­ioms, I first wanted to use an un­trained em­bed­ding, which is trained si­mul­ta­ne­ously with the model.

After the em­bed­ding layer, two par­al­lel net­works exist: one is a se­ries of three 1D con­vo­lu­tions that can mas­ter re­la­tion­ships be­tween ad­ja­cent words (re­mem­ber that the words will be rep­re­sented as a vec­tor con­tain­ing its mean­ing lay­ing in a 200-​dimensional space); the other path is a two-​layered re­cur­rent neural net­work com­posed of GRU units. GRUs are fairly re­cent, they are eas­ier to train than the clas­sic LSTM units (as they have fewer pa­ra­me­ters) and often re­sult in sim­i­lar per­for­mance. Thus, I wanted to give the still young GRU a try.

A quite large dropout (0.5 rate) is used after the GRU lay­ers as re­cur­rent neural net­works eas­ily over­fit. This rate is still smaller than the one sug­gested on the paper (0.7), al­though they used LSTM units.

Model ar­chi­tec­ture

Model architecture with parallel convolutional and RNN paths

Keras func­tional API must be used due to par­al­lel lay­ers. The re­sult of both 1D con­vo­lu­tions and GRUs is con­cate­nated and fol­lowed by a 1-unit out­put dense layer. The Adam op­ti­mizer with soft weight decay will be used.

import keras
from keras.layers import Dense, Dropout, Conv1D, GRU, Embedding, Activation,\
BatchNormalization, concatenate, Input, GlobalAveragePooling1D
from keras.optimizers import Adam
from keras.models import Model

inputs = Input(shape = (34,), dtype = 'int32')

emb = Embedding(9983, 200, input_length = 34)(inputs)
emb_drop = Dropout(0)(emb)

out1 = Conv1D(128, 3)(emb_drop)
out1 = BatchNormalization()(out1)
out1 = Activation('relu')(out1)
out1 = Conv1D(64, 4)(out1)
out1 = BatchNormalization()(out1)
out1 = Activation('relu')(out1)
out1 = Conv1D(64, 3)(out1)
out1 = BatchNormalization()(out1)
out1 = Activation('relu')(out1)
out1 = GlobalAveragePooling1D()(out1)
out1_drop = Dropout(0)(out1)

out2 = GRU(128, return_sequences = True)(emb_drop)
out2 = GRU(128)(out2)
out2_drop = Dropout(0.5)(out2)

out_main = concatenate([out1_drop, out2_drop])
out_main = Dense(1, activation = 'sigmoid')(out_main)

model = Model(inputs = inputs, outputs = out_main)

early = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=1, verbose=0, mode='auto', baseline=None, restore_best_weights=True)
board = keras.callbacks.TensorBoard(log_dir='./logs_paper', histogram_freq=0, batch_size=64, write_graph=True, write_grads=False, write_images=False, update_freq= 1280)
check = keras.callbacks.ModelCheckpoint('model_paper.weights.{epoch:02d}-{val_loss:.4f}.hdf5', monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)

model.compile(optimizer = Adam(lr = 1e-3, decay = 5e-6), loss = 'binary_crossentropy', metrics = ['accuracy'])

print(model.summary())
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_7 (InputLayer)            (None, 34)           0
__________________________________________________________________________________________________
embedding_7 (Embedding)         (None, 34, 200)      1996600     input_7[0][0]
__________________________________________________________________________________________________
dropout_19 (Dropout)            (None, 34, 200)      0           embedding_7[0][0]
__________________________________________________________________________________________________
conv1d_19 (Conv1D)              (None, 32, 128)      76928       dropout_19[0][0]
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 32, 128)      512         conv1d_19[0][0]
__________________________________________________________________________________________________
activation_19 (Activation)      (None, 32, 128)      0           batch_normalization_19[0][0]
__________________________________________________________________________________________________
conv1d_20 (Conv1D)              (None, 29, 64)       32832       activation_19[0][0]
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 29, 64)       256         conv1d_20[0][0]
__________________________________________________________________________________________________
activation_20 (Activation)      (None, 29, 64)       0           batch_normalization_20[0][0]
__________________________________________________________________________________________________
conv1d_21 (Conv1D)              (None, 27, 64)       12352       activation_20[0][0]
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 27, 64)       256         conv1d_21[0][0]
__________________________________________________________________________________________________
activation_21 (Activation)      (None, 27, 64)       0           batch_normalization_21[0][0]
__________________________________________________________________________________________________
gru_17 (GRU)                    (None, 34, 128)      126336      dropout_19[0][0]
__________________________________________________________________________________________________
global_average_pooling1d_7 (Glo (None, 64)           0           activation_21[0][0]
__________________________________________________________________________________________________
gru_18 (GRU)                    (None, 128)          98688       gru_17[0][0]
__________________________________________________________________________________________________
dropout_20 (Dropout)            (None, 64)           0           global_average_pooling1d_7[0][0]
__________________________________________________________________________________________________
dropout_21 (Dropout)            (None, 128)          0           gru_18[0][0]
__________________________________________________________________________________________________
concatenate_7 (Concatenate)     (None, 192)          0           dropout_20[0][0]
                                                                    dropout_21[0][0]
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 1)            193         concatenate_7[0][0]
==================================================================================================
Total params: 2,344,953
Trainable params: 2,344,441
Non-trainable params: 512
__________________________________________________________________________________________________

The model con­tains 2.3 mil­lion train­able pa­ra­me­ters and takes a fairly long time to train using a mid-​performance CUDA-​capable GPU. Extra dropout lay­ers with the rate set to 0 (ren­der­ing their pres­ence ir­rel­e­vant) were added to pos­si­bly tweak dropout rates dur­ing train­ing.

model.fit(x = pad_tweets_train, y = sentiment_train,\
          validation_data = (pad_tweets_val, sentiment_val),\
          batch_size = 64, epochs = 20,\
          callbacks = [early, board, check])
Epoch 1/20
1152000/1152000 [==============================] - 2382s 2ms/step - loss: 0.3939 - acc: 0.8220 - val_loss: 0.3654 - val_acc: 0.8380
Epoch 2/20
1152000/1152000 [==============================] - 2426s 2ms/step - loss: 0.3480 - acc: 0.8473 - val_loss: 0.3538 - val_acc: 0.8432
Epoch 3/20
1152000/1152000 [==============================] - 2596s 2ms/step - loss: 0.3225 - acc: 0.8608 - val_loss: 0.3527 - val_acc: 0.8441
Epoch 4/20
1152000/1152000 [==============================] - 2747s 2ms/step - loss: 0.2984 - acc: 0.8734 - val_loss: 0.3643 - val_acc: 0.8411

After the third epoch, the model reached its best per­for­mance on the val­i­da­tion set. The EarlyStopping call­back makes sure that this is the model that is kept under the model vari­able.

Model Eval­u­a­tion

from copy import deepcopy as dc
from sklearn.metrics import roc_auc_score, f1_score

pred = model.predict(pad_tweets_test)
pred_label = dc(pred)
pred_label[pred_label > 0.5] = 1
pred_label[pred_label <= 0.5] = 0

auc = roc_auc_score(sentiment_test, pred)
f1 = f1_score(sentiment_test, pred_label)
print('AUC: {}'.format(np.round(auc, 4)))
print('F1-score: {}'.format(np.round(f1, 4)))
AUC: 0.9231
F1-score: 0.8452

The best AUC achieved with the bag of words ap­proach was 0.8782, show­ing that the po­si­tional in­for­ma­tion added in this model re­ally boosts per­for­mance. Still, it was a very mild im­prove­ment. This is ex­pected as im­prove­ments in per­for­mance get ex­po­nen­tially more ex­pen­sive with bet­ter mod­els.

As in the first part of this post se­ries, the top false pos­i­tives and false neg­a­tives are ac­tu­ally mis­la­beled data. Thus, we’ll take a peek at ran­dom false pos­i­tive or neg­a­tive ex­am­ples, with­out ac­tu­ally choos­ing the most “in­cor­rect” ones.

True Pos­i­tives

from random import sample

pos_indices = sentiment_test == 1
pos_predicted = (pred > 0.5).reshape((-1))
true_positives = pos_indices & pos_predicted

samples = sample(range(sum(true_positives)), 5)
print(tweets_test[true_positives][samples])
print(pred.reshape((-1))[true_positives][samples])
TweetPos­i­tive Prob­a­bil­ity
’feel­ing bet­ter about every­thing thank you booze!‘0.9616
’some writ­ing, some dust­ing, and then work 59 with tri­cia! ‘0.6634
thats badass jd! keep it up and in no time youll be look­ing like camillo! have all them cougars fol­low­ing your group!‘0.9469
i would have liked to have a like but­ton to press for your last com­ment ‘0.8898
aw ! that is so sweet. ‘0.8991

True Neg­a­tives

neg_indices = sentiment_test == 0
neg_predicted = (pred <= 0.5).reshape((-1))
true_negatives = neg_indices & neg_predicted

samples = sample(range(sum(true_positives)), 5)
print(tweets_test[true_negatives][samples])
print(pred.reshape((-1))[true_negatives][samples])
TweetPos­i­tive Prob­a­bil­ity
’well ive now got a chest in­fec­tion, and it hurts like a bitch i want <all­caps> kfc <all­caps> ‘0.0102
’a bee stung me in the fin­ger! its so swollen that i dont have a fin­ger­print. ‘0.0077
’i miss hav­ing tcm <all­caps> ‘0.0141
’kit­tens are going soon. sad times. i love them too much ‘0.4847
’<user> i took 2 weeks off work for in the sun, in­stead im lieing here try­ing to use this bas­tard twit­ter, gr <elong> i should be rav­ing ‘0.2338

False Pos­i­tives

neg_indices = sentiment_test == 0
pos_predicted = (pred > 0.5).reshape((-1))
false_positives = neg_indices & pos_predicted

samples = sample(range(sum(false_positives)), 10)
print(tweets_test[false_positives][samples])
print(pred.reshape((-1))[false_positives][samples])
TweetPos­i­tive Prob­a­bil­ity
’<user> with 1 / 40th of that fol­low­ing alone, i could stay on twit­ter 24 / 7 the only stress to tire me would come from coul­terites! ‘0.7520
’<user> not even in the land of ketchup, thats just wrong. did u ever try the ketchup fries?‘0.6687
’<user> thanks girlie i need it. <re­peat> theres a lot to do ‘0.7414
’<user> hi ryan! why you are get­ting so un­fash­ion­able lately? ‘0.8097
’i dont know what i am doing on here. wow i joined the new fad ‘0.8859
’<user> thats okay. <re­peat> it will take a cou­ple hours of in­tense ther­apy to get over it, but ill man­age some­how ‘0.9083
’? <re­peat> quotbest video <url> ? <re­peat> ? <re­peat> ? <re­peat> ? <re­peat> , ? <re­peat> ? <re­peat> ? <re­peat> ? <re­peat> ? <re­peat> ! i al­ready clicked it <url> ‘0.5936
’wii fit day 47. hang over pre­vented wii this morn­ing. late night work meant i wasnt home til near mid­night. 15 min walk then situps. ‘0.6522
’<user> were get­tin alot of rain. we must be get­ting yours! ‘0.6998
’im tired didnt do any­thing all day! ex­cept went to the craft store to get some hemp string ‘0.8360

False Neg­a­tives

pos_indices = sentiment_test == 1
neg_predicted = (pred <= 0.5).reshape((-1))
false_negatives = pos_indices & neg_predicted

samples = sample(range(sum(false_negatives)), 10)
print(tweets_test[false_negatives][samples])
print(pred.reshape((-1))[false_negatives][samples])
TweetPos­i­tive Prob­a­bil­ity
’okay, a bath is a must. and then study­ing! i re­ally have no life.‘0.2401
’<user> so you hate me ‘0.3220
’catch­ing up on my read­ing. <re­peat> twit­ter n bf break ‘0.4715
’<user> good thing when quotdj hero­quot video game comes out there will be no more wanna be djs ‘0.4940
’gained 1 fol­lower. i need more. haha! ‘0.4358
’thanks <user> it is the avatar i started with. hope all is well. had more storms here today though nothi. <re­peat> )<url> ‘0.2736
’so <elong> have to piss right now, cant find the en­ergy to want to un­leash the fury ‘0.0265
’oh snap. <re­peat> kinda nuts right now. <re­peat> <user> ive told at least 27 thanks babes.‘0.4799
’ahh, wor­ried about to­mor­row. <re­peat> will they turn up. <re­peat> ? haha ‘0.3653
’my eyes are red. should sleep but dnt feel like it, haha. lilee is sit­ting on my chair so i have to sit on my bed ‘0.1993

We can see that true pos­i­tives and neg­a­tives are in­deed pos­i­tive and neg­a­tive, re­spec­tively. It is worth men­tion­ing one ex­am­ple from true pos­i­tives (“some writ­ing, some dust­ing, and then work 59 with tri­cia!”) which is not ob­vi­ously pos­i­tive and, ac­cord­ingly, re­ceived a lower prob­a­bil­ity. From the neg­a­tives, the tweet “kit­tens are going soon. sad times. i love them too much” re­mained al­most un­cer­tain to the model prob­a­bly due to the “i love them too much” part.

Some false pos­i­tives or neg­a­tives don’t have an ex­plicit feel­ing to them - e.g “<user> were get­tin alot of rain. we must be get­ting yours!”, “catch­ing up on my read­ing. <re­peat> twit­ter n bf break”; with­out the label, it’s hard to clas­sify them (most likely be­cause emoji in­for­ma­tion was lost, and so the model in­cor­rectly clas­si­fied them but with prob­a­bil­i­ties close to 0.5). Some other mis­takes are ac­tu­ally mis­la­beled data (e.g. “okay, a bath is a must. and then study­ing! i re­ally have no life.”, “<user> so you hate me”, “my eyes are red. should sleep but dnt feel like it, haha. lilee is sit­ting on my chair so i have to sit on my bed”, “ahh, wor­ried about to­mor­row. <re­peat> will they turn up. <re­peat> ? haha” - they should be all true neg­a­tives).

There are, of course, some ob­vi­ous mis­takes, such as “wii fit day 47. hang over pre­vented wii this morn­ing. late night work meant i wasnt home til near mid­night. 15 min walk then situps” (in­cor­rectly clas­si­fied as a pos­i­tive).

It’s worth men­tion­ing that the com­plex na­ture of our model gives it a black box na­ture. That is, it’s very hard to know why a tweet was clas­si­fied in some par­tic­u­lar way. For ex­am­ple, the tweets “<user> not even in the land of ketchup, thats just wrong. did u ever try the ketchup fries?” (prob­a­bil­ity: 0.6687) and “<user> hi ryan! why you are get­ting so un­fash­ion­able lately?” (prob­a­bil­ity: 0.8097) were false pos­i­tives, yet they can be clas­si­fied as pos­i­tives if we con­sider they’re funny. Still, it’s very far-​fetched to as­sume that the model learned a sense of humor. It’s also a mis­tery why the tweet “<user> thats okay. <re­peat> it will take a cou­ple hours of in­tense ther­apy to get over it, but ill man­age some­how” was clas­si­fied as a pos­i­tive. Maybe be­cause the per­son will man­age it some­how, even though this might have (and prob­a­bly has, in this con­text) a neg­a­tive con­no­ta­tion.

Fi­nally, it’s per­ceiv­able that this model has a greater abil­ity to de­tect sen­ti­ment with­out ob­vi­ous words (such as love, hate, pain, happy). This is the major im­prove­ment from the bag of words ap­proach.


Previous Post
The basics of outlier detection
Next Post
Twitter sentiment classification - Part 1