Sentiment Analysis using NLTK
Training data set: 31,961 tweets
Test data set: 17,197 tweets
Text Procssing
Coversion the text to lower case
Remove the URL's
Remove the special charactors
Tokenization
Lemmatization
Vectorization
Models
1. Logistic Regresion model
Accuracy: 95%
Precision: 90%
Recall: 31%
2. XGBoost model
Accuracy: 94%
Precision: 86%
Recall: 16%
These are our findings:
Used:
Github : https://github.com/lumindak/sentiment_analysis