660 - Introduction To Database Systems
and Twitter Analysis
is ONLY for CS 660 students
on: Tuesday, Dec 12, 2017 at 11:59PM.
this project you will learn how to get tweets from the Twitter
Website in real-time (streaming mode), how to store them in a MongoDB
database and retrieve them in a Python code using PyMongo, in
addition to playing with the data within the Mongo shell itself.
this extra project which is geared only for grad students in CS660,
we expect students to be able to install all the necessary packages
on their own and be able to search and research for ways to do
things. For some of the tasks we have provided suggestions on how to
perform them but you could use any other methods to get the task done
as long as you are using PyMongo within Python except for the last
task which you might want to use other methods and languages. We
tried to keep it fun and engaging and we wish you a great rest of
each part you should write related Python code either using PyMongo
API or pure Python code or using other 3rd party libraries. You need
your entire code in a zip
file in the format of
Tuesday, December 12, 2017 at 11:59PM.
this part of the project, you use the Twitter data mining script
(pymongo_tweepy.py) given to you and modify it such that it mines
tweets with the keywords #deeplearning,
streamer, similar to the original file, should stream on track
(search for keywords) (while in Part 2 you stream based on location).
what a single tweet would look like when stored in MongoDB:
the command >
order to find the number of tweets in your database, you could use
the following command:
the purpose of the project please retrieve ~1000 tweets using the
given instruction in https://github.com/monajalal/mongo_tweets
. You would need to do a git
to get the latest version of the code if you already have git
the repository. For further instruction on how to get the repo and
get started with Twitter API please check Lab11_extra in
case you didn’t
attend the lab on December 1st, 2017.
this project, you would need to refer to Tweet Object definitions
the number of tweets that have data somewhere in the tweet’s
text (case insensitive search using
all the data related objects, how many of them are geo_enabled?
all the data related tweets, use the TextBlob
Python library to detect if the Tweet’s
sentiment is “Positive”,
You are free to use other sensible methods and libraries to do
so.(Hint: To get better results you could clean
of unwanted characters/emoji/etc--not obligatory and we wouldn’t
deduct point based on accuracy, whatsoever).
final results should look like something like below: