Developing a Sentiment Analysis model for Covid-19 vaccine tweets with Amazon Kinesis Data Firehose and PySpark

In December 2020, the Pfizer-BioNTech COVID-19 vaccine and the Moderna COVID-19 vaccine were authorized by multiple governments. Tens of thousands of people around the world posted their feelings regarding the vaccine on Twitter. There appeared to be mixed feelings about the vaccine.

I decided to collect Twitter live stream data using Amazon Web Services (AWS) Kinesis Data Firehose and develop a Sentiment Analysis Model using PySpark for COVID-19 vaccine tweets.

Displayed here is a diagram explaining the Data Pipeline for my project.

A concise walk through the steps for building a ML model using Python libraries for machine learning and visualization

I’m an avid reader and learner. Over the past few months, I’ve read several dozens of Medium blog posts on a variety of machine learning topics. I’ve learnt a lot from them. I’ve observed that most posts have some specific area of focus. There aren’t many blog posts that walk through the steps for building a machine learning model, from start to finish, with explanation and code for each step.

I guess one reason for this could be that discussing all the steps involved in model building in a single Medium blog post, right from data collection and preparation, data…

Web scraping with Beautiful Soup, data wrangling with Pandas, and discussing insights generated

Recently I was apartment hunting in Toronto and spent a ton of time on various websites trying to understand the rental market. I had several questions such as:

  • Would renting an apartment or condo in Mississauga, Etobicoke, or North York be significantly cheaper than living in downtown Toronto?
  • How much could I potentially save if I rented a basement unit?
  • Which suburb has the lowest rents?
  • How do suburb rents compare to Toronto city rents?

Browsing manually through listings on rental websites was proving to be very time consuming. …

