April 4, 2016

Snowzilla - Blizzard 2016!

    From January 22–24, 2016, a major blizzard produced up to 3 ft (91 cm) of snow in parts of the Mid-Atlantic and Northeast United States. I live in Raleigh, NC which also saw its fair share of snowfall. Internet was down for one day but when it came back on I started collecting tweets referring to the blizzard.

    I wanted to see where are the tweets coming from. So I searched for tweets containing mentions of stormjonas, blizzard2016, jonas, blizzard, snowzilla. I ran the script for a around 10-12 hours and managed to collect around 4k tweets from all over the world. The number of tweets is less since geo tagged tweets are far less in number than regular tweets.

    The data required some cleaning and sorting according to the location. There were tweets from all over the world but most of them from the United States, naturally. 

    Let's start with tweets from the United States. On sorting the data I saw that there were tweets from all states including Hawaii and Alaska. I wish I collected the text from the tweets to get the general idea about what each state was talking about. Anyways, I plotted each unique location on the map and this is how it looks. Tweets from USA
    There are lots of tweets around the north eastern states which were battered by the snow. The density decreases as we move left. There is some activity on the west coast. I wonder what are they talking about. Either about the excellent sunny weather they have throughout the year or about how they could have easily used an extra holiday.

    The state of New York received record snowfall too. Let's compare the average snowfall in each borough against the number of tweets from each borough.

    The number of tweets from Manhattan is the most while it received the least snowfall on average. Though the average snowfall in each borough is almost the same, Staten Island got hit the most. While having hardly any tweets from there. (Source).

    According to this article the top 6 snowfall receiving states are given below and then is the number of tweets from each state sorted descendingly.

Avg Snowfall in inches
Number of tweets from each state
West Virginia which received the highest snowfall does not even make the list. While California which received no snow had considerable amount of tweets. 

    About a 100 tweets came from the United Kingdom. This was mostly due to reports that the blizzard was headed over to the UK from USA. (Source).
Here you can see tweets coming from all over the world. Majority from the US and Europe.

I wish I had collected the text of the tweets to see the sentiment of the tweets from different states. I presented some of the ideas I had. Would add more as and when I can think of more. Comments and suggestions are welcome!

All the code can be found over here

March 11, 2016

Tweets on a Globe!

This is going to be a short one.

While reading on the web, I came across Trump Tweets on a Globe. I was really fascinated by the fantastic work done. The best part was that it was open source. So I forked the repository and started playing with it. 

I thought it would be really cool if everybody got to see the location and the content of the tweets in topics they are interested in real-time. The only way you could change the search term was through the command line. I modified the code, to give the the search functionality on the web page. 

You can check it out over Tweets on a Globe. (The default search term, as it was before, is still Trump. Haha!)

(I have deployed the app on Heroku and there is just one instance of the app running. So you will probably see somebody else's search results on the globe. So just update the search term to what you want. Also, geotagged tweets are far less, so searching for trending topics would help for a better experience.)

You can find all the code on my GitHub page.

Big thumbs up to Joel Grus for the awesome visualization and concept! 

October 18, 2014

125 Years of English Football!

    So the Barclay's Premier League is back since quite some time, and all the other leagues are back in action too. And for the first time, we have an Indian football league too! The ISL! ( I wish it changes Indian football forever!)

     Football league (as it was known back then) was created in 1888 by Aston Villa director William McGregor. Since then English Football has evolved into thousands of teams which play under hundreds of leagues. The BPL is just the tip of the iceberg. This blog gives complete information of the hierarchy of English Football.

    So, James Curley, assistant professor of psychology at Columbia University, in his free time, cobbled up data from a lots of sources and compiled all of them together, to make, what's probably the best collection of English football scores. Sitting silently on this Github page are scores of nearly 200,000 games played in the top 4 leagues since 1888. These 14 megabytes can tell us remarkable stories about 125 years of English Football!

    I have used R to perform all the manipulations on the data. The below code shows how to load data into R.

    Take the most common scoreline, for example, in 188,060 games, there were 13,475 0-0 draws. And the most common scoreline is 1-1, accounting for roughly 21000 (11%) games.
Top Five Full Time scores
Now, lets talk about goals! 

In 188,060 matches played in 125 years, a total of 542,288 goals were scored!
About 330,000 goals were scored by the home team and remaining by the visiting team.

We see that average home goals have reduced significantly and away goals keep oscillating.
Now this drop in average home goals and rise in the away goals in past twenty years explains the below graphs.
So, the home wins have greatly reduced to about 44% while away wins are on a gradual rise. This means that home matches won't matter as much as they used to matter earlier and slowly home dominance will begin to fade away.

Average goals per game have also reduced.

We see huge shifts in the average goals around the years 1925 and 1965. And the reason for is rule changes.

1958 - Substitutions were allowed for the first time
This roughly corresponds with the beginning of a steep decline in scoring in the 1960s. This could make for a plausible causal explanation: Perhaps playing with an injured player left teams extremely vulnerable on defense, leading to many goals. The addition of the substitute may have mitigated these effects.

The reduction in goals in the late 1920's isn't well explained. But it is believed that this majorly happened due to tactical changes. (Teams used to play many forwards, but later, defensive and midfield players increased.)

All the code used for plotting above charts and manipulation data can be found here.

Hope that it was a good read! 
Suggestions and feedback are always welcome! 

Happy Coding :D

August 10, 2014

A look at Rakshabandhan through Instagram

Due to none of my sisters in town, Rakshabandhan day is like any other day for me. No family get-together, no special lunch, nothing at all.

I wanted to use Instagram API since a long time and thought there would be no better day than this. It would be very interesting to know what people posted on Instagram related to this ocassion.

So after searching a lot, reading through the Instagram documentation and a bit of hacking I managed to make a Python script that will fetch images from Instagram in real time.
I ran the script for the keyword happyrakhsabandhan so I could fetch all the images hash-tagged with happyrakshabandhan as and when they are uploaded.

The script ran for about 15-20 minutes after which the rate limit got exceeded. My directory was filled with 335 images in just 15 minutes.

Here is my code snippet,

Just enter your credentials, set a search term and enter your path, you're good to go.

I prepared a quick collage of some random images.

General observations after glancing through the images,

  1. Majorly, there are close-ups of the rakhis either displayed in the plate or tied to the brother's wrist.
  2. Most of the pics show the brother-sister pair captured by a selfie.
  3. Some of the pics show all the cousins getting together for this occasion usually at someone's home.
  4. Also, girls have teasingly sent happy Rakshabandhan messages to their guy friends via trolls/selfies.
This is a very small project but this is just a start. It gave me many ideas for the future like, getting images which are selfies and using image processing to classify whether it is a boy or a girl, predicting hashtags based on the image and also predicting hashtags based on a given hashtag etc.
You can look forward to many more things in the future.
Feedback and suggestions are welcome.

And yes, *blog description and about us coming soon*

Happy Hacking :D