ASA Datafest experience

May 8, 2017

Over the weekend, I attended my third hackathon ever. Technically the second one I've registered for, the first one I snuck into.

As the title suggests, the hackathon I attended to was the ASA datafest, hosted by UWaterloo. When I entered this, I had the goal of finding out whether I actually enjoyed data related activities or if I was just bandwaggoning the term datascience too hard.

They introduced the hackathon as being an extremly open ended competition, where we were given data from Expedia and were told to find anything we can about this data. The criteria were best insight, best visuals and best use of external resources. I didn't what to think about this, but I knew the place was split 50/50 on it being either too vague and that people didn't like it or that it was liberating and that people found it exciting being able to work on anything wanted to.

Personally for myself, the only thing I didn't like was the fact that all my teammates left or didn't bother particitpating, which only left myself and one other teammate. But other than that, it was a really good experience. Before this, I didn't know any R at all, but after the first day, I learned pretty much all of the basics and was able to clean and analyze the data.

As it went on, I found out that my other teammate, Joy, was working with her friends from another group and I decided to join them since it was probably better than working alone. fast forward a little bit and I find out that Joy and her friends from the other team were all fourth years in stats, so that was a huge plus in my learning curve.

so after the first day, I finally decided that I was going to make a time lapse of all the booking destinations from Canada to find out if the Bookings were affected by Seasons. I only did Canada since it was only a million data points and I barely had enough memory in my laptop to be able to process it. Poor thing kept crashing. I found out later that Joy and her friend's team decided to cluster up the data and classify them in order to understand what their needs and wants are for each cluster and from that, to display which locations and bookings on the website in order to secure more bookings.

so after 2 more days and some solid google seshs, I was finally able to create the time lapse. The way I did it was by first cleaning the data of all its null values. finding the all the rows that corresponded to Canada, merged them by user data and destination data by their destination id and then once more selcted only the longitudes, latitudes and dates and inserted them into its own dataframe.

Afterwards, I used ggplot and ggmaps to plot all the coordinates of each booking on to a map of the world and I did that for every single day of the year 2015 in order to create the time lapse. With the help of the software imagemagick, I was finally able to produce this.

Bookings of Canada during 2015

So after seeing this animation, I was honestly surprised to find that the season didn't actually have much impact on where people traveled to and that people just liked travelling to the US and Europe a lot

Plus, for the hell of it, I also created a heatmap version

Heamap of bookings in Canada during 2015

In the end, Joy and I didn't end up presenting, since we felt that we didn't have enough to be able to present, but honestly, the whole experience was great. I went from knowing nothing to learning so much about R. Was able to actually produce something in R, got to know some really great people and found out that I wanted to do a double major in Stats along with CS. So in the end, I was able to achieve my goal, which made this an incredibly valuable experience