2018 Columbia Data Science Hackathon



The Columbia Data Science Society, in collaboration with the Tow Center for Digital Journalism, proudly hosted the fourth annual Columbia Data Science Hackathon. We were excited by what you could do in collaboration with other students and mentors using novel datasets provided by our corporate sponsors. We hope you enjoyed the hackathon as much as we did, and we hope to see you again next year!

Hackathon Datasets

Tow Center for Digital Journalism

Columbia Tech Ventures

Qu Capital

Almost every day, the White House publishes a 4-6 stories under the “West Wing Reads” banner. This dataset brings those stories together, it includes the titles, publications, date of publication, as well as the entities mentioned within the stories, alongside other metadata.

The core of the dataset is a list of all inventions disclosed to Columbia Tech Ventures by inventors dating back to the 1980s. Many of these inventions were discovered in the course of grant-supported research described in academic publications.

Qu Capital provided two time series datasets - tick-level data for bitcoin on a major cryptocurrency exchange and a parsed corpus of reddit comments from select subreddits.


Joseph D. Jamail Lecture Hall
Pulitzer Hall Floor 3
Columbia University
New York, NY 10027


Saturday, September 29, 2018
Start: 4 pm
Sunday, September 30, 2018
End: 12 pm


Hackathon Winners

1 Data Never Sleeps.jpeg

Data Never Sleeps

1st Place

Team Members: Kedi Cui, Zhe Liu, Yang Song, Xiangtian Deng

Constructed a bitcoin trading algorithm by using an ensemble of machine learning algorithms - XGBoost, ARIMA, LSTM, and NLP - to predict market price.

2 Black and White.jpeg

Black and White

2nd Place

Team Members: Quan Yuan, Xiaowo Sun, Jie Li, Xiaofan Zhang

Implemented a combination of machine learning, NLP, and time series to do feature engineering, predictive modelling, and designing an arbitrage strategy.

3 Sleep Beauty.jpeg

Sleep Beauty

3rd Place

Team Members: Jinhao Zhang, Mingfeng Li, Yinan Ling, Nan You

Identified high-value inventions from patent data using feature selection and neural network.

gcp Knowledge Trumps All.jpeg

Knowledge Trumps All

GCP Award

Team Members: Thompson Bliss, Jacob Klein, Alex Kim, Patrick Lewis

Created a web app that quantifies the differences in writing style between news publishers and predicts whether an article will be promoted by the west wing.




Organized By