Want to get a head start on learning the fundamentals of programming and performing data analysis in R? Do you want to figure out what "TidyVerse" even means? Come join CDSS in our Introduction to R Workshop on Thursday March 7 from 8-9pm in Math 203. Bring a laptop if you want to follow along. No prior experience in R is required, so all are welcome to join!
Data science interviews often include a case study. Many times, interviewees are expected to come up with a machine learning/statistical approach to solve the problem. This includes working with the interviewer to identify the data needed, the KPIs and the relevant algorithms. Come join CDSS in a workshop where we show you how to approach cases step-by-step and have the opportunity to practice one yourself!
As a data scientist, you are not only expected to know your expected values but also your expected runtime. Do you know basic computer science concepts like Big-O, standard data structures, and basic algorithms? Do you think about edge cases and test your code? Worry not! In this session, CDSS will go over important concepts you should know, walk through real interview questions, and share some tips regarding coding.
Machine Learning is the foundation for predictive modeling and is an important component of interviews for Data Scientist roles. Join CDSS as we cover the range of concepts that often get tested and help you ace your technical interviews!
The workshop will cover a range of topics from ML fundamentals to algorithms and specific applications. We will also look at the different ways in which questions are framed in these interviews.
Data science interviews often test your SQL skills, so to prepare you, CDSS is going to walk through everything you need to know to ace it! We'll start from basic SQL syntax and work our way up to more advaned functions, as well as walk through the approach to some typical interview questions.
Time: Thursday 10/18, 7:30 - 8:30 PM
Place: Barnard 302
Want to secure interviews from top tech companies? You need more than solid coding skills. Join our resume/portfolio workshop where we will show how to use Github to build outstanding repositories and personal websites. There will also be experienced CDSS board members giving personalized suggestions on your resume!
Time: Wednesday 10/17, 8:00 - 9:00 PM
Place: Kent 413
Everyone and anybody is welcome to join. Show us that you are coming by clicking "Going" on this Facebook event!
Data manipulation and feature engineering are crucial steps in Data Science, making them critical skills that every data scientist must possess. Every language has its own tools for accomplishing these tasks and for R, it’s the powerful yet elegant dplyr library.
Get your laptops and join us for a hands-on workshop where we’ll cover the functionalities offered by dplyr for manipulating data. Knowledge of R is not required for this session. Please have R Studio installed on your computers. We will work on the Titanic dataset from Kaggle (https://www.kaggle.com/c/titanic/data). The dataset is also available here https://drive.google.com/open?id=1zd5mYiLFXjHNF5eT6yDQdfkSNdiBP5cz
When: Wednesday, April 18 2018 @ 7:30pm - 9:00pm
Where: Pupin 329, Columbia University
CDSS is hosting our last data science tech talk of the semester with Dataiku! Dataiku’s core product is a complete data science software tool aimed at shortening the time-consuming load-clean-train-test-deploy cycles of building predictive applications. The French-based startup scored a $28 milion Series B investment in late 2017, a super cool office space in downtown Manhattan, and is currently hiring!
At this event, Dataiku’s Lead Data Scientist, Jed Dougherty, will present a project analyzing the largest national dataset on evictions from Kansas City using Dataiku’s platform. Jed will also speak about the full-time and internship opportunities available at Dataiku.
Please RSVP to the event if you can.
For more info.
Since launching in 2009, Foursquare has collected 12 billion global check-ins, which have formed the cornerstone of its location intelligence. Using this data, Foursquare is able to detect a billion new place visits per month via the activity generated by users and business partners around the world. Foursquare now offers its proprietary location technology to hundreds of other companies, including Apple, Samsung, Microsoft, Twitter and AirBnB.
We will give an overview of Foursquare's data and how it has evolved, starting with the check-in and expanding to include continuously-detected visits on millions of smartphones. We will also describe Foursquare's core location technology, Pilgrim, how it works, and some of the data science challenges it has generated.
Adam Waksman is the Director of Engineering and Data Science at Foursquare, with oversight over Pilgrim. He has worked in various startup areas, including healthcare technology, fintech, and locations intelligence. Most notably, he was Chief Technology Officer at Epickk and was a day-one member at Arcesium. Prior to that he earned his Ph.D. at Columbia University; his academic papers in computer science and neuroscience have resulted in close to a thousand citations.
When: Wednesday, April 4 2018 @ 7:30pm
Where: Pupin 329
Do you know how to file your taxes? Have you started building your credit score? What about your retirement savings? Come learn how to adult your personal finances, backed by data and math. You won’t learn this in school!
When: Monday, April 2 2018 @ 7:30pm
Where: Fayerweather 313
Join CDSS, Suraj Keshri, and Min-hwan Oh, two PhD students in the Operations Research department, for an exciting talk on advanced analytical techniques in basketball. The talk will discuss ongoing work on exploiting optical tracking data to develop new metrics to better characterize player strengths, including understanding defensive assignment and automatic event detection, and combining trajectory modeling with shot efficiency. Methodologically, this work relies on hidden Markov models, logistic regression, deep neural nets, and unidirectional and bidirectional Long Short Term Memory (LSTM) networks.
For more info.
Come join McKinsey’s New Ventures, Advanced Analytics, and QuantumBlack practices on March 29th, 2018 at 7:30pm ET for a conversation about our work in data analytics and machine learning. We will explain the work these groups do and then talk through a recent project that leveraged machine learning to predict and prevent injuries for a professional sports team. We will also discuss the data analytics and machine learning roles at McKinsey and QuantumBlack. Following the talk, there will be a panel discussion to answer any questions. We look forward to meeting you!
Please arrive at 7:30pm sharp, as we’ll be beginning the presentation then.
7:30 – 8:30pm: Introductions, Analytics Case Presentation, Recruiting Process Overview
8:30 – 9:00pm: Q&A
Muneeb Alam - Muneeb is an Analytics Fellow in McKinsey’s Public Sector Analytics group. He joined McKinsey right out of university, with a BA in astrophysics from Columbia and a Master’s in Analytics from Imperial College London. He’s served clients in corrections, tax, and education.
Daniel First – Daniel is a Data Scientist at QuantumBlack, a subsidiary of McKinsey that specializes in machine learning. After pursuing graduate studies at Columbia’s Data Science Institute as an NSF research fellow, he joined McKinsey initially as a management consultant, before moving over to his current role at QuantumBlack. His work has centered around collaborating with doctors and hospitals to design innovative, data-driven solutions to improve outcomes for patients, by forecasting and preventing medical risks. He has also published on the social and political implications of Artificial Intelligence. He holds a master’s degree in philosophy from the University of Cambridge and an undergraduate degree in neuroscience from Yale University.
Ishneet Kaur – Ishneet is a Risk Advanced Analytics Fellow with experience in risk identification and stress testing. She interned with Risk Analytics in the Summer of 2016 before joining McKinsey full time in July of 2017. Ishneet holds a Masters in Applied Economics from Cornell University.
For more info.
Join IBM Research on March 28th, 2018 at 7:30pm ET for a conversation about our work using Data Science for Social Good. Members of IBM Research will give an overview of the program, and walk through recent projects that leveraged machine learning to produce social good. Projects include using natural language processing-based methodology to accelerate the work-flow of policy experts at UNDP, Accelerate Science Discovery. Following the talk, there will be a panel discussion to answer any questions. We look forward to meeting you!
Kush R. Varshney
Kush R. Varshney was born in Syracuse, NY in 1982. He received the B.S. degree (magna cum laude) in electrical and computer engineering with honors from Cornell University, Ithaca, NY, in 2004. He received the S.M. degree in 2006 and the Ph.D. degree in 2010, both in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge. While at MIT, he was a National Science Foundation Graduate Research Fellow.
Dr. Varshney is a research staff member and manager with IBM Research AI at the Thomas J. Watson Research Center, Yorktown Heights, NY, where he leads the Learning and Decision Making group. He is the founding co-director of the IBM Science for Social Good initiative. He applies data science and predictive analytics to human capital management, healthcare, olfaction, computational creativity, public affairs, international development, and algorithmic fairness, which has led to recognitions such as the 2013 Gerstner Award for Client Excellence for contributions to the WellPoint team and the Extraordinary IBM Research Technical Accomplishment for contributions to workforce innovation and enterprise transformation. He conducts academic research on the theory and methods of statistical signal processing and machine learning. His work has been recognized through best paper awards at the Fusion 2009, SOLI 2013, KDD 2014, and SDM 2015 conferences.
Yaoli Mao is a Ben Wood Research Fellow affiliated with the Institute for Learning Technologies and Ph.D. student in the Cognitive Science in Education program at Teachers College Columbia University.
She conducts research using both quantitative and qualitative methods in the intersection of cognitive psychology, human-computer interaction and learning science. Yaoli is interested in social and cognitive-affective aspects of learning (engagement, boredom, and gaming etc.), learning strategies and behavior patterns. Her dissertation concerns collective intelligence, exploring knowledge sharing and learning among diverse expertise and human crowds’ intelligence can be properly evaluated, supported and elevated by machine learning and system design.
Jonathan will graduate in May with an MS in Data Science from Columbia University. He graduated summa cum laude in with a bachelor in Computer Science and Mathematics. Before coming to Columbia, he served as adjunct lecturer as well as developer for a health-tech company. Jonathan main areas of interests are machine learning and natural language processing, especially their utilization for social good. DSI students might recognize Jonathan from the 2017 Columbia Data Science Hackathon, in which his team came in first place. As a Science for Social Good Fellow at IBM, Jonathan’s work was focused on sentence/paragraph embedding and semantic searching techniques & applications.
For more info.
We're excited to present the FinTech Panel 2018! This panel discussion focuses on the disruption of emerging technologies such as Artificial Intelligence and the Internet of Things in the finance industry. The panel will be an hour long, followed by a reception and networking session with the attendees. The panelists for the event are:
Brian Rogers (CEO & Founder, Modgital, Inc.)
-CEO and Founder, ShoShell LLC
-MS in Technology Management (Columbia University)
Michael Wang (CEO & Co-Founder, Sirl.io)
-Adjunct Assistant Professor of Electrical Engineering (Columbia University)
-Director of Internet of Things Practice at Thynk Different
-PhD Electrical Engineering, MS Electrical Engineering (Columbia University)
Steve McClain, a senior software developer at Bloomberg, will be the moderator of the panel.
For more info.
Get started with your own IoT project and learn how to stream sensor data from a Raspberry Pi to a streaming API for real time visualizations. This first event of our new IoT workshop series will introduce you to the Raspberry Pi computer, programming for IoT projects using Python, and real time data streaming. No prerequisites other than a basic understanding of Python are required!
For more info.
What is it like to be a female data scientist? These unique experiences, both rewarding and challenging, will be shared by a panel of accomplished female data scientists from various companies in different industry sectors, including Capital One, Audible, NYTimes, and Macys. This is an invaluable opportunity to get a glimpse at the experiences of female leaders in the industry.
Allison Fenichel- Capitol One
Jiun Kim - Audible
Anne Bauer - NYTimes
Iva Horel - Macys
For more info.
Applied Data Science and the Emerging Role of Quantum Computing in Machine Learning
Data science has been rapidly growing over the past decade, and its applications have become ubiquitous in our daily lives. As these applications consume more data and need faster response times, new technologies and algorithms are needed to meet the computational demands. Quantum computing is a highly promising emerging technology that could present significant opportunities to accelerate the training of machine learning algorithms and improve data science methods.
This presentation will provide an overview of data science, with a focus on practical applications in industry. The current state of quantum computing technologies will also be explored, including some of the ways that quantum computing can be harnessed to advance machine learning.
ABOUT THE SPEAKERS:
John Kelly, Ph.D.
John Kelly, Director of Analytics at QxBranch, is leading the company’s development of advanced data analytics technologies. Previously, he was the Technical Lead for Corporate Data Analytics at Lockheed Martin. John has experience applying machine learning to a diverse set of domains including healthcare, supply chain optimization, sustainment, and program management. He completed his BS and MS in Electrical Engineering at NC State and his Ph.D. in Electrical and Computer Engineering at Carnegie Mellon University, where his work focused on machine learning and signal processing algorithms for brain-computer interfaces.
QxBranch (www.qxbranch.com) develops and deploys advanced data analytics models for global companies in finance, insurance, technology, and sports. We have a diverse team of professionals in systems and software engineering, machine learning, quantum computing, and cyber security.
QxBranch is headquartered in Washington, D.C., with offices in London and Australia. For the most up to date career opportunities, please visitwww.qxbranch.com
For more info.
Arthena is a YC-backed art hedge fund based in NYC. Come hear how we gather millions of records on decades of art auctions and quantitatively analyze, price, and invest in art.
We’re looking for full-time summer interns! Data science interns will build experimental models, expand our data pipeline, build data visualization tools, and contribute to the production machine learning systems we use to invest tens of millions of dollars in art. All internships are paid and based out of our office in SoHo, NYC.
About the speakers:
PAUL WARREN manages the data science team, products, and technical roadmap at Arthena. He is currently co-authoring a data science textbook with a Professor of Data Science in South Korea and has several years of professional data science experience. He studied Computer Science at Stanford, where he grew and managed a 100-person space exploration project group.
BASIL VETAS is a data scientist at Arthena and a graduate student at Columbia's Data Science Institute. He previously worked on the data visualization team at JPMorgan and Qualtrics, with an impact investing fund, and with a number of startups. Basil is originally from Salt Lake City and received his bachelor's degrees from the University of Utah.
CHIKE UDENZE is a Software Engineering intern at Arthena. He studies software engineering at Rochester Institute of Technology and has previously worked at Releaf (YC Spring '17) and Datto.
*Bring in your résumés!
The Broadway Room is located on the 2nd floor of Lerner Hall (the big glass building just to the west of Butler Library). Detailed floor plan:http://lernerhall.columbia.edu/files/lerner/floorplans/2e.jpg?width=600&height=800
When: Monday January 29 2018 @ 7:15pm - 8:00pm
Where: Lerner 569
Kaggle is the biggest online Data Science platform which brings together some of the most skilled data scientists out there. It is home to the biggest Machine Learning competitions in the world and is also a treasure trove of resources for both aspiring and seasoned Data Scientists.
Join us for a workshop that will teach you how to utilize this platform to hone your Data Science skills, attack new data-driven problems, build a portfolio that stands out and get noticed by recruiters!
Not sure where to begin this semester? Here's a Lyft! Data Scientists and Data Analysts from Lyft will be giving a Tech Talk catered towards Masters and Ph.D students but anyone is welcome!
Guest Speakers and Presentation Topics:
-Data Analyst, *Simone Zhou will be speaking about increased passenger activation by 7% by performing a card add funnel analysis.
-Data Analyst, *Baptist Richard will present “Developed density based models to extend ride sharing availability to sparse cities in the US.”
-Data Scientists, *Davide Crapis and *Cameron Bruggeman will give an overview of all the places Data Science is used at Lyft, and then give a deep dive into how they make ETA predictions and allocate supply-side incentives.
-Data Analyst, *Shelley Chan and Nick Chamandy, Head of Data Science, will also be in attendance!
*Indicates Columbia Alum
There will be food and swag!
We know that finals are fast approaching, but just around the corner is an equally daunting task — choosing courses for next semester — especially when so many departments are offering up their takes on a data science curriculum. Let CDSS ease the stress! We'll run through classes for all levels of ability, ranging from the best first-exposure classes to which advanced machine learning class promises to be the most engaging. We'll also leave some time for open Q&A with our undergrad and grad members, who have all taken a different smattering of data science courses over the years.
RSVP on the Facebook event.
*** Please register for the workshop here: http://bit.ly/cdss_tableau ***
The demand for graduates with data analytics and visualization skills has never been higher. Data mining, statistical analysis, and data presentation are among LinkedIn’s list of top skills that can get you hired in 2017. And Forbes recently ranked Tableau as the technical skill with the third biggest rise in demand.
We’ve partnered with CDSS for a hands-on Tableau training for Columbia University students.
1:30-1:45 PM: Overview of Tableau and how to get started as a student
1:45-3:15 PM: Hands-on Tableau Desktop training*
3:15-3:30 PM: Q&A
This is a free event. Spaces are limited - register today to reserve your seat.
*Don’t forget to bring your personal computer, along with a copy of Tableau Desktop installed (request your free student license: https://www.tableau.com/academic/students#form), so that you can follow along with the hands-on training.
So, you want to be a data scientist, but how fast can you answer this question?
If I have two decks of cards each with half blue cards and half green cards, should I draw from the deck of 10 or 100 to maximize the probability of drawing two green cards in a row in the first two draws?
Did it take you longer than you would like? No worries, CDSS has you covered! Technical interviews for data science positions are notorious for throwing hardball statistics questions at you to test not only your math skills, but also your ability to think under pressure. :scream: At Probability and Statistics for Interviews, CDSS will walk you through a repeatable process of figuring out the right answer for these kinds of questions. Whether you’re a math wizard or are taking your first probability class, come to our event for an hour of fun and collaborative problem solving!
When : Thursday, November 9 at 8:00pm to 9:00pm
Where : Pupin 214
Are you looking to learn the basics of the pandas library for Python for your next data science project or an upcoming interview? Do you want to find out why Pandas is useful for data science and how it can be used most efficiently? Join us for a short session on the basics of dataframes, file I/O, cleaning and viewing data and preparing dataframes for scikit-learn. We will guide you through the basics of pandas using the Kaggle Titanic dataset.
When : Wednesday, November 8 at 9:00pm to 10:00pm
Where : Hamilton 517
Getting some experience working on real problems and real datasets is a crucial step towards becoming a data scientist, and the best way to do that while you're in school is a summer internship. But finding the perfect internship is tricky, so let us help you! During this panel, you'll hear from some current CDSS board members about their internship experiences at Facebook, Digital Reasoning, and more- from applying and interviewing to preparing for your first day and landing that return offer. Then we'll open up the floor to you, so come with with questions!
When : Wednesday, October 1 at 9:00pm to 10:00pm
Where : Hamilton 517
What even is a data science interview? Do they want me to be a developer, or an analyst, or a unicorn? At this workshop we’ll go over what to expect in data science interviews, focusing especially on in-person tech interviews. From the basic layout to the right answer to (almost) every “what data structure can you use to make this faster” question, you’ll be ready to land the internship or job of your data-science-filled dreams.
Chris Mulligan is currently a Quantitative Researcher at Two Sigma Investments, LP. He builds models to make predictions of financial markets using untraditional data, which is a sentence he never imagined he’d say about himself. Prior to Two Sigma, Chris interned at Kickstarter, Facebook, and The New York Times, and before that, he spent 7 years in political data analysis and modeling, most recently as Director of Analytics at YouGov. Chris received BA and MA degrees in computer science and statistics from Columbia in 2015, where he was a TA for COMS3157 AP and STAT4400 StatML and cofounded CDSS.