7:00-7:10 getting started: organise yourselves around the table, from beginner upwards
7:10-7:30 talk: a brief introduction to data science
7:30-9:00 coding: work at your own level with those around you
What we’ll do
We guess that people will fit into one of three categories: beginner, intermediate or advanced, and have tasks for all levels.
If you’re completely new to coding, the first thing to do is learn a bit about programming. The very basic concepts of coding are the same across lots of languages, but we’re using Python which was designed with an emphasis on readability. A great place to get started is Code Academy, which has tutorials and exercises you can run in your browser, without having to install anything.
If you already have some experience with writing code, and either want some more practice, or to get familiar with Python, we recommend the Google Python class. This covers some of the basics of Python, with a data science bias to the exercises.
There are instructions for installing Python, or you could use an online service like Python Anywhere to write and run code without having to install anything on your own machine.
If you know how to write code and want to work on real problems, then we’ll get started on some data science challenges.
First step is to install Python and some of the libraries that are useful: numpy, scipy, pandas and scikit-learn. A great place to get the set of useful libraries is Anaconda, a free Python distribution for scientific computing.
Kaggle have some ‘getting started’ tasks, including one from Data Science London. This is a binary supervised classification task where you have to identify whether each example in the dataset belongs to class 0 or to class 1. You can use sci-kit learn to get started without knowing too much about what’s going on under the hood; the most important thing to get to grips with is the use of training, development and test datasets, cross-validation and generalisation. These are things we can discuss on the night. There’s some starter code on GitHub which will read in the data from the Data Science London task and train a basic classifier.
Finally, if you want to really get a good understanding, then Coursera’s Machine Learning course started last week and covers a lot of the theory of machine learning. To do the course, you’ll need to know linear algebra (matrices and vectors) and a little calculus, as well as be able to program in Matlab/Octave.