Since recently announcing $16M in funding, Coursera has been making quite a bit of noise. Last fall, Stanford University decided to freely offer a couple computer science classes online. The response was huge, and that led to the creation of Coursera.
The courses are no longer limited to computer science, and Stanford is no longer the only school involved. Here is a list of academic areas being offered and another list with the schools involved.
Although, not all of the courses will be directly related to data science, many of them are very close. Naturally Math, Statistics, and Computer Science areas have direct relations to data science. However, some of the other areas such as Networks, Biology, and Economics are some of the most popular application areas for data science. This is very exciting. My only concern is that the courses are a bit too much like traditional university courses with specific start/end dates and homework due dates. It will be interesting to see if the course structures change over time.
Anyhow, the following courses are starting today. Signup and start learning.
- Machine Learning – A major focus area of data science
- Computer Science 101 – probably a good starting point if you don’t know how to program
- Compilers – good for understanding how programming languages work
- Automata – hard to explain in 1 line, but it contains some fundamental principles in computer science
- Intro to Logic – learn to reason systematically
- Computer Vision – not sure of the relation to data science, but I am sure there is one, if you know, please leave a comment
Are you going to enroll in any of these courses?
Colleges and Universities are slowly starting to notice the demand for employees with data science skills. Most of the programs are not named data science, but they all focus on producing data people. Below are a couple of the programs I have noticed so far.
Do you know of any other programs?
Here is another similar list of colleges with bigdata/data science programs.
Updated: New schools added and a link to another list of graduate programs. Last update May 2013
The Coursera Probabilistic Graphical Models course officially starts today. Sign up and start learning.
The Coursera Natural Language Processing course officially starts today. Sign up and start learning.
A few days ago, I mentioned that the Stanford Machine Learning class will be starting soon. I thought I should quickly mention some of the topics covered. The list also serves as a great outline for machine learning.
In supervised learning, one has a set of data with features and labels.
- Linear Regression – one/multiple variables
- Gradient Descent - a general algorithm for minimizing a function
- Logistic Regression – This is useful when predicting classification type results. For example, are you looking for a yes or no result. Does the patient have cancer? Will the customer buy my new product? It can also be helpful for more than 2 results. What color will a person choose (red, blue, green, silver)?
- Neural Networks – A learning algorithm that is modeled after the brain. Think of neurons.
- Support Vector Machines
In unsupervised learning, one has a set of data with no features and labels. Can some structure be found for the data?
- Clustering – The most popular technique is K-means.
- PCA (Principal Components Analysis) – speed up a learning algorithm
This section covers methods to determine if data is bad. Bad data is considered an anomaly.
Like the name says, recommender systems are used to make recommendations. Companies like Netflix use recommender systems to recommend new movies to customers. LinkedIn also recommends people to connect with. This is a fairly hot topic in the tech world right now.
- Content Based(Features)
- Modified Linear Regression
- Non-content Based(No Features)
- Collaborative Filtering
- Matrix Factorization
If any of these topics sound interesting to you, signup for the Stanford Machine Learning class. Professor Andrew Ng will do an excellent job explaining the details.
In a matter of days, Stanford will begin the second round of the free online machine learning course. I enrolled in the course last fall, and it exceded all expectations. Professor Andrew Ng is great. The prerequisites are minimal, so don’t worry if your math is a little rusty. Also, the videos are short (around 8 – 12 minutes). Therefore, you don’t need large blocks of time set aside. Just watch a video or two during your lunch and you should be able to keep up. There are programming assignments (optional) and review questions to go along with the videos.
Don’t worry if you fall behind. The videos will still be there. The material you learn is more important than the pace. If you don’t know machine learning, the Stanford class is a great opportunity to get started.
Here is Professor Ng’s introduction to the class.