Top ten algorithms in data mining (2007) [pdf] | Hacker News

Top ten algorithms in data mining (2007) [pdf] | Hacker News.

The discussion below the link is also very good.

If you are curious, here are the 10 algorithms, and the paper is displayed below.

  1. C4.5
  2. k-Means
  3. SVM
  4. Apriori
  5. EM
  6. PageRank
  7. AdaBoost
  8. kNN
  9. Naive Bayes
  10. CART

More Free Courses from Stanford

Also this spring, Stanford will be offering two more courses that might benefit a person learning data science.

If you feel these 2 classes might be a bit too advanced at this point, then here are a couple more fundamental computer science classes.  If you are new to computer science and programming, CS 101 would be a good choice.  If you are not not as new to computer science or might be a bit rusty on your core algorithms knowledge, then Design and Analysis of Algorithms 1 might be appropriate.

Actually, the courses are no longer being offered by just Stanford.  A few others schools have been added.  The courses are now being offered through Coursera. Plus all the courses are free.

Did You Miss Strata 2012?

The Strata Conference Making Data Work for 2012 just finished up. If you (like me) were unable to attend the conference, you may have missed out on some of the networking and excitement of actually being at the conference, but you can still glean some knowledge from the videos.

Steve Schoettler “Learning Analytics”

This is a good video about how data can be used to help people learn.
There are many other Strata 2012 videos available as well. See below for links to them.

Other Strata 2012 Videos

See the O’Reilly Strata CA 2012 Playlist on Youtube for more videos. The videos contain numerous interviews with the speakers and even a few of the talks. Also, many of the slide decks can be found on the Strata Conference website.

Have fun catching up on everything that happened at Strata Conference 2012.

What is a data scientist?

If I am going to create a blog about becoming a data scientist, I must at least provide some type of definition.  One of the best definitions I have read is by Hilary Mason, Chief Scientist at Bit.ly,

A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics, and machine learning.

This definition is short and simple, but there are many more definitions out there.  In fact CITO Research, a site for CIOs and CTOs, set out to define what a data scientist is.  They interviewed six leaders in the data science community, and posted all of the interviews online.  The interviews produced varied results, but focused on some main themes of what a data scientist should know.

After reading Hilary’s definition, the CITO Research interview’s, a great post at Quora, and numerous other articles, I created a list of data science skills:

  • Machine Learning
  • Statistics
  • Story Telling (Communication)
  • Big Data
  • Algorithms
  • Curiosity

I am sure this list will change and evolve over time, but that is where I am going to focus for now.  If you have anything to add to the list, please leave a comment.  If you are interested in gaining some data science skills, please follow along and let’s learn together.