Probabilistic Programming and Bayesian Methods for Hackers is an open source online book. The book is developed with iPython, so it can be read in a variety of formats: web, PDF, or locally with iPython installed.
Also, contributions are welcome via the Github repository for the book (or you can email the authors).
This is the first iPython project I have really looked at, and iPython looks very promising.
Win-Vector Blog » Data Science, Machine Learning, and Statistics: what is in a name?.
This is an excellent write-up for the differences between:
- Machine Learning
- Data Mining
- Big Data
- Predictive Analytics
- Data Science
I recently saw the article, The Best Data Mining Tools You Can Use for Free in Your Company. It contains a very brief description of each of the following tools.
- Apache Mahout
See The Best Data Mining Tools You Can Use for Free in Your Company for more details, links, and pictures.
Yhat, a new predictive modeling startup, wrote up a nice blog post about
10 R Packages I wish I knew about earlier. It is worth reading through the list.
Special Thanks to Mark Nickel for pointing out this link.
Hans Rosling does an excellent job of showing how “not boring” statistics can be. This is a great informative statistics video. It was originally posted at The Joy of Stats.
Yes, 2013 is the International Year of Statistics. Thus a video was made.
Although Tobias Mayer may be known as the first data scientist, he did not coin the term data science. According to Wikipedia, the first use of the term data science was in 2001.
Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics was published in the April 2001 edition of the International Statistics Review. The author was William S. Cleveland, currently a Professor of Statistics at Purdue University.
The paper proposes a new field of study named data science. It then goes on to list and explain 6 technical focus areas for a university data science department.
- Multidisciplinary Investigations
- Models and Methods for Data
- Computing with Data
- Tool Evaluation
For the most part, the paper is still relevant. I did find a couple of good quotes from the paper that deserve comment.
The primary agents for change should be university departments themselves.
That did not happen. The driving agents for change in the data science field have been some of the newer technology/web companies such as LinkedIn, Twitter, and Facebook (none of which even existed in 2001).
…knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited. A merger of the knowledge bases would produce a powerful force for innovation.
I think this statement still applies today. The world is just starting to realize the benefits of merging knowledge from computer science and statistics. There is much more work to do. Fortunately, businesses and universities are working to address the merger.
Have you seen the paper before? What are your thoughts on it?
The Elements of Statistical Learning textbook is available for free. It is a classic, widely-used textbooks for statistics and machine learning. Here is a far from complete list of some of the topics:
- Supervised Learning
- Linear/Logistic Regression
- Model Selection
- Neural Networks
- Support Vector Machines
- Random Forests
- Unsupervised Learning
As you can see, the book is quite extensive.
Note: This book has been available for a quite a while, but I realized I have not added a link to it on my blog.