Here is a data visualization of the paths of tornadoes in the US over the past 56 years. The brighter the blue, the more intense the tornado. This is also an excellent example of using opendata. The raw data is available at data.gov.
Probabilistic Programming and Bayesian Methods for Hackers is an open source online book. The book is developed with iPython, so it can be read in a variety of formats: web, PDF, or locally with iPython installed.
Also, contributions are welcome via the Github repository for the book (or you can email the authors).
This is the first iPython project I have really looked at, and iPython looks very promising.
Are you confused on what hadoop is? What about Hbase, Pig Hive? Well, this link will help you out.
It provides a nice short explaination for the following terms:
Recently, both NYU and Columbia launched academic programs in data science. Well, another school in New York City is entering the mix. The City University of New York (CUNY) is now offering an online masters degree in data analytics. If you would like more information, there will be an online information session on May 22.
This looks to be a great webinar! It is today.
This Spring, Harvard University ran a data science course. Technically, the name of the course was Stat 221 Statistical Computing and Visualization. The course recently finished, and all the course lecture slides are available.
The slides contain a bunch of useful information, plus they show one possible layout for a data science course.
If you are looking for public data, Enigma.io is a new startup just for you. Enigma searches, finds, and connects a variety of formats of public data. The data is then linked and made accessible. Watch the video below for more details.
Plot.ly is a new site that allows for web-based plotting of graphs. The site allows a user to upload data, create a number of plots, and even write python code to generate custom graphs. Then the site has numerous export options for the graphs as well as options for sharing the graph via socia networks.
Below is an example graph via a sharable image link.
I have not had a lot of time to play around with the site, but it looks very impressive. I think there are a lot of possibilities for Plot.ly. First, I could see it used for data analysis in the cloud. Also, I could see it used for sharing plots between researchers or for publishing extra graphs to go along with publications.
Can you think of some other uses for Plot.ly?
The Institute for Data Science and Engineering at Columbia University has released their first academic offering. It is a certificate program titled, Certification of Professional Achievement in Data Sciences. The certificate program consists of 4 courses:
- Algorithms for Data Science
- Probability & Statistics
- Machine Learning for Data Science
- Exploratory Data Analysis and Visualization
Columbia is currently accepting applications for the Fall of 2013. Unfortunately, the program will not initially be offered online.
Also, Columbia is planning to start a new master’s degree in data science sometime in 2014. A PhD program is supposed to come sometime after that. Some of the future programs will also be available online. Combined with the data science program at NYU, New York City is becoming a premiere academic location for learning data science.