Launching in the autumn of 2013, Open Data Festival will be hosting a global data festival. The details are quite vague at this point, but they are looking for volunteers, cities, and speakers. Feel free to sign up.
The festival is being organized by the same team that organizes Big Data Week.
This post is notes from the Coursera Data Analysis Course.
Here are some R commands that might serve helpful for cleaning data.
- sub() replace the first occurrence
- gsub() replaces all occurrences
Quantitative Variables in Ranges
- cut(data$col, seq(0,100, by=10)) breaks the data up by the range it falls into, in this example: whether the observation is between 0 and 10, 10 and 20, 20 and 30, and so on
- cut2(data$col, g=6) return a factor variable with 6 groups
- cut2(data$col, m=25) return a factor variable with at least 25 observations in each group
- merge() for combining data frames
- sort() sorting an array
- order(data$col, na.last=T) returns indexes for the ordered row
- data[order(data$col, na.last=T),] reorders the entire data frame based upon the col
- melt() in the reshape2 package, this is for reshaping data
- rbind() adding more rows to a data frame
Obviously, these functions have other parameters to do a lot more. There are also a number of other helpful R functions, but these are enough to get you started. Check the R help (?functionname) for more details.
This is a video infographic about pizza delivery in Manhattan. This is another good way to make data tell a story.
Yesterday, I made some predictions about the startups I thought would win at the Strata Startup Showcase. Here are the winners.
So how did I do? Well, I got one of the winners correct. I selected Placed. Hopefully videos of the demos will be available. If I find them, I will post some of them to the blog.
Blake Shaw, a data scientist at Foursquare, gave a great talk at Datagotham. The visualization of New York City check-ins, at the beginning of the video, is simply amazing. It is worth watching the video just for that. However, after seeing that great visualization, you will be persuaded to watch the rest of the video. This talk is an excellent example of what good data science and visualization can do.
Data Gotham, the New York City Data Science Conference, is live streaming the presentations today. Things are already underway, so you should be able to go to the site and start watching.
A while back, Strata hosted a web conference titled Data in Motion. The slides and audio are now available online. The conference is focused on unique applications of data used for movement. Examples are: trains, aerospace, and even car racing. The first talk on formula one car racing was fascinating. I had never thought about the amount of data analysis that goes into racing.
A few professors from Stanford University have released version 1.1 of their textbook, Mining of Massive Datasets. The book has been created from materials used for a couple of Stanford computer science classes including large-scale data-mining and web mining. The book looks excellent and really focuses on the analysis of data at a large scale. Some people would use the word bigdata. Below is a list of some of the topics covered in the textbook.
- data mining
- recommender systems
- and more
The book is free for download, or available from Cambridge University Press.