10 Big Data Best Practices

10 Big Data Implementation Best Practices

This is a great article and list of topics to remember when working on big data projects. Here is the list.

  1. Gather business requirements before gathering data
  2. Implementing big data is a business decision not IT
  3. Use Agile and Iterative Approach to Implementation
  4. Evaluate data requirements
  5. Ease skills shortage with standards and governance
  6. Optimize knowledge transfer with a center of excellence
  7. Embrace and plan your sandbox for prototype and performance
  8. Align with the cloud operating model
  9. Associate big data with enterprise data
  10. Embed analytics and decision-making using intelligence into operational workflow/routine

See the original article, 10 Big Data Implementation Best Practices, for details.

Big Data Journal: 5 articles to highlight

The inaugural issue of Big Data was published a few weeks ago. The journal is excellent. The articles are relevant, readable, and free. In the first issue, most of the articles were not super technical (meaning there was not a lot of equations or algorithms). I would like to highlight just 5 of the articles (feel free to read the others as well).

  1. Making Sense of Big Data – A nice brief discussion of the term big data and some goals for the journal.
  2. Big Data For Development - This is an introduction to United Nations Global Pulse, an initiative to use data to better understand human well-being.
  3. Broad Data: Exploring the Emerging Web of Data – This article is all about dealing with the explosion of open data becoming available.
  4. Data Science and Its Relationship to Big Data and Data-Driven Decision Making – The title is pretty self-explanatory. The article points out 7 fundamental concepts of data science.
  5. Educating the Next Generation of Data Scientists – This is a roundtable discussion all about data science and data science education.

Top 5 Data Science Blogs

  1. p-value.info - This blog is only about 1 month old, but it is filled with great stuff.  I just hope Carl , a data scientist at One Kings Lane, can keep up the good posts.
  2. Metamarkets Blog - Metamarkets is a startup focusing on data analytics for business users.  The blog contains lots of data science information.  During the summer, the blog ran an excellent series with data scientist interviews.
  3. Kaggle – A great startup with a great blog.  The blog has tips about data science competitions, explanations from winners, and various other data science related posts.
  4. iCrunchData – This is a job site for data-related positions.  That said, the blog is relevant and informative.  They even do data science on job postings for data science.
  5. What’s the Big Data – A frequently updated blog with great links to big data and data science resources. I especially like the “Big Data Quotes of the Week” posts.
Bonus Blogs
  1. Flowing Data – Nathan Dau, the blog’s author, is a PhD student at UCLA.  The blog focuses on visualizations.
  2. Columbia Data Science Course Blog – This was a blog to go along with the Data Science course at Columbia University.  Unfortunately, the blog will no longer be updated since the course is over.  However, it is still worth browsing though, since it covers many of the topics in data science.  It also has some great visualizations.

Data Science Links from Recent Days

Big Data Education

I recently read, Big Data Education: 3 Steps Universities must take

Here are the 3 steps listed:

  1. Data Science cannot be an undergraduate degree
  2. A graduate degree should contain math, stats and computer science
  3. Research

Step 2 seems obvious. Math, stats, and computer science are some of the key areas for data science. I would add communication and presentation skills to the list because people with just math, stats, and CS skills are not known to be naturally good communicators. I agree with step 3. More research needs to be done, but most of the research will need to be interdisiplinary. Universities need to put more effort into interdisiplinary research.

Step 1 confused me a bit. The argument was data science has too many necessary skills and an applied focus area. Of course a person cannot learn everything about data science in an undergraduate degree. Earning a computer science degree does not mean you will know everything about computer science. It just means you know the fundamentals about algorithms, architecture, and operating systems. You know enough about computer science to understand the field and learn more as you go. I think 4 years should be enough time to do the same for data science.

What are your thoughts?

About the Human Face of Big Data

To start, here is a nice quote from the video. The quote is from Eric Schmidt of Google.

From the dawn of civilization until 2003, humankind generated 5 exabytes of data.
Now we produce 5 exabytes of data every two days.
…and the pace is accelerating

Rick Smolan provides a good talk. He is behind The Human Face of Big Data project. I don’t have a copy of the book, but it looks really intriguing. The talk briefly explains what the book/project is all about.