When completed, the book will be completely free and open-source. You are welcome to contribute to the fundraising efforts for the book.
Here is a great infographic from Data Science @ Berkeley. Just how big is a Gigabyte(GB)? Be sure to look all the way to the bottom. It mentions/explains a few of the latest innovations in hard drives, for example: helium, SMR, HAMR. You will have to scroll to the bottom to see what those acronyms mean.
Brought to you by datascience@berkeley: Master of Information and Data Science
The topic of internet security has been around for many years, but recently the topics of data science and security have joined forces. Many security applications collect vast amounts of data. Also, many security application operate based upon activity. Data Science can help collect all the past activity and machine learning can be used to help predict new activity as malicious or not. Anyhow, here are 2 recent articles on the combination of security and data science.
This information goes along with the post last week, Open Data Could Be Worth $5.4 Trillion Annually. Just last week France released an action plan for open data. Honestly, I have not read the full report, but it is great to see a government create such a plan. See the full report below.
Michael Chui of McKinsey Global Institute provided some clear insights about the benefits of opendata. Here are the 4 characteristics of open data provided by Chui:
- Access by Everyone
- Formatted for Easy Reading by a Computer
- Free(no cost)
- Unlimited Rights to redistribute and reuse
Also, Chui describes how an organization can get the most from their open data. It is not enough to just make the data available, the organizations must provide an ecosystem focused around the open data. Here are some of the strategies he discussed.
- Identify and Prioritize the Correct Data to open
- Get Developers/Data Scientists (internal/external) Playing with the data
- Privacy/Policy Issues
- Platforms & Standards along with metadata
He also mentions the potential economic benefits of open data ranging from $3.2 billion to $5.4 trillion. For more information on open data see the latest Report from McKinsey Global about Open Data and/or watch the video below.
Earlier, I posted about Scientific Data, but unfortunately the site does not host any of the data.
Enter Dryad, data hosting is exactly what the site does. The site hosts opendata and any other digital artifacts associated with a research project. Plus the site provides a DOI (Digital Object Identifier) for citing the the artifacts in research papers.
Nature.com is starting a new publication titled, Scientific Data. The goal is to help researchers publish and discover data. The publication content is called a Data Descriptor. It describes the data, explains the data collection methods, lists the columns, and states other essential information about the dataset.
Unfortunately, the site does not host any of the data. I think it will be interesting to watch how a site like this develops. The publication is currently accepting submissions.