CONTRIBUTED BY SILVIA GAMUNDI
In 1966 the Statistics Department of North Carolina State University started a research project funded by the National Institute of Health to analyze agricultural data for improving crop yields. They studied variables such as temperature, grain type or fertilizer variety in order to identify patterns that turned out into better harvests. This is one of the stories commonly used to go back to the origins of Big Data, and also the birth of SAS (Statistical Analysis System), a software for advanced analytics, business intelligence, data management and predictive analytics, that currently holds the largest market-share for advanced analytics.
But, what is Big Data? Why is important? And, who is important for?
Well, the term “big data” was coined in 1997 by Michael Cox and David Ellsworth, two NASA researchers that faced problems with massive amount of information in visualization of Computational Fluid Dynamics, who wrote: “data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data”.
In 2008, the term was popularized with the publication of “Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society “, and has acquired a great prominence during last years. However, as McKinsey, and other experts point, there isn’t any rigorous definition for big data, though it could be something close to “the massive analysis of data to take most profit of it and make better future decisions and predictions”.
Either way, when people talks about big data, it is usual to appeal to the three Vs of Big Data defined by industry analyst Doug Laney in 2001: Volume, Velocity and Variety, which are often completed with two more “dimensions”: Veracity and Complexity.
Having or not a clear definition, what it seems clear is the importance it has already reached in many fields so disparate such as banking, insurance, NBA, airlines, social media, or government security, spreading to virtually all areas of life.
Focusing on our field of expertise, big data has is also achieving more and more relevance. In fact, human genome decoding originally took 10 years to process, while now it can be achieved in hours and more than 10,000 times cheaper. We are in the “OMICs” era, and massive analysis, of genes, proteins, metabolites, epigenetics modifications, and hardly everything you could imagine are routinely performed. Tons of information are generated every day, but the real challenge is how to manage it and take most profit out of it, and here is where Big Data strategies come.
Some quick inquests about big data: while writing this post, a pubmed search of the term “big data” retrieved as much as 10 publications so recently added that no abstract was yet available. And maybe more surprising, 112 out of 760 publications with the term “big data” have been published during 2015. Considering that we are just at the beginning of March, and if this tendency keeps on, by the end of the year, the publications related with “big data” DURING 2015 will surpass the total “big data” papers published BEFORE 2015 in pubmed.
And this is just the beginning, according to Gartner Group, in 2015 4.4 million of data scientists will be required worldwide. In fact, Harvard Business Review defined data scientist as “The sexiest job of the 21st century”.
Universities aren’t alien to this phenomenon, in point of fact most renowned universities currently offer MSc programs in big data and data management (LINK).
Experts forecast for big data the same future than the internet had some years ago, it will arrive the day that every single enterprise will have a big data strategy just as it has an Internet one. May be the definition is not so important, but it seems pretty clear that soon will be part of our everyday life. You only need to take a look on Twitter social media, and will see hundreds of new “tweets” released every hour with the tag #bigdata.