Cost and Value of Big Data

Continuing the discussion about Big Data…

What is the value of Big Data? What is the cost of identifying , collecting, cleaning, analyzing, and marinating these big Data? Can we define optimal “Big Data”?

In Health and Human Services (HHS), we go through cycles of buzz words, currently our buzz words are “big data, data lakes http://www.forbes.com/sites/edddumbill/2014/01/14/the-data-lake-dream/#1e36adf180d2 …  There is not much discussion within the realm of health and human services, providing services that matter and for which we can clearly demonstrate a return on our investment are critical. We care about designing program services that are right for the person (just like precision medicine), evaluating services that are meaningful, and funding services that add value and improve people’s lives.

The big HHS vendors, now pitch the value of “Big Data”, as a new remedy to all ills in human service programs. Hardly new, I was discussing the value of quality improvement and iterative processes about 15 years ago.  So the challenge is not in just amassing these big behemoth data sets, but in actually understanding how we are going to derive meaningful information so that we can generate knowledge and make policy decisions that are based on real outcomes rather than perceived and/or potential outcomes.  If I could put a penny in a jar for every time the word potential is used in the context of “gaining health benefits and cost-savings” by increasing use of Health IT and using “Big Data” I would be a multi-millionaire.

The latest story on the value of “Big Data” can be evaluated by IBM’s investment of $2.5 billion to get data from joining forces with Truven Analytics to feed “Watson” https://www-03.ibm.com/press/us/en/pressrelease/49132.wss. In some circles the value proposition of IBM buying Truven is being hotly debated. The challenge is in predicting the true value of this purchase as it takes a small army to understand the significance and application of these findings and then to be honest about sharing what works and what doesn’t. I have been working with data since I graduated school and I have yet to find a dataset or an analysis that did not result in many more questions than answers. Hence, when we say “Big Data” has the key to all the answers, I am truly bemused. To date the work has focused on analyzing small components that makes sense and then glossing or omitting large amounts of data that are irritatingly noisy.

The fancy idea of the “data lakes” are good, as I do believe that just plain old easy-to-manage-relational-databases are not structured to help find solutions for the social problems. Issues, such as, why after declaring war on poverty, there continues to be a culture of poverty http://lchc.ucsd.edu/MCA/Mail/xmcamail.2010_11.dir/pdfKPNFlustp6.pdf or why being poor is expensive (http://www.economist.com/news/united-states/21663262-why-low-income-americans-often-have-pay-more-its-expensive-be-poor). We need to not only use data that explains people’s life and the context in which we live, but we need to “meaningfully” integrate clinical, non-clinical, cost, and social determinant data to first understand the complexity of these social problems. Perhaps then we may begin to humbly take on problems one at a time, learn from our processes, and be honest when our solutions do not work out so that we can improve effectively and efficiently.