Is Big Data a modern day Crystal Ball?
Fact #1: We generate 2.5 quintillion bytes of data every day.
Fact #2: 90% of the world’s data was generated in the past two years alone.
Fact #3: The collection of data grows exponentially with advancements in technology.
Every time we utilize technology, pieces of our information are automatically mapped to remote data centers, and now these centers are sharing information among one another—the result is Big Data.
The intention behind data collection isn’t to creep out consumers, and the idea is certainly not new. Data analysis has always been a critical part of business’s enterprise management systems to help visualize workflow and make decisions. Now, having an enormous bank of information, businesses can further boost productivity.
For instance, predictive analytics is one of the most powerful mechanisms driving Big Data research and development, flawlessly driving success in the backend. When tightly-knit data points map the future, the results are, in a way, a “crystal ball” for businesses. Only you can’t just hover your hands over it, look at it, and make a prediction—you have to make it pretty and give it a roof first.
It’s about relational information
Raw data itself is rather messy to say the least, with little meaning behind it. For instance, “32”, “Jane”, “dog”, and “LA” could mean anything. Jane has 32 dogs in LA; 32 years-old Jane was born in LA and has a dog; Jane’s dog, who is 32 dog-years-old, was adopted from LA; you get the picture. In order to understand information, we need to structure data in a way that makes sense. This brings us to the
Data gathering from, say, clickstreams, requires integrating data processing frameworks into business models. And choosing a fitting application for data ingestion often requires prior analysis of future usage and drawing a clear Big Data roadmap. Doing so will help businesses pinpoint which architecture is appropriate for their needs.
Open source ingestion tools such as Apache Flume, Elastic’s Logstash, Mozilla’s Heka, and Amazon AWS’s Kinesis can be used for logging and transporting data. Several other commercial ones are available as well like Splunk and SumoLogic.
Clean and enrich it
Data containing disparate variations and cloudy relationships will complicate analysis, requiring a bit of “cleaning up.” Data normalization is a mechanism that can minimize redundancy, resulting in stronger relational datasets. Normalization tools typically involve grouping similar data together and ordering them to appear as the “same” data during processing.
Tokenization can then be used to "mask" real data with random substitutes to prevent unauthorized access to sensitive information. The relationship between the real data and its token is then stored in a vault to be retrieved by approved personnel. Enriched data maintains Big Data’s overall integrity, making open source enrichment tools like Apache Spark and Scala worthwhile.
Store and structure it
Before businesses can access data, it has to be stored in a warehouse in structural form. For example, leveraging tools like Apache Spark mentioned above automates data storage into relational tables, where it can be accessed and queried against via database scripts.
Then, through Business Intelligence reporting tools and other useful visualizations, businesses can finally use Big Data to gain actionable insights.
Human interpretation of the data is perhaps the most fundamental aspect of Big Data, leading to the “crystal ball” effect. It turns out predicting the future requires a keen eye, after all. Ultimately, we fulfill Big Data’s fundamental purpose, stated by Carla Fiorina, former CEO of HP:
The goal is to turn data into information, and information into insight.
History has it that humans focused on this goal produce miraculous outcomes. Here are some powerful ways Big Data has reshaped industries through people.
Transforming Medical Fields
Prior to data analytics, doctors relied on limited information for diagnostics and treatment. This changed when BioMed and technology wedded. The romance began with body watches, skin-hugging sensors, and even digital tattoos, and since then it’s seen no end. Doctors and patients have co-spun large “hospital networks” as a result, that can, and have, treated cancer patients, like one in Beth Israel Deaconess Medical Center (BIDMC).
Technology developed at the BIDMC has recently yielded positive results for cancer patient, Kathy Halamka. During her treatment, her husband John Kalamka, Chief Information Officer of that hospital, proposed leveraging Big Data through the Shared Health Research Information Network (SHRINE) to determine a more effective treatment. SHRINE linked multiple Harvard-affiliated hospital’s databases filled with 6.1 million patients information, so the assumption there is that surely there is at least one patient whose situation is similar to Kathy’s. Dr. Halamka explains:
We could say, ‘I’m looking for age-50 Asian females who were treated with stage III breast cancer’. What were their medications, what were their outcomes?
And if they were successfully treated, the same treatment may work for Kathy. Through careful analysis, Kathy’s doctors decided to give her chemotherapy drugs instead. It was a success. She says that the treatment was perfect:
The people in radiology thought they were being punked.
Miracle indeed. But the medical field isn’t the only one who benefits from Big Data, e-commerce is too.
Have you ever wondered why items shipped by Amazon come to your door at lightning speed? Big, big data.
Amazon uses location-based predictive analytics to determine what people will order and where it will ship to beforehand. This information allows them to pre-package their products and send them to your local shipping facilities, even before you click the buy button. Put differently, if you’re a frequent Amazon shopper, most likely many of your unordered items are already packed and waiting to be delivered. While it is a little spooky, it’s obviously keeping customers happy and returning to buy more, so everyone’s a winner.
Now everyone’s making predictions; Dell’s executive director Darin Bartik shares his two cents:
Big data initiatives will encourage more [business] growth and investment, and additional returns on that investment will be achieved as [organizations] dive further into different datasets and embrace ever-improving analytic capabilities.
All in all, the future looks bright for Big Data. And when it comes to saving lives and improving business models, having more data is definitely better than having none at all. As Jim Barksdale, former CEO of Netscape, jokes:
If we have data, let’s look at data. If all we have are opinions, let’s go with mine.
True that, for all opinion holders, because at the moment, Big Data is probably the closest thing we can get to a true crystal ball.
See Also: How To Learn SQL For Free
We can probably ignore the opinions at the moment though. With all the data we have right now; Big Data is probably the closest thing we can get to a true crystal ball.
How do you feel about using your data to save someone else’s life? Your opinion’s going to matter later in the Big Data sphere, voice them here now.