This week we're talking about the Five Vs of Big Data.
Big Data has a lot of descriptions associated with it. I've created a separate article just to cover the five Vs of Big Data: volume, velocity, veracity, variety, and value. Note: the number of Vs varies depending on who you talk to, but I think these five are the most important, and tend to cover other terms that seem to be added just because they start with V!
The Five Vs of Big DataVolume is the amount of data in question. Terabytes, Petabytes, Exabytes and beyond. Sifting through large amounts of data requires different algorithms and techniques than the data processing methods of previous years. Large amounts of data also require larger storage systems. Do you use a cloud solution? Local storage? How often you access your data will play a key role in how you store it.
The rate at which data is obtained can vary from a slow trickle that accumulates large volumes over time to massive influxes of data over shot periods of time. The more data you have coming in over short periods of time, the more you need to rely on techniques that sort and sift your data on the fly. If you try to push too much data through an Internet connection that isn't large enough you'll start dropping data during transmission. Be sure your connections can handle spikes in data throughput. Many ISPs provide flexible connections that can expand to meet temporary increases in throughput.
Veracity is the degree to which your data is valid for your purpose. If you are collecting weather data, are you using data from calibrated, approved weather stations? If you are collecting stock market data, does it come from a major exchange? Veracity also means trusting the source(s) of your data. Data curation should be part of your data management system.
Big data comes in many varities: social media streams, software, financial data, huge files, collections of small files, encrypted, compressed, etc. Optimizing your processes based on the type of data coming in (or going out) is essential. Streaming data requires a different acquisition approach that more traditional data. If you are transferring a fixed dataset then transmission delays just cost you time. Delays in real-time streaming data can cost you data.
At the end of the day all data is useless unless it provides value to your company. Months of Twitter comments take up vast amounts of storage space, but if your company has no Twitter presence, or does not utilize data from Twitter, what value is it to you? Decide what data is of value to you before you start collecting it. Searching for needle in a haystack is hard enough without making the haystack needlessly larger.
Big Data requires different policies and procedures than other types of data. Make sure your data governance and data management teams are aware of the differences and have processes in place for the various types of data your business uses.
Data Size Terms
You may also be interested in the Five Cs of Data
Until next time, thanks for Talking Technology with me!
Copyright ©