Your Guide to Software Selection

A Crash Course in Big Data

The data revolution isn’t stopping any time soon. For businesses of every industry and size, the use of Big Data is only continuing to increase. After all, it’s been one of the most well-known buzzwords of the last few years for a reason. But despite how much it’s talked about, many people still don’t know what Big Data actually is. So for those who are uninitiated into the world of Big Data analytics, here’s your crash course:

What is Big Data?

Big Data refers to the massive amount of data that companies collect and analyze. Some people use the term to simply describe the volume of data collected. However, most of the time it’s used to describe the systems and processes used to collect, store, analyze and output data.

Big Data can use data of any kind — structured or unstructured, email clicks or in-store purchases. The end goal of Big Data analytics is to find actionable insights, i.e. data trends that reveal changes you can implement to improve your business. These insights are the basis of data-driven decision-making — making objective business decisions, such as whether or not to change suppliers, based on data trends that’ll give you a competitive advantage.

In order to produce more detailed and powerful insights, new technologies like machine learning and artificial intelligence are being used. These technologies help sort through the seemingly overwhelming amount of data points and find patterns that even some of the most skilled data scientists can’t.

The 4 V’s of Big Data

When defining Big Data more in-depth, there are certain characteristics that you can point to. These are known as the 4 V’s of Big Data:


The term “Big Data” wasn’t a random choice; the word “big” is used to describe the massive amount of information used in Big Data analysis. To give you some perspective: Big Data data sets aren’t measured in megabytes or gigabytes. Instead, they’re measured in terabytes and petabytes of data. In other words, a little analysis on an Excel spreadsheet with 100 rows isn’t Big Data because it just isn’t big enough. However, an analysis of every Facebook user’s ad clicks is very much Big Data analysis.


Variety is more or less self-explanatory. To qualify as Big Data, a specific data analysis has to use a variety of data in that analysis. As we mentioned earlier, this could be any kind of data that’s relevant to your business.

What’s more interesting about variety, however, is structured and unstructured data. Structured data is what most people typically think of when they think of data: numbers and information such as dates, money, names, etc. neatly organized into tables of columns and rows. This easy organization makes structured data easy for computers to analyze.

Unstructured data, on the other hand, is the data that isn’t so neat. This data is more abstract, making it much harder to analyze. Examples of unstructured data include pictures, blogs, text messages, voice recordings and other things “based on a human understanding,” as describes.


Velocity refers to the speed at which data is collected. Big Data doesn’t include any kind of data analysis where you collect a few data points per day. Big Data refers to analysis that collects data on a constant basis. Things like social media posts, retail transactions and app usage are just a few examples of the type of high-velocity activities that Big Data tracks.


Veracity involves the accuracy of the data. In other words, how much can you trust the data you’re using? Big Data analysis is worthless if you’re using inaccurate data, because any insights you gain from inaccurate data are false and misleading. Therefore, the veracity of your data is absolutely essential. Duplicate and missing data are two of the biggest culprits when it comes to inaccurate data.


With the massive amount of data that needs to be analyzed, how are you supposed to do so in any kind of reasonable timeframe? After all, the computing power required to process a large data set can easily shut down your average laptop. Enter Apache Hadoop, an open source software framework that efficiently distributes the storage of large data sets across clusters of computers and servers, called Hadoop clusters.

When using Hadoop, you set up your own physical servers and computers, called NoSQL databases, which are networked together. After you have the infrastructure, you reach out to a Hadoop vendor such as Hortonworks or Cloudera to install, configure and manage the Hadoop framework on those computers.

The Hadoop setup stores data in chunks across different computers, so when you process the data, it’s pulled from several different sources. One of the main advantages of this, in addition to efficient storage, is that if a computer shuts down, you don’t lose all of your data. In order to process data with Hadoop, different Hadoop tools are used, such as MapReduce, Hive, Pig, Impala, Mahout and the newer Scala. For a more in-depth look at Hadoop, check out this article from

Big Data in Use

Need a few examples of how companies are using Big Data today? Mark Schaefer collected several case studies on the use and subsequent success of Big Data by some of the biggest companies in the world. British Airways, for example, combined data from their customer loyalty program with that of their customers’ online behavior. This combination helped the airline create more targeted, relevant offers for their customers, creating a more positive experience.

Schaefer also highlighted American Express, which put Big Data to work to predict customer loyalty. By analyzing historical transactions as well as 115 other variables, the financial giant uses predictive analytics to identify which accounts will close within the next four months.

The Future of Big Data

So that’s the past and present of Big Data, but what does the future hold? Gil Press from Forbes put together some predictions of what we should expect to see in 2018 and beyond.

Possibly the most important prediction is that AI will eliminate the difficulties that unstructured data presents to data analysis. According to Press, deep learning has made unstructured data analysis more accurate and scalable, which will bridge the gap between structured and unstructured data analysis.

Another big prediction is that 50% of enterprises will adopt a cloud-first Big Data analytics strategy by the end of 2018. The on-premise systems of old aren’t very flexible or scalable, and they make controlling costs difficult. All of this is already leading a migration to the cloud, which will only speed up in the years ahead.

How Will You Use Big Data?

So now you’ve graduated from our crash course in Big Data, which means you know what it means and how it works. That means that the big question is: how will you use Big Data? You could use it like many companies do — identifying sales patterns, forecasting future demand, etc. But you just may find a use that nobody else has before. After all, there are practically no limits to what you can improve using Big Data. The only limit lies in how creative you can get with your analysis.

SelectHubA Crash Course in Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *