Default Banner

Downsizing big data

02/07/2019
Downsizing big data

What is big data?

"According to IBM, 2.5 quintillion bytes of data are produced every day. This means that 90% of the data present in the world today was formed during the last few years."

The term refers to data sets that are of at least of a petabyte (a million gigabytes) and are compiled from many diverse sources. They are beyond the current capabilities of your run-of-the-mill database software tools that are required to capture, store, manage and analyse the data. The business value of going through such a process comes from the insights that were not possible to obtain. Acting on these insights or using them to support decision making throughout the business process is the aim of implementing analytical practices based on big data.

Big data > Analytics > Insights > Decision making

The Vs of big data

The Vs are dimensions that quantify what big data should be perceived as. First coined in 2001 by Douglas Laney with the core 3 Vs: Volume, Velocity and Variety. As the knowledge pool of big data grows, a fourth dimension, Veracity was later considered to be a core dimension. As of writing, some academic reports are exploring the possibility of other core dimensions.

The Volume dimension describes the sheer quantity of data being created and utilised every second by the data solution at its feeding sources. The datasets have become so large they can no longer be stored and analysed using conventional database technologies instead distributed systems are required.

  • In simpler terms, instead of storing the data in a centralised location, it is now spread across several databases that can be turned into data clusters that allow analysts to use data as if it was in a single database.

Velocity primarily describes the speed at which data flows to the data solution, but it also pertains to the rate of change within the data cluster, between linked datasets. Particular relationships between datasets can exhibit bursts of activity instead of a steady tempo.

Variety of data is important to the sustainability of the data solution in different business situations. Using both structured and unstructured data, Analysts can create linked data sets that could give insights whilst taking in a myriad of possible factors.

  • Structured data may come from business transactions and business application generated reports. It is found in databases and is usually present before any data solutions are created. The data has a high degree of organisation allowing it to be searchable and readily available.
  • Unstructured data is all the other data that comes from the multitude of sources linked to the data solution. This includes sensory data, IoT device data, social media, digital media, anything that could be used by the data solution to create data sets to provide more accurate insights. This data requires time and energy compilated and turned into something useful.

Veracity measures the quality of the data itself. Out of all the petabytes received by a data cluster, only 10 to 25 per cent could be usable as the rest is inaccurate or does not conform to the data quality policy being upheld.

Such large massive data sets, of course. require vast storage arrays. Improved manufacturing techniques have allowed the cost of storage (cost per gigabyte) to decrease vastly whilst the development of solid-state drives (SSD) has taken read/write speeds to greater heights. SSDs allow for greater data density and even greater security since they have no moving parts.

Using cloud computing to reduce the cost to use big data in analytical operations.

Systems that could support big data analysis requires a great investment in computing resources which is not viable for many small to medium enterprises. The biggest enemy to a big data solution is latency, large amounts of data require large amounts of space and high data throughput to make and keep the query times to a minimum. A high query response time may lead to the original purpose of the query to irrelevant due to factors that are constantly changing. These time critical operations are becoming more apparent in today’s commercial landscape. From the iGaming industry where thousands of transactions happen every minute against shifting odds to self-driving cars constantly refactoring due to environmental and situational changes.

Cloud-based computing circumvents the large initial costs by removing the investment in stand-alone software and servers. Using the cloud’s capabilities, overhead costs of data storage, software updates and management are removed. The company can scale the cloud infrastructure and its costs according to their needs.

What does big-data bring to the table for local businesses?

Fine, so big-data is complex, vast and doesn’t cost an arm and a leg to run if the right implementation methods are used … but is it for any busy and what return should be expected from such an investment?

Can big data help you retain your customer?

The large volumes of data generated by customer interactions with businesses across multiple mediums tend to be omitted due to the resources required to sort, analyse and retrieve insights when using stand data management techniques and technologies. Big data can help negate these challenges and allow businesses to leverage basic interactions including web interactions, shop visits, social media interactions and transactional data such as commercial payments into meaningful and powerful insights. Using data matching techniques, the many dots of a customer’s interactions can be connected to build a comprehensive customer profile that is updated in real-time with every interaction they do.
Once created, the profiles can be given a quality rating; a number that depicts the amount of interaction between the customer and the company. Since the profiles are updated in real time, the quality rating can be monitored in case it falls past a certain threshold. Only then should marketing and sales resources target that customer.

Utilising such a concept will decrease business costs since customer retention is far less expensive than acquiring new clients since working leads through a complete sales funnel uses up far more marketing and sales resources.

A more customer tailored service

Taking the customer profile concept further, these profiles could be used to assess the available customer service to each client based on their online social interactions. If a customer’s satisfaction may seem to be less than expected, the customer service team may reach out with a gesture to resolve any problem and gain increased satisfaction. Positive customer relations are imperative to business growth and should be the main objective for a consumer-focused business.

Matthew De Giorgio is an analyst with Deloitte Digital Malta. For more information, please visit www.deloittedigital.com.mt