What is big data?

Big data has been ascribed a number of definitions and characteristics. Any study of big data must begin with first conceptualizing defining what big data is. Over the past few years, this term has been become a buzzword, used to refer to any number of characteristics of a dataset ranging from size to rate of accumulation to the technology in use.[1]

Many commentators have critiqued the term big data as a misnomer and misleading in its emphasis on size. We have done a survey of various definitions and understandings of big data and we document the significant ones below.

Computational Challenges

The condition of data sets being large and taxing the capacities of main memory, local disk, and remote disk have been seen as problems that big data solves. While this understanding of big data focusses only on one of its features—size, other characteristics posing a computational challenge to existing technologies have also been examined. The (US) National Institute of Science and Technology has defined big data as data which “exceed(s) the capacity or capability of current or conventional methods and systems.” [2]

These challenges are not merely a function of its size. Thomas Davenport provides a cohesive definition of big data in this context. According to him, big data is “data that is too big to fit on a single server, too unstructured to fit into a row-and-column database, or too continuously flowing to fit into a static data warehouse.” [3]

Data Characteristics

The most popular definition of big data was put forth in a report by Meta (now Gartner) in 2001, which looks at it in terms of the three 3V’s—volume[4], velocity and variety. It is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.[5]

Aside from volume, velocity and variety, other defining characteristics of big data articulated by different commentators are— exhaustiveness,[6] granularity (fine grained and uniquely indexical),[7] scalability,[8] veracity,[9] value[10] and variability.[11] It is highly unlikely that any data-sets satisfy all of the above characteristics. Therefore, it is important to determine what permutation and combination of these gamut of attributes lead us to classifying something as big data.

Qualitative Attributes

Prof. Rob Kitchin has argued that big data is qualitatively different from traditional, small data. Small data has used sampling techniques for collection of data and has been limited in scope, temporality and size, and are “inflexible in their administration and generation.”[12]

In this respect there are two qualitative attributes of big data which distinguish them from traditional data. First, the ability of big data technologies to accommodate unstructured and diverse datasets which hitherto were of no use to data processors is a defining feature. This allows the inclusion of many new forms of data from new and data heavy sources such as social media and digital footprints. The second attribute is the relationality of big data.[13]

This relies on the presence of common fields across datasets which allow for conjoining of different databases. This attribute is usually a feature of not the size but the complexity of data enabling high degree of permutations and interactions within and across data sets.

Patterns and Inferences

Instead of focussing on the ontological attributes or computational challenges of big data, Kenneth Cukier and Viktor Mayer Schöenberger define big data in terms of what it can achieve.[14]

They defined big data as the ability to harness information in novel ways to produce useful insights or goods and services of significant value. Building on this definition, Rohan Samarajiva has categorised big data into non-behavioral big data and behavioral big data. The latter leads to insights about human behavior.[15] Samarajiva believes that transaction-generated data (commercial as well as non-commercial) in a networked infrastructure is what constitutes behavioral big data.

[1]. Thomas Davenport, Big Data at Work: Dispelling the Myths, Uncovering the opportunities, Harvard Business Review Press, Boston, 2014.

[2]. MIT Technology Review, The Big Data Conundrum: How to Define It?, available at https://www. technologyreview.com/s/519851/the-big-data-conundrum-how-to-define-it/

[3]. Supra note 1.

[4]. What constitutes as high volume remains an unresolved matter. Intel defined Big Data volumes are emerging in organizations generating a median of 300 terabytes of data a week.

[5]. http://www.gartner.com/it-glossary/big-data/

[6]. Viktor Mayer Schöenberger and Kenneth Cukier, Big Data: A Revolution that will transform how we live, work and think” John Murray, London, 2013.

[7]. Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and their consequences, Sage, London, 2014.

[8]. Nathan Marz and James Warren, Big Data: Principles and best practices of scalable realtime data systems, Manning Publication, New York, 2015.

[9]. Bernard Marr, Big Data: the 5 Vs everyone should know, available at https://www.linkedin. com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know.

[10]. Id.

[11]. Eileen McNulty, Understanding Big Data: the 7 Vs, available at http://dataconomy.com/sevenvs-big-data/.

[12]. Supra Note 7.

[13]. Danah Boyd and Kate Crawford, Critical questions for big data. Information, Communication and Society 15(5): 662–679, available at https://www.researchgate.net/publication/281748849_Critical_questions_for_big_data_Provocations_for_a_cultural_technological_and_scholarly_ phenomenon

[14]. Supra Note 6.

[15]. Rohan Samarajiva, What is Big Data, available at http://lirneasia.net/2015/11/what-is-bigdata/.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s