From within the strange loop of self-reference the question “What is Data?” emerges. Ok, maybe more practically the question arises from our technologically advancing world where data is everywhere, spouting from everything. We claim to have a “data science” and now operate “big data” and have evolving laws about data collection and data use. Quite an intellectual infrastructure for something that lacks identity or even a remotely robust and reliable definition. Should we entrust our understanding and experience of the world to this infrastructure? This question seems stupid and ignorant. However, we have taken up a confused approach in all aspects of our lives by putting data ontologically on the same level as real, physical, actual stuff. So now the question must be asked and must be answered and its implications drawn out.
Data is and Data is not. Data is not data. Data is not the thing the data represents or is attached to. Data is but a ephemeral puff of exhaust from an limitless, unknowable universe of things and their relations. Let us explore.
Observe a few definitions and usage patterns:
The latin roots point to the looming mystery. “Give” -> “Something Given”. Even back in history data was “something”. Almost an anti-definition.
Perhaps we can find clues from clues:
http://www.wolframalpha.com/input/?i=data&a=*C.data-_*Word-
Has there been a crossword puzzle word with broader or more ambiguity than that? “Food for thought?” seems to hit the nail on the head. The clues boil down to data is: numbers, holdings, information, facts, figures, fodder, food, grist, bits. Sometimes crunched and processed, sometimes raw. Food for thoughts, disks, banks, charts and computers.
????????????????????????
Youtube usually can tell us anything, here’s a video directly answering What Is Data:
Strong start in that video, Qualitative and Quantitative… and then by the end the video unwinds the definitions to include basically everything.
Maybe a technical lesson on data types will help elucidate the situation:
Perhaps sticking to computers as a frame of reference helps us. Data is stuff stored in a database specified by data types. What exactly is stored? Bits on a magnetic or electric device (hard drive or memory chip) are arranged according to structure defined by this “data” which is defined or created or detected by sensors and programs… So is the data the bit? the electric symbol? the magnetic structures on the disk? a pure idea regardless of physical substrate?
The confusing self-referential nature of the situation is wonderfully exploited by Tupper’s formula:
http://mathworld.wolfram.com/TuppersSelf-ReferentialFormula.html
What exactly is that? it’s a pixel rendering (bits in memory turned into electrons shot a screen or LED excitations) of a formula (which is a collection of symbols) that when fed through a brain or a computer programmed by a brain end up producing a picture of a formula….
The further we dig the less convergence we seem to have. Yet we have a “data science” in the world and employ “data scientists” and we tell each other to “look at the data” to figure out “the truth.”
Sometimes philosophy is useful in such confusing situations:
Information is notoriously a polymorphic phenomenon and a polysemantic concept so, as an explicandum, it can be associated with several explanations, depending on the level of abstraction adopted and the cluster of requirements and desiderata orientating a theory.
http://plato.stanford.edu/entries/information-semantic/
Er, that doesn’t seem like a convergence. By all means we should read that entire essay, it’s certainly full of data.
Ok, maybe someone can define Data Science and in that we can figure out what is being studied:
https://beta.oreilly.com/ideas/what-is-data-science
That’s a really long article that points to data science as a duct taped loosely linked set of tools, processes, disciplines, activities to turn data into products and tell stories. There’s clearly no simple definition or identification of the actual substance of data found there or in any other description of data science readily available.
There’s a certain impossibility of definition and identification looming. Data isn’t something concrete. It’s “of” everything. It appears to be a shadowy representational trace of phenomena and relations and objects that is itself encoded in phenomena and relations and objects.
There’s a wonderful aside in the great book “Things to Make and Do in the Fourth Dimension” by Matt Parker
Data seems to have a finite, discrete property to it and yet is still very slippery. It is reductive – a compression of the infinite patterns in the universe, it is also a pattern. Compressed traces of actual things. Data is wisps of existence, a subset of existence. Data is an optical and sensory illusion that is an artifact of the limitedness of the sensor and irreducibility of connections between things.
Data is not a thing. It is of things, about things, traces of things, made up of things.
There can be no data science. There is no scientific method possible. Science is done with data, but cannot be done on data. One doesn’t do experiments on data, experiments emit and transcode data, but data itself cannot be experimental.
Data is art. Data is an interpretive literature. It is a mathematics – an infinite regress of finite compressions.
Data is undefined and belongs in the set of unexplainables: art, infinity, time, being, event.
Data = Art
the stuff we all pay homage to and yet, there is some doubt, we don’t know what we are talking about.