The Distinctions Among Information Creation, Transformation, Gathering, and Transmission and What It Means For Data Science
For the purposes of this post, let information mean stuff that resolves uncertainty.
Different people have different relationships with information in part because they have different relationships with uncertainty.
Data scientists, those who turn data into product (Loukides), have a different relationship with information than, say, the median store manager or the median politician.
Some data scientists focus on creating new information. Some invent new sensors. Some deploy existing sensors to new places. Some combine sensors in different ways. They have very good reasons for doing that.
They aren’t the only people who deploy sensors. Camera operators deploy sensors at certain times to capture sounds and images.
Some data scientists create new information using nothing but their minds and their words. They write stories, agendas, and markets. They aren’t the only people who create new information from their minds. Arguably, artists, philosophers and mathematicians sometimes create original ideas from pure nothingness. I’ve heard it argued that they discover the truth about what nature has already created, they merely transformed it into an artifact, meaning, or knowledge.
It can be argued that transformation is just another form of creation. Entrepreneurs do this all the time by observing the world and transforming their perceptions into a new idea. Researchers do this whenever they do a literature review, often transforming the original information created by others into something new. There is, after all a clue in the word research itself: to search again. One might even argue that the act of deploying sensors is a form of transformation, converting the analog into digital, preserving a shard of reality.
Arguing from the data governance perspective, it is extraordinarily useful to argue that every transformation of information is an act of creation. It follows that if you create it, then you’re accountable for it. I make no claim for the popularity of the argument.
It can be argued that transformation is distinct from creation because something has to exist before it can be transformed. After all, how can anything exist if it hasn’t been created? And, it could still be argued that there is still such a thing as original creation. What causes special angst is the relationship of the past to the future. Information can be used to inform a vision of the future, and then people bring it about. The ideas and counterfactuals about future creation and present transformation are at the heart of bias and causation.
So, it could be argued that there is no real distinction between creation and transformation, and it could be argued that there is one. Reasonable people could understand the arguments for both.
Information gathering is the act of gathering information. In many institutions, there are entire divisions of people that gather a bunch of information and put it into a single artifact, like a report, a powerpoint presentation, a statement, a dashboard, or a spreadsheet. People within institutions have different motivations for asking for, and sometimes ignoring, the information they have requested.
It takes a lot more mental energy to disprove a bias than it does to accept information that confirms it. Is it any wonder that people tend to be more receptive to information that resolves their uncertainty about something they know than to think through what a piece of disconfirming information might mean?
Sometimes the process of information gathering involves information creation and information transformation. Sometimes a list of figures has to be transformed into a median, mean, or mode before it can be understood. Sometimes it is useful to transform the relationship between two lists of figures into another list of figures that might suggest how related they are.
Reasonable people can see why the act of information gathering can resemble creation and transmission, and why they could be distinct.
Information transmission is the act of transmitting information. There are different ways and protocols that information can be transmitted. The REST API. The HTTP. The IPFS. Information can also be transmitted by the sound of your voice giving a presentation with all the protocols that entails. There seems to be as many protocols for telling a story as there are protocols for transmitting digital data digitally.
Politicians deploy their mouths, hands, and hair to communicate. “Like Ron, a hard working taxpayer from Wichita who works two jobs to put bread on the table for his family of five.” See what they do? Deliberate word choice intended to resonate with the minds of those they need support. Facts tell but stories sell.
Stand back and consider the fuzziness in the distinctions. Creation, Transformation, Gathering, and Transmission are much alike and much different. It just depends on how much you squint at them.
The implication for data science is a bit of confusion and a whole lot of specialization.
Some data scientists spend their careers focused on data creation. They’re among the most interesting people I know. They think very carefully about valuable uncertainties, and create new streams of information to resolve those uncertainties. Value is created through the resolution of uncertainty. This is why we experience hoards of data scientists in crypto, finance, and market creation of virtual goods. It’s where the most valuable mysteries are.
Some data scientists spend their entire careers focused on data transformation. They’re fascinating people. They think carefully about how data can be summarized into descriptions, predictions, and prescriptions. They concern themselves with variance, bias, and a whole zoo of error terms. Value is created through the resolution of uncertainty about reality through the data. This is why we experience hoards of data scientists in search, recsys, adtech, finance, and natural language processing. It’s where the most valuable mysteries are.
Some data scientists spend their entire careers focused on data gathering. They’re quirky people. They gather information, massage it, and put it into formats for people to understand. The value can range from the negative to the positive. After all, if you have no uncertainty to resolve because you are certain about a decision, then any information gathered is wasteful, isn’t it? There sure are a lot of very certain, very confident, people in the world, aren’t there? And sometimes, information makes all the difference. Data scientists tend to find the environments where information matters. It’s where the most valuable mysteries are left to explore.
Very few data scientists spend their entire careers focused on transmission. Those that do can be wonderful. They get to know their audiences. They tell stories. They try to reduce uncertainty but end up increasing it from time to time. It’s somewhat of a mystery why there isn’t more focus on transmission.
Specialization is a welcome byproduct of progress. And it’s wonderful that there is enough opportunity for such specialization.
A byproduct of that specialization is the emergence of new descriptors for roles. Data analysts came to be called data scientists, and some data scientists have come to be called engineers.
It can also generate fair amount of conflict. Different communities have different ways of deciding what is knowledge and what is not. The boundaries among some data science communities are very fuzzy. Other data communities are transformed by a flood of new members. It has felt like September for a long time.
This can be exciting and scary. Exciting because branching creates new opportunities. Scary because it means some long established (and possibly imagined) hierarchies change.
It could be amazing.