The Economist Data Economy series

Apr 27, 2020 notes

My notes on The Data Economy series in the February 20th 2020 edition of The Economist.

The first article is an introduction of topics, then the following articles dive deeper into each topic. I basically paraphrase every article for my own notes.

A deluge of data is giving rise to a new economy

There is an ocean of data produced that reflects increasingly more aspects of the physical world. There is a mirrored world of human life based on data. The mirror world, although an imperfect reflection, is used for optimization, artificial intelligence, and business models, giving rise to a new economy. The size of the data economy is large and growing fast, yet there is no agreed upon method to measure its value.

Data has unique characteristics producing trade-offs and tensions:

Data can be owned and traded, but also can be shared as a public good to create wealth. (Are data more like oil or sunlight?)
Data is centralized in huge data centers, but also can decentralized at the edge to conserve energy and reduce privacy concerns. (Should data be crunched at the centre or at the edge?)
Data is of limited use to firms if employees lack data skills, don’t believe it, or don’t share it internally, but firms want to incorporate it into corporate applications. (Integrating data is getting harder, but also more important.)
Data is assumed by online giants to be used on a global scale, but governments want to assert their digital sovereignty. (Governments are erecting borders for data.)
Data is meant to mirror the physical world, but may be even more unequally distributed than in the physical world. (Who will benefit most from the data economy?)

Are data more like oil or sunlight?

The flows of data takes on many metaphors, indicating the malleable economics of data.

They have two specific characteristics:

“non-rivalrous,” meaning infinitely copyable, and
“excludable,” meaning access is controllable.

The excludability of data means data can be public, private, or somewhere in between.

Supporting the oil metaphor, both oil and data must be refined to be useful and are traded widely. However, the value of data depends on who controls it and data property rights can be tricky.

Supporting the sunlight metaphor, both sunlight and data could be harnessed by all for the greater good. However, corporations limit which data is published to avoid over-sharing sensitive information.

There are potential workarounds. “Homomorphic encryption” could circumvent issues of sharing private information while maintaining insightful patterns. Also, blockchain could improve data access management.

Should data be crunched at the centre or at the edge?

A centralized model is where all the data are collected and crunched in a few places. Amazon Web Services (AWS) is a good example of the centralized model.

Edge computing is where data is processed in real time as close as possible to where they are collected. Analyzing traffic data at the traffic signals, as done by the startup swim.ai, is a good example of the edge computing model.

Edge computing is growing as the Internet of Things (IoT) grows. In the future, more often than not, data will be generated as people interact with everyday objects, such as washing machines, cranes, or cars. These devices will act, i.e. compute, within the world they are embedded. While IoT is growing, 5G mobile wireless connectivity is enabling more devices to transmit data, and for more data to be transferred.

Big cloud-computing providers prefer the centralized method because it increases their sales and data profitability increases as data is mined together (known as “data gravity”). However, centralizations has costs, such as steep fees to move data to the cloud and higher energy consumption.

Integrating data is getting harder, but also more important

Businesses are searching for digital truth, i.e. identifying and combining data that accurately represent reality. If firms want to benefit from artificial intelligence, their data must be in good shape.

In search of digital truth, businesses are creating centralized data repositories known as “data warehouses” or “data lakes.” Yet integrating data can be difficult and costly due to many data sources and differing formats.

Three possible areas to improve data integration are:

Data repositories, meant to make integration less of a headache by using a less rigid approach to structuring information than traditional database methods.
Specialized databases, meant to work with streaming data, as opposed to static blocks of data.
AI workflow tools, meant to improve how data is cleaned, processed, and fed into data models.

A couple of challenges for organizations are:

breaking down data silos, and
poor data literacy.

Governments are erecting borders for data

Data were supposed to float freely around the world to where they are most efficiently crunched, but flows to data assets are increasingly blocked by governments, known as “digital protectionism.” Digital protectionism could turn into “AI nationalism” as countries try to build a data economy of their own.

Free flowing data would allow more efficient data processing. Data could be centrally processed in data centres located in places near many users, with lots of connectivity, and where land and energy are cheap and the air cool and dry. However, virtual borders have been going up in the data cloud in places such as the EU, Russia, China, and India.

The “Osaka Track” is meant to come up with global rules for data governance, instead of having more uncoordinated national efforts. The underlying, ambiguous ideal is “free flowing data with trust.”

There are a few possible outcomes as countries search for data sovereignty:

Prevent all data flow, essentially disconnecting from the global internet. (Unlikely.)
Countries create coalitions for certains types of data, such as personal data. (Much more likely.)
Create a “Federated Data Infrastructure,” essentially a club of clouds, whose members have to comply with a set of rules and standards. Then a single über-cloud could be contructed, consisting of multiple clouds, minimizing lock-in, and probably allowing firms to move data and computing workloads between rival clouds more easily. (Less likely but intriguing.)

Who will benefit most from the data economy?

A primary challenge of the data economy is the distribution of wealth. It is already very unequal. Size begets size, largely due to network effects, allowing the top tech corporations to reap the majority of profits from data. This is not only a tech company issue.

As the data economy expands, the dynamics effecting the unequal benefits from data in the tech industry will increasingly apply to non-tech companies and probably even countries. America and China account for 90% of the market capitalisation of the world’s 70 largest platforms, according to The Economist, which implies other countries may not benefit from the raw data they are providing.

The biggest issue may be the distribution of capital across workers. As the data economy grows, more manual data work will be required across more workers, but capital may not be distributed commensurately to those additional workers. They may become systematically undervalued.

There have been proposed solutions. Using a “digital dividend” on tech giants and dispursing the money to citizens would burden the data economy. Giving people property rights to personal data would create unmanageable complexity to navigate for many and doesn’t acknowledge that more than one person may have rights to a set of personal data. Collective control of data may be the optimal solution, acting like a trade union on behalf of citizens and distributing the proceeds. Don’t expect an immediate solution - the mechanisms and institutions will take time to build.