What is Data?
Before categorizing data, it is important to understand the nature of data currently.
The data economy’s value chain is complex and individuals are not the only actors at either end. The creation of personal data would not be possible without the platforms provided by technology companies, and governments and businesses also represent significant demand drivers. Simply put, data is not just owned by individuals and no one stakeholder has absolute rights on it. Therefore, it is important to categorize data, and regulate it accordingly.
Potential categorization of digital data
Before defining an approach to data sharing it is essential first to develop a landscape and categorization of different types of data. There is no global standard for data categories because context matters. Some data can be considered highly sensitive in one country but not in another, for example.
Further, different data types need to be regulated according to sensitivity levels. Certain categories of data may have elevated protections that prohibit their processing, while others might be more easily accessible. It is imperative to clearly define data at the beginning and then develop solutions for personal data keeps sensitive information most protected.
Along with defining types of data, another part of categorization is to streamline and standardize the form in which data exists and how it will be shared. Today, even in the most advanced data economies, such as India, various ministries within the government and corporates are using and storing data differently. This creates friction in the sharing process and makes it cumbersome for all stakeholders to participate productively.
Beyond data generated on platforms, categories should also represent derived data, which this includes inferences made based on multiple sources of data such as a credit score or voting predictions. Derived data is created through algorithms applied to personal data, such as it is possible to ascertain political views through online book purchases, or sexual orientation through page likes on social media.
The conversation on definitions of personal data have grown in the last few years. The European Union’s (EU) General Data Protection Regulation (GDPR) has paved the way for many countries. In 2018, several countries have developed regulations for personal data. To see how countries define personal data please follow these links:
It is important that countries define their own categories of data and organize the levels of sensitivity and standardize data before designing the data sharing framework
- Conduct stakeholder meeting to build support for data categories
- Organise them by levels of sensitivity and mark out the ones relevant for data sharing for empowerment
- Publish data categories and their permissions widely and consider charter of citizen rights in a data economy with a focus on personal data rights
Why Share Data?
National solutions to using data for empowerment are emerging for two distinct reasons.
Most practically, these can solve specific “pain points” that arise when data is locked in different databases, in different forms, and in ways that make it difficult for data to be migrated from one place to another. Often the process of accessing one’s data involved a long process including account user names and cumbersome passwords to enable data scraping or, worse yet, navigating lengthy phone operators or lines to receive physical document of records.
Second, and just as importantly, data sharing for empowerment solutions are a means to rebalance the power dynamics of data economies, which tend to skew toward commercial platform providers and/or the state at the expense of the very individual users who generate the data thus leading to empowerment.
Examples of data empowerment
Some examples of data empowerment are elaborated upon below. These are intended to illustrate the variety of purposes, scope and underlying technological solutions that data sharing can take.
Skills and employment: In India, easy access to trusted digital records such as school degrees and transcripts enables people to prove their readiness for jobs. Not only does this offer the opportunity for employment among people who otherwise have few proof points of their skill level, it cuts out the industry of corruption around false certificates.
Health services: In Estonia, the ability to access personal data digitally is helping individuals better manage their health by creating a single, consistent health history that can be shared with all healthcare providers and by improving access to supporting services like filling prescriptions online, which is done nearly 100 percent of the time in Estonia.
Financial services: In China, personal data histories such as transaction histories and consumer behavior are helping people demonstrate their creditworthiness and gain access to lending in order to start or grow new businesses.
Goals of data sharing determine approach
The examples above describe a variety of approaches to data sharing that solve particular pain points. These pain points define the different approaches to data sharing (discussed in detail below). Governments need to decide not just what the purpose of data sharing is but also the data types that can be shared, the level of consent that individuals need to give to share data, and how data will flow between stakeholders.
Estonia and India represent different approaches to creating techno-legal frameworks for data sharing each stemming from very different original goals. Estonia’s data sharing solution was built to create efficiency in the citizen-government engagement and, in doing so, build trust among its citizens for the newly formed government following the separation from Russia. India’s data sharing solution has arisen with the intent to give the hundreds of millions of low-income people that are increasingly “data rich” specific opportunities to leverage their data to access improved commercial services such as credit and employment.
It is not surprising then that the approach each country has taken is different: Estonia is deeply focused on sharing government data to improve government service delivery whereas India’s approach is designed for sharing a range of government and commercial data to reduce transaction costs and therefore expand the addressable market for all types of commercial services. This difference in goals determines each country’s architectural approach so that in Estonia consent is not required for each instance of data sharing (because the government has clarified in law which government agencies have rights to view/use data). India’s solution, on the other hand, requires explicit user consent (and therefore a holistic consent architecture) to be established every instance data is requested. For more on consent see Figure One.
Figure one: Understanding consent philosophy
Consent is a much discussed concept and countries need define their own philosophy with regard to it. Consent is mostly commonly understood as given to platforms by individuals to collect their data, but there is another facet to it, the consent to share which refers to the individual ability to share their data in return for better goods and services which can lead to their empowerment.
Consent can exist on a spectrum, on one end it can be protected heavily by laws/regulations and on the other end, control can be given entirely to the individual to grant and revoke consent as seen fit. For example (explained in detail below), the Estonia data sharing platform requires individuals to give one time consent for data sharing, which is managed by the government data exchange layer. Whereas, in India, the burden of consent is entirely on the individual, who has to manage their own data, and data requests.
Irrespective of the frequency of consent, it should be reliable. It should also be non-repudiable, machine readable, digitally-signed and can be given for a variety of purposes such as defining the scope of data shared, verifying users and data processor, detail data access permissions and establishing data purpose. A robust mechanism to give verifiable data through electronic consent can make data sharing successful. While designing technology for consent, a useful framework to use is “Organs” which means open standards, revocability of consent, granularity of data for permissions and access, auditable consent and data flows, notification for users on any access of their data, and security by design. If consent philosophy and design is designed through this framework it bound to liberate data and empower people. philosophy and design is designed through this framework it bound to liberate data and empower people.
X-Road is a government created platform for interoperability between decentralized databases and a data exchange layer that can be used by both public and, increasingly, private entities. It allows querying of 900+ government and private databases. X-Road was built to improve government efficiency in delivery of services, and according to government data saves the country 800 working hours a year.
If a bank wants to verify a person’s address and income, for example, the bank can query the individual’s data through X-Road. X-Road first authenticates the identity of the bank employee requesting the data to ascertain whether she is allowed to access individual data. It also sets a window for access and validates the purpose for the data request, in this case let’s say it is a mortgage application. Thereafter, it routes the request from the bank to the Population Registry which stores address data and the Tax Department which has income data. X-Road logs and timestamps requests. Permission is granted to transfer address and tax data from the servers of the concerned departments directly to the data processor, the bank. The data is encrypted end to end, and after its used by the data processor, it is deleted from their server. Data is routed through several servers, in case of a breach or compromise.
X-Road ensures confidentiality, interoperability, evidential value i.e. digital signatures to verify source of data and autonomy, where law has determined which government parties have access to which data. Users have full view of who is requesting data and for what purposes. Medical data is not part of this data sharing, and is regulated differently. In other words, users do not control the usage of their data but have full transparency of the sharing and usage of their data.
X-Road is backed by a legal and organizational structure, protocol stack and software to realise the protocol stack. While the usage of government departments is free, private players have to a pay small fees to use X-Road. It is being used in several other countries such as Azerbaijan, Namibia, and Finland. X-Road also allows for free movement of data within countries in the EU.
India: Data Empowerment and Protection Architecture (DEPA)
DEPA is a broadly a techno-legal framework that enables individuals to leverage their personal data for their own empowerment while maintaining privacy. DEPA is in its pilot stages in India but it brings forth an alternative data sharing framework from X-Road. In DEPA, consent is sought from individuals who own and control their own data. The purpose of DEPA is to give users control of their data and let them use it to generate value for themselves.
In DEPA, in the same example as above of banks requesting personal data to process loans, works a little differently. There is a consent layer called the account aggregator or in the case of DEPA, a data access fiduciary (DAF). These can be government or private entities, who are mandated by the government to serve this purpose. Account aggregators exist to empower the individuals. Banks request data from the account aggregators. Account aggregator, in DEPA, are domain specific, which is to say there will be different account aggregators for financial services, education, health etc. The data is federated, or not centralized in one server and requisitioned through a robust consent architecture, based on individual consent.
The account aggregator will verify the data request from the bank - both the officer requesting the data and the purpose for this request. Thereafter, this request is sent to the individual, who stores data in a digital locker or a service that enables Indian citizens to store certain official documents on the cloud. Through the account aggregator, the individual grants the bank to view their income statement for a designated amount of time. The account aggregator cannot view or store the individual data in DEPA, but this might be regulated differently based on contexts. Permission can be given by the user for viewing of data as well as storage, if required. The individual manages every such consent, through the account aggregator, who manages data and consent flows. Account aggregators or DAF are paid by the data processor for their services per transaction.
While India and Estonia represent different approaches to national solutions for data sharing as a result of different key objectives, there is yet a third model to note:
Models of governments creating the environment for competing entrepreneurial models of data sharing to flourish - a data sharing marketplace. This model works in countries that do not necessarily want to over-regulate the data sharing space but at the same time want to provide individuals with the opportunity to safely share their data. However, DEPA relies heavily on individual consent which can lead to consent fatigue or over-consenting, both of which are not optimal for data empowerment. For this, governments would have to provide similar protections to personal data through policy, technology and institutional interventions, while ensuring that no monopolies are built.
Several data marketplaces that have been built over the last few years and are slowly growing as an alternative to government led and managed data sharing. For now, these marketplaces are emerging in countries where regulation on data is limited, and the rights of the individuals are not explicitly protected. Examples of data marketplaces are Wibson, Data Coup, Digi.me etc.
Therefore, every country must define its own objective in pursuing a data sharing solution. The objective will not only guide the design and implementation but also then creates the framework for measuring progress and impact.
- Bring together a multi stakeholder group to brainstorm the main objective of data sharing based on pain points and existing solutions
- Define the spectrum of consent and where your countries lies on it
- Locate the purpose of data sharing in existing laws and regulations
- Sketch out a data sharing model most suitable to your context
Risks of Data Sharing for Empowerment
Before discussing risks, it is important to understand that not building a framework for data sharing is perhaps the biggest risk.
This is because, as discussed above, data is being generated at great speed, and is increasingly being used to provide goods and services to individuals. Those without a data endowment, are likely to be left behind from the process of development. In this scenario, individual control over their own personal data is critical for a meaningful existence.
There’s no way to see into the future on how technology and uses of data may evolve. It is nevertheless important to consider risks and mitigation strategies which include designing a data sharing framework based on PACT. Beyond that, digital teams must be vigilant and prepared for any pitfalls, which can occur while designing a new system. Some known concerns include:
Over-consenting: One key risk is that citizens will ‘over consent’ - opt into sharing more data than what is ‘safe’ - their data in exchange for some reward, monetary or otherwise. User consent is designed to combat institutions strong arming or over-incentivising citizens to disclose their data.
Low adoption rates across stakeholders: Data sharing hinges on the assumption that commercial platforms and users will agree to part with data generated on their platform with other players. So far, many commercial players have maintained monopolies over user data and may not participate in a system that debases that their position. Similarly, low user awareness on data sharing and its benefits may lead to low usage
Decreased value of privacy: Citizens express frustration and unhappiness at the thought of losing their privacy. However, privacy has shown to take a backseat when a simple reward is offered. A Stanford University economist researched this digital privacy paradox between privacy preferences and choices. A simple incentive like pizza may cause a person to relinquish the protection of their personal privacy.
Derived data: Users may have concerns about sharing their personal data, since platforms can derive intelligence from data without the user knowing. For example, the user may consent to share kindle history with a platform which can be used to derive political views, a growing concern for individuals.
Beyond these, there are some broader social ethics to consider about the life and longevity of data and how it is impossible to change/modify data once recorded. Any misdemeanors or defaults recorded will remain on record and can impact the quality of services and goods received. While this is not a hard risk, but it is good to think about in the development of the data sharing framework
How can these risks be mitigated?
The data sharing framework built by countries should balance the risk and benefit to users. Mechanisms for data minimization, informed consent, clarity of purpose, and other security measures can help mitigate the risk. Focusing on user factors, as mentioned above, such as awareness will also make data sharing more useful for companies and users.
Strategies for risk mitigation will also be embedded in the policy and regulation framework, as well as the approaches to technologies. Countries should prepare a risk mitigation plan to identify areas of risk, and possible strategies to address them. Figure 2 describes possible strategies to mitigate the risks highlighted above.
Figure Two: Risk Mitigation Strategies
- Transparency of purpose to ensure that users understand how their data will be used
- Increased user awareness and data literacy to ensure that people understand the value of their data
- Policy and regulation can be framed to bring all data sharing within the ambit of the framework, and enforce penalties on any sharing outside the ambit of the framework
- Ensure protections and security to of personal data is safe, and build user trust in the framework and encourages usage
- Build incentives for businesses to use data sharing platform
Decreased Value of Privacy
- Build privacy protection in the law
- Design limits to data sharing and put sensitive personal data outside the data sharing framework
- Enforce penalties for infringement on sensitive personal data
- Ensure transparency log of all data requests and purpose for data usage
- Enforce penalties for any breach of trust or usage of data beyond declared purpose
What are the examples from existing data-sharing platforms and protocols where these risks have been encountered?
As mentioned, data sharing is new and the risks of sharing are still being grappled with. It is difficult to identify, in detail, how users are platforms that are actively involved in data sharing a preventing these risks but some examples are beginning to crop up.
For example, Accenture  has written out guidelines for data sharing with external partners for firms. The guidelines include ideas such as: develop an ethical review system, identify risks of data sharing early on, minimize data where possible, and emphasize on transparency when processes are not clear.
Similar thinking on data sharing has occurred in the health sector but there are no standards and solutions on how to deal with the risks of data sharing, in general and need to be developed urgently.
- Identify risks to data sharing in your own context across users, corporate players and the government
- Prioritize these risks in terms of likelihood and extent of damage
- Decide redlines for each risk
- Understand the willingness of the private sector to participate in risk sharing
- Discuss mitigation strategies through an all-stakeholder meeting, develop risk mitigation plans
- Engage civil society organizations to inform users on risks