This article was authored by me and Elonnai Hickok.
Artificial Intelligence in many ways is in direct conflict with traditional data protection principles and requirements including consent, purpose limitation, data minimization, retention and deletion, accountability, and transparency. Other related privacy concerns in the context of AI center around re-identification and de-anonymisation, discrimination, unfairness, inaccuracies, bias, opacity, profiling, and misuse of data and imbedded power dynamics.
The need for large amounts of data to improve accuracy, the ability to process vast amounts of granular data, and the present relationship between explainability and result of AI systems have raised many concerns on both sides of the fence. On one hand, there is concern that heavy handed or inappropriate regulation will result in stifling innovation. If developers can only use data for pre-defined purpose – the prospects of AI are limited. On the other hand, individuals are concerned that privacy will be significantly undermined in light of AI systems that collect and process data in realtime and at a personal level not previously possible. Chatbots, house assistants, wearable devices, robot caregivers, facial recognition technology etc. have the ability to collect data from a person at an intimate level. At the sametime, some have argued that AI can work towards protecting privacy by limiting the access that humans working at respective companies have to personal data.
India is embracing AI. Two national roadmaps for AI were released in 2018 respectively by the Ministry of Commerce and Industry and Niti Aayog. Both roadmaps emphasized the importance of addressing privacy concerns in the context of AI and ensuring that a robust privacy legislation is enacted. In August 2018, the Srikrishna Committee released a draft Personal Data Protection Bill 2018 and the associated report that outlines and justifies a framework for privacy in India. As the development and use of AI in India continues to grow, it is important that India simultaneously moves forward with a privacy framework that addresses the privacy dimensions of AI.
In this article we attempt to analyse if and how the Srikrishna committee draft Bill and report has addressed AI, contrast this with developments in the EU and the passing of the GDPR, and identify solutions that are being explored towards finding a way to develop AI while upholding and safeguarding privacy.
The GDPR and Artificial Intelligence
The General Data Protection Regulation became enforceable in May 2018 and establishes a framework for the processing of personal data for individuals within the European Union. The GDPR has been described by IAAP as taking a ‘risk based’ approach to data protection that pushes data controllers to engage in risk analysis and adopt ‘risk measured responses’. Though the GDPR does not explicitly address artificial intelligence, it does have a number of provisions that address automated decision making and profiling and a number of provisions that will impact companies using artificial intelligence in their business activities. These have been outlined below:
- Data rights: The GDPR enables individuals with a number of data rights: the right to be informed, right of access, right to rectification, right to erasure, right to restrict processing, right to data portability, right to object, and rights related to automated decision making including profiling. The last right – rights related to automated decision making – seeks to address concerns arising out of automated decision making by giving the individual the right to request to not be subject to a decision based solely on automated decision making including profiling if the decision would produce legal effects or similarly significantly affects them. There are three exceptions to this right – if the automated decision making is: a. necessary for the performance of a contract, b. authorised by the Union or Member State c. is based on explicit consent.
- Transparency: Under Article 14, data controllers must enable the right to opt out of automated decision making by notifying individuals of the existence of automated decision making including profiling and providing meaningful information about the logic involved as well as the potential consequences of such processing. Importantly, this requirement has the potential of ensuring that companies do not operate complete ‘black box’ algorithms within their business processes.
- Fairness: The principle of fairness found under Article 5(1) will also apply to the processing of personal data by AI. The principle requires that personal data must be processed in a way to meet the three conditions of lawfully, fairly, and in a transparent manner in relation to the data subject. Recital 71 further clarifies that this will include implementing appropriate mathematical and statistical measures for profiling, ensuring that inaccuracies are corrected, and ensuring that processing that does not result in negative discriminatory results.
- Purpose Limitation: The principle of purpose limitation (Article 5(1)(b) requires that personal data must be collected for specified, explicit, and legitimate purposes and not be further processed in a manner incompatible with those purposes. Processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes are not considered to be incompatible with the initial purposes. It has been noted that it is unclear if research carried out through artificial intelligence would fall under this exception as the GDPR does not define ‘scientific purposes’.
- Privacy by Design and Default: Article 25 requires all data controllers to implement technical and organizational measures to meet the requirements of the regulation. This could include techniques like pseudonymisation. Data controllers also are required to implement appropriate technical and organizational measures for ensuring that by default only personal data which are necessary for a specific purpose are processed.
- Data Protection Impact Assessments: Article 35 requires data controllers to undertake impact assessments if they are undertaking processing that is likely to result in a high risk to individuals. This includes if the data controller undertakes: systematic and extensive profiling, processes special categories of criminal offence data on a large scale, systematically monitor publicly accessible places on a large scale. In implementation, some jurisdictions like the UK require impact assessments on additional conditions including if the data controller: uses new technologies, uses profiling or special category data to decide on access to services, profile individuals on a large scale, process biometric data, process genetic data, match data or combine datasets from different sources, collect personal data from a source other than the individual without providing them with a privacy notice, track individuals’ location or behaviour, profile children or target marketing or online services at them, process data that might endanger the individual’s physical health or safety in the event of a security breach.
- Security: Article 30 requires data controllers to ensure a level of security appropriate to the risk including employing methods like encryption and pseudonymization.
Srikrishna Committee Bill and AI
The Draft Data Protection Bill and associated report by the Srikrishna Committee was published in August 2018 and recommends a privacy framework for India. The Bill contains a number of provisions that will directly impact data fiduciaries using AI and that try and account for the unintended consequences of emerging technologies like AI. These include:
- Definition of Harm: The Bill defines harm as including bodily or mental injury, loss, distortion or theft of identity, financial loss or loss of property, loss of reputation or humiliation, loss of employment, any discriminatory treatment, any subjection to blackmail or extortion, any denial or withdrawal of a service, benefit or good resulting from an evaluative decision about the data principal, any restriction placed or suffered directly or indirectly on speech, movement or any other action arising out of a fear of being observed or surveilled, any observation or surveillance that is not reasonably expected by the data principal. The Bill also allows for categories of significant harm to be further defined by the data protection authority.
Many of the above are harms that have been associated with artificial intelligence – specifically loss employment, discriminatory treatment, and denial of service. Enabling the data protection authority to further define categories of significant harm, could allow for unexpected harms arising from the use of AI to come under the ambit of the Bill.
- Data Rights: Like the GDPR, the Bill creates a set of data rights for the individual including the right to confirmation and access, correction, data portability, and right to be forgotten. At the sametime the Bill is intentionally silent on the rights and obligations that have been incorporated into the GDPR that address automated decision making including: The right to object to processing, the right to opt out of automated decision making, and the obligation on the data controller to inform the individual about the use of automated decision making and basic information regarding the logic and impact of same. As justification, in their report the Committee noted the following: The right to restrict processing may be unnecessary in India as it provides only interim remedies around issues such as inaccuracy of data and the same can be achieved by a data principal approaching the DPA or courts for a stay on processing as well as simply withdraw consent. The objective of protecting against discrimination, bias, and opaque decisions that the right to object to automated processing and receive information about the processing of data in the Indian context seeks to fulfill would be better achieved through an accountability framework requiring specific data fiduciaries that will be making evaluative decisions through automated means to set up processes that ‘weed out’ discrimination. At the same time, if discrimination has taken place, individuals can seek remedy through the courts.
By taking this approach, the Bill creates a framework to address harms arising out of AI, but does not empower the individual to decide how their data is processed and remains silent on the issue of ‘black box’ algorithms.
- Data Quality: Requires data fiduciaries to ensure that personal data that is processed is complete, accurate, not misleading and updated with respect to the purposes for which it is processed. When taking steps to comply with this – data fiduciaries must take into consideration if the personal data is likely to be used to make a decision about the data principal, if it is likely to be disclosed to other individuals, if the personal data is kept in a form that distinguishes personal data based on facts from personal data based on opinions or personal assessments.
This principle, while not mandating that data fiduciaries take into account considerations such as biases in datasets, could potentially be be interpreted by the data protection authority to include in its scope, means towards ensuring that data does not contain or result in bias.
- Principle of Privacy by Design: Requires significant data fiduciaries to have in place a number policies and measures around several aspects of privacy. These include – (a) measures to ensure managerial, organizational, business practices and technical systems are designed in a manner to anticipate, identify, and avoid harm to the data principal (b) the obligations mentioned in Chapter II are embedded in organisational and business practices (c) technology used in the processing of personal data is in accordance with commercially accepted or certified standards (d) legitimate interests of business including any innovation is achieved without compromising privacy interests (e) privacy is protected throughout processing from the point of collection to deletion of personal data (f) processing of personal data is carried out in a transparent manner (g) the interest of the data principal is accounted for at every stage of processing of personal data.
A number of these (a, d, e, and g) require that the interest of the data principal is accounted for throughout the processing of personal data, This will be significant for systems driven by artificial intelligence as a number of the harms that have arisen from the use of AI include discrimination, denial of service, or loss of employment – have been brought under the definition of harm within the Bill. Placing the interest of the data principal first is also important in protecting against unintended consequences or harms that may arise from AI. If enacted, it will be important to see what policies and measures emerge in the context of AI to comply with this principle. It will also be important to see what commercially accepted or certified standard companies rely on to comply with (c.)
- Data Protection Impact Assessment: Requires data fiduciaries to undertake a data protection impact assessment when implementing new technologies or large scale profiling or use of sensitive personal data. Such assessments need to include a detailed description of the proposed processing operation, the purpose of the processing and the nature of personal data being processed, an assessment of the potential harm that may be caused to the data principals whose personal data is proposed to be processed, and measures for managing, minimising, mitigating or removing such risk of harm. If the Authority finds that the processing is likely to cause harm to the data principles, it may direct the data fiduciary to undertake processing in certain circumstances or entirely. This requirement applies to all significant data fiduciaires and all other data fiduciaries as required by the DPA.
This principle will apply to companies implementing AI systems. For AI systems, it will be important to see how much information the DPA will require under the requirement of data fiduciaries providing detailed descriptions of the proposed processing operation and purpose of processing.
- Classification of data fiduciaries as significant data fiduciaries: The Authority has the ability to notify certain categories of data fiduciaries as significant data fiduciaries based on 1. The volume of personal data processed, 2. The sensitivity of personal data processed, turnover of the data fiduciary, risk of harm resulting from any processing being undertaken by the fiduciary, use of new technologies for processing, and other factor relevant for causing harm to any data principal. If a data fiduciary falls under the ambit of any of these conditions they are required to register with the Authority. All significant data fiduciaries must undertake data protection impact assessments, maintain records as per the bill, under go data audits, and have in place a data protection officer.
As per this provision – companies deploying artificial intelligence would come under the definition of a significant data fiduciary and be subject to the principles of privacy by design etc. articulated in the chapter. The exception to this will be if the data fiduciary comes under the definition of ‘small entity’ found in section 48.
- Restrictions on cross border transfer of personal data: Requires that all data fiduciaries must store a copy of personal data on a server or data centre located in India and notified categories of critical personal data must be processed in servers located in India.
It is interesting to note that in the context of cross border sharing of data, the Bill is creating a new category of data that can be further defined beyond personal and sensitive personal data. For companies implementing artificial intelligence, this provision may prove cumbersome to comply with as many utilize cloud storage and facilities located outside of India for the processing of larger amounts of data.
- Powers and functions of the Authority: The Bill lays down a number of functions of the Authority one being to monitor technological developments and commercial practices that may affect protection of personal data.
By assumption, this will include monitoring of technological developments in the field of Artificial Intelligence.
- Fair and reasonable processing: Requires that any person processing personal data owes a duty to the data principal to process such personal data in a fair and reasonable manner that respects the privacy of the data principal. In the Srikrishna Committee report, the committee explains that the principle of the fair and reasonable is meant to address 1. Power asymmetries between data subjects and data fiduciaries – recognizing that data fiduciaires have a responsibility to act in the best interest of the data principal 2. Situations where processing may be legal but not necessary fair or in the best interest of the data principal 3. Developing trust between the data principal and the data fiduciary.
This is in contrast to the GDPR which requires processing to simultaneously meet the three conditions of fairness, lawfulness, and transparency.
- Purpose Limitation: Personal data can only be processed for the purposes specified or any other purpose that the data principal would reasonably expect.
As a note, the Srikrishna Committee Bill does not include ‘scientific purposes’ as an exception to the principle of purpose limitation as found in the GDPR, and instead creates an exception for research, archiving, or statistical purposes. The DPA has the responsibility of developing codes defining research purposes under the act.
- Security Safeguards: Every data fiduciary must implement appropriate security safeguards including the use of methods such as de-identification and encryption, steps to protect the integrity of personal data, and steps necessary to prevent misuse, unauthorised access to, modification, and disclosure or destruction of personal data.
Unlike the GDPR which explicitly refers to the technique of pseudonymization, the Srikrishna uses Bill uses term de-identification. The Srikrishna Report clarifies that the this includes techniques like pseudonymization and masking and further clarifies that because of the risk of re-identification, de-identified personal data should still receive the same level of protection as personal data. The Bill further gives the DPA the authority to define appropriate levels of anonymization. 
Technical perspectives of Privacy and AI
There is an emerging body of work that is looking at solutions to the dilemma of maintaining privacy while employing artificial intelligence and finding ways in which artificial intelligence can support and strengthen privacy. For example, there are AI driven platforms that leverage the technology to help a business to meet regulatory compliance with data protection laws, as well as research into AI privacy enhancing technologies. Standards setting bodies like IEEE have undertaken work on the ethical considerations in the collection and use of personal data when designing, developing, and/or deploying AI through the standard ‘Ethically Aligned Design’. . In the article Artificial Intelligence and Privacy by Datatilsynet – the Norwegian Data Protection Authority break such methods into three categories:
- Techniques for reducing the need for large amounts of training data: Such techniques can include
- Generative adversarial networks (GANs): GANs are used to create synthetic data and can address the need for large volumes of labelled data without relying on real data containing personal data. GANs could potentially be useful from a research and development perspective in sectors like healthcare where most data would quality as sensitive personal data.
- Federated Learning: Federated learning allows for models to be trained and improved on data from a large pool of users without directly using user data. This is achieved by running a centralized model on a client unit and subsequently improved on local data. Changes from the improvements are shared back with the centralized server. An average of the changes from multiple individual client units becomes the basis for improving the centralized model.
- Matrix Capsules: Proposed by Google researcher Geoff Hinton, Matrix Capsules improve the accuracy of existing neural networks while requiring less data.
- Techniques that uphold data protection without reducing the basic data set
- Differential Privacy: Differential privacy intentionally adds ‘noise’ to data when accessed. This allows for personal data to be accessed with revealing identifying information.
- Homomorphic Encryption: Homomorphic encryption allows for the processing of data while it is still encrypted. This addresses the need to access and use large amounts of personal data for multiple purposes
- Transfer Learning: Instead of building a new model, transfer learning relies builds upon existing models that are applied to new related purposes or tasks. This has the potential to reduce the amount of training data needed.
- RAIRD: Developed by Statistics Norway and the Norwegian Centre for Research Data, RAIRD is a national research infrastructure that allows for access to large amounts of statistical data for research while managing statistical confidentiality. This is achieved by allowing researchers access to metadata. The metadata is used to build analyses which are then run against detailed data without giving access to actual data.
- Techniques to move beyond opaque algorithms
- Explainable AI (XAI): DARPA in collaboration with Oregon State University is researching how to create explainable models and explanation interface while ensuring a high level of learning performance in order to enable individuals to interact with, trust, and manage artificial intelligence.DARPA identifies a number of entities working on different models and interfaces for analytics and autonomy AI.
- Local Interpretable Model Agnostic Explanations: Developed to enable trust between AI models and humans by generating explainers to highlight key aspects that were important to the model and its decision – thus providing insight into the rationale behind a model.
Public Sector use of AI and Privacy
The role of AI in public sector decision making has been gradually growing globally across sectors such as law enforcement, education, transportation, judicial decision making and healthcare. In India too, use of automated processing in electronic governance under the Digital India mission, domestic law enforcement agencies monitoring social media content and educational schemes is being discussed and gradually implemented. Much like the potential applications of AI across sub-sectors, the nature of regulatory issues are also diverse.
Aside from the accountability framework discussed in the Srikrishna Committee report, the Puttaswamy judgment also provides a basis for governance of AI with respect to its concerns for privacy, in limited contexts. The sources of right to privacy as articulated in the Puttaswamy judgments included the terms ‘personal liberty’ under Article 21 of the Constitution. In order to fully appreciate how constitutional principles could apply to automated processing in India, we need to look closely at the origins of privacy under liberty. In the famous case of AK Gopalan there is a protracted discussion on the contents of the rights under Article 21. Amongst the majority opinions itself, the opinion was divided. While Sastri J. and Mukherjea J. took the restrictive view that limiting the protections to bodily restraint and detention, Kania J. and Das J. take a broader view for it to include the right to sleep, play etc. Through RC Cooper and Maneka, the Supreme Court took steps to reverse the majority opinion in Gopalan and it was established that that the freedoms and rights in Part III could be addressed by more than one provision. The expansion of ‘personal liberty’ has began in Kharak Singh where the unjustified interference with a person’s right to live in his house, was held to be violative of Article 21. The reasoning in Kharak Singh draws heavily from Munn v. Illinois which held life to be “more than mere animal existence.” Curiously, after taking this position Kharak Singh fails to recognise a fundamental right to privacy (analogous to the Fourth Amendment protection in US) under Article 21. The position taken in Kharak Singh was to extrapolate the same method of wide interpretation of ‘personal liberty’ as was accorded to ‘life’. Maneka which evolved the test for enumerated rights within Part III says that the claimed right must be an integral part of or of the the same nature as the named right. It says that the claimed must be ‘in reality and substance nothing but an instance of the exercise of the named fundamental right’. The clear reading of privacy into ‘personal liberty’ in this judgment is effectively a correction of the inherent inconsistencies in the positions taken by the majority in Kharak Singh.
The other significant change in constitutional interpretation that occurred in Maneka was with respect to the phrase ‘procedure established by law’ in Article 21. In Gopalan, the majority held that the phrase ‘procedure established by law’ does not mean procedural due process or natural justice. What this meant was that, once a ‘procedure’ was ‘established by law’, Article 21 could not be said to have been infringed. This position was entirely reversed in Maneka. The ratio in Maneka said that ‘procedure established by law’ must be fair, just and reasonable, and cannot be arbitrary and fanciful. Therefore, any infringement of the right to privacy must be through a law which follows the principles of natural justice, and is not arbitrary or unfair. It follows that any instances of automated processing for public functioning by state actors or others, must meet this standard of ‘fair, just and reasonable’.
While there is a lot of focus internationally on what ethical AI must be, it is important that when we consider use of AI by the state, we pay heed to the existing constitutional principles which determine how AI must be evaluated against these standards. These principles however extend only to limited circumstances for protections under Article 21 are not horizontal in nature but only applicable against the state. Whether a party is the state or not is a question that has been considered several times by the Supreme Court and must be determined by functional tests. In our submission of the Justice Srikrishna Committee, we clearly recommended that where automated decision making is used for discharging of public functions, the data protection law must state that such actions are subject the the constitutional standards and are ‘just, fair and reasonable’ and satisfy the tests for both procedural and substantive due process. To a limited extent, the committee seems to have picked up the standards of ‘fair’ and ‘reasonable’ and made it applicable to all forms of processing, whether public or private. It is as yet unclear whether fairness and reasonableness as inserted in the bill would draw from the constitutional standard under Article 21. The report makes a reference to the twin principles of acting in a manner that upholds the best interest of the privacy of the individual, and processing within the reasonable expectations of the individual, which do not seem to cover the fullest essence of the legal standard under Article 21.
The Srikrishna Committee Bill attempts to create an accountability framework for the use of emerging technologies including AI that is focused on placing the responsibility on companies to prevent harm. Though not as robust as found in the GDPR, the protections have been enabled through requirements such as fair and reasonable processing, ensuring data quality, and implementing principles of privacy of design. At the sametime, the Srikrishna Bill does not include provisions that can begin to address the consumer facing ‘black box’ of AI by ensuring that individuals have information about the potential impact of decisions taken by automated means. In contrast, the GDPR has already taken important steps to tackle this by requiring companies to explain the logic and potential impact of decisions taken by automated means.
Most importantly, the Bill gives the Data Protection Authority the necessary tools to hold companies accountable for the use of AI through the requirements of data protection audits. If enacted, it will have to be seen how these audits and the principle of privacy by design are implemented and enforced in the context of companies using AI. Though the Bill creates a Data Protection Authority consisting of members that have significant experience in data protection, information technology, data management, data science, cyber and internet laws, and related subjects, these requirements can be further strengthened by having someone from a background of ethics and human rights.
One of the responsibilities of the DPA under the Srikrishna Bill will be to monitor technological developments and commercial practices that may affect protection of personal data and promote measures and undertake research for innovation in the field of protection of personal data. If enacted, we hope that AI and solutions towards enhancing privacy in the context of AI like described above will be one of these focus areas of the DPA. It will also be important to see how the DPA develops impact assessments related to AI and what tools associated with the principle of Privacy by Design emerge to address AI.