HIDE: A Red Flag for an Absent Data Ethics Framework for Artificial Intelligence in Malaysia
In the recent weeks, the Hotspot Identification for Dynamic Engagement (HIDE) system has been brought into focus following the public disclosure of the latest HIDE list. This disclosure has been met with mixed reactions from the populace and business community. While some segments are appreciative that we are moving beyond a reactionary stance with a forecasting system in place, there has been direct pushback from business owners named in the HIDE list, highlighting the underlying conflict between personal autonomy and public benefit.
The HIDE system which is developed by the Crisis Preparedness and Response Centre (CPRC) of the Ministry of Health claims to utilize a form of artificial intelligence to predict hotspots within the country that have the potential to cause a cluster outbreak. Unfortunately, it is the lack of information regarding the nature of the system and the manner it operates that raises numerous ethical concerns.
This article will consider three aspects of these concerns namely:
(1) The issue of transparency in design of the HIDE algorithm;
(2) The question of accountability; and
(3) The issue of repurposing of data.
Transparency in Design
At this juncture, the exact nature of the algorithmic design of the HIDE system has not been declared however it is reported that the system takes into account variables such as crowd density, space constraints, and air ventilation, which were used to predict whether a particular location could potentially be a Covid-19 hotspot [1]. While it was declared by the Minister of Science, Technology and Innovation that the purpose of HIDE is to “allow the government to take a more precise and more transparent approach in containing the pandemic” the overarching issue that arises here is the absence of transparency of the system.
First, we may look towards the development of the HIDE system. The CPRC is certainly not the first entity to undertake the development of a prediction model to combat the spread of the pandemic and it seems quite clear from other experiences that there are a considerable number of factors that may affect the output of the system including the training data, methodologies, accuracy in reporting and socioeconomic intersectionality. Considering the public nature of the system’s application, there should be little concern in declaring the development methodology, exact variables and accuracy of the system.
In this vein it is also important to consider the actual model on which the HIDE system is built i.e. time series analysis, ARMA etc. however of particular note is the use of deep learning and neural networks within any model as it would raise two primary questions (1) as these models perform best with large datasets is the data sufficient with the pandemic being a relatively new phenomenon; and (2) the question of explainability i.e. that while providing an highly accurate output, the developer of a machine learning model may not know exactly how this output was determined, thus leading to the risk of bias within a the system i.e. taking into account factors that should not be taken into account in creating the final determination.
Accountability
The effect of the issue of explainability can be seen in the use of the HIDE system when Damansara MP Tony Pua publicly opposed the system pointing out that the overwhelming bulk of locations listed by HIDE comprised major shopping malls and supermarkets although the historical data showed shopping malls made up less than 5 per cent of the clusters. This was followed with three notable associations representing the shopping mall and retail industries nationwide urging the government to suspend making announcements based on the HIDE until a clear, accurate and precise basis is accompanied with the information is released.
While advocating for data ethics committees and ethic reports to be a staple for developers and deployers of artificial intelligence systems the situation further highlights a difficult conundrum in the form of accountability.
Should any one of these businesses intend to bring a legal action against the government for losses suffered due to the publication of the HIDE list, the traditional methods of assessing liability will only extend to the publication of the list itself. In the case of the direction to close, this may be challenged by way of judicial review. However, none of these traditional branches of law adequately address the ability, liability or responsibility of an entity such as the government in using an artificial intelligence system such as HIDE thus there is no prevention from utilizing a system behind closed doors with the output being implemented without the process for determination being made public.
It is therefore clear that the development of AI systems has far outrun the development of law in this field and legislators should at least be aware of the risks navigating an open minefield without a map.
Repurposing of data
While not directly linked to the issue at hand, an emerging concern is the repurposing of data obtained through the MySejahtera app. It is reported that in his May 4 special press conference, the Minister of Science, Technology and Innovation urged business premises to immediately register with MySejahtera to provide QR code scanning for visitors to enable more accurate HIDE analysis. While he did not explain how HIDE would integrate with MySejahtera it can be inferred that the data would be shared between these two systems.
The sharing of data while not objectionable in principle, raises concerns due to the lack of transparency in the collection, storage and use of the data collected in both systems. It is also notable that the privacy policy of MySejahtera does not overtly allow for such a transfer of data to a system such as HIDE in consideration of “purpose of data collection” section [2]. This issue may be moot considering the non-application of the Personal Data Protection Act 2010 to the government.
Conclusion
There is no doubt that the government could have taken many other routes with the HIDE list. Notices could have been issued privately to the business entities and demands made for them increase compliance with more stringent SOPs. Even if closure was required, this could have been done by private notice as opposed to public, the latter having a more lasting reputational/stigma effect [3]. The government’s actions seem at odds with its notice on HIDE claiming that the system is purposed with assisting individuals and premises owners to take their proactive steps in containing the virus [4].
A compelling paper published in the Lancet [5] last year argued that assessing what is reasonable for a digital public health technology depends on two main variables: scientific evidence and risk assessment. The paper pointed towards continuous monitoring, risk impact assessments and including privacy impact assessments are necessary to predict and quantify the potential risks. Most importantly, the paper points towards a framework for the determination of ethical considerations.
The “HIDE experience” highlights the need for ethical considerations and a data ethics oversight committee from the onset of a project such as this. The potential impact of predictive AI systems especially when used in the public context is considerable and it is for this reason that the development of an AI National Framework is of critical importance in Malaysia and efforts must be made to expedite its implementation.
[1] https://www.malaymail.com/news/malaysia/2021/05/07/what-is-hide-the-governments-new-ai-assisted-covid-19-early-warning-system/1972277
[2] https://mysejahtera.malaysia.gov.my/privasi_en/
[3] https://www.malaysiakini.com/news/573928
[4] https://www.mosti.gov.my/web/berita/hotspot-identification-for-dynamic-engagement-hide/
[5] https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30137-0/fulltext
Please contact our partner Darmain Segaran for enquiries on data ethics frameworks or if you wish to collaborate in research projects in the field of data ethics and data privacy.