Royal Academy of Sciences New Zealand Open Science
Open Science

Lessons learned from developing a COVID-19 algorithm governance framework in Aotearoa New Zealand

Published:

ABSTRACT

Aotearoa New Zealand’s response to the COVID-19 pandemic has included the use of algorithms that could aid decision making. Te Pokapū Hātepe o Aotearoa, the New Zealand Algorithm Hub, was established to evaluate and host COVID-19 related models and algorithms, and provide a central and secure infrastructure to support the country’s pandemic response. A critical aspect of the Hub was the formation of an appropriate governance group to ensure that algorithms being deployed underwent cross-disciplinary scrutiny prior to being made available for quick and safe implementation. This framework necessarily canvassed a broad range of perspectives, including from data science, clinical, Māori, consumer, ethical, public health, privacy, legal and governmental perspectives. To our knowledge, this is the first implementation of national algorithm governance of this type, building upon broad local and global discussion of guidelines in recent years. This paper describes the experiences and lessons learned through this process from the perspective of governance group members, emphasising the role of robust governance processes in building a high-trust platform that enables rapid translation of algorithms from research to practice.

Introduction

The emergence of the Covid-19 pandemic in the Information Age has meant that from the outset data and models have been (Hendy et al. 2021), and continue to be, used to inform decision making by Governments (Stats NZ 2021), health authorities (Ministry of Health 2020) and clinicians (World Health Organisation 2022). Reliance on models makes it important to assess model suitability, to provide access to evaluated models and to document information about them (World Health Organisation 2021).

Te Pokapū Hātepe o Aotearoa, the New Zealand Algorithm Hub (https://algorithmhub.co.nz) (the Hub), was launched in November 2020 to meet these needs. The Hub aimed to host COVID-19 related models and algorithms to provide a central, shared knowledge base, and stable secure infrastructure and tools to support Aotearoa New Zealand’s (Aotearoa NZ’s) pandemic response.

In addition, the Hub sought to lay the foundation for a national algorithm management solution that would provide value for the country’s health system beyond the current pandemic response. While other endeavours have been similar in a regional sense (Syrowatka et al. 2021) we believe Aotearoa NZ to be the first country to deploy a national algorithm management solution of this kind (Stats NZ 2020).

This paper describes the experiences and lessons learned during the establishment of a governance process for the Hub; particularly the formation of a Governance Group to ensure algorithms underwent cross-disciplinary scrutiny prior to being made available for implementation. Governance Group members, the authors of this paper, represent Māori, consumer, ethical, clinical, public health, privacy/legal, data science, governmental and commercial perspectives. Robust and wide-ranging conversations within the group and with those applying to have their algorithms included on the Hub elicited lessons, including several related to ethics, Māori, localisation and intended use. This could be beneficial to future parties in similar endeavours, and we aim to summarise those lessons in this paper.

Overview of the Hub

The Hub was established to evaluate and host COVID-19 related models and algorithms, and to provide a central and secure infrastructure to support the country’s pandemic response. The Hub’s intended purpose is to facilitate safe, reliable, evidence-based healthcare decision-making (WHO 2021). The Hub’s components are described below.

Hub technology

The Hub is cloud-based software that provides access to health-related algorithms and models. It offers clinicians, consumers, hospitals and government decision-makers free use of a library of models, with the ability to process data inputs and generate outputs. Models are accessible via a website for self-service, a REST API to allow technology integration, and batch processing that allows authorised users to send multiple sets of inputs as a single batch for efficient processing.

For data scientists, researchers and others contributing algorithms and models, the Hub aims to support deployment of their work without the burden of technical support and system software maintenance falling on the creators. The hope is that the stable, publicly accessible, automated and scalable platform encourages broader use of the models, and also frees up algorithm developers to make improvements or focus on new challenges.

Process for algorithm assessment

In the current process, model developers can submit an algorithm for evaluation and hosting on the Hub. Algorithms are subsequently reviewed by a multidisciplinary Governance Group with a wide range of experience, before they are included on the Hub.

Initial identification of candidate models involved direct engagement with the health sector, proactive enquiries from model developers, and ongoing scanning of the international literature and evidence. The Hub development team initially used focus groups and webinars to engage with potential model end users and contributors. Models that align well with the purpose of the Hub and can provide evidence of quality and adoption are progressed through to the review process, prioritised by an assessment of speed to value.

Review processes involve the following:

  • The use of model cards to summarise the opportunity and inform an initial conversation with the Governance Group (Mitchell et al. 2019).

  • Long-form model documentation through an ‘Algorithm Information Request’, as below, for each model, with appropriate supplemental material. The current version is provided on the Hub website (https://bit.ly/2ZUg53a).

  • Revisions are made to the documentation as required, often in collaboration between the submitting participants and Governance Group members.

  • For endorsed models, a summary of the governance documentation and discussion tailored for algorithm users is prepared and published with the algorithm on the Hub website, and elsewhere as appropriate.

The Algorithm Information Request was co-developed with the Governance Group. We started by preparing a long list of questions that were relevant for the assessment of models and algorithms for COVID-19, drawing on a range of sources (Moons et al. 2015, 2019; Collins et al. 2021), and refined these as the first algorithms were scrutinised. Informed by the experience and expertise of the Governance Group, and the various guidelines discussed below, we organised the questions raised around some specific areas of interest.

On account of Aotearoa NZ’s successful COVID-19 elimination strategy in 2020, the experience of the pandemic at that time differed to many other countries. In particular, very low case numbers meant there was insufficient data to validate most COVID-19-specific models and algorithms for the Aotearoa NZ population. Decisions about the scope of the models and algorithms to be deployed on the Hub were made with an eye on Aotearoa NZ’s unique situation and needs.

Models included on the Hub

Over 30 models and calculators were reviewed for inclusion on the Hub. This included a range of standard medical calculators, which didn’t require extensive discussion. Of more interest to the Governance Group were models related to the pandemic spread and relative risk calculators.

The models operated at different levels of granularity and consequently the issues raised in their consideration were diverse. One of the first models considered and included was the COVID-19 stochastic branching process model (Hendy et al. 2021) developed by researchers from Te Pūnaha Matatini to simulate the spread and effects of COVID-19 in Aotearoa NZ and support a range of planning and policy requirements. Used extensively for scenario planning, this methodology is appropriate while case numbers are low and in the absence of widespread community transmission, as was the case at the time in Aotearoa NZ. Another was nzRISK (Campbell et al. 2019), a preoperative risk prediction tool that estimates the risk of mortality post-surgery; developed and validated in Aotearoa NZ for adult patients undergoing non-cardiac surgery, nzRISK aids in processing the anticipated backlog of unmet surgical demand. The Governance Group also evaluated and included a surgical scheduling algorithm (Soh et al. 2020) which performs a smart optimisation to book a surgical waitlist against available theatres and session slots. A further example was the Measuring Multimorbidity Index (Stanley and Sarfati 2017), which has been developed in Aotearoa NZ as an index for short-term mortality risk using chronic conditions identified from routine hospital admission data. While other comorbidity indices exist, they are largely developed from international data and so may not reflect Aotearoa NZ’s unique population and healthcare settings.

shows an overview of the categories of algorithm included as well as an example of each. A more detailed description of the models included, and associated governance decisions are provided in supplemental reading.

 

Table 1. Summary of algorithm categories included on Te Pokapū Hātepe o Aotearoa, with example of each.

A framework for algorithm governance

The review processes and associated Algorithm Information Request were informed by contemporary frameworks and guidelines. New Artificial Intelligence (AI) technologies have spawned a profusion of ethical guidelines, globally (Reisman et al. 2018; Koene et al. 2019; Future of Privacy Forum 2021) and locally (Privacy Commissioner 2018; AI Forum NZ 2020; Stats NZ 2020; Department of Industry, Science, Energy and Resources 2021). However, comparatively little has been published about governance processes to mediate the implementation of the broad ethical goals and values found in such frameworks (Hagendorff 2020). There is also limited guidance on how to apply these tools to evaluate a specific model against a healthcare use-case. In this respect, the Governance Group began exploring new territory, which generated associated iterative learning opportunities.

Ethical considerations

The Governance Group focused on identifying ethical risks and suggesting mitigations where possible. We did not apply a rigid ‘ethics checklist’, nonetheless some familiar ethical concerns and associated principles emerged.

Following familiar consequentialist principles, we aimed to identify the burden and benefits posed by an algorithm and its proposed implementation. Attention to implementation decisions are crucial. Consider nzRisk, a simple and transparent tool to inform patients’ choices (Campbell et al. 2019). The tool’s scores reflect existing background inequities in Aotearoa NZ’s health system, accurately predicting poorer surgical outcomes for Māori. This does not represent a flaw in the algorithm as the disparities are real, lived, and so embodied in the data. However, were the tool used to prioritise patients most likely to have good outcomes, it would exacerbate those inequities. Implemented in some ways the algorithm is valuable for helping patients and their family/whānau make informed choices; used another way it might worsen existing inequities.

Algorithms should offer genuine benefits and those benefits must outweigh any risks they pose, and these potential risks are broad. The Hub’s governance approach ensured that through open discussion and consideration the risks posed by the use of an algorithm were clearly identified.

Many ethical concerns about algorithms flow from the value of respect for persons, from, that is to say, broadly deontological principles. Standard requirements for transparency and explainability, for instance, rest on the idea that those affected by algorithms must be able to make informed choices. We know too that many patients want health care decisions to be made by humans (Longoni et al. 2019). That preference may rest on the recognition that empathy is important in health care and algorithms cannot empathise and that no algorithm will perfectly capture the richness of any individual’s life, or their clinical, social, biological or demographic characteristics. While some patients may at least occasionally prefer models which have better than clinician accuracy (Pezzo and Beckstead 2020) overall many health consumers are uneasy about processes which do not leave a role for human interpretation. We must respect those genuine concerns and the inherent limitations of algorithms, recognised in Aotearoa NZ’s Algorithm Charter (Stats NZ 2020).

Other ethical concerns overlap with other areas of review by the Governance Group. The Hub’s ethical review canvassed concerns about broad issues of justice and fairness, for instance motivated by concern that algorithms might cause or exacerbate unequal access to health resources. These are outlined elsewhere in this paper.

Many of these issues require that different concerns be balanced. That is most obvious in the case of burdens and benefits, but even some conduct which appears to violate apparent rights against discrimination may be permissible if, in the words of section‘Conclusions and Recommendations’ the Bill of Rights Act, they can be ‘demonstrably justified in a free and democratic society’. To have a right, on one view, is to have a claim upon which one can insist. At the ethical coalface, matters are rarely so simple.

Overall, our approach can be seen to have followed familiar paths–autonomy, beneficence and nonmaleficence, and justice–promoting both comprehensive and consistent review across projects.

The context for Aotearoa New Zealand: Māori perspectives

In the context of providing an algorithm platform for Aotearoa NZ, specific consideration was given to the perspectives of Māori as tangata whenua. The approach is relevant to other indigenous peoples too and should be examined routinely.

The notion of indigenising algorithms involves multiple dimensions. The design of AI and machine learning (ML) systems should feature a high degree of control by Māori over what the system will look like, such that their development and implementation is relevant to Māori and serves Māori aspirations. The systems must perform well for Māori, ensuring that there is at least an equivalent capacity to benefit them. They must also satisfy Māori data sovereignty principles (Te Mana Raraunga 2018) while comporting with or expressing Māori ways of framing situations and acting in the world (Hudson et al. 2017). However, the converse situation does not honour Te Tiriti obligations. The impacts of these latter kinds of systems are often poor and can be harmful for Māori.

Acknowledging the distinction identified in Section ‘Ethical considerations’ about intended use, there was significant concern about the possibility of exacerbating existing health inequalities by the use of models that are often trained on data from the same systems where these inequities exist. There were sometimes untested assumptions regarding the suitability or impact of models for use by Māori and Māori organisations. Sometimes this came from problematic suggestions for use cases. And sometimes Māori populations might not have been considered as separate groups in a population (or these were forthcoming features planned for newer versions of the model).

While the space for indigenising algorithms was relatively limited by the nature of this project, this work provided an opportunity to discuss issues relating to safe algorithm use from a Te Tiriti perspective, and a separate equity lens, and there were attempts to orient developers and users to these obligations and risks. The Algorithm Information Request form filled in by those submitting their models for inclusion on the Hub required the researchers to respond to core questions:

  • What are the relevant Māori considerations for the development and use of this model, and how have/are these being addressed? Consider specifically how this work can uphold the principles of Te Tiriti o Waitangi with reference to participation, protection and partnership.

  • Describe how appropriate decision making and community engagement has taken place.

  • Does the model explore, or is it able to detect, differences in outcome by population subgroup e.g. ethnicity, gender, age? Please include a specific Māori lens in your response.

  • Has the algorithm been tested for differential accuracy or validity by population subgroups e.g. ethnicity? (Comment with respect to factors such as goodness of fit, performance metrics, treatment of missing data.) Please include a specific Māori lens in your response.

  • Is there a potential for disproportionate benefit or disproportionate harm to one group or another in applying or interpreting the results? How do you propose to mitigate this? Please include a specific Māori lens in your response.

Although responses were inconsistent in quality, their completion (along with the face-to-face interview researchers had with the Governance Group) required explicit consideration around how their models may have the potential to mitigate or perpetuate existing disparities in healthcare outcomes or quality of care. In particular, researchers were asked to consider their methods for dealing with missing or unique data, potential for differential performance by population, presence of pre-existing bias, and suggest remedies to uses that might give rise to inequities. The Governance Group also made a decision that no personal information be collected in the use of any of the models. Finally, while the models had already been created, we used the ‘Overview’ page for a model to highlight possible unintended inequitable impacts that might be caused by certain use cases of that model.

Legal and privacy perspectives

Key legal considerations for the Governance Group centred around privacy and intellectual property rights. Human rights law considerations under the Human Rights Act 1993 were addressed when considering the ethical implications of an algorithm, as discussed above.

Privacy law concerns for health-related algorithms can arise in relation to the collection and accuracy of data used for training algorithms, as well as with how personal information is used within an algorithm. The Algorithm Information Request form included questions about how personal information may have been used in relation to model training as we as potential use. While many Hub applicants stated that their models did not involve the use of personal information such assertions were typically probed by the Governance Group, recognising that most health records are built upon personal information. However, it was not always possible to get assurance as to how personal information in training data had been collected and used, or whether it had been appropriately interrogated for accuracy and bias issues.

Ideally, those submitting information about their models would provide full details and evidence of their training data, the extent to which personal information was included, how that information was obtained and how issues like potential historical bias in the data had been addressed. Where appropriate information is not provided or there are concerns about privacy-related matters then those models should not be released.

Privacy concerns for the Hub itself were mitigated by restricting users from entering personal information when using models and by ensuring any outcomes inferred by the software following a calculation were not stored (Terms of Use and Privacy Policy). This allowed for the expediency required in a pandemic, but future enhancements of the Hub and similar initiatives will need to consider this issue closely.

Details of intellectual property right ownership and applicable licence terms were also requested in the Algorithm Information Request form. Such information was not always forthcoming. In other instances, commercial confidentiality concerns meant algorithms were not able to be made available on the Hub.

The Governance Group was not required to advise on broader legal risks associated with the Hub because such matters were dealt with by Orion Health Holdings Limited’s in-house legal team. As a result, it was sometimes unclear whose legal interests the Governance Group was supposed to consider. Liability issues were addressed in the Hub Terms of Use, which include disclaimers that the models are not intended to be used as a substitute for medical advice and that their predictive outcomes may not be accurate. Similar governance groups in future might like to more clearly define from the outset whose legal interests should be considered, so that relevant risks can be more readily identified and addressed.

Operational perspectives on the role of algorithms

Whilst clinicians may not always have to validate outputs of intelligent systems (AI Forum NZ 2020) it remains instructive to be vigilant about issues such as patient privacy and agency in addition to consent and transparency, particularly in relation to building trust and safeguarding public good interests (Powles and Hodson 2017). Transparency is at the heart of digital decision support (Larsson 2018) and highlights the need for clear explanations, in language patients and their whānau can understand, that can be integrated into shared decision-making (Magalhaes and Hounslow 2019; Heyen and Salloch 2021; Stahl et al. 2021). However, currently health consumers are predominantly passive agents (Stahl et al. 2021) who are rarely invited to share their insights and lived experiences in co-design digital partnerships (Digital Council for Aotearoa New Zealand 2020). When considering possible models for inclusion on the Hub, the Governance Group framed conversations to explore the developers’ awareness of the possible impacts of their models on different patient and family/ whānau cohorts across different contexts. Te Tiriti informed conversations about Māori patients and their whānau were informed by Te Kotahi’s Mana-mahi governance framework as it has been applied to AI (West et al. 2020) and presented in the preceding Māori perspectives’ section.

Understanding of the more general operational considerations for algorithm selection evolved over time, as the platform was broadened from its initial COVID-19 based frame. Broadly, algorithm selection required consideration of internal and external validity, as well as the potential real-world circumstances of usage. Internal validity was the most straightforward to address, in that most algorithms selected had undergone a scientific peer review process. This was reinforced by the ability to inspect the algorithms and interview the researchers directly. Whilst publication does not always guarantee appropriate methodology has been used or is reported (Dhiman et al. 2021) this aspect should continue to improve with the recent introduction of reporting standards for regression-based algorithms (Moons et al. 2015, 2019) and the proposal to extend this to AI models (Collins et al. 2021). Further thought needs to be given to whether relying on pre-existing peer review and publication standards may present a barrier to presentation of models from non-academic sources.

External validity proved more difficult to examine. This was particularly the case with COVID-19 risk prediction models that were derived from overseas datasets, with populations and healthcare systems that differ from Aotearoa NZ. Questions of generalisability and calibration of these models (Calster et al. 2019) both with respect to geographic differences and temporal evolution of the pandemic, were difficult to resolve. An example of this is the evolution of the pandemic both in terms of new variants of the virus and the effect of vaccination on disease transmission and severity, neither of which were considerations for models derived in the earliest phases of the pandemic.

In the end a pragmatic approach was taken, in the knowledge that post-derivation calibration of these models might not be feasible for the Aotearoa NZ population until case numbers were much higher. An additional focus was exploration of the inclusion or treatment of predictor variables representing ethnicity and social determinants of disease. Derivation or training of models from datasets that exclude marginalised or socio-economically disadvantaged communities, or discriminatory treatment of such variables in algorithm production, can result in encoded bias. This was of particular concern for those algorithms that might lead to increased inequity.

Discussion–themes arising from the governance experience

Several themes arose from the application of this governance framework process.

Governance Group discussions identified that the intended use of each algorithm can easily be overlooked. As with any technological advance, the proposed use may not directly suit the context of its design. Consequently, it was important to assess whether an algorithm was appropriate for the context in which it was to be deployed. This included understanding how it was developed, particularly in terms of methodology and data; and how it was intended to manage the operationalisation of algorithm enhanced decision making.

Sometimes it was important to be explicit about in-scope and out-of-scope cases for using each algorithm. For example, in the context of COVID-19, the spread and effects stochastic model (Hendy et al. 2021) was designed as a scenario tool, not a planning tool. It would be inappropriate to directly automate any planning decision based on the outputs of a model run. The nzRisk model for surgical outcomes (Campbell et al. 2019) was designed to inform decisions about the benefits and suitability of surgeries. It was noted that it would be easy to use this for rationing of care, despite that not being the intent of its developers. It could also be used to avoid burdensome steps in assessment pathways, such as routine consults for low-risk patients. This could greatly benefit those with difficulty accessing care, while also potentially adjusting the balance of high and low risk procedures that are performed. The equity implications of these changes could have unintended consequences, both positive and negative.

Another important perspective on context came when recognising the current processes that an algorithm would inform or replace. Existing processes are not always ideal, and therefore restricting the use of a new model due to its imperfection could have the effect of prolonging a worse status quo. For example, while the surgical scheduling algorithm (Soh et al. 2020) may not account for all possible trade-offs, it does account for more than the current manual process.

Perspectives unique to Aotearoa NZ were another common theme. COVID-19 models were evaluated in the context of limited cases and consideration needed to be given to monitoring and guidance that might be required for future utility. Aotearoa NZ at the time was pursuing an elimination strategy that substantially changed the dynamics of disease spread. Additionally, both Māori and Pacific peoples were unlikely to be well represented in international studies yet are both most impacted by past and current inequities which result in amplifying impacts from COVID-19.

Given these inequities, the potential differential impacts of algorithms on these populations were closely examined for ways they might contribute to exacerbating disparities. The risks and benefits needed to be explored, recognised and challenged. Deprivation, gender, rurality and ethnicity are critical lenses to test any algorithm, to understand whether they have significance as an independent variable directly, or if the accuracy of a model may be differentially affected. The aim was to ensure that the practices arising from the use of algorithms are well informed and guided toward an overall goal of equitable outcomes.

Algorithms can be complex, and the interpretability of results becomes very important. Governance Group members asked if end users could understand what the results mean; and how the end users might be expected to act? Concepts such as relative risk and attributable risk can be challenging even for experts, so designing the interface and guidance notes for each algorithm required discussion. Some of these issues had not been considered explicitly prior to these reviews, particularly in the case of algorithms developed outside of Aotearoa NZ.

A particularly difficult question to answer is whether a given algorithm has social licence or indeed cultural licence for use in Aotearoa NZ. Social licence refers to whether a practice is considered to be acceptable by the majority of citizens, whereas cultural licence acknowledges that individual risk/benefit judgements may not be sufficient to take the risks and benefits to collectives–including iwi and hapū–into account (Te Mana Raraunga 2017). These questions are rarely asked directly, and the perception of the presence or absence of social and cultural licence can be quickly swayed by recent headlines. Despite this difficulty, it was important to acknowledge the possibility that an algorithm could come into conflict with societal values. Egregious cases of misuse of private information are actually much easier to handle than borderline cases. If any algorithm is used for prioritisation, then almost by definition someone will receive lower priority as a result, perhaps reducing their access to services and better outcomes. In some applications algorithms are strictly used for ‘easy yes’ cases and never for the denying of service. This is a sensible step to ensure that social licence is maintained, but it is worth noting that it is unlikely to be sustainable and may in itself negate many of the benefits offered by algorithms and automation.

The impact of automating an ‘easy yes’ depends on what happens with the additional capacity enabled by such automation, and must be viewed through an equity lens. If additional capacity is used to progress ‘easy’ cases faster than others, then inequities could be worsened; however if it increases the time spent with more complex cases, then inequities could be reduced.

One concluding question that was asked in each case concerned who would be empowered by the use of each algorithm, and how. This framing captured the range of underlying questions. The question addressed whether it impacted individual patient care or aggregate population planning, and different guidance was required in each case.

Conclusions and recommendations

In conclusion, the first phase of the Hub initiative aimed to evaluate and host COVID-19 related models and algorithms to provide a central, shared knowledge base and scalable infrastructure and tools to support Aotearoa NZ’s pandemic response. It intended to lay the foundation for a national algorithm management solution that will provide value for the country’s health system beyond the current pandemic response, and also create a replicable governance and evaluation framework that could be applied in other situations. Our experience highlighted some features that could usefully be taken into account by those considering algorithm governance and implementation in practice. Of particular note, the diversity of the governance group was a significant strength: a range of views were immediately available.

On reflection some aspects of the Hub require further development. These include: enabling stronger feedback mechanisms for those using algorithms on the platform; developing capacity to evaluate model inputs and outputs over time; exploring formalised methods for assessing bias in derivation data and application; creating standards for transparency in algorithm development; revising standard questionnaires used for model assessment in light of lessons learned; empowering and equipping consumers and underrepresented groups to co-design models and to be involved in their implementation and monitoring.

Additionally the Hub was developed at a time of relatively low COVID-19 infections in Aotearoa New Zealand. Given the emergence of highly transmissible variants such as Omicron which have led to much increased prevalence and health services burden, as well as the impact of vaccination and changes in policy with respect to public health protections, it would be appropriate to re-examine the performance and scope of included algorithms. The value and risks of a given algorithm are highly dependent on the context of an outbreak and the associated response, meaning that we must continue to develop responsive governance mechanisms. Further, as Aotearoa emerges from a mindset borne of urgency, wider consultation should be undertaken to ensure that continuation or expansion of this capability is supported by both practitioners and citizens.

Despite these gaps, overall the Hub was an effective way of addressing one of the key motivations which primed its establishment, namely the lack of guidance around implementing general frameworks such as Aotearoa NZ’s Algorithm Charter. We hope that this composite description of our Governance Group’s experiences applying the framework can be used as a reference for other groups attempting a similar task.

Disclosure statement

No potential conflict of interest was reported by the author(s).