The CancerGrid experience: Metadata-based model-driven engineering for clinical trials

Jim Davies, Jeremy Gibbons, Steve Harris, Charles Crichton
<span title="">2014</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/hq6x4whtd5hhlhsxzculyeamey" style="color: black;">Science of Computer Programming</a> </i> &nbsp;
h i g h l i g h t s • Summary, philosophy, and lessons of the CancerGrid project and follow-ons. • Software support for cancer clinical trials, and similar data collection exercises. • Metadata support, to enable subsequent meta-analysis. • Model-driven generation of software artefacts to run trial. • Four case studies. a b s t r a c t The CancerGrid approach to software support for clinical trials is based on two principles: careful curation of semantic metadata about clinical observations, to
more &raquo; ... enable subsequent data integration, and model-driven generation of trial-specific software artefacts from a trial protocol, to streamline the software development process. This paper explains the approach, presents four varied case studies, and discusses the lessons learned. 127 From a software point of view, a clinical trial is largely an exercise in data management: observations have to be specified, collected, recorded, integrated, and analysed. But the software engineering aspects of setting up and running a clinical trial are not trivial. Two particular problems that we will address in this paper involve data integration and tool generation. The data integration problem occurs because medical researchers want to be able to combine the results of multiple trials, a process known as meta-analysis. It is often the case that a single trial in isolation does not have adequate statistical power to yield a robustly significant conclusion. Nevertheless, if sufficiently many trials have been conducted, investigating sufficiently similar hypotheses and collecting sufficiently similar data, it may be possible to pool the results to greater effect. In other situations, the meta-analysis aims at evaluating new hypotheses that are formulated long after the completion of the trials that originally collected the data involved-in this case, data from trials investigating quite different hypotheses may be integrated. Either way, for meta-analysis to be possible, it is necessary to capture and curate metadata expressing the 'semantics' of the data-only then is it possible to determine whether data collected in different trials are commensurate, and if so, how to relate them. For example, when measuring blood pressure, it is not enough to record a pair of numbers, or a pair of pressures, or a pair of measurements in mmHg, or even to indicate that these represent systolic and diastolic pressure. It is also necessary to know how that data was collected (at rest, or after five minutes on a treadmill?), and maybe factors such as who collected it (in the clinic by a professional, or at home by the patient?) and how recent and reliable it is. This semantic metadata is an essential part of the context of the data, and logically forms part of the model of the trial, alongside more syntactic metadata such as the name of the trial and the types of data items. As for tool development, current standard practice in clinical trials management is to pass a textual document containing the trial protocol over to database programmers in the clinical trials unit, or to consultants from a trials management software provider, who will use it as guidance in manually configuring an information management system for this particular trial. This practice causes numerous problems. Firstly, it induces delays: it is usually the case that some to-ing and fro-ing is needed between the database programmers and the medical researchers to get the details right; but the medics are often too busy to respond immediately, and it is not uncommon for the trial to have to start on paper because the software is not ready. Secondly, it is costly. This is not such a problem for a big 'phase III' trials operated on behalf of pharmaceutical companies pursuing regulatory approval: the study will have thousands of participants and a stable design, so the software development will form only a small proportion of the overall cost, and is likely to be recouped in sales over the lifetime of the drug. However, it is a problem for early-phase exploratory studies and late-phase post-approval studies: the former are much smaller, more dynamic and inherently risky, as animal models are an unreliable predictor of efficacy in humans; the latter are typically funded by charities, governments and NGOs in academic settings on a tight budget. Even then, many promising drugs are not brought to market because the return on the drug outweighs the cost of approval. Thirdly, it is not uncommon for an early-phase trial protocol to undergo changes during the execution of the trial, requiring adjustments to software components of the associated trial management system. Current practice is to implement these changes through manually modifying the underlying code, running the risk of introducing software bugs when the system is in production use. And finally, bespoke database design on a per-trial basis is unlikely to promote the consistency and interoperability needed for meta-analysis. All four of these generation issues could be addressed if the development of the software tools needed to support trial execution could be automated. Fortunately, there is essentially enough information in the trial protocol -which needs to be written anyway, not least for the purposes of regulatory approval -to completely determine the relevant software artefacts, either from scratch or by configuring more generic components. If the protocol were written in a more structured formatthat is, as a formal model, rather than merely a textual description, of the trial -then both the prose and the code could be generated from it by suitable processing, and any adjustments required because of changes to the trial protocol can be made without risky manual intervention at the level of code. Moreover, as we have seen, the annotation of the data descriptions in the trial model with semantic metadata will make that model doubly useful, as a basis for supporting meta-analysis in addition to being a specification for a software system. In other words, clinical trials management is crying out for a model-driven approach. The CancerGrid approach, in a nutshell The CancerGrid project [1] was initiated in order to address the twin problems of interoperability and generativity in clinical trials, taking a model-driven approach to the development of trials management tools. It was funded in the first instance for three years from 2005 by the UK Medical Research Council, with the involvement of five UK universities: Cambridge (specializing in oncology), Oxford (software engineering), University College London (semantic modelling), Birmingham (clinical trials management), and Belfast (telemedicine). Oxford University and the Cancer Research UK Cambridge Research Institute have been continuing the work since the original project ended in 2008. The CancerGrid approach addresses the two problems of data integration and tool generation, via the collection and management of metadata in the first case, and model-driven engineering in the second-improving the science through greater effectiveness, and reducing drudgery through greater efficiency. Regarding the metadata, much of the interoperability requirement pivots on some kind of consensus on -or at least, machine-processable documentation of -the common data elements being recorded. There can be no magic here: if two trials have collected incompatible data, or one of them has provided insufficient metadata to allow compatibility to be determined,
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.scico.2013.02.010">doi:10.1016/j.scico.2013.02.010</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/xkbamlmonffspc5wxgf7zbg6vi">fatcat:xkbamlmonffspc5wxgf7zbg6vi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190417205828/https://core.ac.uk/download/pdf/82272809.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f5/98/f5981b1b9ed9d71a3a1d0333df3c63bcff5775a8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.scico.2013.02.010"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> elsevier.com </button> </a>