Enhancing the QUAlity and Transparency Of health Research
The UK EQUATOR Centre is hosting Anna Koroleva as a visiting fellow until the end of January. This young Russian researcher in computational linguistics is developing algorithms to detect spin in the reports of clinical trials. Spin in clinical trials is defined as the distortion of the interpretation of a study result, misleading readers to believe in positive effects that were not really there. Spin is also detected when researchers change the primary outcome of a study to emphasise secondary results, rather than the main focus of the investigation. Anna will use machine learning to automatically detect claims and the supporting evidence in the reports of clinical trials.
Anna Koroleva at the UK EQUATOR Centre, in Oxford, and working on the algorithm to detect spin on research
Tell us how a linguist ended up conducting research in health reporting…
I was working for a company that builds computer programs to tackle large amounts of information, with advanced text and data analysis in many areas. A number of clients approached us with requests from various medical fields, and it became clear that the healthcare industry and research areas are in need of solutions to manage medical coding and detect specific issues in big databases. I realised that computational linguistics solutions could be used to reveal overlooked issues in large sets of texts.
As a linguist, I had worked only with theoretical analysis in the academic field until then, but I wanted to really see things working. This was when I got to know about MiRoR (Methods in Research on Research), Cochrane and the EQUATOR Network. I saw then that maybe there would be a way to use the power of extracting content automatically from texts to help research in health.
What was your first contact with research in journalology?
MiRoR is a programme bringing together several research institutions, both academic and non-academic. It includes 15 joint PhD projects, and I was accepted in one of them, hosted by the Centre National de la Recherche Scientifique (CNRS) and the Université Paris-Saclay in France. My project applies computational linguistics to the investigation of spin in health research.
Cochrane is part of the MiRoR project, and it hosts students for secondments. The Cochrane Schizophrenia Group contacted me with a proposition to start researching clinical trials in the field of mental health. As the outcomes in mental health can be very tricky to define, this seemed as a good place to start looking for bad reporting! With this first contact with their work, I got some feeling of what spin really is.
Who is guiding and supervising your research?
My main supervisor is Patrick Paroubek (from CNRS), a specialist in natural language processing working in the Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI at the ILES group). Paroubek works with annotation corpora studies, that is, producing models of representation usable for language processing.
My co-supervisor is Patrick Bossuyt (University of Amsterdam), a clinical epidemiologist who investigates ways to reduce waste on research, in particular he participated in the development of the STARD reporting guideline for diagnostic test accuracy studies. But I also receive a lot of guidance from Isabelle Boutron (University Paris Descartes), the deputy director of the French EQUATOR Centre and also involved with Cochrane. My mentor is Liz Wager, a freelance consultant and trainer with a special interest in medical publishing and peer review and a long-time collaborator with the EQUATOR Network.
What is the focus of your project? How is it evolving? When we decided to investigate spin using the computational linguistics approach, we had to decide what phrases we would look for in the research reports that would mean there was evidence of spin in them. After deepening my studies on what spin really is, I felt I should narrow the focus of the research to better find something that could be used as a “spin marker” or “false claim detector”. This is a daring proposition, and we are still in the initial phases of this research.
Due to feasibility problems, we had to change both the methods and focus twice during the investigation. We first thought about comparing the clinical trial registration protocol and the actual clinical trial report. The second idea was to focus on the abstract: to compare the primary outcome declared in the abstract versus the outcomes reported in the results and conclusions of the abstract. But it is a well-known problem that there is a frequent switch of outcomes (which indicates spin), while the information needed to detect it is not always present in the abstract. So we finally focussed on searching for the declared primary outcomes, in the abstract and in the full text, and if they were at all reported, and the results actually emphasised in the conclusion.
I built and I am currently adjusting a demo version of an algorithm to identify outcomes in clinical trial reports and to check the consistency of their reporting. It is an eternal process of incorporating new things and ideas that I receive from other investigators, my supervisors and a scoping review that was conducted at the outset of this research project.
What do you expect with this experience in the EQUATOR Centre?
I already had some experience with the software used for computational linguistics, but I wanted to add features to many of them, because they do not fit the purpose of researching spin completely. So I began to develop my own algorithms instead. I am now working on the demo of this solution, and I hope I can interact with the researchers in EQUATOR to learn more, solve some interface problems and improve the “product” by garnering experience from EQUATOR. It will be a very good way of piloting what I have already done.
What is the sensitivity you are aiming at with your spin-detector algorithm?
For new algorithms under construction in computational linguistics, generally a power of 60% to detect what we are looking for is considered acceptable. However, as well pointed out by Isabelle Boutron, in spin detection a false positive result would be very bad. Therefore, we are aiming at a high sensitivitywith this algorithm, but at the same time, to avoid noise with false positives. To verify this, we will need the collaboration of specialists, I mean human specialists, to validate what the algorithm will find.
What have been the main challenges so far?
We have been having some bumps on the road. Abbreviations are a big problem: when you are looking for an outcome such as “survival” and you find out that it can be reported as “OS” (overall survival), “EFS” (event-free survival), and many other types of survival… you have to take care to add all of these to the algorithm.
Other problems are synonyms and lexical variability in the outcome expressions, for example: “body weight decrease”, “body weight reduction” and “reduction of body weight” are only three of many synonyms used, sometimes in the same report, for the same outcome analysed. Lengthy expressions pose additional challenges as it becomes extremely difficult to differentiate between the versions of the same outcome. How to differentiate when the authors are reporting the same outcome all along the paper or when they changed (switched) the outcomes under investigation if they refer to them using different names?
To tackle these problems we are building a dataset of synonyms including eponyms, hierarchy of nouns, international coding of medical diseases and sets or collections of terms.
Published on 28 January 2019
How can reporting quality interfere with reproducibility issues and overall trust in science results? With that question in mind, we participated in the Reproducibility, Replicability and Trust in Science conference organised by the Wellcome Genome Campus from 9 to 11...