Here are suggested headings for your dataset section:

### Dataset Overview
### Source and Composition
### Therapeutic Areas Representation
### Annotation Process
### Expert Annotation Insights
### Inter-Annotator Agreement
### Dataset Partitioning
### Application of NER System
### NER Model Training and Evaluation
### Quality Assessment of NER
### Utility and Comparative Analysis
### Large Language Model Performance
### Availability of the Dataset and Code

Unveiling the SURUS Dataset: A Comprehensive Look at Interventional Study Abstracts

In the evolving landscape of clinical research, the need for high-quality data has never been more pronounced. Our dataset, extracted from PubMed, the premier source of clinical evidence, encapsulates the vital nuances of interventional study reports. Let’s take a closer look at the dataset’s intricacies, its construction, and its significance for Natural Language Processing (NLP) tasks, specifically Named Entity Recognition (NER).

Dataset Composition

Our dataset comprises 400 abstracts from interventional studies, representing four key therapeutic areas identified by the World Health Organization’s ICD-11: cardiovascular diseases, endocrine disorders, neoplasms, and respiratory diseases. Each area includes 100 randomly selected abstracts, serving to demonstrate the diversity inherent in interventional study reporting styles, which can vary significantly across therapeutic fields.

To further enhance versatility, an additional 123 out-of-domain abstracts were incorporated. This group consists of 90 abstracts from different therapeutic areas and 33 from various study types. The aim was clear: to reflect the real-world variety found in interventional publication abstracts.

Expert Annotations

A hallmark of the dataset is its meticulous expert annotations. Each abstract was manually labeled, with entities assigned to one of 25 distinct labels across seven classes. This granular approach was designed to extract not only key elements of PICOS (Population, Intervention, Comparator, Outcome, Study Design) but also other important information that might aid in comprehensive analysis.

For example, while "Population" may include methodologies and disease indications, other elements—like "overall survival"—could be assigned different labels based on context (e.g., methodology or results sections). This level of detail adds to the intricate nature of the annotation process, ensuring that every nuance in the text is captured.

Annotation Process and Quality Assurance

Quality assurance in the annotation phase was paramount. Graduate students with biomedical or pharmaceutical backgrounds undertook the task under the guidance of a detailed manual and following an intensive course on the annotation methodology. Regular “consensus sessions” and expert reviews facilitated consistency across annotations, assuring high quality.

The systematic framework resulted in 39,531 annotations across the 400 abstracts, averaging nearly 99 annotations per abstract. Inter-annotator agreement was robust, revealing a Cohen’s κ of 0.81 and an F1 score of 0.88, affirming the dataset’s reliability.

Leveraging the Datasets: Training the NER Model

Once the annotations were completed, the next step involved training the NER model. The abstracts were tokenized using the BERT tokenizer. Given that BERT has a limitation of 512 subword tokens, a sliding window approach was employed to handle abstracts exceeding this token count. This technique enabled effective processing of longer abstracts without losing critical information.

The model was trained to assign BILOU tags—offering more nuanced classification than the traditional BIO format—and was optimized using a learning rate of 5*10^–5 for 8 epochs. This training regimen was crucial for achieving high accuracy in entity recognition.

Evaluating the Model’s Performance

Model evaluation occurred in two main settings: in-domain and out-of-domain. The in-domain metrics were assessed using tenfold cross-validation, ensuring robust validation of the model’s predictive capabilities. For out-of-domain testing, the SYSTEM was evaluated against datasets with varying therapeutic areas and study types, ensuring its versatility.

Practical Utility of the SURUS Dataset

The SURUS dataset’s utility extends beyond mere data; it acts as a critical resource for systematic literature reviews. By comparing SURUS predictions to expert annotations from Cochrane reviews, we evaluated its efficacy and the accuracy of its extracted PICOS elements. Metrics such as precision, recall, and F1 were employed to gauge performance, revealing insights into its applicability in real-world scenarios.

Exploring LLM Performance

In recent experiments, we also compared the performance of state-of-the-art Language Learning Models (LLMs) like GPT-4 against the SURUS dataset. These evaluations further illustrated the comparative strengths and weaknesses of different models in performing NER tasks.

Conclusion

The SURUS dataset stands as a pioneering effort to synthesize high-quality annotations from a diverse set of interventional study abstracts. Its depth and granularity not only support advanced NLP tasks but also enhance the overall quality of research across various therapeutic domains. As this dataset becomes more widely accessible, it promises to advance both clinical research methodologies and AI capabilities in understanding intricate medical texts.

For those interested in delving deeper, the methods, code, and complete dataset are available in our Git repository, fostering transparency and collaboration within the research community.

Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

VOXI UK Launches First AI Chatbot to Support Customers

Evaluation of SURUS: An NLP System for Named Entity Recognition to Extract Knowledge from Interventional Study Records | BMC Medical Research Methodology

Unveiling the SURUS Dataset: A Comprehensive Look at Interventional Study Abstracts

Dataset Composition

Expert Annotations

Annotation Process and Quality Assurance

Leveraging the Datasets: Training the NER Model

Evaluating the Model’s Performance

Practical Utility of the SURUS Dataset

Exploring LLM Performance

Conclusion

Latest

Running Your ML Notebook on Databricks: A Step-by-Step Guide

Former UK PM Johnson Acknowledges Using ChatGPT in Book Writing

Provaris Advances with Hydrogen Prototype as New Robotics Center Launches in Norway

Public Adoption of Generative AI Increases, Yet Trust and Comfort in News Applications Stay Low – NCS

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

VOXI UK Launches First AI Chatbot to Support Customers

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Microsoft launches new AI tool to assist finance teams with generative tasks

U.S. Artificial Intelligence Market: Size and Share Analysis

How AI is Revolutionizing Data, Decision-Making, and Risk Management

Transformers and State-Space Models: A Continuous Evolution

Popular categories

Most recent

Running Your ML Notebook on Databricks: A Step-by-Step Guide

Former UK PM Johnson Acknowledges Using ChatGPT in Book Writing

Provaris Advances with Hydrogen Prototype as New Robotics Center Launches in Norway

Most popular

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

VOXI UK Launches First AI Chatbot to Support Customers

Subscribe