Enhancing Vulnerability Management with Machine Learning: Rapid7’s Automated CVSS Scoring Solution

Introduction

This post is co-written with Jimmy Cancilla from Rapid7. Organizations are managing increasingly distributed systems, which span on-premises infrastructure, cloud services, and edge devices…

Rapid7’s Solution Architecture

Rapid7 built their end-to-end solution using Amazon SageMaker AI…

Orchestrating with SageMaker AI Pipelines

The first step in the journey toward end-to-end automation was removing manual activities…

Designing the Pipeline

After refactoring, pipeline steps were moved to SageMaker Training and Processing jobs…

Data Loading and Preprocessing

The data used to train the model comprised existing vulnerabilities…

Model Training, Evaluation, and Deployment

For the remaining pipeline steps, Rapid7 executed each step eight times…

Defining the Pipeline

With all the steps defined, a pipeline object is created with all the steps…

Managing Models with SageMaker Model Registry

SageMaker Model Registry is a repository for storing, versioning, and managing ML models…

Deploying Models with Inference Components

When a set of CVSS scoring models has been selected, they can be deployed in…

Ensuring Cost Efficiency

Cost efficiency was a key consideration in designing this workflow…

Monitoring Models in Production

Rapid7 continually monitors the models in production…

Conclusion

End-to-end automation of vulnerability scoring model development and deployment has given Rapid7 a consistent, fully automated process…

About the Authors

Jimmy Cancilla is a Principal Software Engineer at Rapid7…

Enhancing Vulnerability Management with Automated Machine Learning: A Deep Dive into Rapid7’s Approach

This post is cowritten with Jimmy Cancilla from Rapid7.

Organizations today grapple with managing complex, distributed systems that extend across on-premises infrastructure, cloud services, and edge devices. As these systems interconnect and share data, the potential avenues for exploitation increase, making effective Vulnerability Management (VM) crucial in mitigating risk.

Understanding Vulnerability Management

Vulnerability Management refers to the comprehensive process of identifying, classifying, prioritizing, and remediating security weaknesses in various assets, including software, hardware, IoT devices, and virtual machines. When new vulnerabilities are identified, organizations face pressure to respond promptly; any delay can lead to exploits, data breaches, and reputational damage. For organizations managing thousands, or even millions, of software assets, it becomes essential to effectively triage and prioritize these vulnerabilities.

The Role of CVSS in Vulnerability Management

The Common Vulnerability Scoring System (CVSS) has emerged as an industry standard for evaluating the severity of software vulnerabilities. CVSS v3.1 offers a structured framework that scores vulnerabilities based on dimensions such as exploitability, impact, and attack vectors. Major organizations, including NIST, rely on CVSS to prioritize remediation efforts and comply with standards.

Nevertheless, a significant gap exists in the immediate aftermath of vulnerability disclosures. Vendors are not mandated to provide a CVSS score, and regulatory bodies lack specific timelines for CVSS analysis. Consequently, many vulnerabilities surface without associated CVSS scores, leaving organizations unsure how to act—whether to patch immediately, monitor, or deprioritize. Here, Rapid7 aims to bridge the gap using machine learning (ML) to provide timely insights to its customers.

Rapid7’s Automated Solution Architecture

To tackle this challenge, Rapid7 has harnessed Amazon SageMaker AI—an AWS-managed ML service—to automate the training, validation, and deployment of ML models that predict CVSS vectors. This technology simplifies complexity, reduces vulnerabilities, and enables effective remediation.

Building End-to-End Automation with SageMaker

Rapid7’s journey toward automating vulnerability scoring began with transitioning from manual tasks to a fully automated ML pipeline. By integrating SageMaker with their existing DevOps tools (GitHub for version control and Jenkins for build automation), Rapid7 implemented continuous integration and continuous deployment (CI/CD) processes for their CVSS scoring models. This ensures that their models are continuously updated with the latest vulnerability data, minimizing operational overhead.

Orchestrating Automation with SageMaker Pipelines

The automation process starts by migrating experimental code from Jupyter notebooks to robust Python scripts. Each ML pipeline step—ranging from data downloading to model deployment—is structured as separate Python modules, enhancing maintainability.

For instance, CVSS v3.1 consists of eight independent metrics. Rapid7 trained eight distinct models simultaneously while sharing common data preprocessing steps. As data is updated regularly, the pipeline is configured to download the newest vulnerability data, structure it for training, and execute the training jobs.

Managing and Deploying Models Efficiently

The SageMaker Model Registry is pivotal for tracking model versions and managing the evolution of CVSS scoring models. Each accurate iteration is automatically registered, while models falling below accuracy thresholds prompt alerts, ensuring that Rapid7 deploys only the highest-quality scoring models.

When finalized, selected models are deployed on SageMaker endpoints for real-time inference, allowing organizations to calculate CVSS vectors instantly as fresh vulnerability data becomes available.

Optimizing Cost Efficiency

Cost-efficiency was a major consideration in this workflow. Traditional models risk expensive, idle compute resources; hence, Rapid7 smartly implemented Inference Components in their SageMaker endpoint. This enables multiple models to share resources, optimizing costs while maintaining quick response times—achieving sub-second responses during burst periods.

Monitoring and Improving in Production

To maintain high availability and resource efficiency, Rapid7 continually monitors deployed models using Amazon CloudWatch and Grafana. Regular monitoring allows rapid adjustments based on performance metrics, ensuring responsive operation and minimizing delays in the vulnerability remediation pipeline.

Conclusion

The end-to-end automation of Rapid7’s vulnerability scoring model development significantly streamlines the process. By moving away from fragile manual operations, their engineering team has reclaimed crucial time—saving at least 2 to 3 days of maintenance monthly. Additionally, implementing shared resources for ML models has led to nearly 50% savings on cloud compute costs.

Most importantly, automation guarantees that Rapid7 customers receive the most recent Common Vulnerabilities and Exposures (CVEs) with assigned CVSS scores. This is particularly vital for insights provided by Rapid7’s Active Risk Scores, which rely heavily on accurate CVSS scores to assess vulnerability impact.

As we embrace automated ML processes with technologies like Amazon SageMaker, teams can focus on driving innovation and delivering value to customers while reducing operational overhead and enhancing security posture.

About the Authors

Jimmy Cancilla is a Principal Software Engineer at Rapid7, specializing in machine learning and AI applications for cybersecurity challenges.
Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS.
Steven Warwick is a Senior Solutions Architect at AWS, focusing on SaaS architectures and Generative AI solutions.

By learning from these advancements, organizations can optimize their vulnerability management strategies to navigate an increasingly interconnected world successfully.

Exclusive Content:

How Rapid7 Leverages Amazon SageMaker AI to Automate Vulnerability Risk Scoring with ML Pipelines

Enhancing Vulnerability Management with Machine Learning: Rapid7’s Automated CVSS Scoring Solution

Introduction

Rapid7’s Solution Architecture

Orchestrating with SageMaker AI Pipelines

Designing the Pipeline

Data Loading and Preprocessing

Model Training, Evaluation, and Deployment

Defining the Pipeline

Managing Models with SageMaker Model Registry

Deploying Models with Inference Components

Ensuring Cost Efficiency

Monitoring Models in Production

Conclusion

About the Authors

Enhancing Vulnerability Management with Automated Machine Learning: A Deep Dive into Rapid7’s Approach

Understanding Vulnerability Management

The Role of CVSS in Vulnerability Management

Rapid7’s Automated Solution Architecture

Building End-to-End Automation with SageMaker

Orchestrating Automation with SageMaker Pipelines

Managing and Deploying Models Efficiently

Optimizing Cost Efficiency

Monitoring and Improving in Production

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe