Understanding the Prior and Posterior in Bayesian Inference for Anomalous User Behavior Detection

Today, I want to dive deeper into the technical details of how we calculate the values of \(\alpha_{prior}\) and \(\beta_{prior}\) in our Bayesian inference model at Fortscale. In my previous posts, I explained how we use these values to incorporate prior knowledge and prevent false alerts for users who have never acted anomalously.

The prior is a crucial component of Bayesian inference as it allows us to incorporate our prior knowledge when calculating probabilities. In our case, it helps us address the challenge of users with a history of zero SMART values triggering alerts for any positive value. By setting the right values for \(\alpha_{prior}\) and \(\beta_{prior}\), we can strike a balance between incorporating organizational knowledge and giving weight to the user’s actual data.

To determine the values of \(\alpha_{prior}\) and \(\beta_{prior}\), we need to consider the organization’s overall level of anomalous activities. If there are many anomalous activities, the analyst’s interest threshold is higher. We can simulate this effect using the prior by setting \(\alpha_{prior}\) to the number of SMART values in the organization and \(\beta_{prior}\) to their sum. This way, the prior represents the knowledge of the amount of anomalous activities in the organization.

However, setting \(\alpha_{prior}\) too high can make the prior too influential, leading to the user’s data having minimal impact on the calculated probability. To address this, we experimented with real-life data and found that setting \(\alpha_{prior}\) to a reasonable small number, such as 20, while updating \(\beta_{prior}\) to be \(\alpha_{prior}\) times the average of the organization’s SMART values, strikes the right balance.

Choosing a smaller \(\alpha_{prior}\) reduces the prior’s influence, allowing the user’s data to affect their threshold while still taking into account the organization’s level of anomalous activities. The variance of the prior also increases, allowing for some uncertainty in the expected value. This balance between the organization’s knowledge and the user’s data is crucial in personalizing the threshold and reducing false alerts.

In conclusion, calculating the values of \(\alpha_{prior}\) and \(\beta_{prior}\) in our Bayesian inference model requires careful consideration of the organization’s level of anomalous activities and the desired influence of the user’s data. By striking the right balance, we can effectively detect and prevent insider threats while minimizing false alerts and maintaining personalized thresholds.

Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Identifying Anomalies Efficiently with SMART Technology (Part Three)

Understanding the Prior and Posterior in Bayesian Inference for Anomalous User Behavior Detection

Latest

Reinforcement Fine-Tuning for Amazon Nova: Educating AI via Feedback

Calculating Your AI Footprint: How Much Water Does ChatGPT Consume?

China’s AI² Robotics Secures $145M in Funding for Model Development and Humanoid Robot Enhancements

A Comprehensive Family of Large Language Models for Materials Research: Insights on Model Adaptability During Continued Pretraining

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

VOXI UK Launches First AI Chatbot to Support Customers

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Reinforcement Fine-Tuning for Amazon Nova: Educating AI via Feedback

Creating a Personal Productivity Assistant Using GLM-5

Creating Smart Event Agents with Amazon Bedrock AgentCore and Knowledge Bases

Popular categories

Most recent

Reinforcement Fine-Tuning for Amazon Nova: Educating AI via Feedback

Calculating Your AI Footprint: How Much Water Does ChatGPT Consume?

China’s AI² Robotics Secures $145M in Funding for Model Development and Humanoid Robot Enhancements

Most popular

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Subscribe