Predicting Employee Attrition: A Data-Driven Approach Using SHAP
Feel free to let me know if you’d like any changes or additional headings!
Predicting Employee Attrition: Utilizing Machine Learning for Workforce Retention
Highly skilled employees leaving a company suddenly can create significant challenges. Employee attrition can lead to costly disruptions, as recruiting and training new hires who understand the company culture takes considerable time and resources. This brings us to a pivotal question:
“What if we could predict who might leave and understand why?”
While many attribute employee departures to work disconnection or better opportunities elsewhere, the reality is often more nuanced. A sudden influx of resignations in an office can be alarming, and without recognizing patterns, organizations may miss valuable insights that could help in retaining their top talent.
So, do companies and HR departments actively seek to minimize the loss of valuable employees? Absolutely! In this article, we’ll explore how a straightforward machine learning model can help predict employee attrition and how the SHAP (SHapley Additive exPlanations) tool can provide insights for effective action.
Understanding the Problem
According to a 2024 report by WorldMetrics, 33% of employees leave their jobs due to a lack of career development opportunities—a staggering statistic that highlights the need for proactive measures. In an example company of 180 employees, this translates to 60 employees resigning each year.
What is Employee Attrition?
As defined by Gartner, employee attrition is “the gradual loss of employees when positions are not refilled, often due to voluntary resignations, retirements, or internal transfers.” Understanding the root causes of attrition is critical for organizational sustainability.
How Does Analytics Help HR Proactively Address It?
The HR department is uniquely positioned to leverage analytics to identify the root causes of employee attrition. By employing analytics, HR teams can uncover historical attrition trends, demographic patterns, and can design targeted retention strategies.
What is the SHAP Approach?
SHAP is a robust method used to interpret machine learning model outputs. It provides insights into the reasons behind voluntary resignations, helping HR understand the "why" behind predictions.
To get started with SHAP, you can install it through the following commands:
!pip install shap
or
conda install -c conda-forge shap
Dataset Overview
For this analysis, we’ll use the IBM HR Analytics Employee Attrition & Performance dataset, which contains data on over 1,400 employees. Key variables will include:
- Attrition: Whether the employee left or stayed
- Over Time, Job Satisfaction, Monthly Income, Work-Life Balance
This dataset serves as a foundation for using the SHAP approach to predict employee attrition effectively.
Steps to Predict Employee Attrition Using SHAP
Step 1: Load and Explore the Data
First, we will load the dataset and conduct preliminary exploration to understand its structure.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# Load the dataset
df = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')
print("Shape of dataset:", df.shape)
print("Attrition value counts:\n", df['Attrition'].value_counts())
Step 2: Preprocess the Data
Next, we will preprocess the data by encoding categorical features and splitting it into training and testing sets.
# Convert the target variable to binary
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})
# Encode categorical features
label_enc = LabelEncoder()
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
df[col] = label_enc.fit_transform(df[col])
# Define features and target
X = df.drop('Attrition', axis=1)
y = df['Attrition']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 3: Build the Model
We’ll utilize the XGBoost classifier to build the model.
from xgboost import XGBClassifier
from sklearn.metrics import classification_report
# Initialize and train the model
model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))
Step 4: Explain the Model with SHAP
Using SHAP, we can gain insights into which features were most significant in predicting attrition.
import shap
# Initialize SHAP
shap.initjs()
explainer = shap.Explainer(model)
shap_values = explainer(X_test)
# Summary plot
shap.summary_plot(shap_values, X_test)
Step 5: Visualize Key Relationships
Further insights can be uncovered by visualizing relationships within the data.
import seaborn as sns
import matplotlib.pyplot as plt
# Visualizing Attrition vs OverTime
plt.figure(figsize=(8, 5))
sns.countplot(x='OverTime', hue="Attrition", data=df)
plt.title("Attrition vs OverTime")
plt.xlabel("OverTime")
plt.ylabel("Count")
plt.show()
Business Insights from the Data
Here are five key insights derived from the analysis:
| Feature | Insight |
|---|---|
| Over Time | High overtime increases attrition |
| Job Satisfaction | Higher satisfaction reduces attrition |
| Monthly Income | Lower income may lead to higher attrition |
| Years At Company | Newer employees are more likely to leave |
| Work-Life Balance | Poor balance is linked to higher attrition |
Key Insights for HR Departments
- Employees working overtime tend to leave more frequently.
- Low job satisfaction significantly increases the risk of attrition.
- Monthly income has an impact, albeit less than overtime and job satisfaction.
Revising Policies
To mitigate attrition, HR can:
- Revisit compensation plans: Ensure competitive salaries to retain talent.
- Reduce overtime or offer incentives: Address employee burnout and enhance job satisfaction.
- Improve job satisfaction through employee feedback: Actively seek input to guide workplace improvements.
- Promote a better work-life balance: Encourage practices that support employee wellness.
Conclusion
Predicting employee attrition through machine learning and SHAP can empower companies to retain their best employees and maximize profits. By understanding who might leave and why, organizations can create proactive strategies to address these concerns before it’s too late.
Frequently Asked Questions
-
What is SHAP?
SHAP explains the impact of each feature on a model’s prediction. -
Is this model applicable to real companies?
Yes, with proper tuning and data, it can be very useful. -
Can I use other models?
Absolutely, logistic regression and random forests are also viable options. -
What are the primary reasons employees leave?
Key factors include overtime demands, low job satisfaction, and poor work-life balance. -
How can HR utilize these insights?
To formulate better policies aimed at retaining employees.
With tools like SHAP, companies can not only predict but understand and address the dynamics of employee attrition effectively.
Jyoti Makkar is a writer and an AI generalist, co-founding WorkspaceTool.com to help businesses discover and select the best software solutions.