A Step-by-Step Guide to Hosting Machine Learning Notebooks in Databricks

Understanding Databricks Plans

Hands-on

Step 1: Sign Up for Databricks Free Edition

Step 2: Create a Compute Cluster

Step 3: Import or Create a Notebook

Step 4: Install Dependencies

Step 5: Run the Notebook

Step 6: Coding in Databricks

Step 7: Save and Share Your Work

Things to Know About Free Edition

Conclusion

Frequently Asked Questions

Q1: How do I start using Databricks for free?

Q2: Do I need to install anything locally on my ML notebook to run Databricks?

Q3: How do I install Python libraries in my ML notebook on Databricks?

By Janvi, Data Science Enthusiast at Analytics Vidhya
Login to continue reading and enjoy expert-curated content.
Keep Reading for Free

Hosting Your Machine Learning Notebook in Databricks: A Step-by-Step Guide

Databricks has emerged as one of the leading platforms for building and executing machine learning (ML) notebooks at scale. It combines the power of Apache Spark with a user-friendly notebook interface, integrated data tooling, and efficient experiment tracking capabilities. Whether you’re a data scientist, student, or just starting your journey into machine learning, this guide will take you through the steps to host your ML notebook in Databricks using the Free Edition.

Understanding Databricks Plans

Before diving in, it’s important to understand the various Databricks plans:

Free Edition:
- Best for individuals and small projects.
- Features include:
  - A single-user workspace
  - Access to a small compute cluster
  - Support for languages like Python, SQL, and Scala
  - MLflow integration for experiment tracking
- Drawbacks: Limited resources, timeouts after idle usage, and some enterprise features are disabled.
Standard Plan:
- Suitable for small teams.
- Offers larger compute clusters and collaboration features.
Premium Plan:
- Introduces advanced security features and user management.
Enterprise/Professional Plan:
- Designed for production environments requiring advanced governance and automation.

This tutorial will focus on the Free Edition, perfect for testing and learning without a financial commitment.

Hands-On: Hosting Your ML Notebook in Databricks

Step 1: Sign Up for Databricks Free Edition

Visit Databricks Free Edition.
Sign up using your email, Google, or Microsoft account.
Once signed in, a workspace is automatically created, serving as your command center for controlling notebooks and clusters.

Step 2: Create a Compute Cluster

To execute code, you’ll need to create a compute cluster:

Navigate to Compute in the sidebar.
Click on Create Cluster.
Name your cluster and select the default runtime (preferably, Databricks Runtime for Machine Learning).
Click Create and wait for it to show a status of Running.

Note: Clusters may shut down after a period of inactivity in the Free Edition, but you can restart them as needed.

Step 3: Import or Create a Notebook

You can use an existing ML notebook or create a new one:

To import a notebook:
- Navigate to Workspace.
- Use the dropdown beside your folder → Import → File and upload your .ipynb or .py file.
To create a new one:
- Click on Create → Notebook.
- Bind the notebook to your running cluster using the dropdown at the top.

Step 4: Install Dependencies

If your notebook requires libraries like scikit-learn, pandas, or matplotlib, you can install them directly within the notebook:

%pip install scikit-learn pandas xgboost matplotlib

Tip: Databricks may restart the environment after installing libraries, so you might need to restart the kernel to use updated packages.

Step 5: Run the Notebook

You’re now ready to execute your code:

Press Shift + Enter to run a cell or Run All to execute the entire notebook.
Outputs will appear similarly to those in Jupyter notebooks.

Step 6: Coding in Databricks

With your environment set up, let’s look at a brief example using regression modeling to predict customer satisfaction (NPS score):

Load and Inspect Data:

from pathlib import Path
import pandas as pd

DATA_PATH = Path("/Workspace/Users/[email protected]/nps_data_with_missing.csv")
df = pd.read_csv(DATA_PATH)
df.head()

Train/Test Split:

from sklearn.model_selection import train_test_split

TARGET = "NPS_Rating"
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

Quick Exploratory Data Analysis (EDA):

import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(train_df["NPS_Rating"], bins=10, kde=True)
plt.title("Distribution of NPS Ratings")
plt.show()

Data Preparation with Pipelines:

Set up pipelines for data preprocessing and model training, evaluate model performance, visualize the predictions, and even analyze feature importance.

Step 7: Save and Share Your Work

Databricks automatically saves your notebooks. To export and share:

Navigate to File → Click on the three dots and select Download to save as .ipynb, .dbc, or .html.
You can also link to a GitHub repository for version control.

Things to Know About Free Edition

While the Free Edition is great for experimentation, keep these limitations in mind:

Clusters shut down after idle time (approximately 2 hours).
Storage capacity is limited.
Some enterprise capabilities are not included.
Not ideal for production workloads.

Nevertheless, it’s an excellent environment for learning ML and testing models.

Conclusion

Databricks simplifies the cloud execution of ML notebooks. With no local installation required, the Free Edition is a perfect entry point to develop and test your machine learning models. As your projects grow or require more collaboration, you can easily upgrade to a paid plan.

Ready to get started? Sign up for the Databricks Free Edition today and unleash the potential of your machine learning notebooks in a seamless environment.

Frequently Asked Questions

Q1: How do I start using Databricks for free?
A: Sign up for the Databricks Free Edition at databricks.com/learn/free-edition to access a single-user workspace, small compute cluster, and MLflow support.

Q2: Do I need to install anything locally to run Databricks?
A: No, the Free Edition is completely browser-based; you can create clusters and run ML code online.

Q3: How do I install Python libraries in my notebook?
A: Use %pip install library_name inside a notebook cell, or install from a requirements.txt file using %pip install -r requirements.txt.

Hi, I’m Janvi, a passionate data science enthusiast currently working at Analytics Vidhya. My journey into the world of data started with a curiosity about how to extract meaningful insights from complex datasets.

Exclusive Content:

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in Databricks

Understanding Databricks Plans

Hands-on

Step 1: Sign Up for Databricks Free Edition

Step 2: Create a Compute Cluster

Step 3: Import or Create a Notebook

Step 4: Install Dependencies

Step 5: Run the Notebook

Step 6: Coding in Databricks

Step 7: Save and Share Your Work

Things to Know About Free Edition

Conclusion

Frequently Asked Questions

Q1: How do I start using Databricks for free?

Q2: Do I need to install anything locally on my ML notebook to run Databricks?

Q3: How do I install Python libraries in my ML notebook on Databricks?

Hosting Your Machine Learning Notebook in Databricks: A Step-by-Step Guide

Understanding Databricks Plans

Hands-On: Hosting Your ML Notebook in Databricks

Step 1: Sign Up for Databricks Free Edition

Step 2: Create a Compute Cluster

Step 3: Import or Create a Notebook

Step 4: Install Dependencies

Step 5: Run the Notebook

Step 6: Coding in Databricks

Step 7: Save and Share Your Work

Things to Know About Free Edition

Conclusion

Frequently Asked Questions

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe