Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Elevate Your Strategy: Transition from Chain-of-Thought to Chain-of-Draft on Amazon Bedrock

Optimizing Generative AI: A Deep Dive into Chain-of-Draft Prompting

Balancing Quality, Cost, and Latency in AI Implementations

Exploring the Chain-of-Draft Technique: A Game Changer in Prompting

Understanding Chain-of-Thought Prompting: Benefits and Drawbacks

The Transformation to Chain-of-Draft Prompting: Efficiency Over Verbosity

Why Chain-of-Draft Works: The Power of Minimalism

Evaluating Chain-of-Draft through Real-world Implementations on AWS

Key Prerequisites for Implementing CoD in Amazon Bedrock

Testing and Results: A Comparative Analysis of CoD and CoT

When to Trim the Fat: Situations to Avoid Chain-of-Draft

Conclusion: Embracing Chain-of-Draft for Efficient AI Reasoning

Further Reading: Resources on Prompt Engineering and CoD Techniques

About the Authors: Pioneers in AI and Cloud Transformation

Revolutionizing AI Efficiency: The Chain-of-Draft (CoD) Prompting Technique

As organizations scale their generative AI implementations, they face the critical challenge of balancing quality, cost, and latency. With inference costs accounting for a staggering 70–90% of operational expenses in large language model (LLM) environments, and verbose prompting leading to token inflation by 3–5x, there is an urgent need for more efficient interaction strategies with these models.

The Inefficiency Dilemma

Traditional prompting methods, while effective in providing detailed answers, often create an unnecessary overhead that impacts cost efficiency and response times. As businesses increasingly seek to refine their AI strategies, the need for a streamlined approach has never been more pressing.

This blog post introduces Chain-of-Draft (CoD)—a groundbreaking prompting technique proposed in a recent Zoom AI Research paper, Chain of Draft: Thinking Faster by Writing Less. This innovative method redefines how models engage with reasoning tasks, moving beyond the widely used Chain-of-Thought (CoT) prompting strategy.

Understanding Traditional Chain-of-Thought (CoT) Prompting

Chain-of-Thought prompting encourages models to reason through problems in a detailed, step-by-step manner. This technique has proven particularly effective for complex tasks such as logical puzzles and mathematical problems. CoT helps break down intricate queries, improving accuracy and transparency.

Example of CoT Prompting:

Question: If there are 5 apples and you eat 2 apples, how many apples remain?
CoT Response: Start with 5 apples. I eat 2. Subtract 2 from 5. 5 – 2 = 3 apples remaining.

Despite its effectiveness, CoT’s verbose nature leads to excessive token usage and increased latencies, making it less suitable for real-time applications. The detailed outputs can also complicate the integration of responses into downstream processes.

Introducing Chain-of-Draft (CoD) Prompting

In contrast, the Chain-of-Draft technique significantly reduces verbosity by focusing on concise reasoning steps. CoD draws inspiration from human problem-solving patterns, encouraging models to generate compact, high-signal reasoning steps.

Key Features of CoD:

  • Each reasoning step is limited to five words or fewer, promoting clarity and focus.
  • It encourages minimalism without sacrificing accuracy, which is particularly effective in high-redundancy reasoning chains.

Example of CoD Prompting:

Question: Jason had 20 lollipops. He gave Denny some. Now Jason has 12. How many did he give?
CoD Response: Start: 20, End: 12, 20 – 12 = 8.

The CoD response achieves the same logical conclusion as CoT but with minimal tokens.

Why CoD Works

The CoD technique enhances efficiency for several reasons:

  • Reduced Output Length: Shorter outputs result in lower inference latency and cut costs associated with token usage.
  • Cleaner Outputs: Less verbose reasoning leads to simpler downstream processing, improving integration.

Stunningly, CoD maintains a comparable accuracy level to CoT. According to the original Zoom AI research, CoD achieved 91.4% accuracy on benchmark tests while slashing output tokens by up to 92.1% and halving latency in multiple models.

Practical Implementation with AWS

Using Amazon Bedrock and AWS Lambda, we can implement the CoD technique to validate its efficiency. For instance, consider the “Red, Blue, and Green Balls” puzzle, which requires a discerning logical reasoning approach.

To run tests on this problem, you must have the following prerequisites:

  • An AWS account with permissions for Lambda functions and Bedrock access.
  • Configured IAM roles for function execution.
  • Basic familiarity with Python and AWS SDK.

Sample Lambda Code for CoD Prompt:

import json
import boto3
import time
import logging
from botocore.exceptions import ClientError

logger = logging.getLogger()
logger.setLevel(logging.INFO)

bedrock = boto3.client('bedrock-runtime', region_name="us-east-1")

def lambda_handler(event, context):
    # Initialize conversation with CoD prompt
    conversation = [{"role": "user", "content": ["Your CoD prompt here"]}]
    ...

Don’t forget to modify the MODEL_ID to match your selected model.

Testing Results: CoD vs. CoT

Post-testing of the Lambda function reveals that CoD is significantly more efficient:

Model Prompt Type Input Tokens Output Tokens Total Tokens Latency (s)
Model-1 Chain of Thought 109 241 350 3.28
Model-1 Chain of Draft 123 93 216 1.58
Model-2 Chain of Thought 109 492 601 3.81
Model-2 Chain of Draft 123 19 142 0.79

When Not to Use CoD

Despite its advantages, CoD is not a universal solution. Key considerations include:

  • Zero-shot Use Cases: CoD may struggle without strong few-shot examples.
  • High Interpretability Tasks: Tasks requiring thorough explanations, such as legal reviews, may benefit more from CoT.
  • Small Models: CoD underperformed in models with fewer than 3 billion parameters.
  • Creative Tasks: For open-ended tasks, a verbose approach might be more valuable.

Conclusion

The introduction of the Chain-of-Draft prompting technique represents a meaningful leap toward optimizing the efficiency of generative AI systems. By fostering concise thinking and reasoning, CoD yields significant performance improvements in token usage and latency, all without compromising accuracy.

As AI evolves, adopting more efficient reasoning techniques like CoD will be critical for organizations looking to enhance user experiences and minimize operational costs. We encourage practitioners to explore this methodology in their own AI workflows, paving the way for a future defined by smarter AI reasoning approaches.


About the Authors

Ahmed Raafat: Senior Manager at AWS, leading the AI/ML Specialist team in the UK & Ireland with over 20 years of technology experience.

Kiranpreet Chawla: Solutions Architect at AWS, driving cloud and AI transformations with over 15 years of experience.

For more in-depth resources on prompt engineering and the CoD technique, refer to the original Zoom AI research paper and other leading publications in the field.

Latest

Creating a Personal Productivity Assistant Using GLM-5

From Idea to Reality: Building a Personal Productivity Agent...

Lawsuits Claim ChatGPT Contributed to Suicide and Psychosis

The Dark Side of AI: ChatGPT's Alleged Role in...

Japan’s Robotics Sector Hits Record Orders Amid Growing Global Labor Shortages

Japan's Robotics Boom: Navigating Labor Shortages and Global Competition Add...

Analysis of Major Market Segments Fueling the Digital Language Sector

Exploring the Rapid Growth of the Digital Language Learning...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Apple Stock 2026 Outlook: Price Target and Investment Thesis for AAPL

Institutional Equity Research Report: Apple Inc. (AAPL) Analysis Report Overview Report Date: February 27, 2026 Analyst: Lead Equity Research Analyst Rating: HOLD 12-Month Price Target: $295 Data Sources All data sourced...

Optimize Deployment of Multiple Fine-Tuned Models Using vLLM on Amazon SageMaker...

Optimizing Multi-Low-Rank Adaptation for Mixture of Experts Models in vLLM This heading encapsulates the main focus of the content, highlighting both the technical aspect of...

Create a Smart Photo Search Solution with Amazon Rekognition, Amazon Neptune,...

Building an Intelligent Photo Search System on AWS Overview of Challenges and Solutions Comprehensive Photo Search System with AWS CDK Key Features and Use Cases Technical Architecture and...