Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Elevate Your Strategy: Transition from Chain-of-Thought to Chain-of-Draft on Amazon Bedrock

Optimizing Generative AI: A Deep Dive into Chain-of-Draft Prompting

Balancing Quality, Cost, and Latency in AI Implementations

Exploring the Chain-of-Draft Technique: A Game Changer in Prompting

Understanding Chain-of-Thought Prompting: Benefits and Drawbacks

The Transformation to Chain-of-Draft Prompting: Efficiency Over Verbosity

Why Chain-of-Draft Works: The Power of Minimalism

Evaluating Chain-of-Draft through Real-world Implementations on AWS

Key Prerequisites for Implementing CoD in Amazon Bedrock

Testing and Results: A Comparative Analysis of CoD and CoT

When to Trim the Fat: Situations to Avoid Chain-of-Draft

Conclusion: Embracing Chain-of-Draft for Efficient AI Reasoning

Further Reading: Resources on Prompt Engineering and CoD Techniques

About the Authors: Pioneers in AI and Cloud Transformation

Revolutionizing AI Efficiency: The Chain-of-Draft (CoD) Prompting Technique

As organizations scale their generative AI implementations, they face the critical challenge of balancing quality, cost, and latency. With inference costs accounting for a staggering 70–90% of operational expenses in large language model (LLM) environments, and verbose prompting leading to token inflation by 3–5x, there is an urgent need for more efficient interaction strategies with these models.

The Inefficiency Dilemma

Traditional prompting methods, while effective in providing detailed answers, often create an unnecessary overhead that impacts cost efficiency and response times. As businesses increasingly seek to refine their AI strategies, the need for a streamlined approach has never been more pressing.

This blog post introduces Chain-of-Draft (CoD)—a groundbreaking prompting technique proposed in a recent Zoom AI Research paper, Chain of Draft: Thinking Faster by Writing Less. This innovative method redefines how models engage with reasoning tasks, moving beyond the widely used Chain-of-Thought (CoT) prompting strategy.

Understanding Traditional Chain-of-Thought (CoT) Prompting

Chain-of-Thought prompting encourages models to reason through problems in a detailed, step-by-step manner. This technique has proven particularly effective for complex tasks such as logical puzzles and mathematical problems. CoT helps break down intricate queries, improving accuracy and transparency.

Example of CoT Prompting:

Question: If there are 5 apples and you eat 2 apples, how many apples remain?
CoT Response: Start with 5 apples. I eat 2. Subtract 2 from 5. 5 – 2 = 3 apples remaining.

Despite its effectiveness, CoT’s verbose nature leads to excessive token usage and increased latencies, making it less suitable for real-time applications. The detailed outputs can also complicate the integration of responses into downstream processes.

Introducing Chain-of-Draft (CoD) Prompting

In contrast, the Chain-of-Draft technique significantly reduces verbosity by focusing on concise reasoning steps. CoD draws inspiration from human problem-solving patterns, encouraging models to generate compact, high-signal reasoning steps.

Key Features of CoD:

  • Each reasoning step is limited to five words or fewer, promoting clarity and focus.
  • It encourages minimalism without sacrificing accuracy, which is particularly effective in high-redundancy reasoning chains.

Example of CoD Prompting:

Question: Jason had 20 lollipops. He gave Denny some. Now Jason has 12. How many did he give?
CoD Response: Start: 20, End: 12, 20 – 12 = 8.

The CoD response achieves the same logical conclusion as CoT but with minimal tokens.

Why CoD Works

The CoD technique enhances efficiency for several reasons:

  • Reduced Output Length: Shorter outputs result in lower inference latency and cut costs associated with token usage.
  • Cleaner Outputs: Less verbose reasoning leads to simpler downstream processing, improving integration.

Stunningly, CoD maintains a comparable accuracy level to CoT. According to the original Zoom AI research, CoD achieved 91.4% accuracy on benchmark tests while slashing output tokens by up to 92.1% and halving latency in multiple models.

Practical Implementation with AWS

Using Amazon Bedrock and AWS Lambda, we can implement the CoD technique to validate its efficiency. For instance, consider the “Red, Blue, and Green Balls” puzzle, which requires a discerning logical reasoning approach.

To run tests on this problem, you must have the following prerequisites:

  • An AWS account with permissions for Lambda functions and Bedrock access.
  • Configured IAM roles for function execution.
  • Basic familiarity with Python and AWS SDK.

Sample Lambda Code for CoD Prompt:

import json
import boto3
import time
import logging
from botocore.exceptions import ClientError

logger = logging.getLogger()
logger.setLevel(logging.INFO)

bedrock = boto3.client('bedrock-runtime', region_name="us-east-1")

def lambda_handler(event, context):
    # Initialize conversation with CoD prompt
    conversation = [{"role": "user", "content": ["Your CoD prompt here"]}]
    ...

Don’t forget to modify the MODEL_ID to match your selected model.

Testing Results: CoD vs. CoT

Post-testing of the Lambda function reveals that CoD is significantly more efficient:

Model Prompt Type Input Tokens Output Tokens Total Tokens Latency (s)
Model-1 Chain of Thought 109 241 350 3.28
Model-1 Chain of Draft 123 93 216 1.58
Model-2 Chain of Thought 109 492 601 3.81
Model-2 Chain of Draft 123 19 142 0.79

When Not to Use CoD

Despite its advantages, CoD is not a universal solution. Key considerations include:

  • Zero-shot Use Cases: CoD may struggle without strong few-shot examples.
  • High Interpretability Tasks: Tasks requiring thorough explanations, such as legal reviews, may benefit more from CoT.
  • Small Models: CoD underperformed in models with fewer than 3 billion parameters.
  • Creative Tasks: For open-ended tasks, a verbose approach might be more valuable.

Conclusion

The introduction of the Chain-of-Draft prompting technique represents a meaningful leap toward optimizing the efficiency of generative AI systems. By fostering concise thinking and reasoning, CoD yields significant performance improvements in token usage and latency, all without compromising accuracy.

As AI evolves, adopting more efficient reasoning techniques like CoD will be critical for organizations looking to enhance user experiences and minimize operational costs. We encourage practitioners to explore this methodology in their own AI workflows, paving the way for a future defined by smarter AI reasoning approaches.


About the Authors

Ahmed Raafat: Senior Manager at AWS, leading the AI/ML Specialist team in the UK & Ireland with over 20 years of technology experience.

Kiranpreet Chawla: Solutions Architect at AWS, driving cloud and AI transformations with over 15 years of experience.

For more in-depth resources on prompt engineering and the CoD technique, refer to the original Zoom AI research paper and other leading publications in the field.

Latest

Enhancing LLM Inference on Amazon SageMaker AI Using BentoML’s LLM Optimizer

Streamlining AI Deployment: Optimizing Large Language Models with Amazon...

What People Are Actually Using ChatGPT For – It Might Surprise You!

The Evolving Role of ChatGPT: From Novelty to Necessity...

Today’s Novelty Acts See Surge in Investment • The Register

Challenges and Prospects for Humanoid Robots: Insights from the...

Natural Language Processing Software Market Overview

Global Natural Language Processing Platforms Software Market Report: Growth...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Agentic QA Automation with Amazon Bedrock AgentCore Browser and Amazon Nova...

Revolutionizing Quality Assurance: The Future of Agentic AI Testing Introduction to Modern QA Challenges Benefits of Agentic QA Testing AgentCore Browser for Large-Scale Agentic QA Testing Implementing Agentic...

Creating an IDP Solution Programmatically Using Amazon Bedrock Data Automation

Building Intelligent Document Processing Solutions with Amazon Bedrock and Strands SDK Introduction to Intelligent Document Processing Prerequisites for Implementation Solution Architecture Overview Step-by-Step Implementation Guide Configuring the AWS CLI Cloning...

Analyzing the Zero Operator Access Design in Mantle

Elevating Security Standards with Mantle: Amazon's Next-Generation Inference Engine for Generative AI A Commitment to Transparency and Innovation in Customer Data Protection About the Authors Anthony Liguori,...