ConversationTokenBufferMemory

1. What is ConversationTokenBufferMemory?

ConversationTokenBufferMemory stores conversation history up to a maximum token limit, not by number of messages.

When the token limit is exceeded, older messages are dropped automatically.

2. Why does it exist?

ConversationBufferWindowMemory limits by message count (k) But messages vary in length.

This memory solves:

Long messages consuming too many tokens
Better cost and context control

In short:

Remember as much history as fits inside a token budget.

3. Real-world analogy

Imagine a notebook with fixed pages:

You keep writing new notes
When it’s full, you erase the oldest notes
Recent information always fits

That’s token buffer memory.

4. Minimal working example (Gemini)

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.memory import ConversationTokenBufferMemory
from langchain.chains import ConversationChain
import os

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    api_key=os.getenv("GEMINI_API_KEY")
)

memory = ConversationTokenBufferMemory(
    llm=llm,
    max_token_limit=150
)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

conversation.invoke("My name is John")
conversation.invoke("I live in Toronto and work as a software engineer")
conversation.invoke("What do you remember about me?")

5. What does it store?

Stores full messages (not summaries)
Drops oldest messages first
Keeps content within max_token_limit

You can inspect it:

print(memory.buffer)

6. Key parameter: `max_token_limit`

Value

Effect

Small (100–200)

Very short-term memory

Medium (500–1000)

Normal chats

Large (2000+)

Expensive

7. How is this different from Window Memory?

Feature

Window

Token Buffer

Limit by

Message count

Token count

Handles long messages well

❌

✅

Cost control

Medium

High

Precision

Low

High

8. Common beginner mistakes

❌ Forgetting to pass llm to memory ❌ Setting token limit too low ❌ Assuming important facts are protected

This memory does not prioritize importance.

9. When should you use it?

Use ConversationTokenBufferMemory when:

Message lengths vary a lot
You want predictable token usage
You want recent context, not full history

Avoid it for:

Long-term user profiles
Persistent facts (use DB or RAG)

10. One-line mental model

ConversationTokenBufferMemory = sliding window based on tokens

PreviousConversationBufferWindowMemory NextConversationEntityMemory

Last updated 25 days ago

hashtag1. What is ConversationTokenBufferMemory?

hashtag2. Why does it exist?

hashtag3. Real-world analogy

hashtag4. Minimal working example (Gemini)

hashtag5. What does it store?

hashtag6. Key parameter: max_token_limit

hashtag7. How is this different from Window Memory?

hashtag8. Common beginner mistakes

hashtag9. When should you use it?

hashtag10. One-line mental model