ConversationTokenBufferMemory

1. What is ConversationTokenBufferMemory?

ConversationTokenBufferMemory stores conversation history up to a maximum token limit, not by number of messages.

When the token limit is exceeded, older messages are dropped automatically.


2. Why does it exist?

ConversationBufferWindowMemory limits by message count (k) But messages vary in length.

This memory solves:

  • Long messages consuming too many tokens

  • Better cost and context control

In short:

Remember as much history as fits inside a token budget.


3. Real-world analogy

Imagine a notebook with fixed pages:

  • You keep writing new notes

  • When it’s full, you erase the oldest notes

  • Recent information always fits

That’s token buffer memory.


4. Minimal working example (Gemini)


5. What does it store?

  • Stores full messages (not summaries)

  • Drops oldest messages first

  • Keeps content within max_token_limit

You can inspect it:


6. Key parameter: max_token_limit

Value
Effect

Small (100–200)

Very short-term memory

Medium (500–1000)

Normal chats

Large (2000+)

Expensive


7. How is this different from Window Memory?

Feature
Window
Token Buffer

Limit by

Message count

Token count

Handles long messages well

Cost control

Medium

High

Precision

Low

High


8. Common beginner mistakes

❌ Forgetting to pass llm to memory ❌ Setting token limit too low ❌ Assuming important facts are protected

This memory does not prioritize importance.


9. When should you use it?

Use ConversationTokenBufferMemory when:

  • Message lengths vary a lot

  • You want predictable token usage

  • You want recent context, not full history

Avoid it for:

  • Long-term user profiles

  • Persistent facts (use DB or RAG)


10. One-line mental model

ConversationTokenBufferMemory = sliding window based on tokens

Last updated