Skip to main content

context.summarization

Automatic conversation summarization.

When the conversation approaches the model's token limit, older messages are compressed into a concise summary using a dedicated (cheaper) summarization model. The summary replaces the compressed messages, preserving essential context while freeing token budget.

ConversationSummarizer Objects

class ConversationSummarizer()

Summarize conversation messages to free context window space.

Parameters

summarization_model : str | None Model to use for summarization. If None, uses a default cheap model. max_summary_tokens : int Maximum tokens for the generated summary.

summarize_messages

async def summarize_messages(
messages: list[dict[str, Any]],
num_to_compress: int) -> tuple[list[dict[str, Any]], str]

Summarize the oldest num_to_compress messages.

Parameters

messages : list[dict] The full message list. num_to_compress : int Number of messages from the start to compress.

Returns

tuple[list[dict], str] (new_messages, summary_text) — messages with older ones replaced by a summary message, and the raw summary text.