context.summarization
Automatic conversation summarization.
When the conversation approaches the model's token limit, older messages are compressed into a concise summary using a dedicated (cheaper) summarization model. The summary replaces the compressed messages, preserving essential context while freeing token budget.
ConversationSummarizer Objects
class ConversationSummarizer()
Summarize conversation messages to free context window space.
Parameters
summarization_model : str | None Model to use for summarization. If None, uses a default cheap model. max_summary_tokens : int Maximum tokens for the generated summary.
summarize_messages
async def summarize_messages(
messages: list[dict[str, Any]],
num_to_compress: int) -> tuple[list[dict[str, Any]], str]
Summarize the oldest num_to_compress messages.
Parameters
messages : list[dict] The full message list. num_to_compress : int Number of messages from the start to compress.
Returns
tuple[list[dict], str] (new_messages, summary_text) — messages with older ones replaced by a summary message, and the raw summary text.