A new method, parallel compaction, is introduced to address context window limitations in long-horizon LLM agents by providing operators with fine-grained control over context summarization and significantly reducing inference time.
Long-horizon LLM agents often accumulate conversation histories that exceed the model's context window, necessitating context compaction. Current summarization methods are lossy, stall agent inference, and offer poor control over the resulting context. This paper introduces parallel compaction, a novel approach for agentic flows that allows operators fine-grained, predictable control over the summary volume and enables more targeted prompt engineering per block. The method was characterized across various model backbones (8B to 120B parameters, mixing dense and MoE architectures) on long-context benchmarks, demonstrating that parallel compaction reduces end-to-end wall time and improves compaction throughput over sequential baselines.