Token-by-token streaming over HTTP for every model in the catalog, with no cold-start tax.
Replicate now streams generations for every open model on the platform - text, audio, and image diffusion intermediates - with sub-200ms time-to-first-token on warm replicas. The team also rolled out aggressive replica pre-warming that the company says cuts cold-starts by 80%.