Hardware
Cerebras ships Wafer Scale Engine 3 with 4 trillion transistors
The third-generation WSE cuts training time for 70B-parameter models by 3x compared to a 512-GPU H100 cluster.
Cerebras Systems has begun shipping WSE-3, its third wafer-scale processor, packing 4 trillion transistors and 44GB of on-chip SRAM. The company claims a 70B dense model trains in under 8 hours on a single CS-3 system versus 24+ hours on a 512-GPU H100 cluster. Initial customers include two sovereign AI compute projects in the Middle East and a top-10 US bank.
Source
Cerebras · cerebras.net