Skip to main content

Transformer

mHC: DeepSeek’s Manifold-Based Evolution of Residual Connections
·592 words·3 mins
AI Research Neural Architecture Transformer Optimization