NERV

与其感慨路难行，不如马上出发

0%

大语言模型 Tag

2026

04-16

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

04-16

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

04-01

Attention Residuals

2023

06-05

A Survey of Large Language Models