Chinese AI firm DeepSeek has unveiled a new method to improve LLM reasoning skills, claiming it offers more accurate and faster responses than current technologies. The approach, developed with researchers from Tsinghua University, combines generative reward modeling (GRM) with a self-principled critique tuning technique.
The method aims to refine how AI LLMs respond to general queries by better aligning their outputs with human preferences. According to a paper published on the arXiv scientific repository, the resulting DeepSeek-GRM models showed stronger performance than existing methods and proved competitive against widely accepted public reward models.
DeepSeek has announced intentions to release these models as open source, though no release date has been set. The move follows increased global interest in the company, which had earlier gained attention for its V3 foundation model and R1 reasoning model.
Register Email now for Weekly Promotion Stock
100% free, Unsubscribe any time!Add 1: Room 605 6/F FA YUEN Commercial Building, 75-77 FA YUEN Street, Mongkok KL, HongKong Add 2: Room 405, Building E, MeiDu Building, Gong Shu District, Hangzhou City, Zhejiang Province, China
Whatsapp/Tel: +8618057156223 Tel: 0086 571 86729517 Tel in HK: 00852 66181601
Email: [email protected]