1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
Adrianne Foveaux edited this page 2025-02-10 19:40:32 +01:00


Inclusion of reasoning "chains of thought" (CoT) in the model output considerably enhances its quality, funsilo.date but it increases reasoning cost. - Distillation transfers reasoning knowledge from an expensive teacher design to a more affordable trainee, decreasing general inference cost.

  1. A human professional's chain of idea.
  2. The final answer.

    We expanded this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT produced by DeepSeek R1.

    Then, we fine-tuned 3 variants of the design (utilizing LoRA on llama-3.1 -8 B-instruct), each with different training targets:

    Direct Answer Only: Generate the last response without revealing thinking. Human Expert CoT: Generate the last answer alongside a reasoning chain resembling the human professional's. Synthetic R1 CoT: Generate the final answer together with DeepSeek R1's artificial reasoning chain. The table below sums up average accuracy and reasoning length:

    - Note: The precision for the 5-shot standard might differ from numbers reported elsewhere due to different evaluation setups. The crucial focus is on comparing relative efficiency across distillation techniques, not on beating other designs.

    From this study, synthetic thinking CoTs from DeepSeek R1 appear remarkable to human-expert CoTs in increasing efficiency, albeit with a greater inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An user-friendly distillation interface will soon belong to FireOptimizer. If you need earlier gain access to, please contact us to check out options.

    Conclusions

    By integrating reasoning-based information through distillation, companies can significantly improve model performance without bearing the complete concern of human-annotated datasets. DeepSeek R1's capability to produce long, chains makes it a powerful instructor model-showing that, yogaasanas.science in many cases, the device may just out-teach the human.