DeepSeek-V3.2 and V3.2-Speciale

DeepSeek has announced DeepSeek-V3.2 and a related high-compute variant DeepSeek-V3.2-Speciale. DeepSeek describes the release as focused on (1) a sparse attention method for long-context efficiency, (2) a scaled reinforcement-learning (RL) post-training protocol, and (3) a large-scale synthetic data pipeline for tool-use/agent tasks.

What was released

  • DeepSeek-V3.2 (model weights released on Hugging Face, MIT license).
  • DeepSeek-V3.2-Speciale (model released on Hugging Face, MIT license). The model documentation states it is intended for deep reasoning and does not support tool calling.
  • API availability: DeepSeek’s API release note states V3.2-Speciale is exposed via a temporary endpoint and available until Dec 15, 2025, 15:59 UTC (with “same pricing as V3.2” and “no tool calls”).
  • API model details: DeepSeek’s API documentation lists 128K context length and indicates tool calls are available for V3.2 (thinking and non-thinking modes) but not for V3.2-Speciale.

Technical claims (as stated by DeepSeek)

  • DeepSeek Sparse Attention (DSA): Described as a two-part mechanism—an indexing component plus fine-grained token selection that retrieves a top-k subset of key/value entries; the indexer is described as implementable in FP8 for efficiency.
  • Scaled RL post-training: The paper states a framework allocating a post-training compute budget exceeding 10% of pre-training cost.
  • Agent/task synthesis scale: The paper states the pipeline generated 1,800+ environments and 85,000 complex prompts to support agent-focused training.

Evaluation claims (as stated by DeepSeek)

DeepSeek reports results across reasoning and agentic benchmarks (including AIME 2025, HMMT 2025, Codeforces, SWE Verified, Terminal-Bench 2.0, and a “Tool Decathlon”), and positions V3.2 and V3.2-Speciale against proprietary baselines in summary figures. DeepSeek also states it published selected olympiad case materials in the model assets for secondary verification.


Primary sources to consult

  • The DeepSeek-V3.2 technical report (PDF).
  • The DeepSeek API “DeepSeek-V3.2 Release” post (endpoint window and feature notes).
  • The DeepSeek API “Models & Pricing” documentation (context length, tool-call support, endpoints).
  • Hugging Face model pages for DeepSeek-V3.2 and DeepSeek-V3.2-Speciale (release packaging and assets).

Deployment / serving notes to consult

  • vLLM support notes for DeepSeek-V3.2-Exp sparse attention (implementation details for serving).
  • The DeepSeek-V3.2-Exp repository documentation (artifacts and benchmark tables).

Independent reporting/context to consult

  • Reuters reporting on DeepSeek’s sparse attention and cost framing around the V3.2-Exp line.
  • Wall Street Journal reporting on sparse attention and efficiency claims.
  • Tom’s Hardware reporting focused on deployment ecosystems and hardware support.

[Original Source](No response)