Moonshot AI Releases Kimi K2.7-Code with Enhanced Performance for Software Engineering
Moonshot AI has launched Kimi K2.7-Code, a new open-source, coding-focused agentic model. Released under a Modified MIT license, it is built upon Kimi K2.6 and is accessible via the Kimi API and Kimi Code, with its weights available on Hugging Face. The model features a 256K context window and reportedly achieves a 21.8% performance increase on Kimi Code Bench v2 compared to its predecessor, K2.6. It is designed for long-horizon software engineering tasks, including planning, editing, running tools, and debugging.
Moonshot AI recently released Kimi K2.7-Code, an agentic model specifically designed for coding tasks. The model's weights are available on Hugging Face under a Modified MIT license, and it can also be accessed through the Kimi API and Kimi Code.
Kimi K2.7-Code targets long-horizon software engineering, focusing on capabilities such as planning, editing, running tools, and debugging across multiple steps. Moonshot AI pairs this model with a subscription-based coding platform.
Technically, K2.7-Code is a Mixture-of-Experts model, featuring 1 trillion total parameters with 32 billion active per token. It incorporates 384 experts, selecting eight per token with one shared. The architecture includes 61 layers, with one dense layer, and utilizes MLA for attention and SwiGLU for the feed-forward path. A MoonViT vision encoder adds 400 million parameters, enabling image and video input. The model supports native INT4 quantization and has a 256K token context window.
Moonshot AI reported performance gains across six benchmarks when comparing K2.7-Code to K2.6. Notably, it achieved a 21.8% improvement on Kimi Code Bench v2, increasing its score from 50.9 to 62.0. The model also surpassed Claude Opus 4.8 on MCP Mark Verified, scoring 81.1 against 76.4, and demonstrated performance close to GPT-5.5 on MLS Bench Lite.
Additionally, Moonshot AI reported approximately 30% lower reasoning-token usage for K2.7-Code compared to K2.6. This efficiency gain is presented as beneficial for interactive CLI sessions, reducing output-token costs per task, and allowing more steps before context limits are reached.
Primary use cases for K2.7-Code include repo-scale refactors, where the agent can read files, edit across modules, and rerun tests until successful. Other applications include code review, providing risk analysis for pull request diffs, and MCP tool-use workflows for tasks like CI checks, ticket updates, and file edits. Its long context window also supports extensive analysis with text, image, and video input.
Constraints for the model include a mandatory 'Thinking mode' and fixed sampling parameters (temperature 1.0, top_p 0.95, n 1, penalties 0.0), with a default maximum output of 32,768 tokens. Deployment is designed for server-class environments, with self-hosting options via vLLM, SGLang, or KTransformers, and a Hugging Face repository size of approximately 595 GB.
According to Marktechpost, the company-reported benchmarks and official API pricing were released and verified on June 12, 2026.


