MLOps in Production: Agents, Security, and Reality Checks

🔥 Story of the Day

Runpod report: Qwen has overtaken Meta’s Llama as the most-deployed self-hosted LLM — The New Stackhttps://thenewstack.io/runpod-ai-infrastructure-reality/

Runpod's "State of AI" report challenges industry survey data by leveraging raw infrastructure exhaust logs from over 500,000 developers to reveal that Qwen has surpassed Meta's Llama as the most-deployed self-hosted model in actual production environments. Nvidia's new Nemotron 3 Super, a 120B open model with a 1M-token context window using a hybrid latent mixture-of-experts architecture, is now available via NIMs and major cloud platforms, making large-scale agentic workloads more accessible without immediate hardware acquisition.

This shift is critical for DevOps engineers evaluating workload distribution strategies, as it validates Qwen’s efficiency for specific use cases where Llama previously held dominance based on hype cycles rather than empirical adoption metrics. The data-driven approach contrasts sharply with anecdotal evidence or marketing narratives, providing a concrete baseline for prioritizing model selection and resource allocation in MLOps pipelines running on Kubernetes clusters.

⚡ Quick Hits

Rakuten fixes issues twice as fast with Codex — OpenAIhttps://openai.com/index/rakuten

Rakuten integrated Codex, an OpenAI coding agent, into its development workflow to accelerate software delivery and enhance safety by automating CI/CD reviews and constructing full-stack builds. This integration reduced Mean Time To Recovery (MTTR) by 50% and allowed the team to deliver complete build cycles in weeks instead of months, demonstrating that AI agents can effectively handle the complex deployment and maintenance tasks associated with running self-hosted LLMs and streamlining DevOps pipelines for rapid iteration.

Designing AI agents to resist prompt injection — OpenAIhttps://openai.com/index/designing-agents-to-resist-prompt-injection

OpenAI outlines a security mechanism for ChatGPT designed to mitigate prompt injection and social engineering attacks within autonomous agent workflows by enforcing strict constraints on risky actions and implementing protective layers to safeguard sensitive data. While the specifics of latency impacts or reduction rates are not detailed in this overview, the core approach involves defining a defensive strategy that limits execution boundaries during complex multi-step tasks.

From model to agent: Equipping the Responses API with a computer environment — OpenAIhttps://openai.com/index/equip-responses-api-computer-environment

OpenAI describes an agent runtime leveraging the newly released Responses API, shell tools, and hosted containers to facilitate secure and scalable operations for agents interacting with files, tools, and state. The architecture integrates API logic, a shell-based execution context, and containerized hosting to create a cohesive environment where AI agents can safely execute complex workflows without managing every low-level dependency from scratch.

OneCLI – Vault for AI Agents in Rust — Y Combinatorhttps://github.com/onecli/onecli

The GitHub repository announcement for OneCLI labels it as a "Vault for AI Agents" built in Rust; the provided summary contains no specific implementation details or technical specs beyond this designation.

An agent skill for eval-driven development of LLM-powered app — Hacker News - LLMhttps://github.com/yiouli/pixie-qa

A custom tool titled "pixie-qa" was created to automate the refinement of LLM output quality, though no specific technology stack or efficiency metrics are described in the announcement text.

Slicing an 80B MoE LLM into 40B domain specialists — Hacker News - LLMhttps://github.com/JThomas-CoE/College-of-Experts-AI/tree/main/CoE-Demo-v1.5

The project "College-of-Experts-AI" demonstrates a method for slicing an 80B MoE LLM into 40B domain specialists at Demo v1.5, but specific architectural diagrams or performance metrics are not included in the provided repository metadata.

Researcher: qwen3.5:9b • Writer: qwen3.5:9b • Editor: qwen3.5:9b