LIVE

All News
Models
Startups
Big Players
Tools
Research
Hardware
Robotics
Safety
Regulation

LIVE

Loading category…

Loading feed…

A17news

AI news in a flash - the fastest way to know what's happening in AI. Short, sharp, and always up to date.

A17news

AI news in a flash - the fastest way to know what's happening in AI. Short, sharp, and always up to date.

Sections

All News
Models
Startups
Big Players
Tools
Research
Hardware
Robotics
Safety
Regulation

Company

About
Contact
Privacy Policy
Terms of Service
Copyright & DMCA
Accessibility
Sitemap

Privacy Terms DMCA Accessibility

Built by humans and bots for humans and bots.

Models

New releases, benchmarks, and capability breakthroughs - every model that matters.

ModelsJun 9

Claude Fable 5 Integrated into GitHub Copilot

Anthropic's Claude Fable 5 model, the first in their Mythos class, is now available within GitHub Copilot, designed for long-horizon, autonomous coding and knowledge-work tasks.

Source:GitHub Changelog· github.blog

ModelsMay 30

Creator Furious as Amazon Uses AI to License Character for New Series Without Consent

Loryn Brantz, the creator of 'The Good Advice Cupcake,' is protesting Amazon's use of her character for a new AI-animated series without securing her permission.

Source:Wired AI· wired.com

ModelsMay 30

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Agents

StepFun has released Step 3.7 Flash, a 198B Mixture-of-Experts (MoE) Vision-Language Model designed specifically for coding agents and search workflows.

Source:MarkTechPost· marktechpost.com

ModelsMay 30

NVIDIA Introduces X-Token: New Cross-Tokenizer KD Outperforms GOLD on Llama-3.2-1B

NVIDIA has released X-Token, a new projection-guided cross-tokenizer KD method that fixes structural failures in GOLD and significantly improves accuracy on Llama models.

Source:MarkTechPost· marktechpost.com

ModelsMay 30

The AI Code Dilemma: Speed vs. Quality Trade-off Warns Coders

Researchers warn that while AI accelerates code production, the quality of the output may be compromised, posing long-term risks for developers.

Source:TechCrunch AI· techcrunch.com

ModelsMay 30

Gemini Agent Tries to Plan Birthday, Fails to Recognize Most Important Person

A user tested Google's Gemini AI agent by asking it to plan a birthday party, only to discover the AI missed the most important person, highlighting current limitations in real-world personal context understanding.

Source:Wired AI· wired.com

ModelsMay 30

The Interacting Complexity of LLM Serving Optimization

Tuning and optimizing modern Large Language Model (LLM) serving is challenging due to the myriad of interacting deployment choices, which influence bottlenecks across the entire stack.

Source:NVIDIA Developer Blog· developer.nvidia.com

ModelsMay 30

Anthropic Releases Opus 4.8 with Dynamic Workflow Tool

The new Opus 4.8 model is now integrated with Dynamic Workflows, a tool for coordinating swarms of subagents.

Source:TechCrunch AI· techcrunch.com

ModelsMay 30

AI Token Futures: Exchanges Design Derivative Products for AI Tokens

Large exchanges are creating derivative products around AI tokens, reflecting a growing view of these tokens as raw material inputs rather than mere computational outputs.

Source:TechCrunch AI· techcrunch.com

ModelsMay 29

OpenAI Launches Rosalind Biodefense to Advance Pandemic Preparedness

OpenAI is expanding trusted access to the GPT-Rosalind model for vetted developers and U.S. government partners to enhance biodefense and public health preparedness through frontier AI.

Source:OpenAI News· openai.com

ModelsMay 26

Co-ReAct: Using Rubrics as Step-Level Guides to Enhance Agent Reasoning

Introducing Co-ReAct, a novel framework that injects step-level rubrics into ReAct agents to guide their decision-making during inference, leading to more targeted and effective reasoning in complex, multi-step tasks.

Source:arXiv – AI· arxiv.org

ModelsMay 26

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

This research introduces Oracle-Prompted Policy Optimization (OPPO), a novel reinforcement learning approach that provides per-token credit assignment for LLM reasoning. OPPO leverages Bayesian updates to accumulate oracle signals along a trajectory, yielding token-level advantage

Source:arXiv – AI· arxiv.org

ModelsMay 26

AI with Model-Based Design: End-to-End Virtual Sensor Modeling Workflow

A webinar presentation detailing an end-to-end workflow for designing, training, validating, compressing, and deploying AI-based virtual sensor models onto embedded processors.

Source:IEEE Spectrum AI· content.knowledgehub.wiley.com

ModelsMay 26

EDGE-OPD: Internalizing Privileged Context in LLM Distillation

This paper introduces Evidence Guided On-Policy Distillation (EDGE-OPD), a novel method for distilling privileged context (like personas or private facts) into Large Language Models. It addresses the challenge that privileged information can degrade general capabilities during tr

Source:arXiv – AI· arxiv.org

ModelsMay 26

Epistemic Calibration Solves Planning Failures in LLM Multi-Agent Systems

We propose the Epistemic Planning Calibration Agentic Workflow (EPC-AW) to mitigate planning failures in LLM-based multi-agent systems caused by misjudging knowledge about plan feasibility. The method improves system-level success by an average of 9.75%.

Source:arXiv – AI· arxiv.org

ModelsMay 26

Privacy-Preserving Federated Recommender System for Mobile Devices

A new two-stage federated recommendation system that protects sensitive user data by keeping it on the mobile device, only sharing model updates during the training process.

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

PathCal: State-Aware Calibration for Efficient Reasoning in Language Models

Introducing PathCal, a novel training-free decoding controller that calibrates reasoning paths by distinguishing the distinct functional roles of reflection markers used in Chain-of-Thought generation, leading to improved efficiency and accuracy.

Source:arXiv – AI· arxiv.org

ModelsMay 26

Latent Cache Flow: Enabling Efficient Model-to-Model Communication Without Text

New research introduces Latent Cache Flow (LCF), a method for efficiently communicating between LLM agents by transferring KV caches directly. LCF significantly reduces the size and latency compared to existing methods by jointly compressing keys and values and handling differing contexts.

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

Reading Calibrated Uncertainty from Language Model Trajectories

This research introduces a method to better quantify uncertainty in language model generation by analyzing the geometric trajectories of internal activation updates across layers, showing that these paths reveal where and how errors accumulate.

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

FusionSense: Tri-Stage Learning for Energy-Efficient Multimodal Edge Intelligence

FusionSense introduces a novel, runtime-adaptive framework that enables energy-constrained autonomous systems to intelligently decide what to compute and transmit at the edge by learning cross-modal dependencies, achieving significant reductions in energy consumption and data transmission

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

Parallel Compaction Improves Efficiency and Control for Long-Horizon LLM Agent Serving

A new method, parallel compaction, is introduced to address context window limitations in long-horizon LLM agents by providing operators with fine-grained control over context summarization and significantly reducing inference time.

Source:arXiv – AI· arxiv.org

ModelsMay 26

S³GNN: Efficient Global Mixing and Local Message Passing for Long-Range Graph Learning

This paper introduces S³GNN, a novel method designed to mitigate the Oversqueezing (OSQ) phenomenon in Message-Passing Neural Networks (MPNNs) by efficiently incorporating global information mixing, resulting in significant error reduction and parameter savings.

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

Gemma 4 Achieves Up to 3x Faster Token Generation with Multi-Token Prediction

Gemma 4 models, when paired with multi-token prediction (MTP) drafters utilizing speculative decoding, can achieve up to 3x faster inference speeds without compromising output quality.

Source:InfoQ AI & ML· infoq.com

ModelsMay 25

Microsoft Launches MDASH: An AI Platform for Large-Scale Vulnerability Research

Microsoft has introduced MDASH, an AI-driven security platform that uses a multi-model agentic system to automate large-scale code auditing and vulnerability discovery across Microsoft software environments.

Source:InfoQ AI & ML· infoq.com

ModelsMay 25

Comparing Federated Learning Algorithms (FedAvg vs. FedProx) on Non-IID Data using NVIDIA FLARE

A step-by-step guide detailing an advanced federated learning experiment comparing FedAvg and FedProx on a non-IID CIFAR-10 dataset, utilizing the NVIDIA FLARE framework.

Source:MarkTechPost· marktechpost.com

ModelsMay 24

Building Advanced LLM Workflows with the SuperClaude Framework

A tutorial detailing how to construct an advanced workflow leveraging the SuperClaude Framework to structure interactions with the Anthropic API, incorporating concepts like commands, agents, modes, and session memory.

Source:MarkTechPost· marktechpost.com

ModelsMay 24

Nemotron-Labs Achieves Speed-of-Light Text Generation using Diffusion Language Models

Nemotron-Labs has developed novel Diffusion Language Models designed to drastically accelerate the generation of high-quality text, pushing the boundaries of generative AI speed.

Source:Hugging Face Blog· huggingface.co

ModelsMay 24

NVIDIA Releases Gated DeltaNet-2: Decoupling Erase and Write in the Delta Rule

NVIDIA introduces Gated DeltaNet-2, a novel linear attention layer that decouples the erasing and writing processes within the KV cache, achieving state-of-the-art performance across various language modeling and retrieval benchmarks.

Source:MarkTechPost· marktechpost.com

Loading category…

Models

New releases, benchmarks, and capability breakthroughs - every model that matters.

ModelsJun 9

Claude Fable 5 Integrated into GitHub Copilot

Anthropic's Claude Fable 5 model, the first in their Mythos class, is now available within GitHub Copilot, designed for long-horizon, autonomous coding and knowledge-work tasks.

Source:GitHub Changelog· github.blog

ModelsMay 30

Creator Furious as Amazon Uses AI to License Character for New Series Without Consent

Loryn Brantz, the creator of 'The Good Advice Cupcake,' is protesting Amazon's use of her character for a new AI-animated series without securing her permission.

Source:Wired AI· wired.com

ModelsMay 30

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Agents

StepFun has released Step 3.7 Flash, a 198B Mixture-of-Experts (MoE) Vision-Language Model designed specifically for coding agents and search workflows.

Source:MarkTechPost· marktechpost.com

ModelsMay 30

NVIDIA Introduces X-Token: New Cross-Tokenizer KD Outperforms GOLD on Llama-3.2-1B

NVIDIA has released X-Token, a new projection-guided cross-tokenizer KD method that fixes structural failures in GOLD and significantly improves accuracy on Llama models.

Source:MarkTechPost· marktechpost.com

ModelsMay 30

The AI Code Dilemma: Speed vs. Quality Trade-off Warns Coders

Researchers warn that while AI accelerates code production, the quality of the output may be compromised, posing long-term risks for developers.

Source:TechCrunch AI· techcrunch.com

ModelsMay 30

Gemini Agent Tries to Plan Birthday, Fails to Recognize Most Important Person

Source:Wired AI· wired.com

ModelsMay 30

The Interacting Complexity of LLM Serving Optimization

Tuning and optimizing modern Large Language Model (LLM) serving is challenging due to the myriad of interacting deployment choices, which influence bottlenecks across the entire stack.

Source:NVIDIA Developer Blog· developer.nvidia.com

ModelsMay 30

Anthropic Releases Opus 4.8 with Dynamic Workflow Tool

The new Opus 4.8 model is now integrated with Dynamic Workflows, a tool for coordinating swarms of subagents.

Source:TechCrunch AI· techcrunch.com

ModelsMay 30

AI Token Futures: Exchanges Design Derivative Products for AI Tokens

Large exchanges are creating derivative products around AI tokens, reflecting a growing view of these tokens as raw material inputs rather than mere computational outputs.

Source:TechCrunch AI· techcrunch.com

ModelsMay 29

OpenAI Launches Rosalind Biodefense to Advance Pandemic Preparedness

OpenAI is expanding trusted access to the GPT-Rosalind model for vetted developers and U.S. government partners to enhance biodefense and public health preparedness through frontier AI.

Source:OpenAI News· openai.com

ModelsMay 26

Co-ReAct: Using Rubrics as Step-Level Guides to Enhance Agent Reasoning

Source:arXiv – AI· arxiv.org

ModelsMay 26

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

Source:arXiv – AI· arxiv.org

ModelsMay 26

AI with Model-Based Design: End-to-End Virtual Sensor Modeling Workflow

A webinar presentation detailing an end-to-end workflow for designing, training, validating, compressing, and deploying AI-based virtual sensor models onto embedded processors.

Source:IEEE Spectrum AI· content.knowledgehub.wiley.com

ModelsMay 26

EDGE-OPD: Internalizing Privileged Context in LLM Distillation

Source:arXiv – AI· arxiv.org

ModelsMay 26

Epistemic Calibration Solves Planning Failures in LLM Multi-Agent Systems

Source:arXiv – AI· arxiv.org

ModelsMay 26

Privacy-Preserving Federated Recommender System for Mobile Devices

A new two-stage federated recommendation system that protects sensitive user data by keeping it on the mobile device, only sharing model updates during the training process.

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

PathCal: State-Aware Calibration for Efficient Reasoning in Language Models

Source:arXiv – AI· arxiv.org

ModelsMay 26

Latent Cache Flow: Enabling Efficient Model-to-Model Communication Without Text

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

Reading Calibrated Uncertainty from Language Model Trajectories

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

FusionSense: Tri-Stage Learning for Energy-Efficient Multimodal Edge Intelligence

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

Parallel Compaction Improves Efficiency and Control for Long-Horizon LLM Agent Serving

Source:arXiv – AI· arxiv.org

ModelsMay 26

S³GNN: Efficient Global Mixing and Local Message Passing for Long-Range Graph Learning

Source:arXiv – Machine Learning· arxiv.org

ModelsMay 26

Gemma 4 Achieves Up to 3x Faster Token Generation with Multi-Token Prediction

Gemma 4 models, when paired with multi-token prediction (MTP) drafters utilizing speculative decoding, can achieve up to 3x faster inference speeds without compromising output quality.

Source:InfoQ AI & ML· infoq.com

ModelsMay 25

Microsoft Launches MDASH: An AI Platform for Large-Scale Vulnerability Research

Source:InfoQ AI & ML· infoq.com

ModelsMay 25

Comparing Federated Learning Algorithms (FedAvg vs. FedProx) on Non-IID Data using NVIDIA FLARE

A step-by-step guide detailing an advanced federated learning experiment comparing FedAvg and FedProx on a non-IID CIFAR-10 dataset, utilizing the NVIDIA FLARE framework.

Source:MarkTechPost· marktechpost.com

ModelsMay 24

Building Advanced LLM Workflows with the SuperClaude Framework

Source:MarkTechPost· marktechpost.com

ModelsMay 24

Nemotron-Labs Achieves Speed-of-Light Text Generation using Diffusion Language Models

Nemotron-Labs has developed novel Diffusion Language Models designed to drastically accelerate the generation of high-quality text, pushing the boundaries of generative AI speed.

Source:Hugging Face Blog· huggingface.co

ModelsMay 24

NVIDIA Releases Gated DeltaNet-2: Decoupling Erase and Write in the Delta Rule

Source:MarkTechPost· marktechpost.com