§ 01

News

What happened in AI today

Sources: OpenAI / Anthropic / DeepMind / Moonshot / arXiv and other public RSS feeds. Updated twice daily at 06:00 / 18:00.

Jun 06, 2026Hugging Face
Five labs, five minds: building a multi-model finance drama on small models
Jun 06, 2026Hacker News (AI)
Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbot
Jun 06, 2026Hacker News (AI)
Police in England and Wales told to halt AI use in court statements
Jun 06, 2026Hacker News (AI)
US House lawmakers release draft bill to prohibit state AI rules
Jun 06, 2026arXiv cs.AI
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field ExperimentThis study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts engaging users in live debate. After public disclosure, Reddit…
Jun 06, 2026arXiv cs.AI
What Should Agents Say? Action-state Communication for Efficient Multi-Agent SystemsMulti-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared cont…
Jun 06, 2026arXiv cs.AI
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge AcquisitionMultimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailable for emerging memes. We introduce Query Retrieve Conclude, a zero…
Jun 06, 2026arXiv cs.AI
GITCO: Gated Inference-Time Context Optimization in TSFMsPatch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy at inference time by optimizing the input context rather than modifying model weights. W…
Jun 06, 2026arXiv cs.AI
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular FactoryReturned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component integrity may evolve differently under the next service scenario. Ex…
Jun 06, 2026arXiv cs.AI
SentinelBench: A Benchmark for Long-Running Monitoring AgentsAI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise trying to force progress. This is the wrong approach for many long-running tasks, which a…
Jun 06, 2026arXiv cs.AI
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We firs…
Jun 06, 2026arXiv cs.AI
Synthetic Contrastive Reasoning for Multi-Table Q&AMulti-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address thi…
Jun 06, 2026Hacker News (AI)
Ask HN: Why is the HN crowd so anti-AI?
Jun 05, 2026Hugging Face
Thousand Token Wood: shipping a multi-agent economy on a 3B model
Jun 05, 2026Hacker News (AI)
Hacker News, Sans AI
Jun 05, 2026Hacker News (AI)
Ask HN: What is your (AI) dev tech stack / workflow?
Jun 05, 2026Hacker News (AI)
Did Claude increase bugs in rsync?
Jun 05, 2026Hacker News (AI)
Programmers will document for Claude, but not for each other
Jun 05, 2026Kimi
1.47.0What's Changed fix(tools): include trailing output in error briefs and render brief as plain text by @liruifengv in #2389 docs: rename project to Kimi CLI and link to Kimi Code CLI successor by @RealKai42 in #2431 feat(shell): guide users to upgrade to the new Kimi Code by @RealKai42 in #2432 chore(release): bump kimi…
Jun 05, 2026Hacker News (AI)
Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens
Jun 05, 2026Hacker News (AI)
Fine-tuning an LLM to write docs like it's 1995
Jun 05, 2026Hacker News (AI)
The Pentagon is running an AI propaganda mill targeting Latin America
Jun 05, 2026Hacker News (AI)
Open Code Review – An AI-powered code review CLI tool
Jun 04, 2026Hacker News (AI)
Anthropic's open-source framework for AI-powered vulnerability discovery
Jun 04, 2026Hugging Face
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI
Jun 04, 2026Hacker News (AI)
When AI Builds Itself: Our progress toward recursive self-improvement
Jun 04, 2026Hacker News (AI)
Google employees internally share memes about how its AI sucks
Jun 04, 2026Hacker News (AI)
The LLM warnings Google fired Timnit Gebru over have all come true
Jun 04, 2026Hugging Face
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Jun 04, 2026OpenAI
How Endava is redesigning software delivery around AI agentsLearn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.
Jun 04, 2026OpenAI
Dreaming: Better memory for a more helpful ChatGPTChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.
Jun 04, 2026arXiv cs.AI
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust CertificationPre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prompt-level guardrails offer limited assurance once an agent is operati…
Jun 04, 2026arXiv cs.AI
Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human ConnectionPublic discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional su…
Jun 04, 2026arXiv cs.AI
Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled ResearchLarge language models are reshaping research practice while quietly eroding researchers epistemic accountability. This commentary introduces PEEL - Protocols for Epistemically Engaged Literacy in AI, a working scaffolding that combines deterministic distant reading via Voyant Tools with LLM interpretation via Claude,…
Jun 04, 2026arXiv cs.AI
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language ModelsAs LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty. We introduce SMAC-Talk, a natural language extension…
Jun 04, 2026arXiv cs.AI
Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation SignalMulti-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error. Building on prior work o…
Jun 04, 2026arXiv cs.AI
VAMPS: Visual-Assisted Mathematical Problem Solving BenchmarkMultimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scie…
Jun 04, 2026arXiv cs.AI
StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL SynthesisAutomatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines stepwise trajectory modeling, process-reward modeling (PRM), and retr…
Jun 04, 2026arXiv cs.AI
Can Generalist Agents Automate Data Curation?Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data policies against noisy benchmark feedback. We ask whether generalist coding agents can automate this data-curation loop. We introduce *Curat…
Jun 04, 2026Hacker News (AI)
The ways we contain Claude across products
Jun 04, 2026Hacker News (AI)
Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes
Jun 04, 2026OpenAI
Biodefense in the Intelligence AgeAn action plan for AI-powered biological resilience
Jun 04, 2026Hugging Face
Designing the hf CLI as an agent-optimized way to work with the Hub
Jun 03, 2026OpenAI
Introducing new capabilities to GPT-RosalindGPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
Jun 03, 2026Hugging Face
Direct Preference Optimization Beyond Chatbots
Jun 03, 2026Hacker News (AI)
32GB of DDR5 now costs $375 – AI shortage continues to squeeze PC building
Jun 03, 2026Hacker News (AI)
Uber's $1,500/month AI limit is a useful signal for AI tool pricing
Jun 03, 2026OpenAI
How Wasmer used Codex to build a Node.js runtime for the edgeSee how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.
Jun 03, 2026Hacker News (AI)
Mathematicians issue warning as AI rapidly gains ground
Jun 03, 2026OpenAI
A blueprint for democratic governance of frontier AIOpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.