§ 01
News
What happened in AI todaySources: OpenAI / Anthropic / DeepMind / Moonshot / arXiv and other public RSS feeds. Updated twice daily at 06:00 / 18:00.
Jun 06, 2026Hugging Face
Five labs, five minds: building a multi-model finance drama on small models
↗Jun 06, 2026Hacker News (AI)Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbot
↗Jun 06, 2026Hacker News (AI)Police in England and Wales told to halt AI use in court statements
↗Jun 06, 2026Hacker News (AI)US House lawmakers release draft bill to prohibit state AI rules
↗Jun 06, 2026arXiv cs.AIHow Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field ExperimentThis study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts engaging users in live debate. After public disclosure, Reddit…
↗Jun 06, 2026arXiv cs.AIWhat Should Agents Say? Action-state Communication for Efficient Multi-Agent SystemsMulti-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared cont…
↗Jun 06, 2026arXiv cs.AII Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge AcquisitionMultimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailable for emerging memes. We introduce Query Retrieve Conclude, a zero…
↗Jun 06, 2026arXiv cs.AIGITCO: Gated Inference-Time Context Optimization in TSFMsPatch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy at inference time by optimizing the input context rather than modifying model weights. W…
↗Jun 06, 2026arXiv cs.AIUncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular FactoryReturned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component integrity may evolve differently under the next service scenario. Ex…
↗Jun 06, 2026arXiv cs.AISentinelBench: A Benchmark for Long-Running Monitoring AgentsAI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise trying to force progress. This is the wrong approach for many long-running tasks, which a…
↗Jun 06, 2026arXiv cs.AIAn interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We firs…
↗Jun 06, 2026arXiv cs.AISynthetic Contrastive Reasoning for Multi-Table Q&AMulti-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address thi…
↗Jun 06, 2026Hacker News (AI)Ask HN: Why is the HN crowd so anti-AI?
↗Jun 05, 2026Hugging FaceThousand Token Wood: shipping a multi-agent economy on a 3B model
↗Jun 05, 2026Hacker News (AI)Hacker News, Sans AI
↗Jun 05, 2026Hacker News (AI)Ask HN: What is your (AI) dev tech stack / workflow?
↗Jun 05, 2026Hacker News (AI)Did Claude increase bugs in rsync?
↗Jun 05, 2026Hacker News (AI)Programmers will document for Claude, but not for each other
↗Jun 05, 2026Kimi1.47.0What's Changed fix(tools): include trailing output in error briefs and render brief as plain text by @liruifengv in #2389 docs: rename project to Kimi CLI and link to Kimi Code CLI successor by @RealKai42 in #2431 feat(shell): guide users to upgrade to the new Kimi Code by @RealKai42 in #2432 chore(release): bump kimi…
↗Jun 05, 2026Hacker News (AI)Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens
↗Jun 05, 2026Hacker News (AI)Fine-tuning an LLM to write docs like it's 1995
↗Jun 05, 2026Hacker News (AI)The Pentagon is running an AI propaganda mill targeting Latin America
↗Jun 05, 2026Hacker News (AI)Open Code Review – An AI-powered code review CLI tool
↗Jun 04, 2026Hacker News (AI)Anthropic's open-source framework for AI-powered vulnerability discovery
↗Jun 04, 2026Hugging FaceNemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI
↗Jun 04, 2026Hacker News (AI)When AI Builds Itself: Our progress toward recursive self-improvement
↗Jun 04, 2026Hacker News (AI)Google employees internally share memes about how its AI sucks
↗Jun 04, 2026Hacker News (AI)The LLM warnings Google fired Timnit Gebru over have all come true
↗Jun 04, 2026Hugging FaceEVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
↗Jun 04, 2026OpenAIHow Endava is redesigning software delivery around AI agentsLearn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.
↗Jun 04, 2026OpenAIDreaming: Better memory for a more helpful ChatGPTChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.
↗Jun 04, 2026arXiv cs.AIToward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust CertificationPre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prompt-level guardrails offer limited assurance once an agent is operati…
↗Jun 04, 2026arXiv cs.AIStumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human ConnectionPublic discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional su…
↗Jun 04, 2026arXiv cs.AIThinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled ResearchLarge language models are reshaping research practice while quietly eroding researchers epistemic accountability. This commentary introduces PEEL - Protocols for Epistemically Engaged Literacy in AI, a working scaffolding that combines deterministic distant reading via Voyant Tools with LLM interpretation via Claude,…
↗Jun 04, 2026arXiv cs.AISMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language ModelsAs LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty. We introduce SMAC-Talk, a natural language extension…
↗Jun 04, 2026arXiv cs.AIConsensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation SignalMulti-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error. Building on prior work o…
↗Jun 04, 2026arXiv cs.AIVAMPS: Visual-Assisted Mathematical Problem Solving BenchmarkMultimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scie…
↗Jun 04, 2026arXiv cs.AIStepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL SynthesisAutomatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines stepwise trajectory modeling, process-reward modeling (PRM), and retr…
↗Jun 04, 2026arXiv cs.AICan Generalist Agents Automate Data Curation?Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data policies against noisy benchmark feedback. We ask whether generalist coding agents can automate this data-curation loop. We introduce *Curat…
↗Jun 04, 2026Hacker News (AI)The ways we contain Claude across products
↗Jun 04, 2026Hacker News (AI)Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes
↗Jun 04, 2026OpenAIBiodefense in the Intelligence AgeAn action plan for AI-powered biological resilience
↗Jun 04, 2026Hugging FaceDesigning the hf CLI as an agent-optimized way to work with the Hub
↗Jun 03, 2026OpenAIIntroducing new capabilities to GPT-RosalindGPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
↗Jun 03, 2026Hugging FaceDirect Preference Optimization Beyond Chatbots
↗Jun 03, 2026Hacker News (AI)32GB of DDR5 now costs $375 – AI shortage continues to squeeze PC building
↗Jun 03, 2026Hacker News (AI)Uber's $1,500/month AI limit is a useful signal for AI tool pricing
↗Jun 03, 2026OpenAIHow Wasmer used Codex to build a Node.js runtime for the edgeSee how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.
↗Jun 03, 2026Hacker News (AI)Mathematicians issue warning as AI rapidly gains ground
↗Jun 03, 2026OpenAIA blueprint for democratic governance of frontier AIOpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.
↗