Carol —— AI-Pilled 日报

§ 01

News

AI 圈每天发生了什么

来源:OpenAI / Anthropic / DeepMind / 月之暗面 / arXiv 等公开 RSS。每日 06:00 / 18:00 自动更新。

2026.07.21Hugging Face

The State of Simulation for Physical AI: An Overview

↗2026.07.21Hacker News (AI)

AI makes programming differently difficult

↗2026.07.21Hacker News (AI)

Jack Dorsey launches Buzz to combine team chat, AI agents and Git hosting

↗2026.07.21OpenAI

Introducing the ChatGPT for small business programOpenAI launches the ChatGPT for Small Businesses program, helping entrepreneurs build AI skills, automate work, and grow with ChatGPT Work.

↗2026.07.21Google DeepMind

Introducing Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash CyberWe’re introducing new Gemini models, including Gemini 3.6 Flash, 3.5 Flash-Lite and 3.5 Flash Cyber.

↗2026.07.21Hacker News (AI)

Claude Is Not a Compiler

↗2026.07.21OpenAI

OpenAI and Hugging Face partner to address security incident during model evaluationOpenAI and Hugging Face share early findings from a security incident during AI model evaluation, highlighting advanced cyber capabilities and lessons for defenders.

↗2026.07.21arXiv cs.AI

Rater State Bias in RLHF Preference Data: An Audit FrameworkWe identify a structured confound in Reinforcement Learning from Human Feedback (RLHF). Pairwise preference labels are intended to reflect the compared outputs, but they may also reflect the rater's state during annotation. Under sustained stressful or distressing conditions, raters' preferences may shift over time. A…

↗2026.07.21arXiv cs.AI

Design and Validation of a Lightweight 1D CNN for Affective Touch Classification in Soft Plush CompanionsSoft, sensorized companions offer a physically safe and emotionally intuitive interface for socially assistive technologies, yet their deformability and multichannel tactile sensing complicate the robust interpretation of human affect. This study presents a complete open-source MATLAB-based framework for the developme…

↗2026.07.21arXiv cs.AI

Some Large Language Models Exhibit Consistent Risk AttitudesAs artificial intelligence systems are deployed in open-ended, high-stakes settings, a critical dimension remains unmeasured: how perceived risk is translated into action. We test whether large language models (LLMs) exhibit systematic and consistent risk attitudes under uncertainty. We introduce a cross-domain framew…

↗2026.07.21arXiv cs.AI

A Survey on GNN-based Link Prediction: Techniques, Applications, and ChallengesGraph Neural Networks (GNNs) have emerged as the leading paradigm for link prediction, enabling the inference of missing connections and the anticipation of potential future links. However, existing reviews lack systematic exploration specifically targeting underlying GNN architectures and diverse graph structures. To…

↗2026.07.21arXiv cs.AI

PlanFlip: Attacking Multi-Agent LLM Systems via Planning-Phase Prompt InjectionMulti-agent LLM systems increasingly rely on a Planner to decompose goals into sub-task sequences that downstream Executor and Critic agents execute and audit. We identify the planning phase as a critical attack surface: a single injection into the Planner's context achieves cascade amplification, corrupting all downs…

↗2026.07.21arXiv cs.AI

Deterministic Replay for AI Agent SystemsAI agent systems that couple large language models (LLMs) with external tools and APIs are inherently non-deterministic: LLM sampling variance, external API state, CDN infrastructure headers, and execution-environment noise collectively prevent any prior agent run from being faithfully re-executed. Existing observabil…

↗2026.07.21arXiv cs.AI

Generative Ontology Induction: Domain-Agnostic Schema Discovery from Document Corpora Using Large Language ModelsOntology engineering remains a critical bottleneck in knowledge-intensive AI systems. Existing automated approaches either depend on predefined schemas, operate within narrow domains, or produce unstructured outputs unsuitable for downstream pipelines. We introduce Generative Ontology Induction (GOI), a domain-agnosti…

↗2026.07.21arXiv cs.AI

Democratizing AI with Small Language Models: Structured Benchmarking and Parameter-Efficient Fine-Tuning for Local DeploymentAI democratization is not primarily a question of matching frontier-scale generality; it is a question of whether capable models can be selected, audited, and specialized under hardware and governance constraints that ordinary institutions can actually satisfy. This paper studies that problem through a controlled eval…

↗2026.07.21Hacker News (AI)

Five US tech giants' hidden debts soar to $1.65T on opaque AI funding

↗2026.07.21Hugging Face

Grabette: an open system to record robot-manipulation data

↗2026.07.21OpenAI

David Vélez and Robin Vince join the boards of the OpenAI Foundation and OpenAI Group PBCDavid Vélez and Robin Vince join the boards of the OpenAI Foundation and OpenAI Group PBC, bringing global leadership in finance, technology, and governance.

↗2026.07.20Hacker News (AI)

How we measured AI writing across arXiv, and where the measurement breaks

↗2026.07.20Hugging Face

Introducing Cosmos 3 Edge

↗2026.07.20Hacker News (AI)

China’s open-weights AI strategy is winning

↗2026.07.20OpenAI

Safety and alignment in an era of long-horizon modelsOpenAI shares lessons from deploying long-running AI models, highlighting new safety risks, observed failures, and improved safeguards through iterative deployment.

↗2026.07.20arXiv cs.AI

GraphDx: A Cost-Aware Knowledge-Enhanced Multi-Agent Framework for Sequential DiagnosisSequential diagnosis requires balancing diagnostic accuracy against resource costs through iterative information gathering. Existing Large Language Model (LLM) approaches exhibit a critical knowledge-reasoning gap: despite encoding extensive medical knowledge, they struggle to reason systematically under cost constrai…

↗2026.07.20arXiv cs.AI

Causal-Audit: Explicit and Auditable Graph-based Reasoning via Target-Aware Causal Chain ConstructionCausal and intervention-based question answering is fundamental to advancing large language models (LLMs) toward reasoning beyond surface-level correlations and understanding underlying causal mechanisms. However, existing LLM-based methods often rely on implicit language-level reasoning, resulting in opaque causal as…

↗2026.07.20arXiv cs.AI

Cura 1T: Specialized Model for Agentic HealthcareHealthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs that cover these use cases together remain limited. A healthcare model must handle patient consultation, clinical reasoning over text and images, interactive diagnosis, and electronic health record (EHR) tool use…

↗2026.07.20arXiv cs.AI

AnovaX: A Local, Multi-Agent Voice Assistant with LLM Planning, Typed Executors, and Adaptive RecoveryDesktop voice assistants are still dominated by cloud pipelines that ship raw audio off the machine and expose a fixed set of skills. We describe AnovaX, a small local-first assistant that runs entirely on the user's computer and treats the desktop itself as its action surface. A single Python process wires together a…

↗2026.07.20arXiv cs.AI

Precise but Uncoupled: Reviewer Precision Does Not Guarantee Critique Uptake in Multi-Agent Math ReasoningMany math- and science-oriented agent systems use hierarchical designs with specialized reviewer roles, assuming that a dedicated review stage should help turn wrong candidates into correct ones. We test this assumption on 4,181 verifier-grounded Omni-MATH problems using matched gpt-oss-120b actors. Collaboration adds…

↗2026.07.20arXiv cs.AI

DrawingVQA: A Real-World Benchmark for Multi-Depth Visual-Textual Reasoning on Construction DrawingsWe introduce DrawingVQA, the first benchmark designed to evaluate multimodal large language models (MLLMs) on real-world construction drawings -- a core media in architecture, civil, and many other engineering practices. Unlike natural images or schematic floor plans, construction drawings fuse abstract geometry, symb…

↗2026.07.20arXiv cs.AI

Do Coding Agents Need Executable World Models, Simplification, and Verification to Solve ARC-AGI-3?Our previous ARC-AGI-3 agent bundled executable world modeling, scheduled simplification, and exact replay verification, leaving unclear which idea accounted for its performance. We address this attribution question with four nested Codex-based agents: a textual baseline; a flexible-interface executable world model wi…

↗2026.07.20arXiv cs.AI

Beyond a Joke: Multi-Angle Reasoning for Detecting and Explaining Harmful Humor in MemesInternet memes intertwine visual cues, textual content, and cultural context, making them particularly challenging to interpret in scenarios where humor, sarcasm, and harmful intent coexist. These complexities highlight the need for explainable meme understanding systems that can provide reliable and structured reason…

↗2026.07.20Hacker News (AI)

Claude Fable produced a counterexample to the Jacobian Conjecture

↗2026.07.20Anthropic

Apply for Anthropic’s AI for Science rare disease research grantsApply for Anthropic’s AI for Science rare disease research grants

↗2026.07.19Hacker News (AI)

AI advice made people less accurate but more confident – sudy

↗2026.07.18Hacker News (AI)

Setting up your spare Mac for Claude Code to control, a step-by-step guide

↗2026.07.18Hacker News (AI)

GPT-5.6 used a prompt to close a 30-year gap in convex optimization

↗2026.07.18Hacker News (AI)

What AI did to stackoverflow in a graph

↗2026.07.18Hacker News (AI)

Why do AI company logos look like buttholes? (2025)

↗2026.07.18Hacker News (AI)

Fable 5 vs. GPT-5.6 Sol on an NP-Hard Problem: Does /goal help?

↗2026.07.17Hacker News (AI)

Kaiser nurses say AI, surveillance are making their jobs and patient care worse

↗2026.07.17Hugging Face

Fine-tune video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffusers

↗2026.07.17Google DeepMind

Introducing Gemini 3.5 Flash CyberGoogle introduces Gemini 3.5 Flash Cyber, a lightweight cybersecurity model to find and patch vulnerabilities.

↗2026.07.17Hacker News (AI)

The state of open source AI

↗2026.07.17Hacker News (AI)

Claude Code: Anatomy of a Misfeature

↗2026.07.17OpenAI

A scorecard for the AI ageSarah Friar, CFO of OpenAI, introduces a practical AI scorecard to measure ROI through useful work, cost per successful task, dependability, and return on compute.

↗2026.07.17arXiv cs.AI

Intelligent Three Level Learning Architecture for Autonomous UAV Swarms in Search and RescueThis paper presents a novel three level hierarchical learning architecture for autonomous UAV swarms performing search and rescue operations. Unlike conventional approaches that apply a single learning paradigm across all hierarchy levels, the proposed architecture integrates three qualitatively different learning mec…

↗2026.07.17arXiv cs.AI

HG-RAG: Hierarchy-Guided Retrieval-Augmented Generation for Structured Knowledge GraphsRetrieval Augmented Generation (RAG) has proven to be a widely successful process at improving the quality of outputs from a Large Language Model (LLM) for wider context. However, RAG systems typically retrieve context from flat document stores, which struggles when queries require hierarchical or relational reasoning…

↗2026.07.17arXiv cs.AI

IMEX Interaction-Based Model ExplanationIn predictive modeling, the ability to explain why a model produces a given target prediction has become increasingly important [5, 10]. Black-box models do not provide a transparent description of the internal mechanisms that generate the prediction, making even accurate predictions difficult to interpret and validat…

↗2026.07.17arXiv cs.AI

RegNetAgents: A Multi-Agent Framework for Cross-Network Regulatory Driver Identification in Cancer GenomicsWe introduce RegNetAgents, an AI-oriented multi-agent framework for structured, query-driven regulatory candidate identification across heterogeneous gene regulatory networks. The system enables unified analysis of bulk tumor and single-cell-derived ARACNe networks by integrating TCGA-derived cancer networks with larg…

↗2026.07.17arXiv cs.AI

DialogueVPR: Towards Conversational Visual Place RecognitionInspired by how humans communicate spatial information, language-guided geo-localization has gained significant traction for its intuitive and practical value. Despite this progress, most methods still rely on a static, one-shot retrieval paradigm, which fails to handle the ambiguity and incompleteness inherent in rea…

↗2026.07.17arXiv cs.AI

Interpretable Language Model for Closed-Loop Type 1 Diabetes ControlType 1 Diabetes (T1D) is a chronic, life-threatening autoimmune condition characterized by the complete destruction of insulin-producing pancreatic beta cells. While Artificial Pancreas Systems (APS) powered by Reinforcement Learning (RL) have shown promise in automating insulin delivery, their ``black-box'' nature ma…

↗