Carol — The AI-Pilled Daily

§ 01

News

What happened in AI today

Sources: OpenAI / Anthropic / DeepMind / Moonshot / arXiv and other public RSS feeds. Updated twice daily at 06:00 / 18:00.

Jul 21, 2026Hugging Face

The State of Simulation for Physical AI: An Overview

↗Jul 21, 2026Hacker News (AI)

AI makes programming differently difficult

↗Jul 21, 2026Hacker News (AI)

Jack Dorsey launches Buzz to combine team chat, AI agents and Git hosting

↗Jul 21, 2026OpenAI

Introducing the ChatGPT for small business programOpenAI launches the ChatGPT for Small Businesses program, helping entrepreneurs build AI skills, automate work, and grow with ChatGPT Work.

↗Jul 21, 2026Google DeepMind

Introducing Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash CyberWe’re introducing new Gemini models, including Gemini 3.6 Flash, 3.5 Flash-Lite and 3.5 Flash Cyber.

↗Jul 21, 2026Hacker News (AI)

Claude Is Not a Compiler

↗Jul 21, 2026OpenAI

OpenAI and Hugging Face partner to address security incident during model evaluationOpenAI and Hugging Face share early findings from a security incident during AI model evaluation, highlighting advanced cyber capabilities and lessons for defenders.

↗Jul 21, 2026arXiv cs.AI

Rater State Bias in RLHF Preference Data: An Audit FrameworkWe identify a structured confound in Reinforcement Learning from Human Feedback (RLHF). Pairwise preference labels are intended to reflect the compared outputs, but they may also reflect the rater's state during annotation. Under sustained stressful or distressing conditions, raters' preferences may shift over time. A…

↗Jul 21, 2026arXiv cs.AI

Design and Validation of a Lightweight 1D CNN for Affective Touch Classification in Soft Plush CompanionsSoft, sensorized companions offer a physically safe and emotionally intuitive interface for socially assistive technologies, yet their deformability and multichannel tactile sensing complicate the robust interpretation of human affect. This study presents a complete open-source MATLAB-based framework for the developme…

↗Jul 21, 2026arXiv cs.AI

Some Large Language Models Exhibit Consistent Risk AttitudesAs artificial intelligence systems are deployed in open-ended, high-stakes settings, a critical dimension remains unmeasured: how perceived risk is translated into action. We test whether large language models (LLMs) exhibit systematic and consistent risk attitudes under uncertainty. We introduce a cross-domain framew…

↗Jul 21, 2026arXiv cs.AI

A Survey on GNN-based Link Prediction: Techniques, Applications, and ChallengesGraph Neural Networks (GNNs) have emerged as the leading paradigm for link prediction, enabling the inference of missing connections and the anticipation of potential future links. However, existing reviews lack systematic exploration specifically targeting underlying GNN architectures and diverse graph structures. To…

↗Jul 21, 2026arXiv cs.AI

PlanFlip: Attacking Multi-Agent LLM Systems via Planning-Phase Prompt InjectionMulti-agent LLM systems increasingly rely on a Planner to decompose goals into sub-task sequences that downstream Executor and Critic agents execute and audit. We identify the planning phase as a critical attack surface: a single injection into the Planner's context achieves cascade amplification, corrupting all downs…

↗Jul 21, 2026arXiv cs.AI

Deterministic Replay for AI Agent SystemsAI agent systems that couple large language models (LLMs) with external tools and APIs are inherently non-deterministic: LLM sampling variance, external API state, CDN infrastructure headers, and execution-environment noise collectively prevent any prior agent run from being faithfully re-executed. Existing observabil…

↗Jul 21, 2026arXiv cs.AI

Generative Ontology Induction: Domain-Agnostic Schema Discovery from Document Corpora Using Large Language ModelsOntology engineering remains a critical bottleneck in knowledge-intensive AI systems. Existing automated approaches either depend on predefined schemas, operate within narrow domains, or produce unstructured outputs unsuitable for downstream pipelines. We introduce Generative Ontology Induction (GOI), a domain-agnosti…

↗Jul 21, 2026arXiv cs.AI

Democratizing AI with Small Language Models: Structured Benchmarking and Parameter-Efficient Fine-Tuning for Local DeploymentAI democratization is not primarily a question of matching frontier-scale generality; it is a question of whether capable models can be selected, audited, and specialized under hardware and governance constraints that ordinary institutions can actually satisfy. This paper studies that problem through a controlled eval…

↗Jul 21, 2026Hacker News (AI)

Five US tech giants' hidden debts soar to $1.65T on opaque AI funding

↗Jul 21, 2026Hugging Face

Grabette: an open system to record robot-manipulation data

↗Jul 21, 2026OpenAI

David Vélez and Robin Vince join the boards of the OpenAI Foundation and OpenAI Group PBCDavid Vélez and Robin Vince join the boards of the OpenAI Foundation and OpenAI Group PBC, bringing global leadership in finance, technology, and governance.

↗Jul 20, 2026Hacker News (AI)

How we measured AI writing across arXiv, and where the measurement breaks

↗Jul 20, 2026Hugging Face

Introducing Cosmos 3 Edge

↗Jul 20, 2026Hacker News (AI)

China’s open-weights AI strategy is winning

↗Jul 20, 2026OpenAI

Safety and alignment in an era of long-horizon modelsOpenAI shares lessons from deploying long-running AI models, highlighting new safety risks, observed failures, and improved safeguards through iterative deployment.

↗Jul 20, 2026arXiv cs.AI

GraphDx: A Cost-Aware Knowledge-Enhanced Multi-Agent Framework for Sequential DiagnosisSequential diagnosis requires balancing diagnostic accuracy against resource costs through iterative information gathering. Existing Large Language Model (LLM) approaches exhibit a critical knowledge-reasoning gap: despite encoding extensive medical knowledge, they struggle to reason systematically under cost constrai…

↗Jul 20, 2026arXiv cs.AI

Causal-Audit: Explicit and Auditable Graph-based Reasoning via Target-Aware Causal Chain ConstructionCausal and intervention-based question answering is fundamental to advancing large language models (LLMs) toward reasoning beyond surface-level correlations and understanding underlying causal mechanisms. However, existing LLM-based methods often rely on implicit language-level reasoning, resulting in opaque causal as…

↗Jul 20, 2026arXiv cs.AI

Cura 1T: Specialized Model for Agentic HealthcareHealthcare spans high-stakes communication, expert reasoning, and workflow execution, yet specialized LLMs that cover these use cases together remain limited. A healthcare model must handle patient consultation, clinical reasoning over text and images, interactive diagnosis, and electronic health record (EHR) tool use…

↗Jul 20, 2026arXiv cs.AI

AnovaX: A Local, Multi-Agent Voice Assistant with LLM Planning, Typed Executors, and Adaptive RecoveryDesktop voice assistants are still dominated by cloud pipelines that ship raw audio off the machine and expose a fixed set of skills. We describe AnovaX, a small local-first assistant that runs entirely on the user's computer and treats the desktop itself as its action surface. A single Python process wires together a…

↗Jul 20, 2026arXiv cs.AI

Precise but Uncoupled: Reviewer Precision Does Not Guarantee Critique Uptake in Multi-Agent Math ReasoningMany math- and science-oriented agent systems use hierarchical designs with specialized reviewer roles, assuming that a dedicated review stage should help turn wrong candidates into correct ones. We test this assumption on 4,181 verifier-grounded Omni-MATH problems using matched gpt-oss-120b actors. Collaboration adds…

↗Jul 20, 2026arXiv cs.AI

DrawingVQA: A Real-World Benchmark for Multi-Depth Visual-Textual Reasoning on Construction DrawingsWe introduce DrawingVQA, the first benchmark designed to evaluate multimodal large language models (MLLMs) on real-world construction drawings -- a core media in architecture, civil, and many other engineering practices. Unlike natural images or schematic floor plans, construction drawings fuse abstract geometry, symb…

↗Jul 20, 2026arXiv cs.AI

Do Coding Agents Need Executable World Models, Simplification, and Verification to Solve ARC-AGI-3?Our previous ARC-AGI-3 agent bundled executable world modeling, scheduled simplification, and exact replay verification, leaving unclear which idea accounted for its performance. We address this attribution question with four nested Codex-based agents: a textual baseline; a flexible-interface executable world model wi…

↗Jul 20, 2026arXiv cs.AI

Beyond a Joke: Multi-Angle Reasoning for Detecting and Explaining Harmful Humor in MemesInternet memes intertwine visual cues, textual content, and cultural context, making them particularly challenging to interpret in scenarios where humor, sarcasm, and harmful intent coexist. These complexities highlight the need for explainable meme understanding systems that can provide reliable and structured reason…

↗Jul 20, 2026Hacker News (AI)

Claude Fable produced a counterexample to the Jacobian Conjecture

↗Jul 20, 2026Anthropic

Apply for Anthropic’s AI for Science rare disease research grantsApply for Anthropic’s AI for Science rare disease research grants

↗Jul 19, 2026Hacker News (AI)

AI advice made people less accurate but more confident – sudy

↗Jul 18, 2026Hacker News (AI)

Setting up your spare Mac for Claude Code to control, a step-by-step guide

↗Jul 18, 2026Hacker News (AI)

GPT-5.6 used a prompt to close a 30-year gap in convex optimization

↗Jul 18, 2026Hacker News (AI)

What AI did to stackoverflow in a graph

↗Jul 18, 2026Hacker News (AI)

Why do AI company logos look like buttholes? (2025)

↗Jul 18, 2026Hacker News (AI)

Fable 5 vs. GPT-5.6 Sol on an NP-Hard Problem: Does /goal help?

↗Jul 17, 2026Hacker News (AI)

Kaiser nurses say AI, surveillance are making their jobs and patient care worse

↗Jul 17, 2026Hugging Face

Fine-tune video and image models at scale with NVIDIA NeMo Automodel and 🤗 Diffusers

↗Jul 17, 2026Google DeepMind

Introducing Gemini 3.5 Flash CyberGoogle introduces Gemini 3.5 Flash Cyber, a lightweight cybersecurity model to find and patch vulnerabilities.

↗Jul 17, 2026Hacker News (AI)

The state of open source AI

↗Jul 17, 2026Hacker News (AI)

Claude Code: Anatomy of a Misfeature

↗Jul 17, 2026OpenAI

A scorecard for the AI ageSarah Friar, CFO of OpenAI, introduces a practical AI scorecard to measure ROI through useful work, cost per successful task, dependability, and return on compute.

↗Jul 17, 2026arXiv cs.AI

Intelligent Three Level Learning Architecture for Autonomous UAV Swarms in Search and RescueThis paper presents a novel three level hierarchical learning architecture for autonomous UAV swarms performing search and rescue operations. Unlike conventional approaches that apply a single learning paradigm across all hierarchy levels, the proposed architecture integrates three qualitatively different learning mec…

↗Jul 17, 2026arXiv cs.AI

HG-RAG: Hierarchy-Guided Retrieval-Augmented Generation for Structured Knowledge GraphsRetrieval Augmented Generation (RAG) has proven to be a widely successful process at improving the quality of outputs from a Large Language Model (LLM) for wider context. However, RAG systems typically retrieve context from flat document stores, which struggles when queries require hierarchical or relational reasoning…

↗Jul 17, 2026arXiv cs.AI

IMEX Interaction-Based Model ExplanationIn predictive modeling, the ability to explain why a model produces a given target prediction has become increasingly important [5, 10]. Black-box models do not provide a transparent description of the internal mechanisms that generate the prediction, making even accurate predictions difficult to interpret and validat…

↗Jul 17, 2026arXiv cs.AI

RegNetAgents: A Multi-Agent Framework for Cross-Network Regulatory Driver Identification in Cancer GenomicsWe introduce RegNetAgents, an AI-oriented multi-agent framework for structured, query-driven regulatory candidate identification across heterogeneous gene regulatory networks. The system enables unified analysis of bulk tumor and single-cell-derived ARACNe networks by integrating TCGA-derived cancer networks with larg…

↗Jul 17, 2026arXiv cs.AI

DialogueVPR: Towards Conversational Visual Place RecognitionInspired by how humans communicate spatial information, language-guided geo-localization has gained significant traction for its intuitive and practical value. Despite this progress, most methods still rely on a static, one-shot retrieval paradigm, which fails to handle the ambiguity and incompleteness inherent in rea…

↗Jul 17, 2026arXiv cs.AI

Interpretable Language Model for Closed-Loop Type 1 Diabetes ControlType 1 Diabetes (T1D) is a chronic, life-threatening autoimmune condition characterized by the complete destruction of insulin-producing pancreatic beta cells. While Artificial Pancreas Systems (APS) powered by Reinforcement Learning (RL) have shown promise in automating insulin delivery, their ``black-box'' nature ma…