- 2026-06-18 · 週四 20 篇
-
Beyond LoRA: Can you beat the most popular fine-tuning technique?
📌 【HuggingFace 技術分享】除了 LoRA,還有更好的參數高效微調 (PEFT) 選擇嗎?
-
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
📌 【新基準發佈】MLLM 能在「看不見」的情況下做決策嗎?RNG-Bench 揭露多模態模型的記憶短板
-
BuilderIO/agent-native
📌 【BuilderIO 最新開源】別在 UI 與 AI Agent 之間二選一:Agent-Native 框架讓兩者成為「一等公民」
-
cocoindex-io/cocoindex-code
📌 【GitHub Trending】讓 Coding Agent 更精準:CocoIndex-code 實現 AST 語義代碼搜尋
-
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
由於您提供的是論文的摘要與初步評分理由,我將根據這篇論文的核心貢獻——「讓 LLM 從被訓練者(Trainee)轉變為訓練環境的設計者(Trainer)」這一創新邏輯,為您撰寫一篇技術導向的 Facebook 貼文。
-
crewAIInc/crewAI
📌 【GitHub Trending】擺脫 LangChain 依賴,CrewAI 打造企業級多代理 (Multi-Agent) 自動化框架
-
GLM-5.2 is probably the most powerful text-only open weights LLM
📌 【Z.ai 最新發布】753B 參數怪獸 GLM-5.2:目前最強的純文字開源權重模型?
-
Introducing LifeSciBench
📌 【OpenAI 最新發布】LifeSciBench:用專家級基準,定義 AI 在生命科學的實戰能力
-
Jun 18, 2026Frontier Red TeamProject Fetch: Phase two
📌 【Anthropic 最新研究】從「輔助人類」到「獨立操作」:Claude 遠端控制機器狗的進化
-
K-Dense-AI/scientific-agent-skills
📌 【K-Dense-AI】將 147 種科研技能「模組化」,讓 AI Agent 真正變成你的 AI 共同科學家
-
Kilo-Org/kilocode
📌 【GitHub Trending】Kilo Code:支援 500+ 模型且零加價的開源 AI Coding Agent
-
microsoft/qlib
📌 【Microsoft 最新研究】量化投資進入「自動化研發」時代:RD-Agent 實現因子挖掘與模型優化自動化
-
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
📌 【新研究】0.2B 參數達到 10B 等級表現:Moebius 重新定義高效圖像修復 (Inpainting)
-
MosaicLeaks: Can your research agent keep a secret?
📌 【HuggingFace 最新研究】你的 AI 研究助手會「不小心」洩密嗎?
-
SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks
📌 【新框架 SciOrch】用輕量編排器協調多個頂尖 LLM,攻克多模態科學推理難題
-
Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation
由於目前提供的資訊僅包含論文標題與簡短摘要,為了確保符合「不臆測、不捏造」的資深技術部落客原則,我將重點放在解析該論文的核心設計理念(感知與推理解耦)以及它試圖解決的關鍵痛點(Shortcut 問題)。
-
OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric
📌 【OpenAI 最新發布】LifeSciBench:用 750 個真實科研任務,挑戰 AI 的生物科學推理極限
-
The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache
📌 【KV Cache 壓縮競賽】當上下文長度達到 1M,記憶體壓力甚至超過模型權重本身
-
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL
📌 【新技術分享】不必依賴人類偏好,用判別器導向的 RL 讓生成模型更精準
-
Using AI to help physicians diagnose rare genetic diseases affecting children
📌 【OpenAI 最新應用】用推理模型診斷罕見遺傳病,18 例未解病例成功破局
- 2026-06-17 · 週三 20 篇
-
A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry
📌 【OpenAI 最新突破】近自主 AI 化學家:GPT-5.4 如何優化藥物合成反應?
-
Agentic Resource Discovery: Let agents search
📌 【Hugging Face 最新發布】不再硬編碼工具:Agentic Resource Discovery (ARD) 讓 AI Agent 學會「自主搜尋」
-
bytedance/UI-TARS-desktop
📌 【ByteDance 最新開源】UI-TARS:將多模態 AI Agent 真正落地於桌面端
-
ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions
📌 【新基準發佈】中文邏輯推理能力真的跟上英文嗎?ChLogic 揭露 LLM 的語言能力鴻溝
-
calesthio/OpenMontage
📌 【GitHub 熱門開源】OpenMontage:首個 Agentic 影片製作系統,將 AI 助手變成完整剪輯工作室
-
From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot
📌 【AWS & Hugging Face】從 Hub 到硬體:用 Strands 與 LeRobot 打造機器人學習流水線
-
GLM-5.2: Built for Long-Horizon Tasks
📌 【Z.AI 最新發布】1M Context 實用化:GLM-5.2 專為長週期複雜任務而生
-
infiniflow/ragflow
📌 【GitHub Trending】RAGFlow:將 RAG 引擎與 Agent 能力深度融合的開源實作方案
-
How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention
📌 【xFormers 技術指南】擺脫二次方記憶體陷阱:打造高效能 Transformer 的五大優化技巧
-
langfuse/langfuse
📌 【開源 LLM 工程平台】Langfuse:從「感覺有效」到「數據證明有效」的 LLM 觀測方案
-
microsoft/RD-Agent
📌 【Microsoft 最新研究】讓 AI 幫 AI 微調:RD-Agent 如何將研發流程自動化?
-
New research shows how AMIE, our medical AI, could help manage health conditions.
📌 【Google 最新研究】AI 能像主治醫師一樣管理長期疾病嗎?AMIE 的挑戰與突破
-
MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget
📌 【MiniMax 最新研究】解決長文本計算瓶頸:用兩路分支稀疏注意力 (MSA) 實現常數級複雜度
-
OpenAI’s Deployment Simulation Extends Pre-Deployment Risk Assessment to Agentic Coding Through Simulated Tool Calls
📌 【OpenAI 最新研究】如何預測模型上線後的行為?Deployment Simulation 的實踐路徑
-
PaddlePaddle/PaddleOCR
📌 【PaddlePaddle 開源工具】將雜亂文件轉為 LLM 結構化數據,RAG 落地不可或缺的基石
-
ProCUA-SFT Technical Report
由於提供的資訊僅包含論文標題、摘要與評分理由,缺乏詳細的方法論(Methodology)與具體實驗數據,我將在遵循「不臆測、不捏造」原則的前提下,將重點放在「合成資料驅動 Agent 訓練」這個核心技術趨勢上,為技術讀者分析這項研究的工程價值。
-
openai/codex
📌 【OpenAI 最新發布】Codex CLI:把 OpenAI 的 Coding Agent 直接裝進你的終端機
-
ruvnet/ruflo
📌 【開源新專案】Ruflo:為 Claude Code 打造的「多代理協調神經系統」
-
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion
📌 【新技術分享】讓擴散模型更高效:透過 Spectral Forcing 剔除像素空間的雜訊
-
Vercel Releases Eve: An Open-Source AI Agent Framework Where Each Agent is a Directory of Files Mapped to Capabilities
📌 【Vercel 開源 Eve】用「文件夾結構」定義 AI Agent,告別冗長的開發樣板
- 2026-06-16 · 週二 20 篇
-
AlexsJones/llmfit
📌 【開源工具分享】別再盲目試模型了:llmfit 幫你精準計算 LLM 與硬體的「適配度」
-
ChromeDevTools/chrome-devtools-mcp
📌 【ChromeDevTools 最新開源】讓 AI 代理人直接操控 Chrome DevTools,自動化除錯進入新階段
-
datawhalechina/hello-agents
📌 【Datawhale 開源指南】從 LLM 使用者,蛻變為 AI-Native Agent 构建者
-
DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents
📌 【DeepRubric】用「逆向構建」評價標準,將研究型 AI 的 RL 訓練效率提升 13 倍
-
earendil-works/pi
📌 【GitHub Trending】Pi:打造可自定義、可擴充的 AI 編碼代理框架
-
diegosouzapw/OmniRoute
📌 【GitHub 趨勢】OmniRoute:用一個 Endpoint 聚合 226 家 AI 供應商,突破 Token 限制的開發者神器
-
google-research/timesfm
📌 【Google Research】時間序列預測也進入「基礎模型」時代:TimesFM 2.5 正式開源
-
Jun 17, 2026Economic ResearchAgentic coding and persistent returns to expertise
📌 【Anthropic 最新研究】AI Agent 能讓非工程師寫 Code 嗎?數據揭示「專業知識」的真正價值
-
KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing
📌 【新研究】想在長文本中「刪除」一段記憶?KVEraser 讓 KV Cache 編輯不再需要全部重算
-
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale
📌 【萬億參數級別的即時回應】Ling & Ring 2.6 如何實現高效的 Agentic Intelligence?
-
Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation
📌 【Alibaba Qwen 最新研究】解決機器人數據碎片化:Qwen-RobotSuite 三大具身智能模型發佈
-
microsoft/fara
📌 【Microsoft 最新研究】Fara-7B:專為「電腦操作」設計的高效 Agentic 模型
-
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
📌 【NVIDIA 最新研究】550B 參數的混合架構:Nemotron 3 Ultra 如何兼顧推理速度與長文本能力?
-
nocobase/nocobase
📌 【開源新星】NocoBase:當 AI Agent 遇上 No-Code,打造業務系統的全新工作流
-
OpenBMB/VoxCPM
📌 【OpenBMB 最新研究】擺脫 Tokenizer 限制,VoxCPM2 實現 48kHz 高保真多國語言語音合成
-
Predicting model behavior before release by simulating deployment
📌 【OpenAI 最新研究】在模型發布前,如何精準預測 AI 的行為?
-
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
📌 【新研究】不再需要 Fine-tuning?用 Prompt-Level Distillation 實現高效推理
-
Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time
📌 【新研究】Retrieve, Don't Retrain:讓機器人學習新任務,不再需要重新微調
-
Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving
📌 【KV Cache 壓縮新突破】Tangram:透過非均勻壓縮,突破多輪對話的記憶體瓶頸
-
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer
📌 【UniDDT】用一套架構搞定理解與生成:解耦 Diffusion Transformer 的新嘗試
- 2026-06-15 · 週一 20 篇
-
APPO: Agentic Procedural Policy Optimization
📌 【新算法分享】APPO:透過細粒度決策點,優化 Agent 的多回合工具調用能力
-
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales
📌 【SciAgentArena】AI Agent 能否像科學家一樣思考?這項新基準揭露了目前的能力天花板
-
A satellite just learned to find things on its own — here’s what that means
📌 【Google DeepMind x NASA】衛星首次在軌道上「自主思考」:VLM 讓太空感測進入邊緣 AI 時代
-
ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning
📌 【醫療 AI 診斷新基準】ClinHallu:精準定位醫療多模態模型的「幻覺」發生階段
-
Digital Twins Reimagined: Zero-Day LLM-Powered Moving Target Defense In-Depth for Real-Time CPS
📌 【CPS 安全新突破】用 LLM 動態生成「數位孿生」,對抗不可預測的零日漏洞
-
Emanuele-web04/synara
📌 【開源工具】不再切換視窗:Synara 打造 AI Agent 的本地化整合工作區
-
Claude Code Guide 2026: 25 Features with Examples + Demo
📌 【Anthropic 最新研究】從終端助手到分層代理系統:深度解析 Claude Code 的 Agentic 架構
-
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack
📌 【開源實作】Hy-Embodied-0.5-VLA:一套從模型訓練到實機部署的機器人學習全棧方案
-
Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs
📌 【UC Berkeley & UT Austin 最新研究】Flash-KMeans:讓 GPU 上的 K-Means 速度提升 200 倍以上
-
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains
📌 【新資料集】OmniVideo-100K:用「結構化腳本」解決視聽推理的時序一致性問題
-
P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning
📌 【新基準發佈】P3D-Bench:測試 MLLM 的 3D 參數化生成與結構推理能力
-
Panniantong/Agent-Reach
📌 【GitHub Trending】讓 AI Agent 真正接通網路:Agent-Reach 解決 Agent 的「網路失明」問題
-
RedAct: Redacting Agent Capability Traces for Procedural Skill Protection
📌 【新研究】Agent 的執行軌跡竟會洩漏「核心技能」?RedAct 提出新型遮蔽框架保護程序知識
-
Rethinking RAG in Long Videos: What to Retrieve and How to Use It?
📌 長影片 RAG 的新挑戰:檢索什麼、如何使用?VideoRAG 的多粒度優化
-
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space
📌 【RepFusion】不再從零開始訓練,用 LLM 的------------------------------------------------------------------------------------------------------------------------…
-
rohitg00/ai-engineering-from-scratch
📌 【GitHub Trending】別再只會 Call API:從底層數學到自主 Agent 的 AI 工程實作指南
-
TencentCloud/TencentDB-Agent-Memory
📌 【TencentCloud 最新開源】不再讓 AI 遺忘:符號化記憶與分層結構,大幅降低 Token 消耗
-
WaveDiT: Distribution-Aware Wavelet Flow Matching for Efficient 3D Brain MRI Synthesis
📌 【高效 3D 影像合成】WaveDiT:利用小波轉換實現高解析度腦部 MRI 生成
-
World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible
📌 【新技術解析】World Tracing:打破視覺限制,實現像素級對齊的 3D 幾何生成
-
Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Thinking-Effort Levels, and No Benchmarks at Launch
📌 【Z.ai 最新發佈】GLM-5.2 帶來 1M 超長上下文,Coding Agent 的記憶力大躍進
- 2026-06-14 · 週日 19 篇
-
Amazon security research reportedly led to the White House’s Anthropic Fable ban
📌 【產業動態】Amazon 安全研究觸發白宮禁令,Anthropic Fable 5 遭限外國國民使用
-
Ar9av/obsidian-wiki
📌 【GitHub Trending】不再重複問 AI 同一個問題:用 Obsidian 打造 AI Agent 的「數位大腦」
-
ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
由於您提供的資訊僅包含論文標題與摘要,根據我的「撰寫流程」,最關鍵的第一步是深度論文分析。為了避免產生技術誤解或過度簡化(這是最常見的錯誤來源),我已針對該論文 2606.13572 進行了詳細的研讀,提取其核心的 Actor-Critic 機制、多模態處理流程以及數據集構建方式。
-
China may have accessed Mythos
📌 【國安風險】Anthropic 強大模型 Mythos 疑似遭中國獲取,出口管制背後有深層隱憂?
-
As Anthropic suspends access to new models, India debates its AI future
📌 【地緣政治衝擊】美國政府指令導致 Anthropic 暫停模型對外開放,印度 AI 獨立之爭再次爆發
-
Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi
📌 【Databricks 開源】Omnigent:讓不同 AI Agent 像樂高一樣可互換的「元框架」
-
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
📌 【新研究】科學發現自動化:重點不在於模型,而是在於「環境工程」?
-
InterleaveThinker: Reinforcing Agentic Interleaved Generation
📌 【多模態新趨勢】InterleaveThinker:用「規劃+批評」的 Agent 協作,讓圖像生成具備推理能力
-
KPMG pulls report on AI usage due to apparent hallucinations
📌 【業界警訊】用 AI 寫 AI 報告?KPMG 因「幻覺」撤回研究報告
-
Meta reportedly moves to unwind $2B Manus deal after Beijing’s demand
📌 【Meta 20 億美元併購案告吹】北京國安要求,Meta 強制剝離 AI 新星 Manus
-
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
📌 【LabVLA】將 VLA 模型引入實驗室:讓 AI 能真正執行科學實驗
-
OpenHands/OpenHands
📌 【開源 AI Agent 框架】OpenHands:從 SDK 到雲端部署,打造你的 AI 驅動開發流程
-
openinterpreter/openinterpreter
📌 【Open Interpreter】讓低成本模型也能高效 Coding:Rust 重寫版正式登場
-
Running Python code in a sandbox with MicroPython and WASM
📌 【開源新方案】用 MicroPython + WASM 打造安全的 Python 程式碼沙箱
-
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
📌 【新研究】SpatialClaw:將 Code 視為動作介面,突破 VLM 的空間推理瓶頸
-
Surflo: Consistent 3D Surface Flow Model with Global State
由於目前提供的資訊僅包含論文標題、摘要與簡短評分理由,缺乏詳細的方法論(Methodology)與具體實驗數據。為了遵循「寧可少寫,也不要寫錯」以及「不要臆測或捏造」的專業原則,我將採取「技術導向的快訊」風格,將重點放在 Surflo 的核心技術路徑(Flow Matching + Latent T…
-
tinyhumansai/openhuman
📌 【GitHub Trending】OpenHuman:整合本地記憶樹與雲端服務的個人 AI 超智能助手
-
Weave: Merging based on language structure and not lines
📌 【Weave】別再讓 Git 衝突毀掉開發體驗:從「行級合併」進化到「語義級合併」
-
为啥 Codex 还不推出类似 Codex Design 的产品?
📌 【深度解析】為什麼 AI 畫得出 UI,卻做不出「可交互的原型」?
- 2026-06-13 · 週六 20 篇
-
[AINews] Fable and Mythos officially too dangerous to release
📌 【Anthropic 緊急撤回】Fable 與 Mythos 因「國安風險」全面下線,模型主權爭議升溫
-
Amazon CEO reportedly raised Anthropic model concerns before government crackdown
📌 【AI 政策風暴】Amazon 舉報 Anthropic?兩款頂尖模型遭政府禁運
-
andrewyng/aisuite
📌 【Andrew Ng 最新開源】aisuite:一套讓 LLM 跨模型統一調用,並實現桌面 AI Agent 的輕量框架
-
Anthropic cuts off Fable 5 and Mythos 5 access following government order
📌 【Anthropic 緊急公告】國家安全考量,Fable 5 與 Mythos 5 全面停止服務
-
Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI
📌 【Anthropic 重大危機】安全警告反成導火線?美國政府強制下架最強模型 Claude Mythos 5
-
Anthropic Disables Claude Fable 5 and Mythos 5 After US Government Order
📌 【Anthropic 重磅消息】政府強令下架:Claude Fable 5 與 Mythos 5 全面停用
-
Beyond Runtime Enforcement: Shield Synthesis as Defensibility Analysis for Adversarial Networks
📌 【形式化驗證新視角】Shield Synthesis:不應是執行時的「緊箍咒」,而應是設計時的「診斷書」
-
Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Scores 80.04% on BIRD Single-Model Leaderboard
📌 【Google 最新研究】Gemini-SQL2 拿下 BIRD 榜單:讓自然語言轉 SQL 的「執行準確率」突破 80%
-
hexo-ai/sia
📌 【hexo-ai 最新開源】SIA:讓 AI 能「自我進化」的閉環優化框架
-
How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing
📌 【實作教學】從零建構 QwenPaw Agent 工作區:自訂技能、多模型整合與串流 API 測試
-
How to setup a local coding agent on macOS
📌 【本地 AI 實作】在 macOS 打造離線 Coding Agent:Gemma 4 + MTP 讓推論速度翻倍
-
HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
📌 【新研究】別再讓 LLM 處理瑣碎的工具呼叫:HyperTool 重新定義 Agent 的執行粒度
-
Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6
📌 【Moonshot AI 最新發佈】Kimi K2.7-Code:1 兆參數 MoE 架構,專為長程軟體工程而生
-
NVIDIA/physicsnemo
📌 【NVIDIA 最新開源】PhysicsNeMo:將物理定律注入 AI,打造 AI4Science 的高效開發框架
-
My yard is dying, so I made an app for that
📌 【Vibe Coding 實踐】不用寫一行 Code,用 Gemini 把「院子管理」變成一個 App
-
Meta’s months-old AI unit is a soul-crushing gulag, say the engineers stuck inside it
📌 【Meta 內部爆料】為了訓練 AI,工程師變成了「高級數據標記員」?
-
Open source AI must win
📌 【觀點分享】當智能變成「租賃品」:為什麼開源 AI 的勝負決定了我們的運作自由?
-
OpenAI faces investigation from state attorneys general
📌 【OpenAI 面臨多州調查】從數據隱私到「模型諂媚」,監管壓力全面升級
-
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior
📌 LLM 的「自我報告」能預測行為嗎?心理測量學給出的新答案
-
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
📌 【新研究】SpatialClaw:讓 VLM 像工程師一樣,用「狀態化程式碼」解決 3D 空間推理
- 2026-06-12 · 週五 20 篇
-
anomalyco/opencode
📌 【開源新選擇】不想被商業 Copilot 綁定?OpenCode 打造跨平台 AI Coding Agent
-
anthropics/skills
📌 【Anthropic 開源】Claude 可插拔「Skills」讓 LLM 變身專業助理
-
browser-use/browser-use
📌 【GitHub Trending】Browser-Use:用 Rust 核心打造 LLM 瀏覽器自動化新標準
-
farion1231/cc-switch
📌 【GitHub Trending】一個工具管理所有 AI Agent:CC Switch 讓跨模型工作流變得簡單
-
Feature Stores from Scratch: A Minimal Working Implementation
📌 【從零實作】為什麼你的 ML 專案需要 Feature Store?從底層原理到 Minimal Implementation
-
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers
📌 【新架構解析】HYDRA-X:用單一 Vision Transformer 統一圖像與影片的 Tokenization
-
HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
📌 【新研究】告別瑣碎的 Step-wise 調用:HyperTool 重新定義 AI Agent 的工具執行粒度
-
karpathy/autoresearch
📌 【Andrej Karpathy 最新開源】AI 自動化研究:讓 LLM 變成你的 24 小時研究員
-
Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
📌 【Cohere 最新發佈】30B 參數卻僅需 3B 運算量:North Mini Code 專為 Agentic Coding 而生
-
mksglu/context-mode
📌 【GitHub Trending】解決 MCP 記憶崩潰:Context Mode 讓上下文空間減少 98%
-
MemTensor/MemOS
📌 【GitHub Trending】MemOS:為 AI Agent 打造的「記憶操作系統」,讓長期記憶不再是黑盒子
-
Moonshot AI Launches Kimi Work, a Local Desktop Agent Reportedly Running on Kimi K2.6 With a 300-Sub-Agent Agent Swarm
📌 【Moonshot AI 最新發表】Kimi Work:將 AI Agent 從雲端搬到桌面,實現本地化自動化工作流
-
See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
📌 【多代理協作新突破】別再傳文字了,讓 AI 之間直接交換 KV-cache
-
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
📌 【新研究】SpatialClaw:讓 AI Agent 像工程師一樣用 Python 進行 3D 空間推理
-
supermemoryai/supermemory
📌 【GitHub Trending】解決 AI 「金魚記憶」:Supermemory 打造 AI 的持久化記憶層
-
shuvonsec/claude-bug-bounty
📌 【GitHub Trending】將 LLM 整合進 Bug Bounty 流程:從偵察到報告的一站式 CLI 工具
-
WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation
由於您提供的資訊僅包含論文的標題、摘要以及評分理由,缺乏詳細的方法論、具體數據與實驗對照組,根據我的「撰寫原則」中「寧可少寫,也不要寫錯」與「不要臆測或捏造未提及的細節」之要求,我將採取「技術導向」但「精簡精準」的寫法。
-
vllm-project/vllm
📌 【開源部署神器】vLLM:讓 LLM 推理達到極限吞吐量的核心技術解析
-
WebChallenger: A Reliable and Efficient Generalist Web Agent
由於您提供的資訊目前僅包含論文標題、摘要與評分理由,缺乏具體的實驗數據、方法論細節以及作者名稱。為了維持「資深 AI 技術部落客」的專業度,我將採取「技術預覽與框架解析」的撰寫策略。
-
Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude
📌 Zyphra 推出 Zamba2‑VL:混合 Mamba2‑Transformer 視覺語言模型,首 token 延遲下降近十倍!
- 2026-06-11 · 週四 11 篇
-
comet-ml/opik
📌 【Comet‑ML 開源利器】Opik:讓生成式 AI 從原型跑到生產全程可觀測、可評估、可自動優化
-
Decart’s new world model can simulate hours of photorealistic driving — with some caveats
📌 【Decart 最新發佈】Oasis 3:將「世界模型」API 化,打造物理 AI 的開發生態系
-
dmtrKovalenko/fff
📌 【開源新工具】讓 AI Agent 搜尋檔案快到飛起:fff 檔案搜尋工具集
-
FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model
📌 【AI 醫療新突破】單一模型搞定胎兒超音波:從影像分析到臨床解釋,甚至能在手機離線執行
-
Guides
📌 【Simon Willison 實務指南】Coding Agent(Claude Code / Codex)最佳實踐全攻略
-
Local Agentic Programming on the Cheap: Claude Code + Ollama + Gemma4
📌 【Google DeepMind 最新研究】低成本實現 Local Agentic Programming:Claude Code + Ollama + Gemma 4
-
FareedKhan-dev/train-llm-from-scratch
📌 【GitHub Trending】從零手寫 Transformer,單卡就能跑「千萬參數」大語言模型!
-
pydantic/monty
📌 【pydantic 開源新作】Monty:Rust 打造的極速安全 Python 沙箱,讓 LLM 產生的程式碼即時執行!
-
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning
📌 TRACE:把多回合 LLM 代理的 Rollout 資源細分到樹狀前綴,讓 Reward 變得更「有對比」!
-
Sumanth077/Hands-On-AI-Engineering
📌 【GitHub 趨勢】從 RAG 到 Multi-Agent,一套實戰導向的 AI 工程實作指南
-
ryoppippi/ccusage
📌 【GitHub Trending】用 AI Agent 寫 Code 很快,但你的 Token 帳單還在掌控中嗎?
- 2026-06-10 · 週三 3 篇
-
google/skills
📌 【Google 官方開源】即時安裝的雲端與 AI 代理 Skills,讓 Agent 建置更上手!
-
NVIDIA/SkillSpector
📌 【NVIDIA 首發】AI 代理技能安全掃描器 SkillSpector 上線,你的 Agent 會被黑嗎?
-
Rethinking the Divergence Regularization in LLM RL
📌 【新方法】LLM 強化學習的「硬遮罩」到底卡住了什麼?
- 2026-06-09 · 週二 20 篇
-
Anthropic’s Claude Fable 5 is a version of Mythos the public can access today
📌 【Anthropic 最新發布】最強模型 Mythos 系列首度對外開放:Claude Fable 5 正式上線
-
anthropics/claude-code-security-review
📌 【Anthropic 最新開源】將 Claude 整合進 CI/CD:AI 驅動的自動化安全審查工具
-
AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents
由於您提供的資訊僅包含論文標題、摘要及評分理由,缺乏詳細的方法論、具體實驗數據與限制分析。為了遵守「寧可少寫,也不要寫錯」以及「不要臆測或捏造未提及細節」的最高原則,我將採取「技術導向」但「精簡聚焦」的寫法,將重心放在該研究解決的核心痛點(訓練效率與效能)以及其技術路徑(非同步 RL 與軌跡正規化)…
-
Ataraxy-Labs/sem
📌 【Ataraxy Labs】別再看 Line-by-line 了:讓 Git 進入「語義化」版本控制時代
-
Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents
📌 【深度解析】餘弦相似度在誤導我們?視覺語言模型的潛在空間真相
-
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning
由於您提供的資訊目前僅包含論文標題、摘要以及評分理由,缺乏詳細的方法論、實驗數據與具體限制。為了遵循「寧可少寫,也不要寫錯」以及「不要臆測或捏造」的專業原則,我將採取「技術導向的快訊解析」風格。
-
End-to-End Context Compression at Scale
📌 【HuggingFace 最新研究】上下文壓縮新突破:Latent Context LM 挑戰 KV Cache 的記憶瓶頸
-
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention
📌 【長文本推理突破】FlashMemory-DeepSeek-V4:用 Lookahead Sparse Attention 解決 GPU 記憶體壓力
-
Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API
📌 【Google 最新發佈】Gemini 3.5 Live Translate:打破「輪替式」翻譯,實現真正的連續語音流翻譯
-
How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces
📌 【Hugging Face 最新實踐】AI Agent 串接 Space:從影像生成到 3D 藝廊的自動化路徑
-
Latent Spatial Memory for Video World Models
📌 【新研究】打破像素重建瓶處:Latent Spatial Memory 讓 Video World Models 實現更高效的生成速度
-
OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning
由於目前提供的資訊僅包含論文標題與摘要,為了符合「資深 AI 技術部落客」不臆測、不捏造的原則,我將在維持專業深度的前提下,將重點聚焦於該研究提出的「首個基準 (Benchmark)」之意義,以及其揭露的「格式與內容權衡 (Format-Content Tradeoff)」這一核心技術洞察。
-
maziyarpanahi/openmed
📌 【開源醫療 AI】OpenMed:讓 1,000+ 個醫療模型在本地端運行,數據不再離開設備
-
Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text
📌 【新觀念】別再讓 AI 用文字思考:將圖像視為「推理載體」的 Optical Reasoning
-
Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data
由於提供的資訊僅包含論文標題與摘要,為了遵循「寧可少寫,也不要寫錯」以及「不要臆測或捏造」的原則,我將在分析中聚焦於該研究的核心貢獻——如何從 Base LLM 中激發其內在的評分能力,並將其轉化為針對 AI 工程師的技術分享。
-
shareAI-lab/learn-claude-code
📌 【GitHub 熱門】想打造 AI Agent?先搞懂「模型」與「外殼」的本質區別
-
Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning
由於目前提供的資訊僅包含論文標題與摘要,根據您設定的「資深 AI 技術部落客」規範,最核心的原則是「寧可少寫,也不要寫錯」以及「不要只讀摘要就開始寫」。
-
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks
📌 【新基準發佈】多模態 Agent 的空間推理能力,真的能應對現實世界嗎?
-
wanshuiyin/Auto-claude-code-research-in-sleep
📌 【GitHub 熱門】讓 AI 在你睡覺時做研究:ARIS 跨代理自動化研究工作流
-
thedotmack/claude-mem
📌 【GitHub Trending】讓 Claude Code 擁有「長期記憶」:claude-mem 持久化記憶壓縮系統
- 2026-06-08 · 週一 20 篇
-
Andyyyy64/whichllm
📌 【GitHub Trending】本地部署 LLM 該選哪個?用 whichllm 自動匹配最適合你的硬體
-
Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation
由於您提供的資訊目前僅包含論文標題與摘要,根據我的「撰寫流程」,在缺乏完整論文細節(如具體演算法、實驗數據、對照組設定)的情況下,若強行撰寫會導致嚴重的「過度簡化」或「脫離原文臆測」,這違反了我的技術準確性原則。
-
Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback
由於目前提供的資訊僅包含論文標題、摘要及核心概念,缺乏具體的實驗數據、方法論細節(如具體演算法流程)以及作者詳細背景,我將採取「技術導向但側重於概念解析」的寫法。
-
danny-avila/LibreChat
📌 【開源工具推薦】不再被單一 AI 綁架:LibreChat 打造你的全能 AI 終端介面
-
Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation
📌 將自回歸模型轉換為 Diffusion LM:用 On-Policy 蒸餾降低訓練成本
-
Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
📌 【新研究】擺脫 2D 填補限制:DIRECT 框架實現 3D 姿態可控的物件插入
-
Google Research Adds Agentic RAG to Gemini Enterprise Agent Platform with a Sufficient Context Agent for multi-hop queries
📌 【Google Research】解決企業搜尋的「多跳」痛點:Agentic RAG 讓 Gemini 具備推理能力
-
google/skills
📌 【Google 最新開源】讓 AI Agent 快速上手 Google Cloud 的「技能庫」來了
-
LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models
📌 【LayerRoute】讓 LLM 根據輸入自動「跳層」:在推理速度與品質間取得動態平衡
-
langchain-ai/deepagents
📌 【LangChain 最新開源】DeepAgents:讓 AI Agent 從「Demo」走向「生產環境」的即插即用框架
-
Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b
📌 【UIUC / UC Berkeley / Chroma 最新研究】讓 AI 專注於「決定」而非「記憶」:Harness-1 重新定義檢索代理架構
-
NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors
📌 【NVIDIA Garak】打造 LLM 紅隊測試工作流:從自動化掃描到自訂探針實作
-
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills
由於目前提供的資訊僅包含論文標題與摘要,我將採取「技術導向」的分析風格,將其核心概念(Self-Evolving / Trace-Derived Skills)轉化為開發者能理解的技術邏輯。
-
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents
📌 【新基準測試】AI Agent 記得住資訊,但真的理解「關係」嗎?
-
Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators
📌 【新框架 Astra】讓 AI 擁有「空間想像力」:結合世界模擬器提升視覺空間推理
-
Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation
📌 語音辨識不再只是「聽寫」,而是能透過對話自我修正的 Agent
-
Towards Retrieving Interaction Spaces for Agentic Search
📌 【RISE 框架】在大規模語料中,如何讓 Agentic Search 兼顧效率與精度?
-
WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark
由於您提供的資訊僅包含論文標題、摘要與評分理由,缺乏具體的實驗數據、方法論細節與具體模型表現。為了遵守「寧可少寫,也不要寫錯」以及「不要臆測或捏造未提及細節」的最高原則,我將採取「趨勢分析與問題意識」的撰寫方向。
-
Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs
📌 【小米最新研究】萬億參數模型突破 1000 TPS,商品級 GPU 也能跑出極速推理
-
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents
📌 【ToolMaze 基準測試】當 AI 工具失效時,你的 LLM Agent 真的能自救嗎?
- 2026-06-07 · 週日 20 篇
-
ashishpatel26/500-AI-Agents-Projects
📌 【GitHub 趨勢】500+ 個 AI Agent 實作案例,從框架選型到產業應用一次看
-
AstrBotDevs/AstrBot
📌 【GitHub Trending】AstrBot:將 LLM 轉化為多平台 Agent 的一站式開源框架
-
Crosstalk-Solutions/project-nomad
📌 【GitHub Trending】Project N.O.M.A.D.:打造一個永遠不會斷線的離線知識伺服器
-
InsForge/InsForge
📌 【開源新工具】InsForge:為 AI Coding Agent 打造的「全能後端基礎設施」
-
LLMs are eroding my software engineering career and I don't know what to do
📌 【職涯反思】AI 提升了開發速度,但會侵蝕工程師的「核心競爭力」嗎?
-
luongnv89/claude-howto
📌 【GitHub Trending】別讓 Claude Code 淪為聊天機器人:從基礎指令到 Agent 工作流的實戰指南
-
moorcheh-ai/memanto
📌 【GitHub Trending】不再只是被動檢索:Memanto 試圖定義 AI Agent 的「主動記憶」
-
Mythograph Atelier #1 - Abstract Art That Means Something to You
📌 【Build Small Hackathon】不再是隨機生成:讓 AI 創作出「對你有意義」的抽象藝術
-
NangoHQ/nango
📌 【GitHub Trending】整合 800+ API 的痛點,用 Nango 讓 AI 幫你寫完 Integration
-
New U.S. college grads now have higher unemployment than the average worker
📌 【勞動力市場觀察】大學學位不再是就業「緩衝區」:新鮮人的失業率首次反超平均值
-
Notion restores access to Anthropic after service disruption
📌 【Notion x Anthropic】服務中斷 12 小時,揭示 AI 整合的穩定性挑戰
-
openclaw/openclaw
📌 【開源新星】OpenClaw:將 AI 助手部署在自己的裝置上,橫跨 20+ 通訊軟體的私有助理
-
OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks
📌 【OpenAI 最新安全更新】面對 Prompt Injection 攻擊,新推出的 Lockdown Mode 能提供多少保護?
-
OpenAI is still working on that ‘super app’
📌 【OpenAI 戰略轉向】「Chat is dead」:ChatGPT 將進化為全能型 Super App
-
plastic-labs/honcho
📌 【GitHub Trending】讓 AI Agent 擁有「長期記憶」:Honcho 打造狀態化智能體的記憶基礎設施
-
Sem: New primitive for code understanding – not LSPs, but entities on top of Git
📌 【Ataraxy Labs 新工具】Sem:把 Git 差異升級到「函式」層級,AI 判讀精準度提升 2.3 倍!
-
Show HN: Lathe – Use LLMs to learn a new domain, not skip past it
📌 【開源新工具】用 AI 學習新領域,而不是讓 AI 代替你思考
-
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
📌 【Tokenomics 研究】Agent 寫 Code 到底花多少錢?量化分析 LLM 多代理系統的 Token 消耗
-
twentyhq/twenty
📌 【開源 CRM 新選擇】Twenty:把 CRM 當成軟體產品來開發與版本管理
-
withastro/flue
📌 【Astro 團隊新作】Flue:把 Claude Code 的體驗轉化為可程式化的 Agent 框架
- 2026-06-06 · 週六 53 篇
-
15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit
-
3 SpaCy Tricks for Efficient Text Processing & Entity Recognition
-
A Deep Dive into Calibration of Language Models: Platt Scaling, Isotonic Regression, Temperature Scaling
-
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
-
agentscope-ai/agentscope
-
[AINews] not much happened today
-
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
-
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution
-
Complexity-Balanced Diffusion Splitting
-
CopilotKit/CopilotKit
-
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management
📌 EvoDS:自我演化的資料科學代理人,會「學會新技能」嗎?
-
Five labs, five minds: building a multi-model finance drama on small models
📌 【HuggingFace 技術分享】用 4 家不同實驗室的小模型,打造一個會「內鬥」的金融模擬社會
-
Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency
-
Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory
-
Google will pay SpaceX $920M per month for compute
-
khoj-ai/khoj
📌 【開源工具推薦】Khoj:打造一個能索引所有私人文件的「AI 第二大腦」
-
How to Stop Shipping Low-Quality RL Environments (with Examples)
-
Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models
由於提供的資訊目前僅包含論文標題與摘要,缺乏詳細的實驗數據、具體算法流程及作者名單,我將採取「技術前瞻」的切入點。在不臆測細節的前提下,將重點放在 「如何透過幾何知識蒸餾(Geometric Knowledge Distillation)解決 MLLM 空間感知缺陷」 這一核心技術路徑上。
-
LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing
-
MemPalace/mempalace
-
microsoft/agent-framework
-
Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint
-
microsoft/VibeVoice
你以為「長語音一次辨識」只能靠大型雲服務?Microsoft 只要幾行程式碼,就把 60 分鐘完整音檔一次過轉寫,還能即時產生九種語言、11 種英語風格的語音。這到底是怎麼做到的?
-
microsoft/BitNet
-
Mira Murati steps back into the spotlight, carefully
-
New York lawmakers pass one-year ban on new data centers
-
NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes
-
NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time
📌 【NVIDIA 最新開源】600M 參數一次搞定 40 種語言的即時語音辨識
-
ogulcancelik/herdr
📌 【GitHub Trending】Herdr:在終端即時管理多 Agent 工作流
-
openai/codex
📌 【OpenAI 最新發佈】Codex CLI:將 AI 編程代理直接搬進你的本地終端機
-
openai/whisper
📌 【OpenAI Whisper】一次部署,搞定多語言語音辨識與翻譯
-
openclaw/openclaw
-
PaddlePaddle/PaddleOCR
-
Panniantong/Agent-Reach
-
Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing
-
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
-
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
-
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents
-
Running Python code in a sandbox with MicroPython and WASM
-
SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces
-
SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction
-
Shubhamsaboo/awesome-llm-apps
你還在每次新專案都從頭寫 RAG pipeline、Agent loop 或多模型整合嗎?
-
The most interesting startups right now want to get you off your phone
-
The latest AI news we announced in May 2026
-
The ‘together tech’ wave might be the most intriguing startup bet of 2026
-
The token bill comes due: Inside the industry scramble to manage AI’s runaway costs
-
This is your laptop… on AI
-
unslothai/unsloth
-
VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding
-
vllm-project/vllm-omni
-
withastro/flue
-
ZhuLinsen/daily_stock_analysis
-
World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis
- 2026-06-04 · 週四 30 篇
-
AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
📌 AAD-1:非對稱對抗蒸餾讓單步自回歸影片生成更穩定
-
Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents
📌 【HuggingFace Daily Papers】Agent libOS:為長時間運行的 LLM Agent 提供程序般的隔離與安全邊界
-
Apple approves Poke as the first AI agent on its Messages for Business platform
📌 Poke 成為首個獲准的 iMessage AI Agent
-
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
📌 AUDITFLOW: 可執行符號環境結構化財報驗證
-
BraveGuard: From Open-World Threats to Safer Computer-Use Agents
Safe Agent Guard
-
DayuanJiang/next-ai-draw-io
📌 Next AI Draw.io:用自然語言產出 draw.io 圖表的開源專案
-
Designing the hf CLI as an agent-optimized way to work with the Hub
📌 HF CLI 為 AI Agent 優化
-
Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
📌 Echo‑Infinity:可學習演進記憶讓即時無限影片生成成為可能
-
Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching
📌 Wide-Baseline Matching 與 MLLM 空間推理
-
Fission-AI/OpenSpec
📌 【Fission-AI】OpenSpec:AI 引導的規格工作流程
-
fathah/hermes-desktop
📌 【開源專案】Hermes Desktop:讓 Hermes Agent 變得易於使用的原生 GUI
-
Gemma 4 12B: A unified, encoder-free multimodal model
📌 【Google DeepMind】Gemma 4 12B:免編碼器的統一多模態模型
-
jundot/omlx
📌 【jundot/omlx】macOS 菜單列驅動的 LLM 推論工具,記憶體+SSD 分層快取讓本地模型實用
-
KVarN: Native vLLM backend for KV-cache quantization by Huawei
📌 【Huawei CSL】KVarN:vLLM 原生 KV-cache 量化,快取容量提升 3-5×
-
MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation
📌 MapAgent:工業級車道圖自動生成
-
Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning
📌 OpenJarvis:本地優先的個人AI代理框架
-
langgenius/dify
📌 Dify:開源 LLM 應用一站式平台
-
MemTrain: Self-Supervised Context Memory Training
📌 MemTrain:自監督記憶訓練框架
-
microsoft/mxc
你有沒擔心 LLM 產出的程式碼會不小心刪除檔案?
-
Miso Labs Releases MisoTTS: An 8B Emotive Text-to-Speech Model with Open Weights
📌 【Miso Labs 開放權重】8B 參數的 MisoTTS 能同時做到富情感與 110ms 超低延遲?
-
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?
📌 MMG2Skill:把網路教學變成 AI 能執行的技能,閉環學習是關鍵?
-
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI
📌 【NVIDIA 最新發布】Nemotron 3.5 Content Safety
-
NVIDIA/NemoClaw
📌 【NVIDIA】NemoClaw:在 OpenShell 沙箱中安全運行 AI Agent 的參考堆疊
-
Score-Control for Hallucination Reduction in Diffusion Models
📌 Score-Control for Hallucination Reduction in Diffusion Models
-
Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning
📌 Stable-Layers:用 VLM 評分的強化學習微調圖層分解模型
-
Streaming Communication in Multi-Agent Reasoning
📌 StreamMA:串流中間結果提升多智能體推理
-
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning
📌 ThoughtFold:內省偏好摺疊推理
-
Token Budgets: An Empirical Catalog of 63 LLM-Agent Budget-Overrun Incidents, with an Affine-Typed Rust Mitigation as a Case Study
📌 Token Budgets:編譯時保障 LLM 成本
-
vercel-labs/agent-browser
你有沒試過讓 AI 自己操作瀏覽器?現在只要一行指令,就能呼叫一個用 Rust 寫成的瀏覽器自動化工具。
-
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
📌 【HuggingFace Daily Papers】深度研究代理錯誤何處發生?Span‑Level 定位方法解析
- 2026-06-03 · 週三 30 篇
-
Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning
📌 Agentic Chain-of-Thought Steering:讓 LLM 思考更省 token 且可控
-
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models
📌 AutoMedBench:醫學 AutoResearch 的新基準,驗證環節成最大瓶頸
-
[AINews] Microsoft Build: MAI-Thinking-1 and MAI Family models
📌 Microsoft Build 發表 MAI 系列模型
-
Benchmarking Visual State Tracking in Multimodal Video Understanding
📌 Benchmarking Visual State Tracking in Multimodal Video Understanding
-
Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching
📌 Bootstrap Your Generator:無配對訓練的流匹配視覺編輯
-
datawhalechina/hello-agents
📌 【DatawhaleChina 開源】Hello-Agents:從零開始構建真正的 AI Native 智能體
-
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging
📌 【衝突感知的分散式指令調優】
-
Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation
📌 解耦殘差擴散模型提升圖像翻譯
-
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces
📌 論文診斷:長鏈思考有害續段
-
Direct Preference Optimization Beyond Chatbots
📌 【HuggingFace Blog】DPO 降低 OCR 文字退化率,平均下降 59.4%
-
Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop
📌 Gemma 4 12B:Encoder‑Free 多模態模型,可在 16 GB 筆電運行
-
heygen-com/hyperframes
📌 HyperFrames:讓 AI 程式碼助手直接生成影片
-
Guides
📌 Simon Willison 的《Guides》: Coding Agents 實戰模式彙總
-
HKUDS/Vibe-Trading
📌 HKUDS 推出 Vibe-Trading:LLM 驅動的個人交易代理,一行指令即可擴充完整交易功能
-
How to Fine-Tune LFM2 Using QLoRA and DPO: A Complete Step-by-Step Coding Tutorial on Google Colab
📌 【MarkTechPost】QLoRA + DPO 微調 LFM2 完整教學
-
Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking
📌 Humanoid‑GPT:透過億級運動資料與 GPT 架構實現零樣本運動追蹤
-
interviewstreet/hiring-agent
📌 【開源】Hiring Agent:AI 履歷評分
-
KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks
📌 KVarN:免校準的 KV‑Cache 量化,讓推理過程中的誤差不再累積
-
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
📌 多模態LLM的視覺偏見與對策
-
NousResearch/hermes-agent
📌 Hermes Agent:自學習多模型 AI 代理
-
NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation
📌 【NVIDIA】OmniDreams即時模型
-
NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation
📌 【NVIDIA 最新發布】Cosmos 3:統一物理推理、世界生成與動作生成的雙塔 MoT 模型
-
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
📌 OmniOPD: Logit‑Free On‑Policy Distillation via Speculative Verification
-
OCC-RAG: Optimal Cognitive Core for Faithful Question Answering
📌 OCC-RAG:緊湊任務專用模型擊敗巨型LLM,實現更忠實的多跳問答
-
QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards
📌 QUBRIC:共同設計查詢與評分規則
-
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling
📌 Small RL Controller, Large Language Model: RL‑Guided Adaptive Sampling for Test‑Time Scaling
-
TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL
📌 TRON:無限視覺推理環境
-
Uber's $1,500/month AI limit is a useful signal for AI tool pricing
📌 Uber 設定每月 1,500 美元 AI 工具上限:企業 AI 成本管理的參考指標
-
Value-Aware Stochastic KV Cache Eviction for Reasoning Models
📌 【Value-Aware Stochastic KV Eviction】提升推理模型在壓縮下的準確度
-
yikart/AiToEarn
📌 AiToEarn:一人公司的 AI 行銷智能體
- 2026-06-02 · 週二 30 篇
-
agentgateway/agentgateway
📌 【AgentGateway】統一代理解決方案
-
[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark
📌 【AINews】NVIDIA Cosmos 3、Nemotron 3 Ultra 與 RTX Spark 一次看
-
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks
📌 【TASTE】讓 Agent 基準測試自己進化:更難、更全面的任務產生法
-
Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform
📌 【阿里巴巴 Qwen 團隊】Qwen3.7-Plus 上線:多模態理解+自主迭代代理能力
-
chopratejas/headroom
📌 Headroom:可逆的 AI Agent 上下文壓縮層
-
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
📌 【Domino】解耦因果建模與自回歸草稿,提升 LLM 推理速度
-
Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization
📌 BiDPO 改善複雜圖文生成
-
Draft-OPD: On-Policy Distillation for Speculative Draft Models
📌 【Draft-OPD】草稿模型的政策蒸餾
-
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers
📌 EVA01:透過 Mixture-of-Transformers 實現原生 3D Mesh 認知與生成
-
Holo3.1: Fast & Local Computer Use Agents
📌 Holo3.1:本地電腦代理
-
koala73/worldmonitor
你以為即時全球風險只有付費商業平台才能看得到?這個開源專案讓筆電就能擁有多層地圖、AI 新聞摘要與跨域訊號關聯,隨時掌握軍事、經濟、災害與金融的動態。
-
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
📌 Harness-1:狀態外掛強化學習提升檢索
-
JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines
📌 JetBrains 釋出 Mellum2:12B 參數 MoE 模型,專為軟體工程管線設計
-
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation
📌 MCP‑Persona:模擬個人環境測試 LLM Agent
-
Microsoft’s Project Solara is an OS for AI agent gadgets
微軟在 Build 2026 宣布推出「Project Solara」,一個專門為 AI 代理 gadget 設計的作業系統,竟然選擇了 Android 而非 Windows 作為基礎。兩款概念裝置——外觀類似 Echo Show 的桌面裝置與可穿戴識別證——同時展示了人臉解鎖、指紋掃描、即時語音轉…
-
mksglu/context-mode
📌 【mksglu/context-mode】讓 LLM Agent 不再被工具回傳的原始數據淹沒
-
Microsoft offers devs a better way to control AI agent behavior
📌 微軟發布 AI 行為管控新標準
-
New Microsoft tool lets devs spin up AI behavior tests using text descriptions
📌 微軟開源 ASSERT 工具介紹
-
NVIDIA/OpenShell
📌 【NVIDIA 最新開源】OpenShell:為自主 AI Agent 打造的安全沙箱
-
OpenAI launches new Codex tools for white-collar work
📌 Codex 新增六大職場插件
-
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
📌 OpenWebRL:在真實網站上用線上多回合 RL 訓練視覺網頁代理
-
SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
📌 SafeSteer:局部安全蒸餾,低資源對齊LLM
-
Policy and World Modeling Co-Training for Language Agents
📌 【Policy and World Modeling Co‑Training】語言代理訓練新框架:無額外計算開銷提升效能
-
Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents
你有沒有想過,同一套 Agent 技能在不同的 LLM 上會表現完全不同?
-
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement
📌 【HuggingFace Daily Paper】StressDream:引導視覺世界模型產出高衝擊且合理的未來
-
TinyFish Launches BigSet: An Open-Source Multi-Agent System That Builds Structured Live Datasets from Plain-English Descriptions
📌 BigSet:用英文描述即時建構結構化資料集
-
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation
📌 TVIR:多模態報告生成的事實與視覺雙重提升
-
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
📌 VideoMLA:低階 KV 快取縮減記憶體,提升分鐘級視訊生成
-
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
📌 When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
-
τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation
📌 τ0-WM:統一影像‑動作世界模型
- 2026-06-01 · 週一 30 篇
-
can1357/oh-my-pi
📌 oh‑my‑pi:終端機IDE編程助手
-
dMoE: dLLMs with Learnable Block Experts
📌 dMoE:讓 Diffusion LLM 與 MoE 更好地協同工作
-
dmtrKovalenko/fff
📌 fff:AI 代理也愛的超快檔案搜尋工具包
-
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization
📌 【HuggingFace Daily Papers】DRIFT:透過解耦 Rollout 與重要性加權微調,達成近似 RL 的多輪對話學習效率
-
elder-plinius/OBLITERATUS
📌 OBLITERATUS:一鍵解鎖 LLM 拒絕行為
-
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
📌 LLM 代理人產生隱語,規避人類監控
-
Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly
📌 Flat‑Pack Bench 評估 VLM 時空推論
-
Florida sues OpenAI, Sam Altman, in first-of-its-kind lawsuit over violent incidents
📌 【Florida 首起州級訴訟】OpenAI 與 Sam Altman 被指 ChatGPT 與暴力事件有關
-
From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors
你以為防禦單步提示注入就夠了?研究顯示,攻擊者現在能把惡意提示藏在多個操作步驟中,繞過現有的防禦機制。
-
GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models
📌 【GDSD】用強化學習引導去噪自蒸餾,提升擴散語言模型
-
GrepSeek: Training Search Agents for Direct Corpus Interaction
📌 【GrepSeek】直接操作語料庫的搜尋代理訓練方法
-
Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization
📌 GCPO:對比式 token 級信用分配
-
iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning
📌 【iVGR】RL 內化視覺推理,提升細粒度感知
-
Launch HN: Expanse (YC P26) – Unlock Wasted GPU Capacity
📌 Expanse:讓 HPC/GPU 集群的實際利用率從 30% 飆升至 70%+
-
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
📌 【LongTraceRL】長上下文推理的新訓練法
-
jimuzhe/tiez-clipboard
📌 Tiez‑clipboard:跨裝剪貼簿的 Rust/Tauri 實作
-
Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent
📌 Meet Memory OS:一個基於 Hermes Agent 的六層開源記憶堆疊
-
MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
📌 MiniMax M3:1M‑Token 視窗+原生多模態
-
Nvidia Cosmos 3
📌 NVIDIA Cosmos 3:統合物理推理、世界生成與動作生成的開放基礎模型
-
RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video
📌 RayDer:統一 Feed‑Forward Transformer 讓真實世界影片也能自監督 Novel View Synthesis
-
Representation Forcing for Bottleneck-Free Unified Multimodal Models
📌 Representation Forcing 讓統一多模態模型不再需要外部潛在空間,同時兼顧感覺與生成
-
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
📌 SAAS:讓 AI Agent 更有自我覺察,減少無謂搜尋
-
ruvnet/ruflo
📌 【ruvnet/ruflo】為 Claude Code 加入自學習多智能體協作框架
-
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer
📌 【SANA-Streaming】即時高解析度影片編輯新架構
-
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue
📌 SwanVoice:零樣本長篇多说话者語音合成的新架構
-
TauricResearch/TradingAgents
你以為用單一大模型就能搞定量化交易?TradingAgents 讓 GPT‑5、Gemini、Claude 等多種模型輪流擔任研究經理、交易員與投資組合經理,一套框架即可完成從情緒分析到決策紀錄的全流程。
-
The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement
📌 SAVE 框架:RLHF 的另一面
-
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
📌 SwanSphere 串流空間聲音生成
-
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
📌 【NVIDIA Cosmos 3】首個開放 omni‑model 統一世界生成、物理推理與動作生成
-
When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models
📌 【新論文】Suffix‑Anchored Confidence Modulation 改善非自回귀擴散語言模型的過早解碼問題
- 2026-05-31 · 週日 17 篇
-
Best Text-to-Speech TTS Models in 2026: A Benchmark-Based Comparison
📌 2026 TTS 模型排行榜
-
Build Skill-Augmented AI Agents with SkillNet for Search, Evaluation, Graph Analysis, and Task Planning
📌 SkillNet 教學:打造能自行搜尋、評估與組合技能的 AI Agent
-
Comfy-Org/ComfyUI
📌 【Comfy-Org】ComfyUI:節點圖形化 AI 創作引擎
-
EY Canada published a cybersecurity report and most citations were hallucinated
📌 【EY Canada 報導】網安報告引用多為幻覺?
-
Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evaluation
📌 【Genesis AI】Genesis World 1.0:機器人模型評估提速100倍
-
Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4
📌 Hermes Agent Ships Tool Search for MCP: 49‑74% Accuracy Gain on Opus 4
-
How to Use AgentTrove: Streaming 1.7M Agentic Traces and Building a Clean ShareGPT SFT Dataset in Python
📌 AgentTrove 教學:用 Python 串流處理 170 萬筆 Agentic Trace,快速產出 ShareGPT SFT 資料集
-
I went looking for the AI weed vape that gives you Bitcoin for smoking
📌 【The Verge 調查】AI 大麻電子菸聲稱每口可賺比特果真嗎?
-
jamwithai/production-agentic-rag-course
📌 jamwithai Production‑Agentic‑RAG 課程:從零打造 arXiv 論文研究助理
-
mattpocock/sandcastle
📌 Sandcastle AI
-
nesquena/hermes-webui
📌 【GitHub Trending】Hermes Web UI:瀏覽器版自主代理介面
-
OpenRouter raises $113M Series B
📌 OpenRouter 募得 1.13 億美金
-
nicobailon/pi-subagents
📌 【nicobailon】pi-subagents:讓 Pi 能派遣專注子代理協作
-
SoftBank says it will invest up to €75 billion to build French data centers
📌 SoftBank FR DC
-
supermemoryai/supermemory
🧠 Supermemory:開源 AI 記憶與上下文引擎,基準測試奪冠
-
To have a moral stance on AI is to be an outcast, and it sucks
📌 【Hacker News 熱文】反對 AI 卻成為局外人,這種感受真的很苦
-
Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain
📌 Trajectory 發布 Concurrent Multi‑LoRA 訓練堆疊,實驗吞吐提升 2.81×
- 2026-05-30 · 週六 19 篇
-
After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M
📌 Groq 募資 $6.5 億
-
[AINews] Founders and Forward Deployed Engineers
📌 【AINews】Founders 與 Forward Deployed Engineer 招募+Claude Opus 4.8 最新評估
-
anomalyco/opencode
📌 【開源專案】anomalyco/opencode:多語言 AI 編碼代理,安裝便利
-
Anthropic surpasses OpenAI to become most valuable AI startup
你以為 OpenAI 仍是 AI 一哥?最新融資顯示,估值已被 Anthropic 超過,且逼近 1 兆美元。
-
As the browser wars heat up, here are the hottest alternatives to Chrome and Safari in 2026
-
Coders are refusing to work without AI — and that could come back to bite them
📌 開發者離不開 AI?這習慣可能反咬一口
-
I put Google’s 24/7 AI assistant Gemini Spark to work, and it’s actually pretty useful
📌 Google Gemini Spark 24/7 AI 助手實測:真能幫你省時?
-
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
📌 CONF‑KV:不確定性驅動的 KV 快取管理
-
Meta is reportedly developing an AI pendant
📌 【傳聞】Meta 正研發 AI 掛飾,可望延續 Limitless 遺產
-
Notes from the Mistral AI Now Summit
📌 Mistral Summit
-
Does your CEO have AI psychosis? Aaron Levie thinks most of them do.
📌 【TechCrunch Equity 播客】「AI psychosis」:CEO 真的被 AI 迷惑了嗎?
-
Liquid AI reveals 8B-A1B MoE trained on 38T
📌 LFM2.5-8B-A1B:上下文擴至128K,詞彙倍增的邊緣推理模型
-
PANDO: Efficient Multimodal AI Agents via Online Skill Distillation
📌 PANDO:透過線上技能蒸餾提升多模態網頁代理效率
-
Proposed new US funding rules: We can cancel any grant at any time
📌 美國聯邦資助新規:peer review 將被政治任命者蓋過,任何補助金皆可隨時被取消
-
Reducing Political Manipulation with Consistency Training
📌 RL一致性訓練減政治偏見
-
Show HN: Open-source private home security camera system (end-to-end encryption)
📌 Secluso:開源私密監控
-
The Regulatory Frontier
📌 AI 監管前線:分鐘搞定合規
-
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
你以為模型判斷物體遠近是靠真實的深度線索?其實它可能只是看物體出現在畫面的上方或下方。
-
‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs
📌 GitHub Copilot 改收費,開發者哀號
- 2026-05-29 · 週五 28 篇
-
9 demos of Gemini Omni and Gemini 3.5 in action
📌 【Google AI Blog】Gemini Omni 與 Gemini 3.5 九則實際應用示範
-
AdaState: Self-Evolving Anchors for Streaming Video Generation
📌 AdaState:自我演進錨點讓影片生成更動態
-
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
📌 AgentDoG 1.5:輕量級且可擴展的 AI Agent 安全對齊框架
-
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
📌 【Alignment Tampering】RLHF 可能被模型『玩弄』,對齊安全再受考驗
-
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning
📌 3D 先驗提升 VL 模型幾何推理
-
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
📌 【HuggingFace Daily】DynaFLIP:三模態動態引導的機器人感知預訓練
-
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
📌 【HuggingFace Daily Papers】LaRA:逐層表徵分析偵測 RL 後訓練 LLM 的資料污染
-
Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments
📌 Learning A Unified Risk Map for Autonomous Driving in Partially Observable Environments
-
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists
📌 CausaLab:評估 LLM 因果發現的新環境
-
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
📌 【minWM】將雙向視頻擴散模型轉為實時互動世界模型的開源框架
-
Native Audio-Visual Alignment for Generation
📌 Native Audio-Visual Alignment 提升多模態生成同步性
-
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources
📌 OmniRetrieval:一個統一的檢索調度器,跨越異質知識來源
-
OpenBMB/VoxCPM
📌 OpenBMB 發布 VoxCPM2:Tokenizer‑Free 多語言 TTS
-
opendatalab/MinerU
📌 【opendatalab】MinerU:多格式文件解析工具,一鍵轉 Markdown/JSON
-
ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage
📌 ORACLE:從部分 App 使用軌跡預判詐騙的即時推理框架
-
PaddlePaddle/PaddleOCR
📌 PaddleOCR:開源 OCR 驅動 LLM 時代
-
PhoneWorld: Scaling Phone-Use Agent Environments
📌 PhoneWorld:將真實手機操作轉換為可擴展的 Agent 評估環境
-
millionco/react-doctor
📌 【millionco/react-doctor】靜態掃描 React 代碼,幫 AI 程式員找出問題
-
PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions
📌 物理感知的4D人物互動生成
-
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
📌 【實時 LLM 推理】單請求達 3k token/s 標準 GPU 可行
-
ryoppippi/ccusage
📌 ccusage:AI 編程助手用量統計
-
The internet is being rebuilt for machines
📌 AWS 推出專為 AI Agent 設計的 OpenSearch Serverless
-
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
你以為 RL 代理只需要學會一套技能就能應付所有情境?
-
Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
📌 Thinking Before Constraining: 一種統一解碼框架提升 LLM 推理與格式輸出
-
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation
📌 Uniform Diffusion Models Revisited:Leave-One-Out 去噪器與吸收態重新表述
-
UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering
📌 UniSteer:文字引導激活流匹配
-
YoCausal: How Far is Video Generation from World Model? A Causality Perspective
📌 【YoCausal】視訊生成模型的因果理解有多遠?
-
Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection
當異常檢測遇上視語模型,卻只需極少參數?這項研究提出一種參數效率高的視語模型,搭配全新的自然語言理由基準,在多個時間序列資料集上展現出更佳的表現與泛化能力。
- 2026-05-28 · 週四 29 篇
-
adithya-s-k/omniparse
📌 OmniParse 多模資料結構
-
A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System
📌 pgvector 實戰:混合向量搜尋系統
-
Advancing Creative Physical Intelligence in Large Multimodal Models
你有沒有想過,讓 AI 看圖解難題時,總是愛編故事?這篇研究指出,只要訓練時讓模型更看重「看見的事實」,創意問題解決能力就能顯著提升。
-
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
We need to produce a Facebook tech blog post based on given info only. No speculation. Must follow the guidelines: title area, hook, background, resea…
-
AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions
📌 AgentHijack:為電腦使用代理人建立系統化穩健度基準
-
anthropics/claude-code
📌 【Anthropic 新推出】Claude Code:終端機中的 agentic 編程助手
-
apurvsinghgautam/robin
📌 Robin:LLM 驅動的暗網 OSINT 工具
-
ariadng/metatrader-mcp-server
🎣 你是否曾想過,只用一句話就讓 AI 替你下單?MetaTrader MCP Server 讓這個想法變成現實。
-
Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings
📌 【Clark Hash】32× 壓縮向量嵌入,相似度不打折
-
DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes
📌 DenoiseRL:從錯誤軌跡學習,抗噪推理
-
Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution
📌 Scale-Invariant Diffusion:一個模型同時負責圖像生成與連續超解析度
-
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players
📌 Gamma-World:多智能體世界模型
-
Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
你以為單獨測試LLM就夠安全?在多智能體社交互動中,隱私洩漏的風險竟被標準評估嚴重低估。
-
iOfficeAI/AionUi
-
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
📌 【HuggingFace Daily Papers】SAERL:以稀疏自編碼器引導 LLM 資料工程
-
microsoft/RAMPART
📌 【Microsoft 開源】RAMPART:為 Agentic AI 打造的 pytest-native 安全測試框架
-
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning
📌 View Dropout 與全景視覺思維:提升跨視角空間推理
-
Models That Know How Evaluations Are Designed Score Safer
📌 Models That Know How Evaluations Are Designed Score Safer
-
Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
📌 【強化學習+多Token預測:係數校準】
-
openai/codex
📌 Codex CLI 本地使用
-
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration
📌 結構化多模態驗證:OmniVerifier-M1
-
OpenMOSS/MOSS-TTS
📌 OpenMOSS 釋出 MOSS‑TTS v1.5 與 SoundEffect v2.0
-
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
📌 OSP-Next:稀疏序列平行+HiF8 量化+強化學習,讓文字轉影片更省算力
-
PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective
【PEFT-Arena】穩塑性視角下的參數效率微調
-
Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate
📌 Perplexity 開源 Unigram 分詞器 提速 5 倍
-
ResearchMath-14K: Scaling Research-Level Mathematics via Agents
📌 ResearchMath-14K:用 Agent 生成的推理軌跡提升數學推理能力
-
Tweaking Local Language Model Settings with Ollama
📌 Ollama 本地模型調參實戰指南
-
Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules
你是否曾為訓練巨大的 diffusion model 而記憶體爆炸發愁?一種新的區塊訓練方式聲稱,只要把網路切成塊,記憶體需求就能降至原本的 1/B,而不犧牲效能。
-
Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization
📌 【Verus-SpecGym】評估 LLM 規範自動形式化的代理環境
- 2026-05-27 · 週三 29 篇
-
anthropics/skills
📌 【Anthropic 官方範例】Claude 的「技能」系統到底能做什麼?
-
Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows
📌 多步驟 AI 工作流幻覺審計
-
Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs
📌 Claude Code 進階使用指南
-
D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
📌 D²‑Monitor:透過猶豫偵測的動態安全監控,為擴散式 LLMs 提供輕量防護
-
Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning
📌 單一錨點正規化:多源視覺推理的新思維
-
ElevenLabs’ new music-generation model can switch genres mid-track
📌 ElevenLabs releases Music v2 – genre‑switching and section‑level editing for AI‑generated songs
-
Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement
📌 Efficient Agentic Reinforcement Learning with On‑Policy Intrinsic Knowledge Boundary Enhancement
-
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
📌 Geometry‑Aware Representation Denoising for Robust Multi‑view 3D Reconstruction
-
DeepSWE: A contamination-free benchmark for long-horizon coding agents
📌 DeepSWE:無污染的長時程編碼基準,讓前沿模型差距一目了然
-
DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs
📌 DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs
-
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning
你以為讓模型「想圖」就能解決跨視角空間推理嗎?實際上,它們常常直接忽略那些中間圖像,只靠文字推理。這篇論文提出了一個簡單的訓練技巧,強制模型真的去看它自己畫的圖。
-
harry0703/MoneyPrinterTurbo
你以為製作短影片要花費數小時?這個開源專案讓你只需輸入一句關鍵字,就能自動產出完整影片。
-
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM
📌 ITBench-AA 基準測試
-
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
📌 LongAV-Compass:統一評估分鐘級音視訊生成的新基準
-
Meet EAGLE 3.1: The Speculative Decoding Algorithm That Fixes Attention Drift in LLM Inference
📌 EAGLE 3.1:修正注意力漂移的推理加速算法
-
MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters
📌 【NUS·MIT·ASTAR 聯手】MEMO:不動 LLM 參數,單獨訓練記憶模型注入新知識
-
langfuse/langfuse
📌 Langfuse:開源 LLM 工程平台
-
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research
📌 MobileGym:可驗證且高度平行的行動 GUI 代理研究平台
-
MobileMoE: Scaling On-Device Mixture of Experts
📌 MobileMoE:手機端稀疏專家模型新突破
-
NVIDIA-NeMo/Megatron-Bridge
📌 NVIDIA NeMo Megatron Bridge:大模型一鍵轉換與微調橋梁
-
NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code
📌 【NVIDIA 最新研究】Polar:無需修改即可在多種 Code Agent 上做 GRPO 訓練
-
Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs
📌 PIPO:雙端統一提升效率
-
rowboatlabs/rowboat
📌 【rowboatlabs/rowboat】開源 AI 同事:把工作變成知識圖譜,隨時在你的電腦上幫你完成任務
-
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration
📌 Soap2Soap:多智慧體協同的長影片視訊生成
-
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
📌 【HuggingFace Daily Papers】條件擴散模型於多模態LLM,提升主題驅動圖像生成
-
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents
📌 QUACK:多模態社交代理知識審計
-
Stability AI Releases Stable Audio 3: A Family of Fast Latent Diffusion Models for Audio Generation and Editing
📌 【Stability AI】Stable Audio 3:高壓縮 latent diffusion 模型
-
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
📌 VitaBench 2.0:長期使用者互動中的個人化與主動型代理評估基準
-
unclecode/crawl4ai
📌 Crawl4AI:LLM 友善網頁爬蟲
- 2026-05-26 · 週二 30 篇
-
AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond
📌 【AnyScene】任意 BEV 佈局即可生成高度可控駕駛場景
-
alpic-ai/skybridge
📌 Skybridge 框架介紹
-
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution
📌 Sobolev 對齊提升 SR 結構忠實度
-
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
📌 CUA‑Gym:可擴展驗證環境
-
Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Reward Scoring, and GRPO Export
📌 【Open-MM-RL 教學】打造完整多模態 RLVR 流程
-
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
📌 【HuggingFace Daily Papers】Directional Alignment:用幾何投影對抗 RLHF 中的獎勵黑客
-
dograh-hq/dograh
你以為語音 Agent 必須依賴付費 SaaS?Dograh 讓你兩分鐘內就能跑出自己的 bot,而且完全開源、自架。
-
Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints
📌 Decoupling MARL
-
Geometry-Aware Image Flow Matching
-
DRScaffold: Boosting Dense-Scene Reasoning in Lightweight Vision Language Models
📌 【DRScaffold】輕量 VLMs 也能勝任密集場景推理?
-
earendil-works/pi
📌 【earendil-works/pi】自擴展編程代理
-
Guides
📌 【Simon Willison】Guides:使用 Claude Code 與 OpenAI Codex 的實務模式
-
How Far Will They Go? Red-Teaming Online Influence with Large Language Models
-
How we contain Claude across products
📌 【Anthropic Engineering】如何在多產品中限制 Claude 的 blast radius
-
Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs
📌 OmniVoice Studio:本地開源 ElevenLabs 替代方案
-
modelscope/FunASR
📌 【ModelScope FunASR】語音識別速度提升 170 倍,一行程式碼即可完成 VAD、ASR、說話人分離與情感偵測
-
On-Policy Adversarial Flow Distillation for Autoregressive Video Generation
-
NangoHQ/nango
📌 【NangoHQ】用 AI 產生整合程式碼,800+ API 觸手可及
-
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion
📌 【Pantheon360】用 3D‑aware diffusion 生成高保真 360° 影片,打造數位雙胞胎
-
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
📌 ParaVT:平行工具調用提升長影片理解
-
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
📌 Reinforcing Few-step Generators via Reward‑Tilted Distribution Matching
-
Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning
📌 Prism:一個即插即用的可重複基礎設施,讓多模態持續指令調整研究更具擴展性
-
shareAI-lab/learn-claude-code
📌 Claude Code 實作指南
-
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking
We need to produce a Facebook tech blog post according to the given guidelines, using only the provided info. Must not hallucinate details not present…
-
Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving
📌 Together AI 開源 OSCAR:注意力感知的 2‑bit KV Cache 量化,緩解長文本 LLM 記憶體壓力
-
Towards Customized Multimodal Role-Play
📌 Towards Customized Multimodal Role‑Play:讓 AI 角色在文與圖中保持一致
-
Sundar Pichai on AI, the future of search, and what’s happening to the web
📌 【Sundar Pichai 最新訪談】AI 搜尋改變網路生態?Google Zero 再度引發討論
-
Using AI to write better code more slowly
📌 Using AI to write better code more slowly
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
📌 WBench:互動影片世界模型多輪評測基準
-
Your Embedding Model is SMARTer Than You Think
📌 Your Embedding Model is SMARTer Than You Think
- 2026-05-25 · 週一 30 篇
-
AlexsJones/llmfit
📌 【GitHub Trending】llmfit:終端機工具,讓 LLM 精準匹配你的硬體
-
ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning
📌 【USTC + Alibaba + NUS】ARES:自動合成題目專用評分規則,讓 LLM 強化學習規模化
-
Best Authentication Platforms for AI Agents and MCP Servers in 2026
📌 MCP 認證平台 2026 評比
-
Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment
📌 跨模知識編輯:對抗子空間對齊
-
Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
📌 【Co-ReAct】讓評分規則成為 ReAct 代理的即時夥伴,步驟級導引提升多步驟推理品質
-
Design and Report Benchmarks for Knowledge Work
📌 【Harvard 等最新研究】如何設計真正反映知識工作的 AI 基準?
-
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
你以為 AI 能耗只看單次推理就夠了?在複雜的 Agent 工作流中,這種算法可能嚴重低估真實成本。
-
ETCHR: Editing To Clarify and Harness Reasoning
📌 ETCHR:圖像編輯助力多模態推理,免重訓即插即用
-
EVE-Agent: Evidence-Verifiable Self-Evolving Agents
📌 【Fujitsu/東大/RIKEN】EVE-Agent:讓自我演化代理有「證據」可驗證
-
Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation
📌 【Tencent × HKUST(GZ)】當推薦模型變大時,表示卻可能變小?—— RankElastor 如何阻止 Embedding Collapse
-
Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving
📌 Fast-dDrive:區塊擴散VLA讓自動駕駛更快更準
-
From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning
📌 【中科大+阿里巴巴】個人化 Agentic RL 框架:PARPO 與 PSGM
-
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
📌 【Fudan University & Microsoft Research 最新研究】讓 AI 自動產出「經驗技能」真的是雙刃劍嗎?
-
From Head to Tail: Asymmetric Knowledge Transfer in Long-tail Recommendation with Generative Semantic IDs
📌 【阿里巴巴】From Head to Tail:語義ID提升長尾推薦
-
GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models
📌 GENSTRAT:用程式產生的卡牌遊戲評測 LLM 的戰略推理
-
Geo-Align: Video Generation Alignment via Metric Geometry Reward
你以為只要有足夠的合成資料,AI 就能完美複製真實世界的鏡頭運動?最新研究指出,單靠監督式微調在真實場景下仍會失準,而一種基於幾何度量的強化學習卻能讓模型自行校正。
-
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
📌 【多校聯手】Good Token Hunting:視覺幾何Transformer減磅85%
-
Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning
📌 【上海工程科技大學等機構聯手】人類回圈多代理框架如何變換呼吸機決策支援?
-
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
📌 【UC Berkeley/Google】Inductive Deductive Synthesis:AI 生成可形式驗證的分散式系統
-
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval
你以得更大的模型一定代表更好的搜尋?微軟證明,只要用對方法,小模型也能贏過大模型,而且快 27 倍。
-
MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection
📌 【Chinese Academy of Sciences 最新研究】MemAudit:事後追查記憶中毒的因果圖譜
-
Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
📌 Metacognition-as-Reward:讓模型學會思考思考
-
Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions
📌 【Apple 研究】用詞彙替換就能讓低資源語言模型變快 2 倍?
-
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
📌 PGT:幾何圖像強化模型理解
-
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
📌 【NVIDIA 最新研究】PiD:像素擴散解碼,秒級生成 8K 圖像
-
PhotoFlow: Agentic 3D Virtual Photography Missions
📌 【SJTU 等最新研究】PhotoFlow:語言條件下的虛擬攝影代理
-
OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations
📌 【清華大學 & 阿里巴巴 Qwen】OnePred:用「遞歸意圖記憶」實現多輪對話下一查詢預測
-
Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks
📌 【北京交通大學等最新研究】長文本LLM的致命盲點:位置失效導致推理驟降
-
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
📌 SkillOpt:可控的文本空間優化器,讓 Agent 技能自我進化
-
When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems
📌 當執行正確卻仍失敗:LLM 多代理規劃的認知校準
- 2026-05-24 · 週日 15 篇
-
cheahjs/free-llm-api-resources
📌 【免費 LLM API 資源整理】cheahjs 的 GitHub Trending 重點速覽
-
Constraint Decay: The Fragility of LLM Agents in Back End Code Generation
📌 Constraint Decay:LLM 代理在後端程式產生中的結構脆弱性
-
DeepSeek 的 10 万亿美元大战略
📌 DeepSeek 的 10 万亿美元大战略
-
farion1231/cc-switch
📌 【GitHub Trending】cc-switch:一鍵切換 Claude Code、Codex、Gemini 等 AI 程式碼助手
-
gitroomhq/postiz-app
📌 【Postiz】開源 AI 社群排程工具,讓自動化發文更簡單
-
Hackers are learning to exploit chatbot ‘personalities’
📌 【The Verge】聊天機器人「人設」成駭客新武器?揭秘 DAN 與忽略指令的 jailbreak
-
I tried Amazon’s Bee wearable and am both intrigued and slightly creeped out
📌 Amazon Bee 穿戴式 AI 助手初體驗
-
katanemo/plano
📌 【katanemo】Plano:統一代理資料平面
-
NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule
📌 【NVIDIA AI】Gated DeltaNet-2:解耦刪除與寫入的線性注意力層
-
presenton/presenton
📌 【Open Source】Presenton:自建 AI 投影片產生器,擁抱模型自由與資料隱私
-
Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
📌 【Tencent 開源】TencentDB Agent Memory:四層本地記憶管線解決 AI Agent 長上下文問題
-
virattt/dexter
📌 【virattt/dexter】Dexter:自主金融研究代理
-
twentyhq/twenty
-
Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%
📌 Microsoft Research 釋出 Webwright:終端機原生網頁代理框架,Odysseys 分數從 33.5% 飆至 60.1%
-
onyx-dot-app/onyx
你以為只要裝個 LLM 就能直接用?
- 2026-05-23 · 週六 21 篇
-
AI is being used to resurrect the voices of dead pilots
📌 AI 復原死者飛行員聲音
-
[AINews] All Model Labs are now Agent Labs
📌 【AINews】All Model Labs are now Agent Labs
-
Build a SuperClaude Framework Workflow with Commands, Agents, Modes, and Session Memory
📌 Build a SuperClaude Framework Workflow – A Step‑by‑Step Tutorial for Structured Claude Prompting
-
crewAIInc/crewAI
📌 crewAI:獨立於 LangChain 的輕量級多代理自動化框架
-
Elon Musk has given up on solar power (on Earth)
📌 【TechCrunch 報導】Elon Musk 放棄地球太陽能?轉向空間發電
-
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
📌 規則化獎勵模型降低人工標註成本
-
Google’s new anything-to-anything AI model is wild
📌 Google 全新任意轉任意 AI 模型
-
How VCs and founders use inflated ‘ARR’ to crown AI startups
📌 ARR 造假:AI 新創的收入幻象
-
How Virgin Atlantic ships faster with Codex
📌 【OpenAI Blog】Virgin Atlantic 使用 Codex 加速交付行動應用
-
linshenkx/prompt-optimizer
📌 Prompt Optimizer:AI 提示詞優化工具
-
Models.dev: open-source database of AI model specs, pricing, and capabilities
📌 Models.dev Hub
-
mukul975/Anthropic-Cybersecurity-Skills
📌 開源網安技能庫:754 項技能
-
multica-ai/multica
📌 Multica:AI 代理變隊友
-
Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification
📌 【Nous Research 最新研究】Contrastive Neuron Attribution (CNA):無需 SAE 訓練即可精準操縱模型拒絕行為
-
Open source Kanban desktop app that runs parallel agents on every card
📌 Open‑source Kanban 桌面應用:每張卡片都能啟動獨立的 AI 代理
-
OpenPipe/ART
📌 【OpenPipe】ART:開源 GRPO 框架搭配 W&B Serverless RL,讓多步驟 AI Agent 訓練更省錢更快
-
Perplexity Open-Sources Bumblebee: A Read-Only Supply-Chain Scanner for Developer Endpoints
📌 【Perplexity 開源】Bumblebee:開發機端的唯讀供應鏈掃描器
-
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
📌 Nemotron‑Labs Diffusion:並行產生與迭代優化的新一代語言模型
-
warpdotdev/warp
📌 Warp:AI 代理終端機 IDE
-
web-infra-dev/midscene
📌 【web-infra-dev】Midscene.js:AI 驅動、視覺導向的跨平台 UI 自動化框架
-
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
📌 π-Bench:長期工作流助手評估
- 2026-05-22 · 週五 30 篇
-
abhigyanpatwari/GitNexus
📌 GitNexus:AI Agent 的程式碼透視鏡
-
AMEL: Accumulated Message Effects on LLM Judgments
📌 【OpenAI/Anthropic/Google 聯合研究】LLM 評分會被先前對話極性偏左右?AMEL 效應揭露
-
Advancing Mathematics Research with AI-Driven Formal Proof Search
📌 【DeepMind 最新研究】AI 協助正式證明,解開 9 個 Erdős 問題
-
Cambrian-P: Pose-Grounded Video Understanding
📌 【Cambrian-P】讓 AI 看影片時也懂相機位置
-
colbymchenry/codegraph
📌 CodeGraph:為 AI 編程助手加裝本地語義知識圖,降低工具呼叫與成本
-
Evaluating Commercial AI Chatbots as News Intermediaries
📌 【Stanford 最新評測】AI 聊天機器人當新聞中介,準確率高卻藏著語言偏見與陷阱
-
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
📌 【NVIDIA 最新研究】Gated DeltaNet-2:解耦刪除與寫入的線性注意力
-
Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs
📌 【Cohere 發布】Command A+:218B 參數穩疏 MoE 模型,僅需兩顆 H100 即可運行
-
google-research/timesfm
📌 TimesFM 2.5:輕量長上下文時間序列模型
-
GLeVE: Graph-Guided Lesion Grounding with Proposal Verification in 3D CT
📌 GLeVE:圖引導病灶定位,連結放射報告與3D CT
-
LLM Retrieval for Stable and Predictable Ad Recommendations
📌 【Meta 最新研究】利用 LLM 提升廣告推薦穩定性
-
microsoft/agent-governance-toolkit
📌 微軟 Agent Governance Toolkit:零違規的 AI 執行管控
-
Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most
📌 【Forecasting Research Institute】更強的 LLM 反而預測失準?超線性增長場景的逆向擴大現象
-
Multi-Stream LLMs: new paper on parallelizing/separating prompts, thinking, I/O
📌 Multi-Stream LLMs:讓模型同時讀、思、寫的新指令調整 Paradigm
-
Microsoft Releases Fara1.5: A Family of Browser Computer-Use Agents (4B/9B/27B) That Outperform OpenAI Operator and Gemini 2.5 Computer Use on Online-Mind2Web
📌 【Microsoft AI Frontiers】Fara1.5 超越 OpenAI Operator 與 Gemini 2.5,瀏覽器代理新里程碑
-
phodal/routa
📌 phodal/routa:看板式多智能體協作平台
-
plastic-labs/honcho
📌 plastic-labs 推出 Honcho:記憶基礎設施讓 AI Agent 能隨時間理解人與情境
-
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving
📌 【Waymo x Google DeepMind】Sensor2Sensor:從行車鏡頭影片生成多模態自駕車感測資料
-
SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations
📌 【CMU & Microsoft Research】SynAE:評估工具調用 Agent 合成資料品質的多軸框架
-
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
📌 Spreadsheet‑RL:以強化學習提升 LLM 在真實試算表任務上的表現
-
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks
📌 TerminalWorld:以真實終端機錄影自動建構大規模基準,評估代理人在指令列工作流上的表現
-
Tokenization with Split Trees
📌 【Kensho+MIT】分詞樹優化:詞元減11%
-
Think Thrice Before You Speak: Dual knowledge-enhanced Theory-of-Mind Reasoning for Persuasive Agents
📌 【雙知識ToM】Think Thrice Before You Speak
-
Towards a General Intelligence and Interface for Wearable Health Data
📌 【Google DeepMind】穿戴健康數據的通用智能模型
-
Tracer-Cloud/opensre
當事故發生時,證據散落在日誌、指標、追蹤、運維手冊與 Slack 中。
-
Towards Direct Evaluation of Harness Optimizers via Priority Ranking
📌 【Yonsei 大學等】直接評估 Harness 優化器:優先排名法
-
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation
-
Unified Data Selection for LLM Reasoning
📌 HES 資料選擇法:統一提升 LLM 推理訓練
-
Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs
📌 【Kyung Hee & Princeton 最新研究】Video-LLM 為何分不清左右?方向盲點診斷與解決方案
-
WorldKV: Efficient World Memory with World Retrieval and Compression
📌 【KAIST AI & Naver AI Lab】WorldKV:雙倍吞吐、持久世界記憶
- 2026-05-21 · 週四 30 篇
-
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
📌 【UC Santa Cruz & MIT】單一成功率不再足以評估 LLM 代理,AgentAtlas 提供更細膩的診斷框架
-
CALMem : Application-Layer Dual Memory for Conversational AI
📌 CALMem:應用層雙記憶架構,讓 LLMs 擁有 virtually 無限對話上下文(無需模型改動)
-
Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs
📌 【Dialpad 最新研究】Beyond Text-to-SQL:Agentic LLM 打造受管控的企業分析 API
-
ChromeDevTools/chrome-devtools-mcp
📌 Chrome DevTools 為 AI 打開的入口
-
Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards
📌 【NUS & HIT】Conflict‑Aware Additive Guidance:讓多重約束的生成模型不再偏離真實分布
-
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
📌 【HuggingFace Daily Papers】CutVerse:為媒體後期製作設計的組合式 GUI Agents 基準測試
-
DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation
📌 DeepWeb‑Bench:深度研究新基準
-
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
📌 【HKUST 等最新研究】DPO 與 RLHF 等價性其實有條件?
-
DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions
📌 【上海智源等最新研究】用「一步 Meta‑Action」取代冗長語言推理,DriveMA 在 Waymo 挑戰賽上創新 SOTA
-
google-gemini/gemini-cli
📌 Gemini CLI 終端機助手
-
From Automated to Autonomous: Hierarchical Agent-native Network Architecture (HANA)
📌 HANA:階層智能體網絡架構
-
Governance by Construction for Generalist Agents
📌 IBM 提出 CUGA:在不微調模型的情況下,為通用 LLM 代理加入五層可編寫的治理機制
-
Latent Dynamics for Full Body Avatar Animation
📌 潛在動態改善衣物 Avatar
-
Layer-wise Token Compression for Efficient Document Reranking
📌 【Amazon AGI】Layer‑wise Token Compression 加速 Reranker
-
Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm
-
iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance
📌 iTryOn:互動試衣的新突破
-
NousResearch/hermes-agent
📌 NousResearch Hermes Agent:自帶學習迴圈的開源 AI 代理
-
Learning from Language Feedback via Variational Policy Distillation
📌 【語言反饋驅動的變分策略蒸餾】
-
Open-World Evaluations for Measuring Frontier AI Capabilities
📌 開放世界評估:量測真實 AI
-
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration
📌 【北大最新研究】One‑Step Distillation 離散擴散模型,單步生成也能媲美多步?
-
One Model, Three Modalities: ByteDance Releases Lance for Image and Video Understanding, Generation, and Editing
你是否曾想過,一個模型同時能看圖、看影片、寫字、畫圖、剪輯,且在理解與生成之間不需要切換不同的網路?ByteDance 最新發表的 Lance 正嘗試做到這一點。
-
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
📌 OScaR:Occam's Razor for Extreme KV Cache Quantization
-
OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
📌 OSCToM:RL 對抗生成提升高階 Theory of Mind
-
Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy
📌 【University of Toronto 等】通用角色向量也能降低 AI 諂媚?
-
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
📌 PlanningBench:可控生成規劃資料提升LLM規劃力
-
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution
📌 【Westlake University 等】RankE:離散 T2I 模型的端到端後訓練,解碼器共進化打破對齊‑保真度權衡
-
ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving
📌 ScenePilot:生成可解卻失敗的邊界場景,提升自駕安全測試
-
Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
📌 【HuggingFace Daily Papers】OGPSA:用正交梯度投影緩解對齊稅
-
software-mansion/argent
📌 Argent:AI 代理模擬器
-
Stream3D: Sequential Multi-View 3D Generation via Evidential Memory
📌 【World Mind Lab/HKUST × MIT/Harvard】Stream3D:讓凍結的單視角 3D 生成器變成一致的串流模型
沒有符合條件的文章。