RD-OS Dynamic Agent Scheduling

动态资源分配、深度分析、智能调度

“不是平均分配，而是智能调度：有价值的 repo 分配更多 Agent 深入研究”

Core Problem

传统静态分配的问题

❌ Static Assignment (传统方式)
├─ 400 repos, 100 agents → 每个 repo 分配 0.25 agent
├─ 平均分配时间：每个 repo 分析 10 分钟
├─ 问题：
│   ├─ 重要 repo (tidb) 和 不重要 repo (废弃工具) 同样对待
│   ├─ 发现有价值 repo 时，无法动态增加资源
│   ├─ 发现无价值 repo 时，无法及时止损
│   └─ 无法根据发现调整策略
└─ 结果：资源浪费，深度不够

动态调度的优势

✅ Dynamic Scheduling (RD-OS)
├─ 初始扫描：所有 repo 快速扫描 (2 分钟/repo)
├─ 价值评估：根据指标评分
├─ 动态分配：
│   ├─ 高价值 repo → 分配 5-10 agents 深入分析
│   ├─ 中价值 repo → 分配 1-2 agents 标准分析
│   └─ 低价值 repo → 分配 0.5 agent 快速归档
├─ 持续调整：
│   ├─ 发现新问题 → 增加 Agent
│   ├─ 发现无价值 → 减少/停止分析
│   └─ 发现依赖关系 → 协调分析
└─ 结果：资源聚焦，深度足够，效率高

Value Scoring System

Repo 价值评估指标

# Repo 价值评分模型
class RepoValueScorer:
    """
    评估 repo 价值，决定分配多少 Agent 资源
    """
    
    def calculate_score(self, repo: Repo) -> float:
        score = 0.0
        
        # 1. 活跃度 (0-25 分)
        score += self._activity_score(repo)
        # - 最近提交频率
        # - 活跃贡献者数量
        # - 最近 PR/Issue 活动
        
        # 2. 影响力 (0-25 分)
        score += self._impact_score(repo)
        # - 被其他 repo 引用次数
        # - Stars/Forks
        # - 部署实例数量
        
        # 3. 战略重要性 (0-25 分)
        score += self._strategic_score(repo)
        # - 是否核心产品 (tidb = 25 分)
        # - 是否平台组件
        # - 是否关键依赖
        
        # 4. 代码质量 (0-15 分)
        score += self._quality_score(repo)
        # - 测试覆盖率
        # - 文档完整性
        # - 代码规范
        
        # 5. 迁移可行性 (0-10 分)
        score += self._feasibility_score(repo)
        # - 依赖复杂度
        # - 团队支持度
        # - 技术栈匹配度
        
        return score  # 0-100

评分示例

Repo	活跃度	影响力	战略	质量	可行性	总分	等级
tidb	25	25	25	12	8	95	S
tiflow	20	18	20	10	7	75	A
tidb-operator	18	15	18	11	8	70	A
ossinsight	15	20	10	12	9	66	B
废弃工具 A	2	1	2	5	8	18	D
废弃工具 B	0	0	0	3	9	12	D

Agent Allocation Strategy

三级分析深度

┌─────────────────────────────────────────────────────────────────┐
│                  Three-Tier Analysis Depth                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Level 1: Deep Analysis (S/A 级 repo)                           │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Agents: 5-10 per repo                                   │   │
│  │  Time: 2-4 hours per repo                                │   │
│  │  Scope:                                                  │   │
│  │  - Full code analysis                                    │   │
│  │  - Dependency graph (detailed)                           │   │
│  │  - Test coverage analysis                                │   │
│  │  - Performance profiling                                 │   │
│  │  - Security audit                                        │   │
│  │  - Tech debt assessment                                  │   │
│  │  - Migration complexity analysis                         │   │
│  │  Output: 50-100 page report                              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Level 2: Standard Analysis (B 级 repo)                         │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Agents: 1-2 per repo                                    │   │
│  │  Time: 30-60 minutes per repo                            │   │
│  │  Scope:                                                  │   │
│  │  - Code structure overview                               │   │
│  │  - Dependency list                                       │   │
│  │  - Basic quality metrics                                 │   │
│  │  - Migration recommendation                              │   │
│  │  Output: 10-20 page report                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Level 3: Quick Scan (C/D 级 repo)                              │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Agents: 0.5 per repo (1 agent handles 2-3 repos)        │   │
│  │  Time: 10-15 minutes per repo                            │   │
│  │  Scope:                                                  │   │
│  │  - Basic metadata                                        │   │
│  │  - Last activity check                                   │   │
│  │  - Archive recommendation                                │   │
│  │  Output: 1-2 page summary                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Agent Allocation Algorithm

class DynamicAgentScheduler:
    """
    动态分配 Agent 资源
    """
    
    def __init__(self, total_agents: int = 1000):
        self.total_agents = total_agents
        self.available_agents = total_agents
        self.assignments = {}
    
    def allocate(self, repos: List[Repo]) -> Dict[str, int]:
        """
        根据 repo 价值分配 Agent 数量
        """
        # 1. 评分所有 repo
        scored_repos = [(repo, scorer.calculate_score(repo)) for repo in repos]
        
        # 2. 分级
        s_tier = [r for r, s in scored_repos if s >= 85]  # S 级
        a_tier = [r for r, s in scored_repos if 70 <= s < 85]  # A 级
        b_tier = [r for r, s in scored_repos if 50 <= s < 70]  # B 级
        c_tier = [r for r, s in scored_repos if s < 50]  # C/D 级
        
        # 3. 分配 Agent
        allocation = {}
        
        # S 级：每 repo 8 agents
        for repo in s_tier:
            allocation[repo.id] = 8
        
        # A 级：每 repo 4 agents
        for repo in a_tier:
            allocation[repo.id] = 4
        
        # B 级：每 repo 2 agents
        for repo in b_tier:
            allocation[repo.id] = 2
        
        # C/D 级：每 3 repos 1 agent
        agent_for_c = max(1, len(c_tier) // 3)
        for i, repo in enumerate(c_tier):
            allocation[repo.id] = 1 if i % 3 == 0 else 0  # 共享 agent
        
        # 4. 检查是否超出总 Agent 数
        total_needed = sum(allocation.values())
        if total_needed > self.available_agents:
            # 降级处理：减少 S/A 级的 agent 数
            allocation = self._scale_down(allocation, self.available_agents)
        
        return allocation
    
    def reallocate(self, new_info: Dict[str, float]):
        """
        根据新信息重新分配（动态调整）
        """
        # 例如：发现某个 repo 比预期更重要
        # 增加其 Agent 分配，从低优先级 repo 调配
        pass

Dynamic Reallocation Triggers

何时触发重新分配

┌─────────────────────────────────────────────────────────────────┐
│              Dynamic Reallocation Triggers                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Value Discovery (价值发现)                                  │
│     ├─ Trigger: 初始分析发现 repo 价值高于预期                  │
│     ├─ Action: 增加 Agent (1 → 5)                               │
│     └─ Example: 发现"废弃工具"实际被 50 个服务依赖              │
│                                                                 │
│  2. Dependency Discovery (依赖发现)                             │
│     ├─ Trigger: 发现 repo 是关键依赖                            │
│     ├─ Action: 增加 Agent，协调分析依赖链                       │
│     └─ Example: 发现 tidb 依赖某个"小工具"                      │
│                                                                 │
│  3. Issue Detection (问题检测)                                  │
│     ├─ Trigger: 发现严重问题（安全漏洞、架构缺陷）              │
│     ├─ Action: 增加专项 Agent 深入调查                          │
│     └─ Example: 发现安全漏洞，分配安全专家 Agent                │
│                                                                 │
│  4. Blocker Resolution (阻塞解决)                               │
│     ├─ Trigger: 某 repo 分析阻塞，等待外部信息                  │
│     ├─ Action: 临时减少 Agent，调配到其他 repo                  │
│     └─ Example: 等待团队确认，先分析其他 repo                   │
│                                                                 │
│  5. Milestone Completion (里程碑完成)                           │
│     ├─ Trigger: 一批 repo 分析完成                              │
│     ├─ Action: 释放 Agent，分配到下一批                         │
│     └─ Example: P0 完成，Agent 调到 P1                          │
│                                                                 │
│  6. Human Intervention (人类干预)                               │
│     ├─ Trigger: 人类指定优先分析某 repo                         │
│     ├─ Action: 立即调配 Agent                                   │
│     └─ Example: CTO 说"先分析这个"                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Deep Analysis Workflow

S 级 Repo 深度分析流程

┌─────────────────────────────────────────────────────────────────┐
│              Deep Analysis Workflow (S-Tier Repo)               │
│              Example: pingcap/tidb                              │
└─────────────────────────────────────────────────────────────────┘

Repo: tidb (Score: 95, S-Tier)
Agents Assigned: 8
Estimated Time: 4 hours

┌─────────────────────────────────────────────────────────────┐
│  Agent Team Structure                                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  lead-analyst (1)                                           │
│    ├─ Coordinates the team                                  │
│    ├─ Synthesizes findings                                  │
│    └─ Produces final report                                 │
│                                                             │
│  code-archaeologist (2)                                     │
│    ├─ Maps code structure                                   │
│    ├─ Identifies key components                             │
│    └─ Documents architecture                                │
│                                                             │
│  dependency-analyst (1)                                     │
│    ├─ Maps internal dependencies                            │
│    ├─ Maps external dependencies                            │
│    └─ Identifies circular deps                              │
│                                                             │
│  quality-auditor (1)                                        │
│    ├─ Analyzes test coverage                                │
│    ├─ Runs static analysis                                  │
│    └─ Identifies tech debt                                  │
│                                                             │
│  security-analyst (1)                                       │
│    ├─ Scans for vulnerabilities                             │
│    ├─ Reviews auth/security code                            │
│    └─ Checks compliance                                     │
│                                                             │
│  migration-planner (1)                                      │
│    ├─ Assesses migration complexity                         │
│    ├─ Identifies risks                                      │
│    └─ Creates migration plan                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  Analysis Phases                                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 1: Reconnaissance (30 min)                           │
│  ├─ Quick scan of repo structure                           │
│  ├─ Identify key directories                               │
│  └─ Create initial dependency graph                        │
│                                                             │
│  Phase 2: Deep Dive (2 hours)                               │
│  ├─ Each agent analyzes their specialty                    │
│  ├─ Continuous checkpointing                               │
│  └─ Cross-agent communication                              │
│                                                             │
│  Phase 3: Synthesis (1 hour)                                │
│  ├─ Lead analyst synthesizes findings                      │
│  ├─ Identifies cross-cutting concerns                      │
│  └─ Creates unified report                                 │
│                                                             │
│  Phase 4: Review (30 min)                                   │
│  ├─ Quality check                                          │
│  ├─ Validate findings                                      │
│  └─ Submit report + recommendations                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  Output: Deep Analysis Report                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Executive Summary (1 page)                              │
│     - Value score, recommendation                           │
│     - Key findings                                          │
│     - Migration priority                                    │
│                                                             │
│  2. Architecture Overview (5 pages)                         │
│     - Component diagram                                     │
│     - Data flow                                             │
│     - Key modules                                           │
│                                                             │
│  3. Dependency Analysis (10 pages)                          │
│     - Internal dependency graph                             │
│     - External dependencies                                 │
│     - Circular dependencies                                 │
│                                                             │
│  4. Quality Assessment (5 pages)                            │
│     - Test coverage                                         │
│     - Code quality metrics                                  │
│     - Tech debt inventory                                   │
│                                                             │
│  5. Security Audit (5 pages)                                │
│     - Vulnerability scan results                            │
│     - Security best practices                               │
│     - Compliance status                                     │
│                                                             │
│  6. Migration Plan (10 pages)                               │
│     - Migration strategy                                    │
│     - Risk assessment                                       │
│     - Effort estimation                                     │
│     - Recommended order                                     │
│                                                             │
│  Total: ~36 pages                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Agent Coordination Protocol

多 Agent 协作分析同一 Repo

# Pseudo-code: Multi-agent coordination

class DeepAnalysisTeam:
    """
    多 Agent 协作深度分析
    """
    
    def __init__(self, repo: Repo, agents: List[Agent]):
        self.repo = repo
        self.agents = agents
        self.shared_context = SharedContext()
        self.findings = []
    
    async def coordinate(self):
        # 1. 共享上下文初始化
        self.shared_context.set('repo', self.repo)
        self.shared_context.set('phase', 'reconnaissance')
        
        # 2. 并行分析（每个 agent 负责不同方面）
        tasks = [
            self.agents[0].analyze_architecture(self.shared_context),
            self.agents[1].analyze_dependencies(self.shared_context),
            self.agents[2].analyze_quality(self.shared_context),
            self.agents[3].analyze_security(self.shared_context),
            # ...
        ]
        
        # 3. 定期同步（每 15 分钟）
        sync_task = asyncio.create_task(self.periodic_sync())
        
        # 4. 等待所有分析完成
        results = await asyncio.gather(*tasks)
        
        # 5. 综合发现
        await self.synthesize(results)
        
        # 6. 生成报告
        report = await self.generate_report()
        
        return report
    
    async def periodic_sync(self):
        """定期同步，避免重复工作"""
        while not self.is_complete():
            await asyncio.sleep(900)  # 15 分钟
            
            # 共享发现
            for agent in self.agents:
                new_findings = agent.get_new_findings()
                self.shared_context.append('findings', new_findings)
                
                # 通知其他相关 agent
                for other_agent in self.agents:
                    if other_agent.should_know(new_findings):
                        other_agent.notify(new_findings)
            
            # 检查是否需要重新分配
            if self.needs_reallocation():
                await self.reallocate()

Real-World Example

场景：发现隐藏的宝石

初始状态:
├─ Repo: "old-tool" (看似废弃工具)
├─ Initial Score: 25 (C 级)
├─ Agent Allocation: 0.5 (快速扫描)
└─ Expected: 15 分钟完成，可能归档

快速扫描发现:
├─ 被 50 个内部服务依赖
├─ 处理关键数据转换
├─ 无替代方案
└─ 团队说"这很重要，但没时间维护"

触发重新评估:
├─ New Score: 78 (A 级) ⬆️
├─ New Agent Allocation: 4 agents ⬆️
└─ New Depth: Standard Analysis ⬆️

深度分析结果:
├─ 发现 3 个严重 bug
├─ 发现 5 个性能优化机会
├─ 创建现代化计划
└─ 建议：保留 + 重构（不是归档）

Impact:
├─ 避免归档关键工具
├─ 防止 50 个服务中断
├─ 改进性能 40%
└─ 价值：远超分析成本

Resource Optimization

Agent 利用率监控

class AgentUtilizationMonitor:
    """
    监控 Agent 利用率，优化分配
    """
    
    def monitor(self):
        metrics = {
            'total_agents': 1000,
            'active': 850,
            'idle': 100,
            'blocked': 50,
            
            'utilization_rate': 0.85,  # 85%
            'avg_task_duration': '45min',
            'tasks_completed_today': 342,
            
            'by_tier': {
                'S-tier': {'agents': 80, 'repos': 10, 'utilization': 0.95},
                'A-tier': {'agents': 200, 'repos': 50, 'utilization': 0.88},
                'B-tier': {'agents': 300, 'repos': 150, 'utilization': 0.82},
                'C-tier': {'agents': 100, 'repos': 190, 'utilization': 0.75},
            }
        }
        
        # 告警：利用率过低
        if metrics['utilization_rate'] < 0.60:
            alert("Low agent utilization - consider increasing batch size")
        
        # 告警：阻塞过多
        if metrics['blocked'] > 100:
            alert("Many agents blocked - investigate blockers")
        
        # 建议：重新分配
        if metrics['by_tier']['C-tier']['utilization'] < 0.50:
            suggest("Reallocate C-tier agents to B-tier")
        
        return metrics

Human Override

人类干预接口

# .rd-os/config/human-override.yaml

# 人类可以覆盖 AI 的分配决策
overrides:
  # 优先分析指定 repo
  priority_repos:
    - repo: pingcap/tidb
      reason: "CTO request - strategic importance"
      agents: 10  # 覆盖 AI 建议的 8 个
      deadline: 2026-03-01
    
    - repo: pingcap/new-feature
      reason: "Urgent customer request"
      agents: 5
      deadline: 2026-02-29
  
  # 跳过某些 repo
  skip_repos:
    - repo: pingcap/old-experiment
      reason: "Confirmed obsolete by team"
      action: archive
  
  # 调整分析深度
  depth_overrides:
    - repo: pingcap/ossinsight
      depth: deep  # 覆盖 AI 建议的 standard
      reason: "May become core product"

Metrics & KPIs

调度效果评估

Metric	Target	Measurement
Agent Utilization	>80%	Active agents / Total agents
Value Discovery Rate	>10%	Repos upgraded after initial scan
Reallocation Efficiency	<5 min	Time to reallocate agents
Deep Analysis ROI	>5x	Value found / Analysis cost
Human Satisfaction	>90%	Human approval of allocations
Completion Rate	>95%	Repos analyzed / Total repos

Implementation Checklist

Phase 1: Basic Scoring

Implement repo value scorer
Define scoring criteria
Test on 10-repo sample
Tune scoring weights

Phase 2: Dynamic Allocation

Implement allocation algorithm
Create agent pool manager
Add reallocation triggers
Test dynamic scaling

Phase 3: Coordination

Implement multi-agent coordination
Create shared context system
Add periodic sync mechanism
Test team analysis

Phase 4: Optimization

Implement utilization monitoring
Add human override interface
Create optimization recommendations
Continuous tuning

Conclusion

Dynamic Scheduling vs Static Allocation:

Aspect	Static	Dynamic
Agent Distribution	Equal	Based on value
Response to Discovery	None	Immediate reallocation
Resource Efficiency	50-60%	80-90%
Depth for Critical Repos	Same as others	5-10x deeper
Adaptability	None	High

Result:

High-value repos get deep analysis
Low-value repos get quick disposition
Resources flow to where they matter most
System learns and adapts over time

This is how you analyze 400 repos intelligently, not uniformly.

“Not all repos are created equal. Treat them accordingly.”

Keyboard shortcuts

Agentic Engineering Documentation