Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RD-OS Dynamic Agent Scheduling

动态资源分配、深度分析、智能调度

“不是平均分配,而是智能调度:有价值的 repo 分配更多 Agent 深入研究”


Core Problem

传统静态分配的问题

❌ Static Assignment (传统方式)
├─ 400 repos, 100 agents → 每个 repo 分配 0.25 agent
├─ 平均分配时间:每个 repo 分析 10 分钟
├─ 问题:
│   ├─ 重要 repo (tidb) 和 不重要 repo (废弃工具) 同样对待
│   ├─ 发现有价值 repo 时,无法动态增加资源
│   ├─ 发现无价值 repo 时,无法及时止损
│   └─ 无法根据发现调整策略
└─ 结果:资源浪费,深度不够

动态调度的优势

✅ Dynamic Scheduling (RD-OS)
├─ 初始扫描:所有 repo 快速扫描 (2 分钟/repo)
├─ 价值评估:根据指标评分
├─ 动态分配:
│   ├─ 高价值 repo → 分配 5-10 agents 深入分析
│   ├─ 中价值 repo → 分配 1-2 agents 标准分析
│   └─ 低价值 repo → 分配 0.5 agent 快速归档
├─ 持续调整:
│   ├─ 发现新问题 → 增加 Agent
│   ├─ 发现无价值 → 减少/停止分析
│   └─ 发现依赖关系 → 协调分析
└─ 结果:资源聚焦,深度足够,效率高

Value Scoring System

Repo 价值评估指标

# Repo 价值评分模型
class RepoValueScorer:
    """
    评估 repo 价值,决定分配多少 Agent 资源
    """
    
    def calculate_score(self, repo: Repo) -> float:
        score = 0.0
        
        # 1. 活跃度 (0-25 分)
        score += self._activity_score(repo)
        # - 最近提交频率
        # - 活跃贡献者数量
        # - 最近 PR/Issue 活动
        
        # 2. 影响力 (0-25 分)
        score += self._impact_score(repo)
        # - 被其他 repo 引用次数
        # - Stars/Forks
        # - 部署实例数量
        
        # 3. 战略重要性 (0-25 分)
        score += self._strategic_score(repo)
        # - 是否核心产品 (tidb = 25 分)
        # - 是否平台组件
        # - 是否关键依赖
        
        # 4. 代码质量 (0-15 分)
        score += self._quality_score(repo)
        # - 测试覆盖率
        # - 文档完整性
        # - 代码规范
        
        # 5. 迁移可行性 (0-10 分)
        score += self._feasibility_score(repo)
        # - 依赖复杂度
        # - 团队支持度
        # - 技术栈匹配度
        
        return score  # 0-100

评分示例

Repo活跃度影响力战略质量可行性总分等级
tidb25252512895S
tiflow20182010775A
tidb-operator18151811870A
ossinsight15201012966B
废弃工具 A2125818D
废弃工具 B0003912D

Agent Allocation Strategy

三级分析深度

┌─────────────────────────────────────────────────────────────────┐
│                  Three-Tier Analysis Depth                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Level 1: Deep Analysis (S/A 级 repo)                           │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Agents: 5-10 per repo                                   │   │
│  │  Time: 2-4 hours per repo                                │   │
│  │  Scope:                                                  │   │
│  │  - Full code analysis                                    │   │
│  │  - Dependency graph (detailed)                           │   │
│  │  - Test coverage analysis                                │   │
│  │  - Performance profiling                                 │   │
│  │  - Security audit                                        │   │
│  │  - Tech debt assessment                                  │   │
│  │  - Migration complexity analysis                         │   │
│  │  Output: 50-100 page report                              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Level 2: Standard Analysis (B 级 repo)                         │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Agents: 1-2 per repo                                    │   │
│  │  Time: 30-60 minutes per repo                            │   │
│  │  Scope:                                                  │   │
│  │  - Code structure overview                               │   │
│  │  - Dependency list                                       │   │
│  │  - Basic quality metrics                                 │   │
│  │  - Migration recommendation                              │   │
│  │  Output: 10-20 page report                               │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Level 3: Quick Scan (C/D 级 repo)                              │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  Agents: 0.5 per repo (1 agent handles 2-3 repos)        │   │
│  │  Time: 10-15 minutes per repo                            │   │
│  │  Scope:                                                  │   │
│  │  - Basic metadata                                        │   │
│  │  - Last activity check                                   │   │
│  │  - Archive recommendation                                │   │
│  │  Output: 1-2 page summary                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Agent Allocation Algorithm

class DynamicAgentScheduler:
    """
    动态分配 Agent 资源
    """
    
    def __init__(self, total_agents: int = 1000):
        self.total_agents = total_agents
        self.available_agents = total_agents
        self.assignments = {}
    
    def allocate(self, repos: List[Repo]) -> Dict[str, int]:
        """
        根据 repo 价值分配 Agent 数量
        """
        # 1. 评分所有 repo
        scored_repos = [(repo, scorer.calculate_score(repo)) for repo in repos]
        
        # 2. 分级
        s_tier = [r for r, s in scored_repos if s >= 85]  # S 级
        a_tier = [r for r, s in scored_repos if 70 <= s < 85]  # A 级
        b_tier = [r for r, s in scored_repos if 50 <= s < 70]  # B 级
        c_tier = [r for r, s in scored_repos if s < 50]  # C/D 级
        
        # 3. 分配 Agent
        allocation = {}
        
        # S 级:每 repo 8 agents
        for repo in s_tier:
            allocation[repo.id] = 8
        
        # A 级:每 repo 4 agents
        for repo in a_tier:
            allocation[repo.id] = 4
        
        # B 级:每 repo 2 agents
        for repo in b_tier:
            allocation[repo.id] = 2
        
        # C/D 级:每 3 repos 1 agent
        agent_for_c = max(1, len(c_tier) // 3)
        for i, repo in enumerate(c_tier):
            allocation[repo.id] = 1 if i % 3 == 0 else 0  # 共享 agent
        
        # 4. 检查是否超出总 Agent 数
        total_needed = sum(allocation.values())
        if total_needed > self.available_agents:
            # 降级处理:减少 S/A 级的 agent 数
            allocation = self._scale_down(allocation, self.available_agents)
        
        return allocation
    
    def reallocate(self, new_info: Dict[str, float]):
        """
        根据新信息重新分配(动态调整)
        """
        # 例如:发现某个 repo 比预期更重要
        # 增加其 Agent 分配,从低优先级 repo 调配
        pass

Dynamic Reallocation Triggers

何时触发重新分配

┌─────────────────────────────────────────────────────────────────┐
│              Dynamic Reallocation Triggers                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Value Discovery (价值发现)                                  │
│     ├─ Trigger: 初始分析发现 repo 价值高于预期                  │
│     ├─ Action: 增加 Agent (1 → 5)                               │
│     └─ Example: 发现"废弃工具"实际被 50 个服务依赖              │
│                                                                 │
│  2. Dependency Discovery (依赖发现)                             │
│     ├─ Trigger: 发现 repo 是关键依赖                            │
│     ├─ Action: 增加 Agent,协调分析依赖链                       │
│     └─ Example: 发现 tidb 依赖某个"小工具"                      │
│                                                                 │
│  3. Issue Detection (问题检测)                                  │
│     ├─ Trigger: 发现严重问题(安全漏洞、架构缺陷)              │
│     ├─ Action: 增加专项 Agent 深入调查                          │
│     └─ Example: 发现安全漏洞,分配安全专家 Agent                │
│                                                                 │
│  4. Blocker Resolution (阻塞解决)                               │
│     ├─ Trigger: 某 repo 分析阻塞,等待外部信息                  │
│     ├─ Action: 临时减少 Agent,调配到其他 repo                  │
│     └─ Example: 等待团队确认,先分析其他 repo                   │
│                                                                 │
│  5. Milestone Completion (里程碑完成)                           │
│     ├─ Trigger: 一批 repo 分析完成                              │
│     ├─ Action: 释放 Agent,分配到下一批                         │
│     └─ Example: P0 完成,Agent 调到 P1                          │
│                                                                 │
│  6. Human Intervention (人类干预)                               │
│     ├─ Trigger: 人类指定优先分析某 repo                         │
│     ├─ Action: 立即调配 Agent                                   │
│     └─ Example: CTO 说"先分析这个"                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Deep Analysis Workflow

S 级 Repo 深度分析流程

┌─────────────────────────────────────────────────────────────────┐
│              Deep Analysis Workflow (S-Tier Repo)               │
│              Example: pingcap/tidb                              │
└─────────────────────────────────────────────────────────────────┘

Repo: tidb (Score: 95, S-Tier)
Agents Assigned: 8
Estimated Time: 4 hours

┌─────────────────────────────────────────────────────────────┐
│  Agent Team Structure                                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  lead-analyst (1)                                           │
│    ├─ Coordinates the team                                  │
│    ├─ Synthesizes findings                                  │
│    └─ Produces final report                                 │
│                                                             │
│  code-archaeologist (2)                                     │
│    ├─ Maps code structure                                   │
│    ├─ Identifies key components                             │
│    └─ Documents architecture                                │
│                                                             │
│  dependency-analyst (1)                                     │
│    ├─ Maps internal dependencies                            │
│    ├─ Maps external dependencies                            │
│    └─ Identifies circular deps                              │
│                                                             │
│  quality-auditor (1)                                        │
│    ├─ Analyzes test coverage                                │
│    ├─ Runs static analysis                                  │
│    └─ Identifies tech debt                                  │
│                                                             │
│  security-analyst (1)                                       │
│    ├─ Scans for vulnerabilities                             │
│    ├─ Reviews auth/security code                            │
│    └─ Checks compliance                                     │
│                                                             │
│  migration-planner (1)                                      │
│    ├─ Assesses migration complexity                         │
│    ├─ Identifies risks                                      │
│    └─ Creates migration plan                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  Analysis Phases                                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 1: Reconnaissance (30 min)                           │
│  ├─ Quick scan of repo structure                           │
│  ├─ Identify key directories                               │
│  └─ Create initial dependency graph                        │
│                                                             │
│  Phase 2: Deep Dive (2 hours)                               │
│  ├─ Each agent analyzes their specialty                    │
│  ├─ Continuous checkpointing                               │
│  └─ Cross-agent communication                              │
│                                                             │
│  Phase 3: Synthesis (1 hour)                                │
│  ├─ Lead analyst synthesizes findings                      │
│  ├─ Identifies cross-cutting concerns                      │
│  └─ Creates unified report                                 │
│                                                             │
│  Phase 4: Review (30 min)                                   │
│  ├─ Quality check                                          │
│  ├─ Validate findings                                      │
│  └─ Submit report + recommendations                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────────────────────────────┐
│  Output: Deep Analysis Report                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Executive Summary (1 page)                              │
│     - Value score, recommendation                           │
│     - Key findings                                          │
│     - Migration priority                                    │
│                                                             │
│  2. Architecture Overview (5 pages)                         │
│     - Component diagram                                     │
│     - Data flow                                             │
│     - Key modules                                           │
│                                                             │
│  3. Dependency Analysis (10 pages)                          │
│     - Internal dependency graph                             │
│     - External dependencies                                 │
│     - Circular dependencies                                 │
│                                                             │
│  4. Quality Assessment (5 pages)                            │
│     - Test coverage                                         │
│     - Code quality metrics                                  │
│     - Tech debt inventory                                   │
│                                                             │
│  5. Security Audit (5 pages)                                │
│     - Vulnerability scan results                            │
│     - Security best practices                               │
│     - Compliance status                                     │
│                                                             │
│  6. Migration Plan (10 pages)                               │
│     - Migration strategy                                    │
│     - Risk assessment                                       │
│     - Effort estimation                                     │
│     - Recommended order                                     │
│                                                             │
│  Total: ~36 pages                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Agent Coordination Protocol

多 Agent 协作分析同一 Repo

# Pseudo-code: Multi-agent coordination

class DeepAnalysisTeam:
    """
    多 Agent 协作深度分析
    """
    
    def __init__(self, repo: Repo, agents: List[Agent]):
        self.repo = repo
        self.agents = agents
        self.shared_context = SharedContext()
        self.findings = []
    
    async def coordinate(self):
        # 1. 共享上下文初始化
        self.shared_context.set('repo', self.repo)
        self.shared_context.set('phase', 'reconnaissance')
        
        # 2. 并行分析(每个 agent 负责不同方面)
        tasks = [
            self.agents[0].analyze_architecture(self.shared_context),
            self.agents[1].analyze_dependencies(self.shared_context),
            self.agents[2].analyze_quality(self.shared_context),
            self.agents[3].analyze_security(self.shared_context),
            # ...
        ]
        
        # 3. 定期同步(每 15 分钟)
        sync_task = asyncio.create_task(self.periodic_sync())
        
        # 4. 等待所有分析完成
        results = await asyncio.gather(*tasks)
        
        # 5. 综合发现
        await self.synthesize(results)
        
        # 6. 生成报告
        report = await self.generate_report()
        
        return report
    
    async def periodic_sync(self):
        """定期同步,避免重复工作"""
        while not self.is_complete():
            await asyncio.sleep(900)  # 15 分钟
            
            # 共享发现
            for agent in self.agents:
                new_findings = agent.get_new_findings()
                self.shared_context.append('findings', new_findings)
                
                # 通知其他相关 agent
                for other_agent in self.agents:
                    if other_agent.should_know(new_findings):
                        other_agent.notify(new_findings)
            
            # 检查是否需要重新分配
            if self.needs_reallocation():
                await self.reallocate()

Real-World Example

场景:发现隐藏的宝石

初始状态:
├─ Repo: "old-tool" (看似废弃工具)
├─ Initial Score: 25 (C 级)
├─ Agent Allocation: 0.5 (快速扫描)
└─ Expected: 15 分钟完成,可能归档

快速扫描发现:
├─ 被 50 个内部服务依赖
├─ 处理关键数据转换
├─ 无替代方案
└─ 团队说"这很重要,但没时间维护"

触发重新评估:
├─ New Score: 78 (A 级) ⬆️
├─ New Agent Allocation: 4 agents ⬆️
└─ New Depth: Standard Analysis ⬆️

深度分析结果:
├─ 发现 3 个严重 bug
├─ 发现 5 个性能优化机会
├─ 创建现代化计划
└─ 建议:保留 + 重构(不是归档)

Impact:
├─ 避免归档关键工具
├─ 防止 50 个服务中断
├─ 改进性能 40%
└─ 价值:远超分析成本

Resource Optimization

Agent 利用率监控

class AgentUtilizationMonitor:
    """
    监控 Agent 利用率,优化分配
    """
    
    def monitor(self):
        metrics = {
            'total_agents': 1000,
            'active': 850,
            'idle': 100,
            'blocked': 50,
            
            'utilization_rate': 0.85,  # 85%
            'avg_task_duration': '45min',
            'tasks_completed_today': 342,
            
            'by_tier': {
                'S-tier': {'agents': 80, 'repos': 10, 'utilization': 0.95},
                'A-tier': {'agents': 200, 'repos': 50, 'utilization': 0.88},
                'B-tier': {'agents': 300, 'repos': 150, 'utilization': 0.82},
                'C-tier': {'agents': 100, 'repos': 190, 'utilization': 0.75},
            }
        }
        
        # 告警:利用率过低
        if metrics['utilization_rate'] < 0.60:
            alert("Low agent utilization - consider increasing batch size")
        
        # 告警:阻塞过多
        if metrics['blocked'] > 100:
            alert("Many agents blocked - investigate blockers")
        
        # 建议:重新分配
        if metrics['by_tier']['C-tier']['utilization'] < 0.50:
            suggest("Reallocate C-tier agents to B-tier")
        
        return metrics

Human Override

人类干预接口

# .rd-os/config/human-override.yaml

# 人类可以覆盖 AI 的分配决策
overrides:
  # 优先分析指定 repo
  priority_repos:
    - repo: pingcap/tidb
      reason: "CTO request - strategic importance"
      agents: 10  # 覆盖 AI 建议的 8 个
      deadline: 2026-03-01
    
    - repo: pingcap/new-feature
      reason: "Urgent customer request"
      agents: 5
      deadline: 2026-02-29
  
  # 跳过某些 repo
  skip_repos:
    - repo: pingcap/old-experiment
      reason: "Confirmed obsolete by team"
      action: archive
  
  # 调整分析深度
  depth_overrides:
    - repo: pingcap/ossinsight
      depth: deep  # 覆盖 AI 建议的 standard
      reason: "May become core product"

Metrics & KPIs

调度效果评估

MetricTargetMeasurement
Agent Utilization>80%Active agents / Total agents
Value Discovery Rate>10%Repos upgraded after initial scan
Reallocation Efficiency<5 minTime to reallocate agents
Deep Analysis ROI>5xValue found / Analysis cost
Human Satisfaction>90%Human approval of allocations
Completion Rate>95%Repos analyzed / Total repos

Implementation Checklist

Phase 1: Basic Scoring

  • Implement repo value scorer
  • Define scoring criteria
  • Test on 10-repo sample
  • Tune scoring weights

Phase 2: Dynamic Allocation

  • Implement allocation algorithm
  • Create agent pool manager
  • Add reallocation triggers
  • Test dynamic scaling

Phase 3: Coordination

  • Implement multi-agent coordination
  • Create shared context system
  • Add periodic sync mechanism
  • Test team analysis

Phase 4: Optimization

  • Implement utilization monitoring
  • Add human override interface
  • Create optimization recommendations
  • Continuous tuning

Conclusion

Dynamic Scheduling vs Static Allocation:

AspectStaticDynamic
Agent DistributionEqualBased on value
Response to DiscoveryNoneImmediate reallocation
Resource Efficiency50-60%80-90%
Depth for Critical ReposSame as others5-10x deeper
AdaptabilityNoneHigh

Result:

  • High-value repos get deep analysis
  • Low-value repos get quick disposition
  • Resources flow to where they matter most
  • System learns and adapts over time

This is how you analyze 400 repos intelligently, not uniformly.


“Not all repos are created equal. Treat them accordingly.”