Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Mono-Repo Consolidation Plan

Agentic Engineering AI-First Initiative

“AI should be able to automatically complete a project from development to deployment.”

“Google proved monorepo scales to 2 billion lines. We’re building on that foundation with AI ownership.”


Overview

Goal: Consolidate 400+ repositories (~39GB) into an AI-friendly mono-repo with closed-loop development, testing, and progress management.

Strategic Context: This is not just a code consolidation — it’s a first-principles reimagining of AI-driven engineering. We’re building the foundation for AI to own the full lifecycle: architecture, development, testing, deployment, and iteration.

Inspired By: Google’s monorepo (2B LOC, 25K engineers, 45K commits/day)

Our Advantage: Google automated processes. We automate decisions with AI.

Timeline: Planning phase (1-2 days) → Execution phase (TBD)


AI-First Engineering Philosophy

Three Layers of AI-Driven Development

LayerScopeFocusThis Project
MicroSkills, MCP, ToolsEfficiency in existing systemsFoundation
MesoFeature lifecycleAI drives design→test→deployCore capability
MacroSystem/org architectureAI reorganizes everythingUltimate goal

Relativity Framework

Special Relativity (Near-term):

AI can automatically complete a single project: development, testing, deployment, launch

General Relativity (Ultimate):

AI unifies all company repositories, system architecture, deployment, modules — all deeply designed for AI ownership

This Project’s Place

Current State → Micro layer (tools, skills, MCP)
     ↓
This Project → Meso + Macro transition
     ↓
End State → General Relativity achieved
            (AI owns full lifecycle across unified codebase)

Current State

Total Repos: ~400
Total Size: ~39GB
Categories:
  - Products: TiDB, TiDB Next-Gen (database, storage, import/export tools)
  - Platform: TiDB Cloud SaaS (control services, resource deployment, monitoring)
  - DevOps: Online operations backend
  - Forks: Third-party dependencies
  - Abandoned: Unused projects

Problem: Fragmented codebase prevents AI from having full context.
         AI cannot optimize across repo boundaries.
         Human coordination overhead scales with repo count.

Phase 1: Repository Analysis (Distributed Agent Cluster)

1.1 Agent Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Orchestrator Agent                       │
│  - Coordinates 400+ repo agents                             │
│  - Aggregates analysis results                              │
│  - Makes merge recommendations                              │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│  Repo Agent   │   │  Repo Agent   │   │  Repo Agent   │
│   (repo-001)  │   │   (repo-002)  │   │   (repo-400)  │
└───────────────┘   └───────────────┘   └───────────────┘

1.2 Per-Repo Analysis Metrics

Each agent analyzes its repo for:

MetricDescriptionWeight
FreshnessLast commit date, activity frequencyHigh
DependenciesInternal deps, external deps, circular refsHigh
Code QualityTest coverage, lint errors, tech debtMedium
DocumentationREADME, API docs, architecture docsMedium
UsageImport count, deployment instancesHigh
OwnerTeam ownership, maintenance statusMedium
Build SystemCI/CD config, build scriptsLow

1.3 Agent Implementation

# Agent spec (pseudo-code)
class RepoAgent:
    def __init__(self, repo_path, repo_id):
        self.repo_path = repo_path
        self.repo_id = repo_id
    
    def analyze(self):
        return {
            'freshness': self.check_freshness(),
            'dependencies': self.map_dependencies(),
            'code_quality': self.assess_quality(),
            'documentation': self.scan_docs(),
            'usage': self.detect_usage(),
            'merge_recommendation': self.recommend(),
        }

1.4 Distributed Execution Strategy

Challenge: 400+ agents running concurrently

Solution: Batched parallel execution

  • Batch size: 50 agents (adjustable based on resources)
  • Total batches: 8 (400/50)
  • Estimated time per batch: 5-10 minutes
  • Total analysis time: ~1-2 hours

Resource Requirements:

  • CPU: 8+ cores recommended
  • Memory: 16GB+ recommended
  • Disk I/O: SSD preferred (39GB read operations)

Phase 2: Mono-Repo Design

2.1 Target Structure

mono-repo/
├── products/
│   ├── tidb/                    # TiDB database core
│   │   ├── server/
│   │   ├── storage/
│   │   └── tools/
│   └── tidb-next/               # Next-gen database
│       ├── server/
│       ├── storage/
│       └── tools/
├── platform/
│   ├── cloud-saas/              # TiDB Cloud platform
│   │   ├── control-plane/
│   │   ├── resource-deploy/
│   │   ├── monitoring/
│   │   └── api-gateway/
│   └── shared-services/         # Cross-platform services
├── devops/
│   ├── ops-backend/             # Operations tools
│   ├── ci-cd/
│   └── deployment/
├── libs/                        # Shared libraries
│   ├── common/
│   ├── utils/
│   └── protocols/
├── tools/                       # Build/dev tools
├── docs/                        # Centralized documentation
└── infra/                       # Infrastructure as code

2.2 AI-Friendly Design Principles

  1. Clear Boundaries: Each component has well-defined interfaces
  2. Self-Contained: Components can be understood in isolation
  3. Documented Contracts: API specs, data schemas, protocols
  4. Testable: Clear test boundaries, mockable interfaces
  5. Versioned: Internal versioning for breaking changes

2.3 Build System

# Monorepo build orchestration
- Turborepo / Nx / Bazel (depending on tech stack)
- Incremental builds (only changed components)
- Parallel test execution
- Dependency graph visualization

Phase 3: Migration Strategy

3.1 Migration Priority

PriorityCategoryCriteriaAction
P0Active core productsHigh usage, active developmentMigrate first
P1Platform servicesCritical infrastructureMigrate early
P2DevOps toolsImportant but isolatedMigrate mid-phase
P3Low-activity reposMinor usage, stableMigrate late
P4Abandoned reposNo activity >1 yearArchive or delete
P5Forked dependenciesThird-party forksEvaluate: keep upstream?

3.2 Migration Process (Per Repo)

1. Pre-migration check
   ├── Dependency analysis
   ├── Conflict detection
   └── Build verification

2. Code transfer
   ├── Preserve git history (git filter-repo)
   ├── Map to new structure
   └── Update import paths

3. Integration
   ├── Update build configs
   ├── Fix dependency references
   └── Run tests

4. Validation
   ├── CI/CD passes
   ├── Integration tests pass
   └── Smoke tests in staging

5. Cutover
   ├── Update deployment configs
   ├── Switch CI/CD to mono-repo
   └── Archive old repo (read-only)

3.3 Estimated Timeline

PhaseReposDuration
Planning & AnalysisAll 4002 days
P0 Migration (core)~503-5 days
P1 Migration (platform)~1005-7 days
P2-P3 Migration~1507-10 days
P4-P5 Cleanup~1002-3 days
Total400~3-4 weeks

Phase 4: AI Closed-Loop Development

4.1 The AI-First Vision

This mono-repo is designed to enable General Relativity: AI owns the full system lifecycle.

┌─────────────────────────────────────────────────────────────────────┐
│                    AI Ownership Spectrum                            │
│                                                                     │
│  Micro          Meso              Macro              General Rel.   │
│  │              │                 │                  │              │
│  ▼              ▼                 ▼                  ▼              │
│  Tools      Feature           System              AI owns          │
│  & Skills   Lifecycle         Architecture        Everything       │
│                                                                     │
│  [Current]  [Phase 4.2]       [Phase 4.3]         [End State]      │
└─────────────────────────────────────────────────────────────────────┘

4.2 Development Loop (Meso Layer)

┌─────────────────────────────────────────────────────────────┐
│                    AI Development Loop                      │
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐  │
│  │ Plan    │───▶│ Code    │───▶│ Test    │───▶│ Review  │  │
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘  │
│       ▲                                              │      │
│       └──────────────────────────────────────────────┘      │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Progress Management                     │    │
│  │  - Task tracking                                     │    │
│  │  - Sprint planning                                   │    │
│  │  - Blocker detection                                 │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

4.2 AI Capabilities

CapabilityDescriptionImplementation
Code GenerationGenerate features, fixes, refactorsLLM + context from repo
Test GenerationAuto-generate unit/integration testsCoverage-guided
Code ReviewAutomated PR review, style checksStatic analysis + LLM
Bug DetectionIdentify potential issuesPattern matching + ML
DocumentationAuto-generate/update docsCode → docs extraction
Progress TrackingSprint planning, task estimationHistorical data + LLM

4.3 System Architecture Ownership (Macro Layer)

AI Reorganizes System Architecture:

┌─────────────────────────────────────────────────────────────────┐
│              AI-Designed System Architecture                    │
│                                                                 │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐    │
│  │   Product    │     │   Platform   │     │    DevOps    │    │
│  │   Services   │◀───▶│   Services   │◀───▶│   Services   │    │
│  └──────────────┘     └──────────────┘     └──────────────┘    │
│         ▲                    ▲                    ▲             │
│         └────────────────────┼────────────────────┘             │
│                              │                                  │
│                     ┌────────▼────────┐                         │
│                     │  AI Orchestrator│                         │
│                     │  - Discovers    │                         │
│                     │  - Optimizes    │                         │
│                     │  - Refactors    │                         │
│                     └─────────────────┘                         │
└─────────────────────────────────────────────────────────────────┘

AI Capabilities at Macro Layer:

  • Architecture Discovery: Map service dependencies, data flows, bottlenecks
  • Automated Refactoring: Identify and execute cross-service improvements
  • Interface Optimization: Evolve APIs based on usage patterns
  • Tech Debt Management: Prioritize and fix systemic issues

4.4 Deployment & Operations Ownership (General Relativity)

AI-Managed Infrastructure:

# Auto-scaling policies (AI-optimized)
resource_policies:
  - service: control-plane
    scaling:
      min_instances: 3
      max_instances: 50
      metrics: [cpu, memory, request_latency]
      ai_optimizer: enabled
  
  - service: resource-deploy
    multi_region:
      regions: [us-east, eu-west, ap-southeast]
      ai_routing: enabled  # AI decides optimal region

AI Responsibilities:

  • Predict load patterns
  • Auto-scale before traffic spikes
  • Optimize resource allocation across regions
  • Detect and respond to anomalies
  • Cost optimization (right-sizing, spot instances)
  • Self-healing: Automatic incident response and recovery
  • Continuous Optimization: A/B test deployments, rollback on metrics

4.5 End State: General Relativity Achieved

┌─────────────────────────────────────────────────────────────────┐
│         General Relativity: AI Owns Everything                  │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Unified Codebase                      │   │
│  │  (400 repos → 1 mono-repo, AI-readable, AI-optimizable) │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│         ┌────────────────────┼────────────────────┐            │
│         ▼                    ▼                    ▼             │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│  │   AI Dev    │     │   AI Ops    │     │   AI Org    │       │
│  │  - Designs  │     │  - Deploys  │     │  - Plans    │       │
│  │  - Codes    │     │  - Scales   │     │  - Staffs   │       │
│  │  - Tests    │     │  - Monitors │     │  - Allocates│       │
│  │  - Reviews  │     │  - Heals    │     │  - Optimizes│       │
│  └─────────────┘     └─────────────┘     └─────────────┘       │
│                                                                 │
│  Result: Human engineers focus on strategy, creativity,         │
│          and high-level problem definition.                     │
│          AI handles execution at all layers.                    │
└─────────────────────────────────────────────────────────────────┘

Phase 5: Technical Considerations

5.1 Google Monorepo Lessons (2 Billion LOC Proven)

Key Insights from Google’s Playbook:

PrincipleGoogle’s ApproachTiDB Application
Single Source of TruthOne repo for 95% of codebaseAll 400 repos → 1 mono-repo
Trunk-Based DevelopmentDirect commits to main, pre-commit reviewAdopt from day 1
Code OwnershipDefault open, CODEOWNERS enforcementDirectory-based ownership
Build SystemBazel (incremental, remote cache)Bazel/Turborepo/Nx based on stack
Dependency MgmtSingle version graph, automated updatesDependency visualization tool
Code ReviewAutomated pre-checks + OWNERSGitHub/GitLab CODEOWNERS
InfrastructurePiper + CitC (partial checkout)Git + shallow clones + sparse checkout

Google’s Scale (for reference):

  • 2 billion lines of code
  • 25,000+ engineers
  • 45,000 commits/day
  • 86 TB storage
  • Automation does 24,000 commits/day

Our AI Advantage: Google automated processes. We automate decisions.


5.2 Scale Challenges

ChallengeSolutionGoogle Reference
Git repo sizegit-lfs, shallow clones, sparse checkoutCitC (partial checkout)
Build timeIncremental builds, remote cachingBazel
CI/CD complexityPath-based triggeringAutomated pre-commit checks
Code ownershipCODEOWNERS file, clear boundariesOWNERS files per workspace
Access controlFine-grained permissions per directoryDefault open, exceptions restricted
Search speedSourcegraph / ZoektCodeSearch engine
Dependency hellDependency graph visualizationSingle version, automated updates

5.3 Tooling Requirements

CategoryToolsRecommendation
Build SystemBazel, Turborepo, NxBased on tech stack (see below)
Code SearchSourcegraph, ZoektSourcegraph (enterprise) or Zoekt (open)
Dependency VizCustom + graph DBBuild custom tool
CI/CDGitHub Actions, GitLab CIPath filtering required
Agent FrameworkLangChain, AutoGen, customCustom (tuned for repo analysis)
Version ControlGitStandard Git + sparse checkout

Build System by Tech Stack:

Go          → Bazel or Please
TypeScript  → Turborepo or Nx
Java        → Bazel or Gradle
Python      → Bazel or Pants
Mixed       → Bazel (most flexible)

5.4 Risk Mitigation

RiskMitigationGoogle Parallel
Data lossFull backups before each batchPiper (distributed storage)
DowntimeParallel run (old + new)Release branches + feature flags
Broken buildsComprehensive tests, canary deploysPre-commit verification
Team disruptionGradual migration, trainingTrunk-based culture
Rollback neededKeep old repos read-only 30 daysRelease branch rollback
PerformanceIncremental builds, cachingBazel remote cache

5.5 Trunk-Based Development Model (Google Standard)

main (trunk)
  │
  ├── All developers commit directly to main
  ├── Pre-commit code review required
  ├── Automated checks run before merge
  │
  └── release/v1.0  (branch for deployment only)
      └── Feature flags control visibility

Rules:

  1. No long-lived feature branches
  2. All changes reviewed before merge (pre-commit)
  3. Small, frequent commits (not big bangs)
  4. Feature flags for incomplete features
  5. Release branches are for deployment, not development

Benefits:

  • No merge nightmares
  • Early conflict detection
  • Continuous delivery enabled
  • AI can safely make small, incremental changes

5.6 CODEOWNERS Structure

# Root CODEOWNERS file
# Format: path_pattern  @owner1 @owner2

# Products
products/tidb/*         @tidb-core-team @database-leads
products/tidb-next/*    @tidb-next-team @architecture-review

# Platform
platform/cloud-saas/*   @cloud-platform-team @platform-leads
platform/shared/*       @platform-architects

# DevOps
devops/*                @devops-team @sre-leads

# Shared Libraries (high scrutiny)
libs/*                  @platform-architects @tech-leads

# Infrastructure
infra/*                 @infra-team @security-review

# Build/Tooling
tools/*                 @devex-team
BUILD                   @build-maintainers

Review Policies:

  • libs/* requires 2 approvals (shared code impact)
  • products/* requires 1 approval + team lead
  • devops/* requires 1 approval + on-call SRE
  • Security-sensitive paths require security team approval

Next Steps (Planning Phase)

Day 1: Analysis Framework

  • Set up distributed agent infrastructure
  • Define analysis metrics and scoring
  • Create repo inventory (list all 400 repos)
  • Run pilot analysis on 10 repos

Day 2: Mono-Repo Design

  • Finalize directory structure
  • Design build system architecture
  • Plan migration tooling
  • Create detailed migration runbook

Deliverables

  1. repo-analysis-report.md — Analysis of all 400 repos
  2. mono-repo-structure.md — Detailed structure spec
  3. migration-runbook.md — Step-by-step migration guide
  4. ai-dev-loop-spec.md — AI closed-loop development spec
  5. ai-first-methodology.md — AI-First engineering methodology (this framework)
  6. ai-capability-maturity.md — AI capability maturity model (Micro→Meso→Macro→General Relativity)
  7. google-monorepo-lessons.md — Google best practices reference ✅ DONE
  8. codeowners-template.md — CODEOWNERS file template
  9. build-system-evaluation.md — Bazel vs Turborepo vs Nx analysis

Open Questions

  1. Tech stack: What languages/frameworks are in the 400 repos? (affects build system choice)
  2. Team size: How many engineers will work in the mono-repo? (affects access control design)
  3. Current CI/CD: What’s the existing pipeline? (affects migration complexity)
  4. Deployment: How are services currently deployed? (affects infra design)
  5. Agent hosting: Where will the 400 agents run? (local cluster, cloud, hybrid?)

Appendix: AI-First Methodology

Why This Matters

Most AI engineering efforts stop at the Micro layer:

  • Build some skills
  • Add some MCP tools
  • Improve individual workflows

This project goes further:

Layer       What Changes              Outcome
─────────────────────────────────────────────────────────
Micro       Tools & workflows         Faster individual tasks
Meso        Feature ownership         AI delivers features end-to-end
Macro       System architecture       AI optimizes across services
General     Everything                AI runs the engineering org

First Principles Reasoning

Question: What should AI be capable of in software engineering?

Answer: A good AI engineer should be able to:

  1. Understand the full system (not just one repo)
  2. Design improvements that span boundaries
  3. Implement, test, and deploy changes
  4. Monitor and iterate based on outcomes

Barrier: Fragmented codebases prevent #1.

Solution: Unified mono-repo designed for AI ownership.

Success Metrics

MetricCurrentTarget (6mo)Target (12mo)
AI-completed features0%20%50%
AI-identified optimizations0%100/week500/week
AI-deployed changes0%10%40%
Human time on routine tasks60%30%10%
System-wide tech debtHighReduced 25%Reduced 60%

Last updated: Planning phase

“The goal is not to help humans do AI work. The goal is to have AI do the work, and humans define what matters.”