Mono-Repo Consolidation Plan

Agentic Engineering AI-First Initiative

“AI should be able to automatically complete a project from development to deployment.”

“Google proved monorepo scales to 2 billion lines. We’re building on that foundation with AI ownership.”

Overview

Goal: Consolidate 400+ repositories (~39GB) into an AI-friendly mono-repo with closed-loop development, testing, and progress management.

Strategic Context: This is not just a code consolidation — it’s a first-principles reimagining of AI-driven engineering. We’re building the foundation for AI to own the full lifecycle: architecture, development, testing, deployment, and iteration.

Inspired By: Google’s monorepo (2B LOC, 25K engineers, 45K commits/day)

Our Advantage: Google automated processes. We automate decisions with AI.

Timeline: Planning phase (1-2 days) → Execution phase (TBD)

AI-First Engineering Philosophy

Three Layers of AI-Driven Development

Layer	Scope	Focus	This Project
Micro	Skills, MCP, Tools	Efficiency in existing systems	Foundation
Meso	Feature lifecycle	AI drives design→test→deploy	Core capability
Macro	System/org architecture	AI reorganizes everything	Ultimate goal

Relativity Framework

Special Relativity (Near-term):

AI can automatically complete a single project: development, testing, deployment, launch

General Relativity (Ultimate):

AI unifies all company repositories, system architecture, deployment, modules — all deeply designed for AI ownership

This Project’s Place

Current State → Micro layer (tools, skills, MCP)
     ↓
This Project → Meso + Macro transition
     ↓
End State → General Relativity achieved
            (AI owns full lifecycle across unified codebase)

Current State

Total Repos: ~400
Total Size: ~39GB
Categories:
  - Products: TiDB, TiDB Next-Gen (database, storage, import/export tools)
  - Platform: TiDB Cloud SaaS (control services, resource deployment, monitoring)
  - DevOps: Online operations backend
  - Forks: Third-party dependencies
  - Abandoned: Unused projects

Problem: Fragmented codebase prevents AI from having full context.
         AI cannot optimize across repo boundaries.
         Human coordination overhead scales with repo count.

Phase 1: Repository Analysis (Distributed Agent Cluster)

1.1 Agent Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Orchestrator Agent                       │
│  - Coordinates 400+ repo agents                             │
│  - Aggregates analysis results                              │
│  - Makes merge recommendations                              │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│  Repo Agent   │   │  Repo Agent   │   │  Repo Agent   │
│   (repo-001)  │   │   (repo-002)  │   │   (repo-400)  │
└───────────────┘   └───────────────┘   └───────────────┘

1.2 Per-Repo Analysis Metrics

Each agent analyzes its repo for:

Metric	Description	Weight
Freshness	Last commit date, activity frequency	High
Dependencies	Internal deps, external deps, circular refs	High
Code Quality	Test coverage, lint errors, tech debt	Medium
Documentation	README, API docs, architecture docs	Medium
Usage	Import count, deployment instances	High
Owner	Team ownership, maintenance status	Medium
Build System	CI/CD config, build scripts	Low

1.3 Agent Implementation

# Agent spec (pseudo-code)
class RepoAgent:
    def __init__(self, repo_path, repo_id):
        self.repo_path = repo_path
        self.repo_id = repo_id
    
    def analyze(self):
        return {
            'freshness': self.check_freshness(),
            'dependencies': self.map_dependencies(),
            'code_quality': self.assess_quality(),
            'documentation': self.scan_docs(),
            'usage': self.detect_usage(),
            'merge_recommendation': self.recommend(),
        }

1.4 Distributed Execution Strategy

Challenge: 400+ agents running concurrently

Solution: Batched parallel execution

Batch size: 50 agents (adjustable based on resources)
Total batches: 8 (400/50)
Estimated time per batch: 5-10 minutes
Total analysis time: ~1-2 hours

Resource Requirements:

CPU: 8+ cores recommended
Memory: 16GB+ recommended
Disk I/O: SSD preferred (39GB read operations)

Phase 2: Mono-Repo Design

2.1 Target Structure

mono-repo/
├── products/
│   ├── tidb/                    # TiDB database core
│   │   ├── server/
│   │   ├── storage/
│   │   └── tools/
│   └── tidb-next/               # Next-gen database
│       ├── server/
│       ├── storage/
│       └── tools/
├── platform/
│   ├── cloud-saas/              # TiDB Cloud platform
│   │   ├── control-plane/
│   │   ├── resource-deploy/
│   │   ├── monitoring/
│   │   └── api-gateway/
│   └── shared-services/         # Cross-platform services
├── devops/
│   ├── ops-backend/             # Operations tools
│   ├── ci-cd/
│   └── deployment/
├── libs/                        # Shared libraries
│   ├── common/
│   ├── utils/
│   └── protocols/
├── tools/                       # Build/dev tools
├── docs/                        # Centralized documentation
└── infra/                       # Infrastructure as code

2.2 AI-Friendly Design Principles

Clear Boundaries: Each component has well-defined interfaces
Self-Contained: Components can be understood in isolation
Documented Contracts: API specs, data schemas, protocols
Testable: Clear test boundaries, mockable interfaces
Versioned: Internal versioning for breaking changes

2.3 Build System

# Monorepo build orchestration
- Turborepo / Nx / Bazel (depending on tech stack)
- Incremental builds (only changed components)
- Parallel test execution
- Dependency graph visualization

Phase 3: Migration Strategy

3.1 Migration Priority

Priority	Category	Criteria	Action
P0	Active core products	High usage, active development	Migrate first
P1	Platform services	Critical infrastructure	Migrate early
P2	DevOps tools	Important but isolated	Migrate mid-phase
P3	Low-activity repos	Minor usage, stable	Migrate late
P4	Abandoned repos	No activity >1 year	Archive or delete
P5	Forked dependencies	Third-party forks	Evaluate: keep upstream?

3.2 Migration Process (Per Repo)

1. Pre-migration check
   ├── Dependency analysis
   ├── Conflict detection
   └── Build verification

2. Code transfer
   ├── Preserve git history (git filter-repo)
   ├── Map to new structure
   └── Update import paths

3. Integration
   ├── Update build configs
   ├── Fix dependency references
   └── Run tests

4. Validation
   ├── CI/CD passes
   ├── Integration tests pass
   └── Smoke tests in staging

5. Cutover
   ├── Update deployment configs
   ├── Switch CI/CD to mono-repo
   └── Archive old repo (read-only)

3.3 Estimated Timeline

Phase	Repos	Duration
Planning & Analysis	All 400	2 days
P0 Migration (core)	~50	3-5 days
P1 Migration (platform)	~100	5-7 days
P2-P3 Migration	~150	7-10 days
P4-P5 Cleanup	~100	2-3 days
Total	400	~3-4 weeks

Phase 4: AI Closed-Loop Development

4.1 The AI-First Vision

This mono-repo is designed to enable General Relativity: AI owns the full system lifecycle.

┌─────────────────────────────────────────────────────────────────────┐
│                    AI Ownership Spectrum                            │
│                                                                     │
│  Micro          Meso              Macro              General Rel.   │
│  │              │                 │                  │              │
│  ▼              ▼                 ▼                  ▼              │
│  Tools      Feature           System              AI owns          │
│  & Skills   Lifecycle         Architecture        Everything       │
│                                                                     │
│  [Current]  [Phase 4.2]       [Phase 4.3]         [End State]      │
└─────────────────────────────────────────────────────────────────────┘

4.2 Development Loop (Meso Layer)

┌─────────────────────────────────────────────────────────────┐
│                    AI Development Loop                      │
│                                                             │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐  │
│  │ Plan    │───▶│ Code    │───▶│ Test    │───▶│ Review  │  │
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘  │
│       ▲                                              │      │
│       └──────────────────────────────────────────────┘      │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Progress Management                     │    │
│  │  - Task tracking                                     │    │
│  │  - Sprint planning                                   │    │
│  │  - Blocker detection                                 │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

4.2 AI Capabilities

Capability	Description	Implementation
Code Generation	Generate features, fixes, refactors	LLM + context from repo
Test Generation	Auto-generate unit/integration tests	Coverage-guided
Code Review	Automated PR review, style checks	Static analysis + LLM
Bug Detection	Identify potential issues	Pattern matching + ML
Documentation	Auto-generate/update docs	Code → docs extraction
Progress Tracking	Sprint planning, task estimation	Historical data + LLM

4.3 System Architecture Ownership (Macro Layer)

AI Reorganizes System Architecture:

┌─────────────────────────────────────────────────────────────────┐
│              AI-Designed System Architecture                    │
│                                                                 │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐    │
│  │   Product    │     │   Platform   │     │    DevOps    │    │
│  │   Services   │◀───▶│   Services   │◀───▶│   Services   │    │
│  └──────────────┘     └──────────────┘     └──────────────┘    │
│         ▲                    ▲                    ▲             │
│         └────────────────────┼────────────────────┘             │
│                              │                                  │
│                     ┌────────▼────────┐                         │
│                     │  AI Orchestrator│                         │
│                     │  - Discovers    │                         │
│                     │  - Optimizes    │                         │
│                     │  - Refactors    │                         │
│                     └─────────────────┘                         │
└─────────────────────────────────────────────────────────────────┘

AI Capabilities at Macro Layer:

Architecture Discovery: Map service dependencies, data flows, bottlenecks
Automated Refactoring: Identify and execute cross-service improvements
Interface Optimization: Evolve APIs based on usage patterns
Tech Debt Management: Prioritize and fix systemic issues

4.4 Deployment & Operations Ownership (General Relativity)

AI-Managed Infrastructure:

# Auto-scaling policies (AI-optimized)
resource_policies:
  - service: control-plane
    scaling:
      min_instances: 3
      max_instances: 50
      metrics: [cpu, memory, request_latency]
      ai_optimizer: enabled
  
  - service: resource-deploy
    multi_region:
      regions: [us-east, eu-west, ap-southeast]
      ai_routing: enabled  # AI decides optimal region

AI Responsibilities:

Predict load patterns
Auto-scale before traffic spikes
Optimize resource allocation across regions
Detect and respond to anomalies
Cost optimization (right-sizing, spot instances)
Self-healing: Automatic incident response and recovery
Continuous Optimization: A/B test deployments, rollback on metrics

4.5 End State: General Relativity Achieved

┌─────────────────────────────────────────────────────────────────┐
│         General Relativity: AI Owns Everything                  │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Unified Codebase                      │   │
│  │  (400 repos → 1 mono-repo, AI-readable, AI-optimizable) │   │
│  └─────────────────────────────────────────────────────────┘   │
│                              │                                  │
│         ┌────────────────────┼────────────────────┐            │
│         ▼                    ▼                    ▼             │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│  │   AI Dev    │     │   AI Ops    │     │   AI Org    │       │
│  │  - Designs  │     │  - Deploys  │     │  - Plans    │       │
│  │  - Codes    │     │  - Scales   │     │  - Staffs   │       │
│  │  - Tests    │     │  - Monitors │     │  - Allocates│       │
│  │  - Reviews  │     │  - Heals    │     │  - Optimizes│       │
│  └─────────────┘     └─────────────┘     └─────────────┘       │
│                                                                 │
│  Result: Human engineers focus on strategy, creativity,         │
│          and high-level problem definition.                     │
│          AI handles execution at all layers.                    │
└─────────────────────────────────────────────────────────────────┘

Phase 5: Technical Considerations

5.1 Google Monorepo Lessons (2 Billion LOC Proven)

Key Insights from Google’s Playbook:

Principle	Google’s Approach	TiDB Application
Single Source of Truth	One repo for 95% of codebase	All 400 repos → 1 mono-repo
Trunk-Based Development	Direct commits to main, pre-commit review	Adopt from day 1
Code Ownership	Default open, CODEOWNERS enforcement	Directory-based ownership
Build System	Bazel (incremental, remote cache)	Bazel/Turborepo/Nx based on stack
Dependency Mgmt	Single version graph, automated updates	Dependency visualization tool
Code Review	Automated pre-checks + OWNERS	GitHub/GitLab CODEOWNERS
Infrastructure	Piper + CitC (partial checkout)	Git + shallow clones + sparse checkout

Google’s Scale (for reference):

2 billion lines of code
25,000+ engineers
45,000 commits/day
86 TB storage
Automation does 24,000 commits/day

Our AI Advantage: Google automated processes. We automate decisions.

5.2 Scale Challenges

Challenge	Solution	Google Reference
Git repo size	git-lfs, shallow clones, sparse checkout	CitC (partial checkout)
Build time	Incremental builds, remote caching	Bazel
CI/CD complexity	Path-based triggering	Automated pre-commit checks
Code ownership	CODEOWNERS file, clear boundaries	OWNERS files per workspace
Access control	Fine-grained permissions per directory	Default open, exceptions restricted
Search speed	Sourcegraph / Zoekt	CodeSearch engine
Dependency hell	Dependency graph visualization	Single version, automated updates

5.3 Tooling Requirements

Category	Tools	Recommendation
Build System	Bazel, Turborepo, Nx	Based on tech stack (see below)
Code Search	Sourcegraph, Zoekt	Sourcegraph (enterprise) or Zoekt (open)
Dependency Viz	Custom + graph DB	Build custom tool
CI/CD	GitHub Actions, GitLab CI	Path filtering required
Agent Framework	LangChain, AutoGen, custom	Custom (tuned for repo analysis)
Version Control	Git	Standard Git + sparse checkout

Build System by Tech Stack:

Go          → Bazel or Please
TypeScript  → Turborepo or Nx
Java        → Bazel or Gradle
Python      → Bazel or Pants
Mixed       → Bazel (most flexible)

5.4 Risk Mitigation

Risk	Mitigation	Google Parallel
Data loss	Full backups before each batch	Piper (distributed storage)
Downtime	Parallel run (old + new)	Release branches + feature flags
Broken builds	Comprehensive tests, canary deploys	Pre-commit verification
Team disruption	Gradual migration, training	Trunk-based culture
Rollback needed	Keep old repos read-only 30 days	Release branch rollback
Performance	Incremental builds, caching	Bazel remote cache

5.5 Trunk-Based Development Model (Google Standard)

main (trunk)
  │
  ├── All developers commit directly to main
  ├── Pre-commit code review required
  ├── Automated checks run before merge
  │
  └── release/v1.0  (branch for deployment only)
      └── Feature flags control visibility

Rules:

No long-lived feature branches
All changes reviewed before merge (pre-commit)
Small, frequent commits (not big bangs)
Feature flags for incomplete features
Release branches are for deployment, not development

Benefits:

No merge nightmares
Early conflict detection
Continuous delivery enabled
AI can safely make small, incremental changes

5.6 CODEOWNERS Structure

# Root CODEOWNERS file
# Format: path_pattern  @owner1 @owner2

# Products
products/tidb/*         @tidb-core-team @database-leads
products/tidb-next/*    @tidb-next-team @architecture-review

# Platform
platform/cloud-saas/*   @cloud-platform-team @platform-leads
platform/shared/*       @platform-architects

# DevOps
devops/*                @devops-team @sre-leads

# Shared Libraries (high scrutiny)
libs/*                  @platform-architects @tech-leads

# Infrastructure
infra/*                 @infra-team @security-review

# Build/Tooling
tools/*                 @devex-team
BUILD                   @build-maintainers

Review Policies:

libs/* requires 2 approvals (shared code impact)
products/* requires 1 approval + team lead
devops/* requires 1 approval + on-call SRE
Security-sensitive paths require security team approval

Next Steps (Planning Phase)

Day 1: Analysis Framework

Set up distributed agent infrastructure
Define analysis metrics and scoring
Create repo inventory (list all 400 repos)
Run pilot analysis on 10 repos

Day 2: Mono-Repo Design

Finalize directory structure
Design build system architecture
Plan migration tooling
Create detailed migration runbook

Deliverables

repo-analysis-report.md — Analysis of all 400 repos
mono-repo-structure.md — Detailed structure spec
migration-runbook.md — Step-by-step migration guide
ai-dev-loop-spec.md — AI closed-loop development spec
ai-first-methodology.md — AI-First engineering methodology (this framework)
ai-capability-maturity.md — AI capability maturity model (Micro→Meso→Macro→General Relativity)
google-monorepo-lessons.md — Google best practices reference ✅ DONE
codeowners-template.md — CODEOWNERS file template
build-system-evaluation.md — Bazel vs Turborepo vs Nx analysis

Open Questions

Tech stack: What languages/frameworks are in the 400 repos? (affects build system choice)
Team size: How many engineers will work in the mono-repo? (affects access control design)
Current CI/CD: What’s the existing pipeline? (affects migration complexity)
Deployment: How are services currently deployed? (affects infra design)
Agent hosting: Where will the 400 agents run? (local cluster, cloud, hybrid?)

Appendix: AI-First Methodology

Why This Matters

Most AI engineering efforts stop at the Micro layer:

Build some skills
Add some MCP tools
Improve individual workflows

This project goes further:

Layer       What Changes              Outcome
─────────────────────────────────────────────────────────
Micro       Tools & workflows         Faster individual tasks
Meso        Feature ownership         AI delivers features end-to-end
Macro       System architecture       AI optimizes across services
General     Everything                AI runs the engineering org

First Principles Reasoning

Question: What should AI be capable of in software engineering?

Answer: A good AI engineer should be able to:

Understand the full system (not just one repo)
Design improvements that span boundaries
Implement, test, and deploy changes
Monitor and iterate based on outcomes

Barrier: Fragmented codebases prevent #1.

Solution: Unified mono-repo designed for AI ownership.

Success Metrics

Metric	Current	Target (6mo)	Target (12mo)
AI-completed features	0%	20%	50%
AI-identified optimizations	0%	100/week	500/week
AI-deployed changes	0%	10%	40%
Human time on routine tasks	60%	30%	10%
System-wide tech debt	High	Reduced 25%	Reduced 60%

Last updated: Planning phase

“The goal is not to help humans do AI work. The goal is to have AI do the work, and humans define what matters.”

Keyboard shortcuts

Agentic Engineering Documentation