Mono-Repo Consolidation Plan
Agentic Engineering AI-First Initiative
“AI should be able to automatically complete a project from development to deployment.”
“Google proved monorepo scales to 2 billion lines. We’re building on that foundation with AI ownership.”
Overview
Goal: Consolidate 400+ repositories (~39GB) into an AI-friendly mono-repo with closed-loop development, testing, and progress management.
Strategic Context: This is not just a code consolidation — it’s a first-principles reimagining of AI-driven engineering. We’re building the foundation for AI to own the full lifecycle: architecture, development, testing, deployment, and iteration.
Inspired By: Google’s monorepo (2B LOC, 25K engineers, 45K commits/day)
Our Advantage: Google automated processes. We automate decisions with AI.
Timeline: Planning phase (1-2 days) → Execution phase (TBD)
AI-First Engineering Philosophy
Three Layers of AI-Driven Development
| Layer | Scope | Focus | This Project |
|---|---|---|---|
| Micro | Skills, MCP, Tools | Efficiency in existing systems | Foundation |
| Meso | Feature lifecycle | AI drives design→test→deploy | Core capability |
| Macro | System/org architecture | AI reorganizes everything | Ultimate goal |
Relativity Framework
Special Relativity (Near-term):
AI can automatically complete a single project: development, testing, deployment, launch
General Relativity (Ultimate):
AI unifies all company repositories, system architecture, deployment, modules — all deeply designed for AI ownership
This Project’s Place
Current State → Micro layer (tools, skills, MCP)
↓
This Project → Meso + Macro transition
↓
End State → General Relativity achieved
(AI owns full lifecycle across unified codebase)
Current State
Total Repos: ~400
Total Size: ~39GB
Categories:
- Products: TiDB, TiDB Next-Gen (database, storage, import/export tools)
- Platform: TiDB Cloud SaaS (control services, resource deployment, monitoring)
- DevOps: Online operations backend
- Forks: Third-party dependencies
- Abandoned: Unused projects
Problem: Fragmented codebase prevents AI from having full context.
AI cannot optimize across repo boundaries.
Human coordination overhead scales with repo count.
Phase 1: Repository Analysis (Distributed Agent Cluster)
1.1 Agent Architecture
┌─────────────────────────────────────────────────────────────┐
│ Orchestrator Agent │
│ - Coordinates 400+ repo agents │
│ - Aggregates analysis results │
│ - Makes merge recommendations │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Repo Agent │ │ Repo Agent │ │ Repo Agent │
│ (repo-001) │ │ (repo-002) │ │ (repo-400) │
└───────────────┘ └───────────────┘ └───────────────┘
1.2 Per-Repo Analysis Metrics
Each agent analyzes its repo for:
| Metric | Description | Weight |
|---|---|---|
| Freshness | Last commit date, activity frequency | High |
| Dependencies | Internal deps, external deps, circular refs | High |
| Code Quality | Test coverage, lint errors, tech debt | Medium |
| Documentation | README, API docs, architecture docs | Medium |
| Usage | Import count, deployment instances | High |
| Owner | Team ownership, maintenance status | Medium |
| Build System | CI/CD config, build scripts | Low |
1.3 Agent Implementation
# Agent spec (pseudo-code)
class RepoAgent:
def __init__(self, repo_path, repo_id):
self.repo_path = repo_path
self.repo_id = repo_id
def analyze(self):
return {
'freshness': self.check_freshness(),
'dependencies': self.map_dependencies(),
'code_quality': self.assess_quality(),
'documentation': self.scan_docs(),
'usage': self.detect_usage(),
'merge_recommendation': self.recommend(),
}
1.4 Distributed Execution Strategy
Challenge: 400+ agents running concurrently
Solution: Batched parallel execution
- Batch size: 50 agents (adjustable based on resources)
- Total batches: 8 (400/50)
- Estimated time per batch: 5-10 minutes
- Total analysis time: ~1-2 hours
Resource Requirements:
- CPU: 8+ cores recommended
- Memory: 16GB+ recommended
- Disk I/O: SSD preferred (39GB read operations)
Phase 2: Mono-Repo Design
2.1 Target Structure
mono-repo/
├── products/
│ ├── tidb/ # TiDB database core
│ │ ├── server/
│ │ ├── storage/
│ │ └── tools/
│ └── tidb-next/ # Next-gen database
│ ├── server/
│ ├── storage/
│ └── tools/
├── platform/
│ ├── cloud-saas/ # TiDB Cloud platform
│ │ ├── control-plane/
│ │ ├── resource-deploy/
│ │ ├── monitoring/
│ │ └── api-gateway/
│ └── shared-services/ # Cross-platform services
├── devops/
│ ├── ops-backend/ # Operations tools
│ ├── ci-cd/
│ └── deployment/
├── libs/ # Shared libraries
│ ├── common/
│ ├── utils/
│ └── protocols/
├── tools/ # Build/dev tools
├── docs/ # Centralized documentation
└── infra/ # Infrastructure as code
2.2 AI-Friendly Design Principles
- Clear Boundaries: Each component has well-defined interfaces
- Self-Contained: Components can be understood in isolation
- Documented Contracts: API specs, data schemas, protocols
- Testable: Clear test boundaries, mockable interfaces
- Versioned: Internal versioning for breaking changes
2.3 Build System
# Monorepo build orchestration
- Turborepo / Nx / Bazel (depending on tech stack)
- Incremental builds (only changed components)
- Parallel test execution
- Dependency graph visualization
Phase 3: Migration Strategy
3.1 Migration Priority
| Priority | Category | Criteria | Action |
|---|---|---|---|
| P0 | Active core products | High usage, active development | Migrate first |
| P1 | Platform services | Critical infrastructure | Migrate early |
| P2 | DevOps tools | Important but isolated | Migrate mid-phase |
| P3 | Low-activity repos | Minor usage, stable | Migrate late |
| P4 | Abandoned repos | No activity >1 year | Archive or delete |
| P5 | Forked dependencies | Third-party forks | Evaluate: keep upstream? |
3.2 Migration Process (Per Repo)
1. Pre-migration check
├── Dependency analysis
├── Conflict detection
└── Build verification
2. Code transfer
├── Preserve git history (git filter-repo)
├── Map to new structure
└── Update import paths
3. Integration
├── Update build configs
├── Fix dependency references
└── Run tests
4. Validation
├── CI/CD passes
├── Integration tests pass
└── Smoke tests in staging
5. Cutover
├── Update deployment configs
├── Switch CI/CD to mono-repo
└── Archive old repo (read-only)
3.3 Estimated Timeline
| Phase | Repos | Duration |
|---|---|---|
| Planning & Analysis | All 400 | 2 days |
| P0 Migration (core) | ~50 | 3-5 days |
| P1 Migration (platform) | ~100 | 5-7 days |
| P2-P3 Migration | ~150 | 7-10 days |
| P4-P5 Cleanup | ~100 | 2-3 days |
| Total | 400 | ~3-4 weeks |
Phase 4: AI Closed-Loop Development
4.1 The AI-First Vision
This mono-repo is designed to enable General Relativity: AI owns the full system lifecycle.
┌─────────────────────────────────────────────────────────────────────┐
│ AI Ownership Spectrum │
│ │
│ Micro Meso Macro General Rel. │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ Tools Feature System AI owns │
│ & Skills Lifecycle Architecture Everything │
│ │
│ [Current] [Phase 4.2] [Phase 4.3] [End State] │
└─────────────────────────────────────────────────────────────────────┘
4.2 Development Loop (Meso Layer)
┌─────────────────────────────────────────────────────────────┐
│ AI Development Loop │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Plan │───▶│ Code │───▶│ Test │───▶│ Review │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ ▲ │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Progress Management │ │
│ │ - Task tracking │ │
│ │ - Sprint planning │ │
│ │ - Blocker detection │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
4.2 AI Capabilities
| Capability | Description | Implementation |
|---|---|---|
| Code Generation | Generate features, fixes, refactors | LLM + context from repo |
| Test Generation | Auto-generate unit/integration tests | Coverage-guided |
| Code Review | Automated PR review, style checks | Static analysis + LLM |
| Bug Detection | Identify potential issues | Pattern matching + ML |
| Documentation | Auto-generate/update docs | Code → docs extraction |
| Progress Tracking | Sprint planning, task estimation | Historical data + LLM |
4.3 System Architecture Ownership (Macro Layer)
AI Reorganizes System Architecture:
┌─────────────────────────────────────────────────────────────────┐
│ AI-Designed System Architecture │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Product │ │ Platform │ │ DevOps │ │
│ │ Services │◀───▶│ Services │◀───▶│ Services │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ▲ ▲ ▲ │
│ └────────────────────┼────────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ AI Orchestrator│ │
│ │ - Discovers │ │
│ │ - Optimizes │ │
│ │ - Refactors │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
AI Capabilities at Macro Layer:
- Architecture Discovery: Map service dependencies, data flows, bottlenecks
- Automated Refactoring: Identify and execute cross-service improvements
- Interface Optimization: Evolve APIs based on usage patterns
- Tech Debt Management: Prioritize and fix systemic issues
4.4 Deployment & Operations Ownership (General Relativity)
AI-Managed Infrastructure:
# Auto-scaling policies (AI-optimized)
resource_policies:
- service: control-plane
scaling:
min_instances: 3
max_instances: 50
metrics: [cpu, memory, request_latency]
ai_optimizer: enabled
- service: resource-deploy
multi_region:
regions: [us-east, eu-west, ap-southeast]
ai_routing: enabled # AI decides optimal region
AI Responsibilities:
- Predict load patterns
- Auto-scale before traffic spikes
- Optimize resource allocation across regions
- Detect and respond to anomalies
- Cost optimization (right-sizing, spot instances)
- Self-healing: Automatic incident response and recovery
- Continuous Optimization: A/B test deployments, rollback on metrics
4.5 End State: General Relativity Achieved
┌─────────────────────────────────────────────────────────────────┐
│ General Relativity: AI Owns Everything │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Unified Codebase │ │
│ │ (400 repos → 1 mono-repo, AI-readable, AI-optimizable) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ AI Dev │ │ AI Ops │ │ AI Org │ │
│ │ - Designs │ │ - Deploys │ │ - Plans │ │
│ │ - Codes │ │ - Scales │ │ - Staffs │ │
│ │ - Tests │ │ - Monitors │ │ - Allocates│ │
│ │ - Reviews │ │ - Heals │ │ - Optimizes│ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Result: Human engineers focus on strategy, creativity, │
│ and high-level problem definition. │
│ AI handles execution at all layers. │
└─────────────────────────────────────────────────────────────────┘
Phase 5: Technical Considerations
5.1 Google Monorepo Lessons (2 Billion LOC Proven)
Key Insights from Google’s Playbook:
| Principle | Google’s Approach | TiDB Application |
|---|---|---|
| Single Source of Truth | One repo for 95% of codebase | All 400 repos → 1 mono-repo |
| Trunk-Based Development | Direct commits to main, pre-commit review | Adopt from day 1 |
| Code Ownership | Default open, CODEOWNERS enforcement | Directory-based ownership |
| Build System | Bazel (incremental, remote cache) | Bazel/Turborepo/Nx based on stack |
| Dependency Mgmt | Single version graph, automated updates | Dependency visualization tool |
| Code Review | Automated pre-checks + OWNERS | GitHub/GitLab CODEOWNERS |
| Infrastructure | Piper + CitC (partial checkout) | Git + shallow clones + sparse checkout |
Google’s Scale (for reference):
- 2 billion lines of code
- 25,000+ engineers
- 45,000 commits/day
- 86 TB storage
- Automation does 24,000 commits/day
Our AI Advantage: Google automated processes. We automate decisions.
5.2 Scale Challenges
| Challenge | Solution | Google Reference |
|---|---|---|
| Git repo size | git-lfs, shallow clones, sparse checkout | CitC (partial checkout) |
| Build time | Incremental builds, remote caching | Bazel |
| CI/CD complexity | Path-based triggering | Automated pre-commit checks |
| Code ownership | CODEOWNERS file, clear boundaries | OWNERS files per workspace |
| Access control | Fine-grained permissions per directory | Default open, exceptions restricted |
| Search speed | Sourcegraph / Zoekt | CodeSearch engine |
| Dependency hell | Dependency graph visualization | Single version, automated updates |
5.3 Tooling Requirements
| Category | Tools | Recommendation |
|---|---|---|
| Build System | Bazel, Turborepo, Nx | Based on tech stack (see below) |
| Code Search | Sourcegraph, Zoekt | Sourcegraph (enterprise) or Zoekt (open) |
| Dependency Viz | Custom + graph DB | Build custom tool |
| CI/CD | GitHub Actions, GitLab CI | Path filtering required |
| Agent Framework | LangChain, AutoGen, custom | Custom (tuned for repo analysis) |
| Version Control | Git | Standard Git + sparse checkout |
Build System by Tech Stack:
Go → Bazel or Please
TypeScript → Turborepo or Nx
Java → Bazel or Gradle
Python → Bazel or Pants
Mixed → Bazel (most flexible)
5.4 Risk Mitigation
| Risk | Mitigation | Google Parallel |
|---|---|---|
| Data loss | Full backups before each batch | Piper (distributed storage) |
| Downtime | Parallel run (old + new) | Release branches + feature flags |
| Broken builds | Comprehensive tests, canary deploys | Pre-commit verification |
| Team disruption | Gradual migration, training | Trunk-based culture |
| Rollback needed | Keep old repos read-only 30 days | Release branch rollback |
| Performance | Incremental builds, caching | Bazel remote cache |
5.5 Trunk-Based Development Model (Google Standard)
main (trunk)
│
├── All developers commit directly to main
├── Pre-commit code review required
├── Automated checks run before merge
│
└── release/v1.0 (branch for deployment only)
└── Feature flags control visibility
Rules:
- No long-lived feature branches
- All changes reviewed before merge (pre-commit)
- Small, frequent commits (not big bangs)
- Feature flags for incomplete features
- Release branches are for deployment, not development
Benefits:
- No merge nightmares
- Early conflict detection
- Continuous delivery enabled
- AI can safely make small, incremental changes
5.6 CODEOWNERS Structure
# Root CODEOWNERS file
# Format: path_pattern @owner1 @owner2
# Products
products/tidb/* @tidb-core-team @database-leads
products/tidb-next/* @tidb-next-team @architecture-review
# Platform
platform/cloud-saas/* @cloud-platform-team @platform-leads
platform/shared/* @platform-architects
# DevOps
devops/* @devops-team @sre-leads
# Shared Libraries (high scrutiny)
libs/* @platform-architects @tech-leads
# Infrastructure
infra/* @infra-team @security-review
# Build/Tooling
tools/* @devex-team
BUILD @build-maintainers
Review Policies:
libs/*requires 2 approvals (shared code impact)products/*requires 1 approval + team leaddevops/*requires 1 approval + on-call SRE- Security-sensitive paths require security team approval
Next Steps (Planning Phase)
Day 1: Analysis Framework
- Set up distributed agent infrastructure
- Define analysis metrics and scoring
- Create repo inventory (list all 400 repos)
- Run pilot analysis on 10 repos
Day 2: Mono-Repo Design
- Finalize directory structure
- Design build system architecture
- Plan migration tooling
- Create detailed migration runbook
Deliverables
repo-analysis-report.md— Analysis of all 400 reposmono-repo-structure.md— Detailed structure specmigration-runbook.md— Step-by-step migration guideai-dev-loop-spec.md— AI closed-loop development specai-first-methodology.md— AI-First engineering methodology (this framework)ai-capability-maturity.md— AI capability maturity model (Micro→Meso→Macro→General Relativity)google-monorepo-lessons.md— Google best practices reference ✅ DONEcodeowners-template.md— CODEOWNERS file templatebuild-system-evaluation.md— Bazel vs Turborepo vs Nx analysis
Open Questions
- Tech stack: What languages/frameworks are in the 400 repos? (affects build system choice)
- Team size: How many engineers will work in the mono-repo? (affects access control design)
- Current CI/CD: What’s the existing pipeline? (affects migration complexity)
- Deployment: How are services currently deployed? (affects infra design)
- Agent hosting: Where will the 400 agents run? (local cluster, cloud, hybrid?)
Appendix: AI-First Methodology
Why This Matters
Most AI engineering efforts stop at the Micro layer:
- Build some skills
- Add some MCP tools
- Improve individual workflows
This project goes further:
Layer What Changes Outcome
─────────────────────────────────────────────────────────
Micro Tools & workflows Faster individual tasks
Meso Feature ownership AI delivers features end-to-end
Macro System architecture AI optimizes across services
General Everything AI runs the engineering org
First Principles Reasoning
Question: What should AI be capable of in software engineering?
Answer: A good AI engineer should be able to:
- Understand the full system (not just one repo)
- Design improvements that span boundaries
- Implement, test, and deploy changes
- Monitor and iterate based on outcomes
Barrier: Fragmented codebases prevent #1.
Solution: Unified mono-repo designed for AI ownership.
Success Metrics
| Metric | Current | Target (6mo) | Target (12mo) |
|---|---|---|---|
| AI-completed features | 0% | 20% | 50% |
| AI-identified optimizations | 0% | 100/week | 500/week |
| AI-deployed changes | 0% | 10% | 40% |
| Human time on routine tasks | 60% | 30% | 10% |
| System-wide tech debt | High | Reduced 25% | Reduced 60% |
Last updated: Planning phase
“The goal is not to help humans do AI work. The goal is to have AI do the work, and humans define what matters.”