Code health
A 1–10 health score for every file from 25 deterministic biomarkers — complexity, cohesion, duplication, coverage, and git-behavioural risk. Zero LLM calls, defect-calibrated weights, validated to out-rank the leading commercial code-health tool at predicting real bugs.
Graph and wiki layers have competition now. Code health is where repowise is unique — and unlike most "code quality" scores, this one is calibrated against real defect history and benchmarked head-to-head against the leading commercial tool in the space.
Repowise scores every file from 1 to 10 using 25 deterministic biomarkers computed over tree-sitter ASTs and git history. No LLM calls, no cloud requirement, no new runtime dependencies — pure Python that finishes in under 30 seconds on a 3,000-file repo.
The biomarkers
Each file starts at 10.0; biomarker findings deduct from the score, with a cap per category so no single category can dominate.
| Category | Biomarkers |
|---|---|
| Structural complexity | brain_method, nested_complexity, bumpy_road, complex_conditional, complex_method, large_method, primitive_obsession |
| Cohesion & size | low_cohesion (LCOM4), god_class |
| Duplication | dry_violation (native Rabin–Karp clone detection) |
| Test coverage | untested_hotspot, coverage_gap, coverage_gradient |
| Test quality | large_assertion_block, duplicated_assertion_block |
| Organizational / git | developer_congestion, knowledge_loss, hidden_coupling, function_hotspot, code_age_volatility, ownership_risk, churn_risk, change_entropy, co_change_scatter, prior_defect |
The three repo-level KPIs:
- Hotspot Health — NLOC-weighted average over the top-25% hotspot files.
- Average Health — NLOC-weighted average over all files.
- Worst Performer — the single lowest-scoring file.
Calibrated, not hand-tuned
The biomarker weights are learned offline from a real defect corpus. Each file is scored at the commit immediately before a 6-month defect window (T0, so the measurement can't leak future information), and an L2-regularized logistic regression — with file size (NLOC) as an explicit control — fits each biomarker's defect lift beyond size. Only the learned constants ship; the runtime stays fully deterministic.
The strongest calibrated predictors: co_change_scatter,
change_entropy, ownership_risk, and nested_complexity.
Does the score predict real bugs?
Yes — validated across 21 open-source repositories spanning all nine Full-tier languages (Python, TypeScript, JavaScript, Java, Kotlin, Go, Rust, C++, C#):
| Result | Value |
|---|---|
| Cross-project mean ROC AUC | 0.74 [95% CI 0.68–0.79] (up to 0.90 on individual repos) |
| Survives controlling for file size | partial Spearman ρ = −0.16 |
| Beats recent-churn baseline | +0.10 AUC (DeLong p < 1e-9) |
| Beats prior-defect baseline | +0.12 AUC |
| External, never-seen dataset (PROMISE/jEdit) | AUC 0.76–0.78 |
Head-to-head vs the leading commercial tool
On the same 2,770 files across 9 languages, scored at the same leakage-free commit against the same defect labels, with paired significance tests:
| Axis | repowise | Leading commercial tool |
|---|---|---|
| Recall @ 20%-of-lines budget | 0.173 | 0.074 |
| Effort-aware ranking (Popt) | 0.607 | 0.462 |
| Defect density, size-normalized (defects/KLOC) | 2.18× | 0.56× |
| Discrimination (ROC AUC) | 0.731 | 0.705 |
Ranking by repowise health surfaces 2.3× the defects under a fixed review budget (Popt Δ +0.144, recall Δ +0.098, density Δ all paired and significant at p = 0.003).
Full methodology, confidence intervals, and reproduction steps live in repowise-bench — the health-defect report and the head-to-head comparison.
Using it
repowise health # KPIs + lowest-scoring files
repowise health --coverage cov.lcov # ingest LCOV/Cobertura/Clover → untested-hotspot
repowise health --refactoring-targets # ranked by impact / effort
repowise health --trend # snapshots + declining / predicted-decline alerts- Coverage ingestion — LCOV, Cobertura, Clover, or normalized JSON light up the test-coverage biomarkers.
- Trend tracking — a rolling 50-row snapshot history powers
Declining HealthandPredicted Declinealerts. - Refactoring targets — deterministic, rule-based, ranked by impact / effort.
- Per-file overrides —
.repowise/health-rules.jsondisables biomarkers per glob.
Your agent reaches the same data through the
get_health MCP tool, and a single-line summary
shows up in repowise status. Full CLI reference:
repowise health.
How it connects to the other layers
Code health isn't a silo — it reuses signals from every other layer:
- Git feeds the organizational biomarkers (ownership, churn, co-change scatter, knowledge loss).
- Graph feeds centrality (a
brain_methodmust be central, not just long) andhidden_coupling. - Decisions surface as
ungoverned_hotspotandstale_governancehealth findings.
That's why a repowise health score means more than a complexity linter's: it knows not just that a file is complex, but that it's complex, central, churned by many hands, and untested.
Architectural decisions
First-class decision records linked to the files they govern — captured by hand, mined from git history, or extracted from inline markers. Surfaced to AI agents via get_why so the why survives the team.
Workspace setup
Index multiple related git repos as a single repowise workspace. Get cross-repo co-changes, API contract matching, and federated MCP queries on top of per-repo intelligence.