# Geometry Extraction

The project now includes a local, dependency-free mmCIF extraction script:

```text
scripts/extract_structure_geometry.py
```

It reads structure targets from:

```text
data/cofactor-targets.json
```

and writes:

```text
data/extracted-geometry.json
src/model/extracted-geometry.js
```

## Current Result

The first extraction targets are:

- RCSB PDB `5XTD`, human respiratory Complex I
- RCSB PDB `8Q1U`, bovine Complex I with ubiquinone-10 (`U10`)

The script found:

- FMN
- six `SF4` clusters
- two `FES` clusters
- cardiolipin and other bound lipids/ligands

The script did not assign a Q node for `5XTD` because the configured ubiquinone-like comp IDs were not present. This is expected for some structures and should not be forced.

The comparative bovine `8Q1U` pass did identify `U10` and computed a comparative `N2 -> Q` nearest-ligand-atom distance. The app marks this edge with an asterisk because it is not a direct human `5XTD` measurement and still needs redox-active headgroup validation.

The extractor now applies redox-atom filters:

- Fe-S centers: `FE1-FE4` and `S1-S4`
- FMN isoalloxazine region: ring/redox atoms, excluding most tail atoms
- U10/Q headgroup: `C1-C6` and `O2-O5`

For `8Q1U`, the comparative `N2 -> Q` edge remains `24.7 A` after this filtering, from `S2` on the N2-assigned cluster to U10 `O3`. The app flags this as a long comparative edge because it lies outside the usual `4-14 A` productive single-hop window.

## Important Caveat

Fe-S labels are currently inferred by greedy nearest-neighbor ordering from FMN. This gives a useful geometry scaffold, but it is not publication-grade naming until manually checked against the structure/literature.

The UI therefore treats the extracted path as:

```text
mmCIF-derived distances, manual/inferred candidate labels, review required
```

`data/manual-cofactor-map.json` pins the current `5XTD` candidate residue keys so app output remains stable while review is underway. That file is deliberately labeled as review-required, not final curation.

## Next Upgrade

1. Manually verify residue IDs for the Complex I Fe-S centers.
2. Review whether `8Q1U` captures a catalytically relevant Q pose or an open/deactivated-state geometry.
3. Curate edge-specific redox potentials and driving forces for each hop.
4. Compare the inferred path against manually curated Complex I cofactor nomenclature.
