Discovery Update: PRR36 & DNAH5 independently replicated in MSKCC cohort (GSE21034). Deeper ciliary pathway analysis complete. Wet lab validation is the next phase.

Independent replication achieved

Independently replicated across two cohorts. Wet lab is the next step.

PRR36 and DNAH5 were identified through a governed AI screen of 19,010 genes and independently replicated in the MSKCC cohort (GSE21034). Wet lab biological validation is the planned next phase.

Current report
Two prostate cancer gene candidates replicated across independent cohorts
Current status: replication complete · wet lab planning next
PRR36 replicated — MSKCC cohort DNAH5 replicated — p = 3.55×10⁻¹⁵ Documentation in preparation GSE21034 · 150 tumors · 29 normal
First successful governed AI discovery
Early discovery signal — PRR36

E-DICE-R surfaced a result for further validation.

E-DICE-R's governed AI platform screened 19,010 genes across six molecular evidence types and identified PRR36 as a prostate cancer candidate — with zero prior cancer publications at the time of discovery. No human hypothesis guided the result. It was surfaced from data alone, under full governance controls.

Gene candidate PRR36 — Proline-rich region 36 — identified from TCGA prostate cancer dataset.
Key signal 3.7× tumor overexpression, p = 2.57×10⁻²¹, perfect reproducibility across tested dataset.
Literature status Zero publications found connecting PRR36 to any cancer context at time of discovery.
Data source Public TCGA dataset — no proprietary or patient-identifiable data used.
Discovery method Autonomous governed AI workflow — no human guidance in candidate selection.
Discovery by the numbers

Screened at scale. Found through governance.

No pre-selection, no hypothesis. Every gene screened equally — governance controlled the quality gates.

19,010
Total genes screened across all evidence types
95%+
Candidates eliminated by quality gates — no human steering
942
Candidates advanced for cross-database literature review
3.7×
Tumor overexpression in the lead discovery signal, PRR36
Independent cohort replication — MSKCC

Two genes. Two independent datasets. Both replicate.

Both genes tested in an entirely independent cohort — GSE21034 from MSKCC — different patients, different platform, no data overlap with the discovery set. Both held up.

Original discovery candidate

PRR36

Proline Rich Region 36 — prostate cancer overexpression candidate identified from TCGA-PRAD by governed AI screen. Zero prior cancer publications at time of discovery.
3.7×
Overexpression
TCGA-PRAD
2.57×10⁻²¹
p-value
Original cohort
p = 0.003
p-value
MSKCC replication
75–80%
Confidence post-replication
Co-candidate — stronger MSKCC signal

DNAH5

Dynein Axonemal Heavy Chain 5 — co-identified in the same governed screen. Shows an even stronger independent replication signal than PRR36 in the MSKCC cohort.
Significant
Overexpression
TCGA-PRAD
High
Statistical strength
Original cohort
3.55×10⁻¹⁵
p-value
MSKCC replication
80–85%
Confidence post-replication
Confidence change — before vs. after independent replication

Confidence estimates reflect strength of computational evidence — not clinical validation probability.

PRR36 Original candidate
Before: 65–70% 75–80%
Pre-replication +8–10 pts after MSKCC
p
p = 0.003
MSKCC cohort · 150 tumor samples · 29 normal
Replicated — MSKCC
DNAH5 Co-candidate
Before: 60–65% 80–85%
Pre-replication +18–22 pts after MSKCC
p
p = 3.55×10⁻¹⁵
MSKCC cohort · extremely strong independent signal
Replicated — MSKCC
Discovery cohort vs. replication cohort
Discovery cohort
TCGA-PRAD
The Cancer Genome Atlas — Prostate Adenocarcinoma. Public multi-institution dataset. Used in original governed AI discovery screen that identified PRR36 and DNAH5 from 19,010 genes.
PRR36 p = 2.57×10⁻²¹ · 3.7× overexpression
DNAH5 Significant overexpression identified
Replication cohort — independent
MSKCC — GSE21034
Memorial Sloan Kettering Cancer Center. 150 prostate tumor samples + 29 normal samples. Independent platform, independent patients, no data overlap with TCGA discovery cohort.
PRR36 p = 0.003 — replicated ✓
DNAH5 p = 3.55×10⁻¹⁵ — strongly replicated ✓
Deeper Analysis — April 2026

A deeper layer of findings. An understudied pathway appears.

After replication, a systematic literature search revealed something broader: not just an unstudied gene — an unstudied biological pathway. And DNAH5 overexpresses in isolation from its own protein complex.

PubMed survey — April 2026

The Ciliary Gap: 8 genes. Zero prostate cancer papers.

A search across all canonical axonemal dynein and ciliary motor genes returned zero prostate cancer publications for every gene in the family. Conventional research has not looked here.

Ciliary Motor Protein Family — Prostate Cancer Literature
Systematic PubMed search · April 2026 · All 8 canonical genes surveyed
8
Genes
surveyed
0
Papers
found
Gene
Role in ciliary complex
Papers found
DNAH5 Key candidate
Axonemal dynein heavy chain — primary ciliary motor component, overexpressed alone in our dataset
0 no papers
DNAI1
Dynein intermediate chain — outer dynein arm assembly
0no papers
DNAI2
Dynein intermediate chain — cilia motility regulation
0no papers
DNALI1
Dynein light intermediate chain — inner dynein arm component
0no papers
NME8
Ciliary outer dynein arm — NME/NDP kinase family
0no papers
DNAL1
Dynein axonemal light chain 1 — outer arm structural subunit
0no papers
CCDC114
Outer dynein arm docking complex — ciliary anchor protein
0no papers
RSPH4A
Radial spoke head 4A — inner ciliary structural component
0no papers
DNAH5 — Expression Pattern

Overexpressed alone — without its normal partners

DNAH5 has 10 high-confidence STRING interaction partners (scores 0.935–0.997), all ciliary components. None of the 10 appear in the 942 passing genes from the governed screen.

If the ciliary program were being activated in prostate cancer, the whole complex would be expected to rise together. Only DNAH5 rises. Two possible interpretations:

  • Moonlighting: DNAH5 may be serving a non-ciliary function in prostate tumor cells — a recognized phenomenon in cancer biology
  • Pathway disruption: Ciliary signaling may be dysregulated in a way distinct from other cancers

Neither interpretation is established. Both are testable in wet lab. This is an atypical expression pattern — the biological role remains unknown until further study.

PRR36 + ARHGEF38 — Convergence

Two independent paths converge on the same gene pair

PRR36 has one STRING interaction partner: ARHGEF38, a Rho GEF involved in cytoskeletal regulation and cell motility. ARHGEF38 independently appears in the 942 passing genes from the same governed screen — through a completely separate evidence path.

Two unrelated screening paths converging on the same gene pair is unlikely by chance. This provides functional context for an otherwise unknown gene:

  • ARHGEF38 is a cytoskeletal regulator — biology mechanistically linked to invasion and metastasis
  • The convergence is the most actionable functional lead from the current analysis
  • Pattern is consistent with dysregulated cell motility — a known driver of progression

This is a mechanistic observation. Association with clinical aggressiveness requires validation against outcomes data.

Preliminary interpretation — not a clinical claim

What these signals may suggest — and what they do not yet establish

PRR36 + ARHGEF38

Points toward cytoskeletal control and cell movement. This biology is mechanistically linked to invasion, metastasis, and disease progression. The signal is consistent with a more invasive phenotype — not yet demonstrated to track with clinical outcomes such as Gleason score or recurrence.

DNAH5 isolated overexpression

May reflect a cellular stress state, altered intracellular transport, or dedifferentiation — patterns that can appear in aggressive tumors, but less mechanistically direct than the PRR36/ARHGEF38 signal. Biological role remains uncharacterized.

!

To suggest aggression association, the following are needed:

Gleason score, metastatic vs. localized samples, biochemical recurrence, or overall/progression-free survival data. Current framing: "Consistent with more aggressive biology — not yet established." Wet lab and outcomes data are the decisive next steps.

Preliminary clinical context — not yet established

What these signals could mean for patients — if confirmed.

These are mechanistic observations from computational data, not clinical proof. What follows is what the biology may suggest — and what validation against outcomes data is needed to confirm.

Detection

Earlier, more precise risk stratification

If PRR36 and DNAH5 expression tracks with aggressive disease, they could flag higher-risk biology earlier — moving toward treating the patient's specific tumor rather than all prostate cancer identically. This requires association with Gleason score, recurrence, or outcomes data first.

Treatment context

A signal toward invasive potential

PRR36 + ARHGEF38 points to cytoskeletal regulation and cell motility — biology mechanistically tied to invasion and spread. If this pathway is active, it could inform decisions around surgery, radiation, or systemic therapy. Not yet demonstrated against clinical outcomes.

Drug targets

New mechanisms, new intervention points

Most treatments target well-studied pathways. An unstudied mechanism — whether DNAH5 repurposed outside its ciliary complex, or PRR36 feeding into cytoskeletal signaling — opens the door to more precise therapies with fewer off-target effects. Requires functional validation first.

Current framing

Consistent with more aggressive biology — not yet established

To move from "mechanistic hints" to clinical claims, validation is needed against: Gleason score, metastatic vs. localized samples, biochemical recurrence, and survival data. That is exactly what the next phase addresses.

Multi-omics methodology

Six evidence types. Each one governed.

Every candidate was evaluated across six independent molecular evidence streams. A candidate only advances if it passes the quality gate for its evidence type. The governance layer enforces these gates — there is no manual override, no bias toward prior literature.

01
Gene expression (transcriptomics)
Differential expression analysis across tumor vs. normal tissue using TCGA RNA-seq data. Minimum fold-change threshold enforced by governance gate.
Passed — 3.7× overexpression
02
Mutation profiling (genomics)
Somatic mutation frequency and pattern analysis across prostate cancer samples. Evaluates whether genetic alterations are consistent with oncogenic activity.
Passed — mutation pattern consistent
03
Survival analysis
Kaplan-Meier and Cox proportional hazards modeling across TCGA patient cohorts. High expression associated with survival outcome differences.
Passed — survival signal present
04
Cross-dataset reproducibility
Signal consistency observed across multiple sub-cohorts within TCGA. The governance layer requires perfect reproducibility before a candidate advances to literature review.
Passed — 100% reproducibility
05
Literature and database review
Automated cross-reference against PubMed, CancerGene, COSMIC, and primary cancer databases. Candidates with extensive prior publication are deprioritized — novelty is valued.
Passed — 0 cancer publications found
06
Structural druggability assessment
AlphaFold-based protein structure prediction combined with binding pocket detection and druggability scoring using standard bioinformatics pipelines.
⊙ In validation
Governance-enabled discovery pipeline

How PRR36 was found autonomously.

Policy-enforced quality gates at every stage — no human steering on which genes to examine, which thresholds to apply, or which candidates to advance.

01
Dataset ingestion — TCGA prostate cancer cohort
Public TCGA dataset loaded into the governed pipeline. No proprietary or patient-identifiable data. Governance layer validates dataset integrity and provenance before any analysis begins.
Input: 19,010 genes | Source: TCGA | Type: RNA-seq + mutation + clinical
02
Multi-evidence screening — six parallel evidence streams
All 19,010 genes are screened simultaneously across expression, mutation, survival, and reproducibility evidence types. Each evidence stream has a governed quality gate. Candidates failing any gate are eliminated at this stage.
Output: 942 candidates passed all primary quality gates (>95% eliminated)
03
Authority-weighted ranking — evidence strength scoring
Remaining 942 candidates are ranked by a governed authority score that weights evidence strength, statistical significance, reproducibility, and cross-modal consistency. Candidates with higher authority scores represent stronger, more consistent signals.
PRR36 ranked in top tier — combined authority score above threshold
04
Literature cross-reference — novelty validation
Top-ranked candidates are cross-referenced against PubMed, COSMIC, CancerGene, and primary cancer literature databases. Automated search across all indexed sources. The governance layer records all searches and their results as part of the audit trail.
PRR36: 0 publications found connecting this gene to any cancer — novel candidate — no prior publications found
05
Cryptographic audit trail — full execution log
Every decision in the pipeline — every quality gate pass, every evidence weight, every ranking score, every literature query — is logged in a SHA-256 hash-chained audit trail. The discovery is replayable from first principles at any time.
Hash-chained log: complete | Replayable: yes | Deterministic: yes
06
Output — PRR36 surfaced as lead discovery candidate
PRR36 emerged as the lead candidate from a fully autonomous, governance-controlled pipeline. It was not chosen by a human. It was not hypothesized in advance. It was surfaced by evidence, ranked by authority, and confirmed as having no prior cancer publications through automated literature review. This is what a governed AI platform can do.
Discovery candidate — currently under biological evaluation
Next phase — biological testing

Replication complete. Wet lab is next.

Computational discovery done. Independent replication complete. Wet lab experiments will determine whether PRR36 and DNAH5 behave in living cancer cells the way the data predicts.

01 — qPCR validation

Quantitative PCR — expression confirmation

Quantitative PCR experiments in prostate cancer cell lines will directly measure PRR36 and DNAH5 mRNA expression levels, confirming whether overexpression observed in public RNA-seq data holds in controlled laboratory conditions.

Planning
02 — Protein presence

Western blot & IHC — protein-level confirmation

RNA expression must be paired with protein-level evidence. Western blot and immunohistochemistry (IHC) assays will test whether the overexpression detected in transcriptomic data translates to measurable protein abundance in tumor tissue.

Planning
03 — Functional studies

Cell knockdown & overexpression assays

If a gene is truly oncogenic, disrupting its expression should affect cancer cell behavior. Knockdown and overexpression assays will test whether PRR36 and DNAH5 influence proliferation, migration, or apoptosis in prostate cancer cell lines — the first functional evidence of biological role.

Pending — follows qPCR & protein
!

Computational replication ≠ clinical validation

Independent computational replication across two cohorts is a strong scientific milestone — but it is not wet lab validation, and it is not a clinical claim. PRR36 and DNAH5 are computationally discovered candidates with independent replication confirmed. The next phase is biological testing. We will update this page as each stage completes and will not advance clinical claims before that evidence exists.

What comes next

Validation roadmap — careful science, open progress.

We will update this page as each stage completes.

Stage 01
Computational discovery
Governed multi-omics screen — 19,010 genes, six evidence types, PRR36 & DNAH5 identified, audit trail complete.
Complete
Stage 02
Independent replication
Both PRR36 (p=0.003) and DNAH5 (p=3.55×10⁻¹⁵) replicated in independent MSKCC cohort GSE21034 — 150 tumors, 29 normal samples.
Complete — April 2026
Stage 03
Scientific documentation
Governing documentation for "Two Previously Unreported Prostate Cancer Gene Candidates Identified Through Governed Multi-Omics Screening of TCGA-PRAD" — in preparation.
In preparation — Apr 2026
Stage 04
Wet lab validation
qPCR expression confirmation, Western blot/IHC protein presence, and functional cell-line knockdown assays for PRR36 and DNAH5. Follows computational replication phase.
Pending — next stage
Stage 05
Structural druggability
AlphaFold protein structure and binding pocket assessment. VQE-assisted druggability scoring under development for both candidates.
Pending
Stage 06
Clinical relevance review
Dr. Fontanez (DNP, FNP-BC) will assess signal relevance against clinical prostate cancer patient outcomes for both PRR36 and DNAH5.
Pending
The team behind the discovery

Interdisciplinary depth. Responsible execution.

This discovery was produced by a governed AI platform — but validating it requires human expertise. Our team spans the full validation chain: from computational biology and multi-omics analysis to applied mathematics, clinical research, and full-stack system delivery.

George Soto, MBA

Founder & Principal Investigator

System architect of the governed discovery platform. Designed the authority computation, quality gates, and deterministic execution pipeline that surfaced PRR36. Lead author, SPIE 2026.

Dr. Athar Hussain, PhD

Advisor — Data Science & Multi-omics

PhD Biotechnology, NIBGE-PIEAS. Leads biological validation, multi-omics integration, and interpretation of the PRR36 signal. Active research lab with undergraduate and graduate students.

Dr. Laura Fontanez, DNP

Advisor — Clinical Research

Board-certified Family Nurse Practitioner (FNP-BC). Provides clinical relevance assessment — bridging computational findings with real-world patient outcomes and healthcare settings.

Method and process — our proprietary IP

We claim the method that identifies, validates, and governs its use.

E-DICE-R is a proprietary governed multi-omics pipeline. The IP is the structured, reproducible, authority-scored process — not the gene. PRR36 is a result produced by that method. The method is what we sell. The method is what we protect.

E-DICE-R — Governed Pipeline

The method and process

E-DICE-R converts multi-omics discovery into a governed, repeatable decision system for identifying and prioritizing therapeutic and diagnostic targets. The invention defines a structured pipeline that ingests heterogeneous data, applies authority-based scoring and validation, and produces ranked, auditable outputs.

  • Method: defined pipeline for ingesting, processing, scoring, and ranking targets from multi-omics data
  • Decision engine: authority-based filtering, evidence weighting, and cross-cohort validation
  • Outputs: ranked candidates for diagnostics, biomarkers, and therapeutic targeting
  • System: architecture enforcing reproducibility, auditability, and deterministic workflows
  • Pharma linkage: integration into target selection, validation, and drug development workflows
PRR36 — Exploratory Signal

What our method found

PRR36 showed 3.7× overexpression in our TCGA prostate cancer analysis, with p = 2.57 × 10−21, supporting follow-up as an exploratory, under-studied candidate. These findings are statistically strong within the analyzed public dataset, but biomarker utility, novelty, IP position, and druggability still require separate validation.

  • Quantitative signal: 3.7× differential expression in the prostate cancer cohort
  • Statistical strength: p = 2.57 × 10−21 under the applied test model
  • Data source: public TCGA dataset, de-identified tier
  • Workflow: reproducible analysis pipeline with auditable outputs
  • Interpretation scope: association only — not causality or clinical utility
  • Next steps: cohort replication, subtype analysis, survival correlation, and functional characterization
E-DICE-R invention position

E-DICE-R turns multi-omics discovery into a governed decision system.

E-DICE-R converts multi-omics discovery into a governed, repeatable decision system for identifying and prioritizing therapeutic and diagnostic targets. The invention defines a structured pipeline that ingests heterogeneous data, applies authority-based scoring and validation, and produces ranked, actionable outputs for clinical and drug-development use. This establishes a foundation for method, system, and application-level patent protection.

M
Method claim
A governed method for ingesting, processing, scoring, and ranking targets from multi-omics data with authority-based scoring and non-compensating validation gates.
  • Defined pipeline: ingestion → scoring → ranking → output
  • Evidence weighting and cross-cohort validation
  • Ranked outputs for diagnostics, biomarkers, and therapeutic targeting
S
System claim
A deterministic, auditable architecture that enforces governance, maintains a complete evidence chain, and guarantees replayable outputs.
  • Reproducibility and auditability by design
  • Deterministic workflows and evidence traceability
  • Architecture-level protection around the ranking engine
P
Pharma workflow claim
A pharmaceutical workflow layer connecting governed outputs to target selection, validation tracking, and downstream drug-development decision support.
  • Structured evidence packages for pharma portfolio review
  • Target selection and validation tracking
  • Application-level use of identified targets, including PRR36
SDK
E-DICE-Edge · E-DICE-Edge Partner Vault
Deployable molecular governance for partner pipelines
A programmable governance layer for pharmaceutical and multi-omics workflows
Authority-based scoring Deterministic audit workflows External dataset integration Partner Vault

E-DICE-Edge SDK exposes the molecular governance layer of E-DICE-R as a deployable integration for external pipelines. This run serves as an illustrative example — governed multi-omics analysis surfacing a high-confidence, underexplored candidate with full reproducibility and evidence traceability. Through E-DICE-Edge Partner Vault, partners apply authority-based scoring, non-compensating validation gates, and deterministic audit workflows directly to their own datasets — reducing downstream validation risk. View Partner Vault pricing →

Replicated. Deeper pattern found. Wet lab is the next step.

PRR36 and DNAH5 independently replicated in the MSKCC cohort. A deeper analysis points to an unstudied ciliary pathway, an isolated overexpression pattern in DNAH5, and a PRR36/ARHGEF38 convergence signal consistent with invasive biology. If you are a research partner, clinical collaborator, or investor, this is the moment to engage.

Contact via LinkedIn Meet the team Partnership pricing