Cognitive Legitimacy & Algorithmic Legitimacy Shift (ALS): A Strict Minimax-Risk Proof (Version 10.0)

Abstract: We introduce Algorithmic Legitimacy Shift (ALS): a decision-theoretic phenomenon in which, under structural coverage constraints ($B < J$), the algorithmic decision channel becomes strictly more legitimate than the human decision channel in the minimax-risk sense. Formally, the Human Channel admits a sharp, non-vanishing minimax error floor on a noiseless subclass, while the Algorithm Channel achieves exponentially decaying minimax risk in the sample size $m$ under a positive margin condition $\Delta>0$. Version 10.0 is mathematically closed by (1) defining an extended probability space including internal randomization to rigorously construct human risk; (2) giving an explicit uniform $B$-subset generator to witness achievability of the sharp bound; (3) enforcing consistent notation for induced measures; and (4) linking Bayes risk to minimax risk in the information-theoretic lower bound. These results yield an explicit sufficient condition on $m$ under which $\mathfrak{R}^\star(\mathsf{Ch}_A) < \mathfrak{R}^\star(\mathsf{Ch}_H)$ holds as a theorem, with no remaining ambiguity in the probabilistic model. This provides a formal criterion for comparing the legitimacy of evaluation channels (e.g., human review vs. third-party AI assessment) under explicit structural constraints.

1. Measure-Theoretic Foundation & Model Definition

Notation and Probability Space

Let $[J] := \{1, \dots, J\}$ be the set of fields with $J \ge 2$. Let $[m] := \{1, \dots, m\}$.
Let $\mathcal{X} := [-1, 1]$ be the observation space equipped with Borel $\sigma$-algebra $\mathcal{B}(\mathcal{X})$.
Let $\mathcal{P}(\mathcal{X})$ be the set of all probability measures on $\mathcal{X}$.
Throughout, $\log$ denotes the natural logarithm.
Randomness Source: We fix the internal randomization space $(\mathcal{U}, \mathcal{B}(\mathcal{U}), \lambda) := ([0, 1]^{T+1}, \mathcal{B}, \text{Leb}^{\otimes (T+1)})$. Let $U = (U_1, \dots, U_T, U_{T+1}) \sim \lambda$ with independent coordinates.

Definition 1 (Data-Generating Families and Environment) Fix margin $\mu_+ \in (0, 1]$ and contamination rate $\alpha \in [0, 1)$.

Clean Families: $\mathcal{P}_1 := \{P \in \mathcal{P}(\mathcal{X}) : \mathbb{E}_P[X] \ge \mu_+\}$, $\mathcal{P}_0 := \{P \in \mathcal{P}(\mathcal{X}) : \mathbb{E}_P[X] \le -\mu_+\}$.
Contamination Family: $\mathcal{Q} := \mathcal{P}(\mathcal{X})$.
Parameter Space: $\Theta := \{0, 1\}^J$. Truth vector $\vartheta = (\vartheta_1, \dots, \vartheta_J) \in \Theta$.
Environment Space: $\Omega := \{ \omega=(\vartheta, P, Q) : \vartheta \in \Theta, P \in \prod_{j=1}^J \mathcal{P}_{\vartheta_j}, Q \in \mathcal{Q}^J \}$.
Data Generation: For each $\omega \in \Omega$ and $j \in [J]$, the observation follows the mixture measure: $$ M_{j,\omega} := (1-\alpha)P_j + \alpha Q_j. $$
Target: $g(\vartheta) := \bigwedge_{j=1}^J \vartheta_j \in \{0, 1\}$. Loss $L(\hat{g}, \vartheta) = \mathbf{1}\{\hat{g} \neq g(\vartheta)\}$.

Definition 1.1 (Sampling Mechanism)

Algorithm sampling: The full observation is $X=(X_{j,i})_{j\in[J],\,i\in[m]}\in\mathcal{X}^{Jm}$. $$ X_{j,i}\stackrel{\text{i.i.d.}}{\sim} M_{j,\omega}, \quad \text{independent across }(j,i). $$
Human query sampling: Each query to field $j_t$ yields a fresh sample $x_t$. Conditional on $\omega$ and $U$ (and thus on the query sequence $j_1, \dots, j_T$), the samples $x_1, \dots, x_T$ are independent and satisfy $x_t \sim M_{j_t, \omega}$.

2. Procedures and Channels as Statistical Experiments

Definition 2 (Human Protocol $\Pi_H$ and Admissible Class) Fix $T\in\mathbb{N}$ (total queries) and unique-field budget $B\in\{0,\dots,J-1\}$. (W.L.O.G. assume $B \le T$). A human procedure $\Pi_H=(\psi_1,\dots,\psi_T,\phi)$ consists of Borel measurable maps $$\psi_t:([J]\times\mathcal{X})^{t-1}\times[0,1]\to[J],\quad \phi:([J]\times\mathcal{X})^{T}\times[0,1]\to\{0,1\}.$$ Given $U=(U_1,\dots,U_T,U_{T+1})$, define iteratively $$j_t=\psi_t(H_{t-1},U_t),\qquad x_t\sim M_{j_t,\omega},$$ generating a history $H_T \in \mathcal{Y}_H := ([J]\times\mathcal{X})^T$.
Define the extended outcome space $\tilde{\mathcal{Y}}_H := \mathcal{Y}_H \times [0, 1]$ equipped with the product measure $\tilde{\mathbb{P}}^{\Pi_H}_\omega := \mathbb{P}^{\Pi_H}_\omega \otimes \text{Leb}$, governing the pair $(H_T, U_{T+1})$.
Let $S_T:=\{j_t:1\le t\le T\}$. The admissible class is $$\mathrm{Adm}_H(T,B):=\{\Pi_H:\ \forall \omega\in\Omega,\ \mathbb{P}^{\Pi_H}_\omega(|S_T|\le B)=1\}.$$

Definition 3 (Algorithm Protocol $\Pi_A$) The algorithm observes $X \in \mathcal{Y}_A := \mathcal{X}^{Jm}$. The induced law is $P_\omega := \bigotimes_{j=1}^J M_{j,\omega}^{\otimes m}$. A procedure is a Borel measurable map $\delta: \mathcal{Y}_A \to \{0, 1\}$. The admissible class is $\text{Adm}_A(m)$, the set of all such maps.

Definition 4 (Channels, Estimators, Risk, Minimax Risk) Let $g(\vartheta)=\bigwedge_{j=1}^J\vartheta_j\in\{0,1\}$.
Human estimator: for $\Pi_H=(\psi,\phi)$, define $$ \hat{g}_H := \phi(H_T, U_{T+1}) \in \{0, 1\}. $$
Algorithm estimator: for $\delta$, define $$ \hat{g}_A := \delta(X) \in \{0, 1\}. $$ Define the (0–1) risk: $$ R_H(\Pi_H;\omega) := \mathbb{E}_{\tilde{\mathbb{P}}^{\Pi_H}_\omega}[\mathbf{1}\{\phi(H_T, U_{T+1}) \neq g(\vartheta)\}], \quad R_A(\delta;\omega) := \mathbb{E}_{P_\omega}[\mathbf{1}\{\hat{g}_A \neq g(\vartheta)\}]. $$ Then the minimax risks are $$ \mathfrak{R}^\star(\mathsf{Ch}_H(T,B)) := \inf_{\Pi_H\in \mathrm{Adm}_H(T,B)}\ \sup_{\omega\in\Omega} R_H(\Pi_H;\omega), $$ $$ \mathfrak{R}^\star(\mathsf{Ch}_A(m)) := \inf_{\delta\in \mathrm{Adm}_A(m)}\ \sup_{\omega\in\Omega} R_A(\delta;\omega). $$

Lemma IT (Explicit kernel construction) Fix $\Pi_H$. The measure $\mathbb{P}^{\Pi_H}_\omega$ on $\mathcal{Y}_H$ is uniquely constructed via the Ionescu-Tulcea theorem using kernels $K_t(h_{t-1}, u_t; \cdot)$ determined by $\psi_t$ and $M_{j,\omega}$. Finally, since $U_{T+1} \sim \text{Unif}[0,1]$ is independent of the history, the product extension yields a unique induced law $\tilde{\mathbb{P}}^{\Pi_H}_\omega$ on $\tilde{\mathcal{Y}}_H$.

3. Sharp Lower Bound for Human Channel

Definition 5 (Noiseless Subclass $\Omega_H^0$) Define $\Omega_H^0\subset\Omega$ by requiring, for every $j\in[J]$, $$P_j=\delta_{+1}\ \text{if }\vartheta_j=1,\qquad P_j=\delta_{-1}\ \text{if }\vartheta_j=0,$$ and additionally $Q_j=P_j$ for all $j$. Hence for every $\omega\in\Omega_H^0$, $M_{j,\omega}=P_j$, so observations are deterministic given $\vartheta$.

Lemma H1 (Coverage Bound) Fix an admissible $\Pi_H$. Define $p_j := \mathbb{P}^{\Pi_H}_{\omega^{(1)}}(j \in S_T)$ under the "all-true" environment $\omega^{(1)}$. Then $\sum_{j=1}^J p_j = \mathbb{E}^{\Pi_H}_{\omega^{(1)}}[|S_T|] \le B$. Thus, there exists $K \in \arg\min_{j} p_j$ such that $\mathbb{P}^{\Pi_H}_{\omega^{(1)}}(K \notin S_T) = 1 - p_K \ge 1 - B/J$.

Proof: Work on $\Omega_H^0$. Couple the two experiments under $\omega^{(1)}$ and $\omega^{(0,K)}$ by using the same internal randomness $U$. Until $K$ is queried, observations (all +1) and thus histories are identical. The event $\{K \in S_T\}$ is determined by the query sequence, which remains identical as long as $K$ is not queried. Since the event occurrence implies $K$ is queried, and the path to that query is identical, the probabilities are equal.

Lemma H2 (Joint Event Identity) On the event $E = \{K \notin S_T\}$, the full pair $(H_T, U_{T+1})$ is identical under $\omega^{(1)}$ and $\omega^{(0,K)}$ for the same $U$. Thus, for any event $A$ defined by the estimator $\hat{g}_H$, $\mathbf{1}_A(\omega^{(1)}) \mathbf{1}_E = \mathbf{1}_A(\omega^{(0,K)}) \mathbf{1}_E$.

Proof (Lower Bound & Tightness):
1. Lower Bound: Define the event $A := \{ \hat{g}_H = 1 \}$. Let $p = \mathbb{P}^{\Pi_H}_{\omega^{(1)}}(A)$. Risk at $\omega^{(1)}$ is $1-p$. For $\omega^{(0,K)}$, risk is $\mathbb{P}^{\Pi_H}_{\omega^{(0,K)}}(A)$. We have $\sum_{k} \mathbb{P}^{\Pi_H}_{\omega^{(1)}}(A \cap \{k \in S_T\}) = \mathbb{E}[|S_T|\mathbf{1}_A] \le Bp$. So there exists $K$ such that $\mathbb{P}^{\Pi_H}_{\omega^{(1)}}(A \cap \{K \in S_T\}) \le (B/J)p$. Then $\mathbb{P}^{\Pi_H}_{\omega^{(1)}}(A \cap \{K \notin S_T\}) \ge (1 - B/J)p$. By Lemma H2, $\mathbb{P}^{\Pi_H}_{\omega^{(0,K)}}(A) \ge \mathbb{P}^{\Pi_H}_{\omega^{(0,K)}}(A \cap \{K \notin S_T\}) = \mathbb{P}^{\Pi_H}_{\omega^{(1)}}(A \cap \{K \notin S_T\}) \ge (1 - B/J)p$. Minimizing $\max\{1-p, (1-B/J)p\}$ yields $\frac{1-B/J}{2-B/J}$.

Remark (Explicit Uniform Subset Generator) Since the collection of $B$-subsets is finite, enumerate them as $S_1, \dots, S_N$. Define $\Gamma: [0, 1] \to ([J]^B)$ by $\Gamma(u) = S_k$ for $u \in [\frac{k-1}{N}, \frac{k}{N})$. This $\Gamma$ is Borel measurable and generates a uniform distribution over $B$-subsets.

2. Achievability: Protocol: Use $U_1$ to pick $S = \Gamma(U_1)$. Query all $j \in S$. If any $-1$ seen, output 0. Else use $U_{T+1}$ to output 1 with prob $p^\star = \frac{1}{2-B/J}$. This achieves the bound exactly.

4. Algorithm Channel: Upper and Lower Bounds

Proof (Explicit rule and bound): For each field $j$, let $\bar X_j:=\frac{1}{m}\sum_{i=1}^m X_{j,i}$ and define the decision rule $$ \delta(X)=\mathbf{1}\{\forall j\in[J],\ \bar X_j \ge 0\}. $$ Inf/Sup of $\mathbb{E}_Q[X]$ over $\mathcal{Q}$ are $-1$ and $1$. For $\vartheta_j=1$, $\mathbb{E}_{P_\omega}[X] \ge (1-\alpha)\mu_+ + \alpha(-1) = \Delta.$ For $\vartheta_j=0$, $\mathbb{E}_{P_\omega}[X] \le -\Delta.$ Since $X \in [-1, 1]$ (range 2), Hoeffding's inequality gives $$ \mathbb{P}_{P_\omega}[\bar X_j < 0 \mid \vartheta_j=1] \le \exp\left(-\frac{2(m\Delta)^2}{m \cdot 2^2}\right) = \exp\left(-\frac{1}{2}m\Delta^2\right). $$ Union bound gives the result.

Lemma A2-Realize (Rademacher Subclass Realizability) Assume $\Delta \in (0, 1)$. The distributions $P_{+,\Delta}(+1) = \frac{1+\Delta}{2}$ and $P_{-,\Delta}(+1) = \frac{1-\Delta}{2}$ are realizable within $\Omega$ by setting $P^{(1)}(+1)=\frac{1+\mu_+}{2}$, $Q=\delta_{-1}$, etc.

Lemma Chi3 (Mixture $\chi^2$ Identity) Let $P_0 = P_{+,\Delta}^{\otimes Jm}$ and $P_k$ be the law where field $k$ is $P_{-,\Delta}^{\otimes m}$. Let $\bar{P} = \frac{1}{J}\sum P_k$. Then $\chi^2(\bar{P} \| P_0) = \frac{1}{J}\chi^2(P_1 \| P_0)$.

Let $L_k := dP_k/dP_0$ on the common finite sample space. Then $$ 1+\chi^2(\bar P \| P_0)=\mathbb{E}_{P_0}\left[\left( \frac{1}{J}\sum_{k=1}^J L_k \right)^2\right] =\frac{1}{J^2}\sum_{k}\mathbb{E}_{P_0}[L_k^2]+\frac{1}{J^2}\sum_{k \neq \ell}\mathbb{E}_{P_0}[L_k L_\ell]. $$ For $k \neq \ell$, $P_k$ and $P_\ell$ differ from $P_0$ on disjoint field-blocks, so under $P_0$ the likelihood ratios factor and $\mathbb{E}_{P_0}[L_k L_\ell]=\mathbb{E}_{P_0}[L_k]\mathbb{E}_{P_0}[L_\ell]=1\cdot1=1$. Also $\mathbb{E}_{P_0}[L_k^2]=1+\chi^2(P_k\|P_0)$, independent of $k$ by symmetry. Thus $\chi^2(\bar P \| P_0)= \frac{1}{J}(\chi^2(P_1\|P_0))$.

Proof: Consider the realizable subclass. Let $P_0$ correspond to $H_1$ (all +) and $P_k$ to $H_0$ (k is -). Define $\bar P = \frac{1}{J}\sum P_k$. Define prior $\pi$ with mass $1/2$ on $P_0$ and $1/(2J)$ on each $P_k$. By Lemma (Minimax Dominates Bayes Risk), $\mathfrak{R}^\star \ge \text{Bayes Risk}$. Bayes risk $\ge \frac{1}{2}(1 - \mathrm{TV}(P_0, \bar{P}))$. We have $\chi^2(P_k\|P_0) = \left(1+\frac{4\Delta^2}{1-\Delta^2}\right)^m-1$. By Lemma Chi3, $\chi^2(\bar P\|P_0) = \frac{1}{J}\chi^2(P_k\|P_0)$. If this is $\le 1$, then $\mathrm{TV} \le 1/2$, yielding Minimax risk $\ge 1/4$. This condition simplifies to the stated bound on $m$.

5. Main Result and Legitimacy

Proof: The condition on $m$ ensures $J\exp\left(-\frac{1}{2}m\Delta^2\right) < \frac{1-B/J}{2-B/J}$. Combining Theorem A1 and Corollary H yields the result.

Definition (Cognitive Legitimacy Preorder) For channels $\mathsf{Ch}_1, \mathsf{Ch}_2$, define $\mathsf{Ch}_1 \succ \mathsf{Ch}_2$ iff $\mathfrak{R}^\star(\mathsf{Ch}_1) < \mathfrak{R}^\star(\mathsf{Ch}_2)$. We call $\mathsf{Ch}_1$ "strictly more legitimate (in the minimax sense)" than $\mathsf{Ch}_2$.

References

[1] L. Le Cam, Asymptotic Methods in Statistical Decision Theory, Springer (1986).
[2] P. J. Huber, "Robust Estimation of a Location Parameter," Annals of Mathematical Statistics (1964).
[3] A. B. Tsybakov, Introduction to Nonparametric Estimation, Springer (2009).
[4] W. Hoeffding, "Probability Inequalities for Sums of Bounded Random Variables," JASA (1963).
[5] A. Ionescu Tulcea, "Mesures dans les espaces produits," Atti Accad. Naz. Lincei Cl. Sci. Fis. Mat. Nat. (1949).