No one is under an obligation to prove a negative here. It's up to the believers in the existence of something to prove so.
As for the IAT, as I said its reliability and validity have already been called into question by the literature. It furthermore seems to have a rather low correlation with actual behavior:
Meissner et al (2019) wrote:Predicting Behavior With Implicit Measures: Disillusioning Findings, Reasonable Explanations, and Sophisticated Solutions
Franziska Meissner1*, Laura Anne Grigutsch1, Nicolas Koranyi1, Florian Müller2 and Klaus Rothermund1
1General Psychology II, Institute of Psychology, Friedrich Schiller University Jena, Jena, Germany
2Department for the Psychology of Human Movement and Sport, Institute for Sports Science, Friedrich Schiller University Jena, Jena, Germany
Two decades ago, the introduction of the Implicit Association Test (IAT) sparked enthusiastic reactions. With implicit measures like the IAT, researchers hoped to finally be able to bridge the gap between self-reported attitudes on one hand and behavior on the other. Twenty years of research and several meta-analyses later, however, we have to conclude that neither the IAT nor its derivatives have fulfilled these expectations. Their predictive value for behavioral criteria is weak and their incremental validity over and above self-report measures is negligible. In our review, we present an overview of explanations for these unsatisfactory findings and delineate promising ways forward. Over the years, several reasons for the IAT’s weak predictive validity have been proposed. They point to four potentially problematic features: First, the IAT is by no means a pure measure of individual differences in associations but suffers from extraneous influences like recoding. Hence, the predictive validity of IAT-scores should not be confused with the predictive validity of associations. Second, with the IAT, we usually aim to measure evaluation (“liking”) instead of motivation (“wanting”). Yet, behavior might be determined much more often by the latter than the former. Third, the IAT focuses on measuring associations instead of propositional beliefs and thus taps into a construct that might be too unspecific to account for behavior. Finally, studies on predictive validity are often characterized by a mismatch between predictor and criterion (e.g., while behavior is highly context-specific, the IAT usually takes into account neither the situation nor the domain). Recent research, however, also revealed advances addressing each of these problems, namely (1) procedural and analytical advances to control for recoding in the IAT, (2) measurement procedures to assess implicit wanting, (3) measurement procedures to assess implicit beliefs, and (4) approaches to increase the fit between implicit measures and behavioral criteria (e.g., by incorporating contextual information). Implicit measures like the IAT hold an enormous potential. In order to allow them to fulfill this potential, however, we have to refine our understanding of these measures, and we should incorporate recent conceptual and methodological advancements. This review provides specific recommendations on how to do so.
Or even more interestingly, and to the point:
Kurdi et al (2018) wrote:Relationship Between the Implicit Association Test and Intergroup Behavior: A Meta-Analysis
Using data from 217 research reports (N 36,071, compared to 3,471 and 5,433 in previous meta-analyses), this meta-analysis investigated the conceptual and methodological conditions under which Implicit Association Tests (IATs) measuring attitudes, stereotypes, and identity correlate with criterion measures of intergroup behavior. We found significant implicitcriterion correlations (ICCs) and explicit–criterion correlations (ECCs), with unique contributions of implicit ( .14) and explicit measures ( .11) revealed by structural equation modeling. ICCs were found to be highly heterogeneous, making moderator analyses necessary. Basic study features or conceptual variables did not account for any heterogeneity: Unlike explicit measures, implicit measures predicted for all target groups and types of behavior, and implicit, but not explicit, measures were equally associated with behaviors varying in controllability and conscious awareness. However, ICCs differed greatly by methodological features: Studies with a declared focus on ICCs, standard IATs rather than variants, high-polarity attributes, behaviors measured in a relative (two categories present) rather than absolute manner (single category present), and high implicit–criterion correspondence (k 13) produced a mean ICC of r .37. Studies scoring low on these variables (k 6) produced an ICC of r .02. Examination of methodological properties—a novelty of this meta-analysis—revealed that most studies were vastly underpowered and analytic strategies regularly ignored measurement error. Recommendations, along with online applications for calculating statistical power and internal consistency are provided to improve future studies on the implicit–criterion relationship.
Methodological Shortcomings of the Reviewed Studies
Statistical power. The power of inferential tests has far-reaching consequences for the validity of statistical inferences (Cohen, 1962; Fraley & Vazire, 2014). Therefore, establishing the power of the studies on ICCs is paramount to diagnosing the overall methodological soundness of this literature. The vast majority of the studies included in the present meta-analysis were underpowered: At 40, the median sample size was surprisingly, perhaps shockingly, low. This sample size is miniscule for probing individual differences and too small to reliably (i.e., with a probability of at least .80) detect any effect below the effect size of r .43 (Cohen, 1992). Moreover, a sample size of 40 provides only .40 power to detect the mean effect size reported by Greenwald et al. (2009) and .14 power for the mean effect size reported by Oswald et al. (2013).11 Even though post hoc power tends to overestimate the power of studies for small effect sizes and small sample sizes (Yuan & Maxwell, 2005), median post hoc power of the included studies was found to be only .15 and mean post hoc power was .27.12
These low levels of statistical power are worrisome when it comes to the interpretability and inferential value of the vast majority of individual studies conducted on implicitcriterion relationships. We can go so far as to say that many of the studies included in this meta-analysis should never have been undertaken given the potential for incorrect inferences about the population effect size. Low statistical power of individual studies also provides additional justification for this meta-analysis: Due to their ability to pool data from participants across multiple investigations, metaanalyses have the potential to derive valid conclusions about the population effect size and its moderators even when individual studies are underpowered (e.g., Card, 2016).
Basic Study Characteristics: Target Group, Type of Behavior, and Study Setting Target group.
Regarding the target group variable, two results seem noteworthy (see Figure 1). First, implicit attitudes were significantly associated with behavior across all target categories, with the exception of one category labeled “other intergroup,” which was highly diverse and contained a relatively small number of effect sizes (kind 19). Importantly, this result indicates that ICCs were fairly homogeneous across target group categories.On the other hand, ECCs were found to be more variable by target group than ICCs. For the former, effect sizes ranged from r=.10 (ethnicity) to r=.32 (sexuality), where as for the latter they ranged from r=.08 (other clinical) to r=.11 (sexuality).
A correlation of 0.1 between the association found by the IATs and the actual intergroup behavior (e.g. racial discrimination) is pretty low. And it also seems those papers are often underpowered.
So, even if the IAT does measure what it is claimed to measure (unconscious racial bias), it would not be strongly correlated with actual discriminatory behavior. As such, it is not a good explanation for why systemic or individual racism exist, or more precisely it's not a good way to say they harm the discriminated population (even though I think we can agree bigotry hurts those who are being discriminated against).
I will also note that the fact we're talking about implicit associations means we're already claiming individual conscious or unconscious beliefs or preferences and individual behavior are the cause of institutionalized discrimination - although it seems it's not that simple after all. Maybe those beliefs do cause racial discrimination, but the IAT is not a reliable, valid or predictive measure here.