Scientific study of the results of higher criticism?

The methods of higher criticism which are employed in biblical studies and other fields can yield intriguing results, but are the conclusions trustworthy? In the case of ancient texts, the reality behind redaction or source criticism is speculative, and there is no authoritative measure by which to evaluate the practice, because we do not have independent documentation of the sources or editorial process. So if two critics reach differing conclusions, and these lead to a significant divergence in exegesis, there is no way to settle the matter. It becomes "he said, she said" among scholars.

While we cannot know for certain in the case of ancient texts, there is a way to evaluate the various methodologies of higher criticism in general. This will help rank the relative quality of these methods, which will lend confidence to judgements passed on ancient texts.

I propose a study as follows: Two pools of texts are assembled. The first pool will be known to be composed by a single author - the analogue to a placebo group. The second pool will be comprised of texts with documented eclectic sources and editorial history (Wikipedia, with its history of edits, would be perfect for this). These texts would then be passed to scholars in a double-blind fashion, and they would be asked to make an analysis of the history of these texts. The results of these analyses can be objectively scored for precision and recall, which will reveal the best scholars and their best practices.

Is anyone aware of such a study having already been published? Are there any problems with the methodology I propose?