Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors ================================================================================================================================================= * Asbjørn Hróbjartsson * Ann Sofia Skou Thomsen * Frida Emanuelsson * Britta Tendal * Jørgen Hilden * Isabelle Boutron * Philippe Ravaud * Stig Brorson ## Abstract **Objective** To evaluate the impact of non-blinded outcome assessment on estimated treatment effects in randomised clinical trials with binary outcomes. **Design** Systematic review of trials with both blinded and non-blinded assessment of the same binary outcome. For each trial we calculated the ratio of the odds ratios—the odds ratio from non-blinded assessments relative to the corresponding odds ratio from blinded assessments. A ratio of odds ratios <1 indicated that non-blinded assessors generated more optimistic effect estimates than blinded assessors. We pooled the individual ratios of odds ratios with inverse variance random effects meta-analysis and explored reasons for variation in ratios of odds ratios with meta-regression. We also analysed rates of agreement between blinded and non-blinded assessors and calculated the number of patients needed to be reclassified to neutralise any bias. **Data Sources** PubMed, Embase, PsycINFO, CINAHL, Cochrane Central Register of Controlled Trials, HighWire Press, and Google Scholar. **Eligibility criteria for selecting studies** Randomised clinical trials with blinded and non-blinded assessment of the same binary outcome. **Results** We included 21 trials in the main analysis (with 4391 patients); eight trials provided individual patient data. Outcomes in most trials were subjective—for example, qualitative assessment of the patient’s function. The ratio of the odds ratios ranged from 0.02 to 14.4. The pooled ratio of odds ratios was 0.64 (95% confidence interval 0.43 to 0.96), indicating an average exaggeration of the non-blinded odds ratio by 36%. We found no significant association between low ratios of odds ratios and scores for outcome subjectivity (P=0.27); non-blinded assessor’s overall involvement in the trial (P=0.60); or outcome vulnerability to non-blinded patients (P=0.52). Blinded and non-blinded assessors agreed in a median of 78% of assessments (interquartile range 64-90%) in the 12 trials with available data. The exaggeration of treatment effects associated with non-blinded assessors was induced by the misclassification of a median of 3% of the assessed patients per trial (1-7%). **Conclusions** On average, non-blinded assessors of subjective binary outcomes generated substantially biased effect estimates in randomised clinical trials, exaggerating odds ratios by 36%. This bias was compatible with a high rate of agreement between blinded and non-blinded outcome assessors and driven by the misclassification of few patients. ## Introduction The randomised clinical trial is regarded as the most valid method for assessing the benefits and harms of healthcare interventions.1 One challenge to the validity of such trials is the tendency for assessments of outcomes to systematically deviate from the truth because of predispositions in observers, such as from hope or expectation.2 Such observer bias, also called ascertainment bias or detection bias, might be especially important when outcome assessors have strong predispositions and when outcomes are subjective—that is, involve personal judgment such as with qualitative scores or pattern recognition of images. Similarly, observer bias might have little practical importance when neutral assessors evaluate an objective outcome, such as death. Many trials use blinded outcome assessors to avoid bias, though use of non-blinded outcome assessors is also common,3 4 especially in non-pharmacological trials. For example, one study of orthopaedic trauma trials reported that blinded outcome assessment had not been implemented in 90% of trials.3 It is an empirical question to which degree the estimated effects of experimental interventions in randomised trials are affected by lack of blinding of the outcome assessors and which factors influence the degree of bias. The most reliable way of studying the impact of non-blinded outcome assessors is to analyse trials that use both blinded and non-blinded assessors for the same outcome. One such trial by Noseworthy and colleagues is often cited, reporting that the effect of plasma exchange for multiple sclerosis was significant only with assessments by non-blinded neurologists.5 The finding, however, was inconsistent across time points, seen only for one of the two experimental interventions, and might be atypical. Other studies have been based on indirect comparisons with a considerable risk of confounding.6 7 8 It is prudent to suspect possible bias in trials with non-blinded assessors. Existing analyses, however, do not provide a reliable assessment of the typical degree of observer bias in randomised clinical trials. Thus, it is not clear whether observer bias in clinical trials, on average, is negligible or large or how variable the size and direction of any observer bias is or which factors in a trial are associated with a more pronounced degree of bias. A reliable evaluation of the impact of non-blinded outcome assessors in randomised clinical trials is important, both to guide the design of future trials and to assist the balanced interpretation of trial results—for example, in the assessment of the risk of bias in trials for meta-analysis.1 It also seems important for evidence based medicine to strengthen its own evidence base. We systematically reviewed randomised trials with blinded and non-blinded assessors of binary outcomes to evaluate the impact of non-blinded outcome assessment on estimated treatment effects in randomised clinical trials and to examine reasons for its variation. ## Methods We included randomised clinical trials with blinded and non-blinded assessment of the same binary outcome. We excluded trials where it was unclear which group was experimental and which was control as such trials would not allow us to determine the direction of any bias; trials in which only a subgroup of patients had been evaluated by blinded and non-blinded assessors, unless they were selected at random; trials in which blinded and non-blinded assessors had access to each other’s results (for example, blinded assessments were provided to non-blinded assessors as a quality enhancement procedure); and trials where initially blinded assessors clearly had become unblinded—for example, when radiographs showed ceramic material indicative of the experimental intervention. Finally, we excluded trials with blinded end point committees adjudicating the assessments made by non-blinded clinicians because such adjudication often involves previous knowledge of the non-blinded assessment or is restricted to adjudication of events only. We searched standard databases (PubMed, Embase, PsycINFO, CINAHL, Cochrane Central Register of Controlled Trials) and full text databases (HighWire Press and Google Scholar). Our core search string was: random* AND (“blind* and unblind*” OR “masked and unmasked”) with variations according to the specific database (see appendix on bmj.com). The last search was performed on 26 January 2010. We read the references of all included trials and asked authors of all the included trials if they knew of other trials. One author (ASST) read all abstracts from standard databases and all text fragments from full text databases. If a study was potentially eligible, one author (ASST or AH) retrieved the full study report and excluded ineligible studies. Two authors (AH and ASST, SB, or BT) decided on the eligibility of the remaining studies. Disagreements were resolved by discussion. We selected one binary outcome from each trial. If several outcomes had been assessed by both blinded and non-blinded assessors we selected the primary outcome of the trial, and if none was stated we selected the outcome we found most clinically relevant. We included the first assessment after the end of treatment, unless the primary outcome prescribed a different time point. Two authors (AH and either SB or BT) selected the outcome independently. Disagreements were resolved by discussion. For trials with more than two groups, we pooled the results in the experimental or the control groups. We extracted background data for each trial (ASST and FE or AH and SB) and outcome data from each trial (AH and SB or BT): total number of failures and total number of successes in each group resulting from the blinded assessment and the non-blinded assessment. When possible we also extracted paired patient level data on blinded and non-blinded assessments, and constructed a 2×2 table (failure/success×blind/non-blind) for the experimental group and a corresponding table for the control group. Data from split body designed trials were treated as if they derived from parallel group trials. If data were incomplete, we emailed the corresponding author and, if necessary, at least one additional author, followed up by telephone calls, and at least two reminders. Authors were asked whether they would share unpublished data with our group. We also searched the Food and Drug Administration (FDA) website for such data. When authors chose to send us individual patient data (that is, all randomised patients listed by allocation group and result of blinded and non-blinded assessment), we checked whether all randomised patients were included in the dataset and tried to replicate a table or a main result of the published paper. Two authors (AH and BT or SB) independently derived outcome data. Any discrepancy was solved by discussion. We sent our results to the authors of the trial for comments. For each trial, we evaluated five prespecified potential confounders in the comparison between blinded and non-blinded outcome assessments: a considerable time difference between these two assessments, different types of assessors (such as nurses *v* physicians), different types of procedures (such as direct visual assessment of wounds *v* assessment of photographs of wound), a substantial risk of ineffective blinding procedure, and non-identical groups of patients assessed (such as a few patients evaluated only by the blinded outcome assessor). For 16 trials, two masked authors (IB and PR) independently evaluated the first four items at a different location from the rest of the group. Other masked authors (AH and BT or SB) scored five trials. Disagreements were resolved by discussion. The masking was implemented by manipulating pdfs of the trial reports so that tables, graphs, or text describing results of any comparison between blinded and non-blinded assessors were blanked out. There were no cases of accidental unmasking. Using the same masking procedure, we also evaluated characteristics of each outcome assessment. Two authors (mainly IB and PR) independently scored three factors out of a score of 5 (1 was low and 5 high): the degree of outcome subjectivity (that is, the degree of assessor judgment, high in assessment of global improvement and low in reading a laboratory sheet); the non-blinded outcome assessor’s overall involvement in the trial (that is, a proxy for the degree of personal preference for a result favourable to the experimental intervention); and the vulnerability of the outcome to the reporting and behaviour of non-blinded patients (as they might influence results considerably when outcomes are based on interviews and less so when outcomes are based on pure observations, such as inspection of radiographs). Disagreements were resolved by discussion. We calculated the odds ratio for failures (such as an unhealed wound) in each trial for both the blinded and non-blinded assessments. An odds ratio under 1 indicates a beneficial effect of the experimental intervention. For each trial we summarised the impact of non-blinded outcome assessment as the ratio of the odds ratios (ORnon-blind / ORblind). A ratio <1 indicates that non-blinded assessments are more optimistic. We meta-analysed the individual trial ratio of odds ratios with inverse variance methods using random-effects models.9 The standard error of the ratio of odds ratios used for the main analysis disregarded the dependency between blinded and non-blinded assessments. The statistical software we used was Stata 11. We tested the robustness of our main analysis of the ratios of the odds ratios in sensitivity analyses. We used standard errors that took account of the dependence between blinded and non-blinded assessments (see appendix on bmj.com); all trials were given equal weight; and an analysis was conducted on the basis of the ratio of risk ratios, as risk ratios might be more easily interpretable than odds ratios by some. We studied whether the effect differed in subgroups of trials involving various types of data; clinical problems; objectives, designs, and sources of funding; and type of non-blinded assessor; and according to risk of confounding. We also evaluated the influence of small sample size on estimated ratio of risk ratios by funnel plot inspection.1 We furthermore explored whether the variation in ratio of odds ratios was associated with the three prespecified outcome characteristics described above by random effects meta-regression of log ratio with the scores for each outcome characteristic. To analyse the pattern of misclassifications underlying any difference between the blinded and non-blinded outcome assessments we compared the total number of failure events during non-blinded and blinded assessments in the experimental and in the control group and also compared the rate of agreement between blinded and non-blinded assessments in each trial. Finally, we calculated how many reclassifications of non-blinded assessments were needed to neutralise a difference between the blinded and non-blinded treatment effects—that is, to drive the ratio of odds ratios to 1 (see appendix on bmj.com). ## Results We examined 537 publications based on 1835 hits in standard databases and 2200 hits in full text databases. We excluded 512 studies, mostly because they were not randomised clinical trials or lacked blinded or non-blinded outcome assessment (see appendix on bmj.com). Thus, we included 25 trials (tables 1⇓ and 2⇓).10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Of the 25 trials, six published outcomes for both types of assessments usable for our analysis.16 19 23 25 26 29] Contact with authors and searches of the FDA website increased the number of trials with outcome data to 21 (4391 randomised patients), of which eight trials provided individual patient data.11 12 13 14 15 21 23 24 Thirteen trials had strictly paired data (all patients had been assessed both by blinded and by non-blinded assessors), and eight trials provided predominantly paired data as a minority of patients had been assessed by only one type of assessor (see appendix on bmj.com). View this table: [Table 1](https://www.bmj.com/content/344/bmj.e1119/T1) Table 1  Characteristics of 25 included randomised clinical trials in study of effect of blinded or non-blinded outcome assessors View this table: [Table 2](https://www.bmj.com/content/344/bmj.e1119/T2) Table 2  Characteristics of conditions for outcome assessments in trials with blinded or non-blinded outcome assessors In ten trials the validity of the non-blinded assessments was tested against the blinded assessments or non-blinded assessments were used as backup for missing blinded data.10 11 15 19 20 21 22 23 24 30 In four trials the main focus of the paper or abstract was a direct comparison between blinded and non-blinded outcome assessment, but it is unclear whether this was the original reason for using dual type assessors.21 23 24 30 In one trial refinement of the methods implied addition of blinded assessments without omission of the initially planned non-blinded assessments.26 Fifteen of the 21 trials (71%) studied the effect of surgery or a procedure, 19 were parallel group trials (90%), and the median sample size was 172 (10th-90th centile 35-368). The trials were conducted in general surgery, orthopaedic surgery, plastic surgery, cardiology, gynaecology, anaesthesiology, neurology, psychiatry, dermatology, otolaryngology, infectious diseases, and ophthalmology (table 1⇑). The outcomes of the trials were in most cases subjective—for example, qualitative assessments of patients’ function (such as severity of angina or neurological deficit) or assessment of healing status (such as wounds or ulcers or fractures) (table 2⇑). Seventeen trials (81%) scored 4 or 5 for outcome subjectivity on the 1 to 5 scale. The odds ratio point estimate was more optimistic when based on the non-blinded assessors in 15 trials (71%) (fig 1⇓). The ratio of odds ratios in the 21 trials ranged from 0.02 to 14.4 (fig 2⇓). The pooled ratio of odds ratios was 0.64 (95% confidence interval 0.43 to 0.96) with moderate heterogeneity (I2=45%, P=0.015). Thus, on average, the odds ratios based on non-blinded assessments were exaggerated by 36% compared with the odds ratios based on blinded assessments. ![Figure1](https://www.bmj.com/https://www.bmj.com/content/bmj/344/bmj.e1119/F1.medium.gif) [Figure1](https://www.bmj.com/content/344/bmj.e1119/F1) **Fig 1** Estimated intervention effect according to blinded or non-blinded outcome assessor ![Figure2](https://www.bmj.com/https://www.bmj.com/content/bmj/344/bmj.e1119/F2.medium.gif) [Figure2](https://www.bmj.com/content/344/bmj.e1119/F2) **Fig 2** Impact of non-blinded outcome assessors on estimated intervention effects in randomised clinical trials measured as ratio of odds ratios (odds ratio based on non-blinded outcome assessors divided by odds ratio based on blinded outcome assessors) Individual patient data provided 48% of the weight of the main analysis. The main result was robust, though sensitivity and subgroup analyses in general had wide confidence intervals (table 3⇓). In the 12 trials with data on the dependence between blinded and non-blinded assessments the pooled ratio of odds ratios was 0.76 (0.61 to 0.94). In these 12 trials, the standard error accounting for the dependence was a median of 25% smaller than the corresponding standard errors assuming independence. Reducing the standard errors of the nine additional trials (without data on the dependence between blinded and non-blinded assessments) by 25% resulted in a pooled ratio of odds ratios of 0.64 (0.44 to 0.93). No trial was free from any of the five predefined possible confounders, but results were not clearly affected (table 3⇓). The funnel plot was symmetrical on visual inspection (data not shown). Based on a qualitative assessment, the results in the four trials with incomplete or unclear outcome data did not to differ from the results in the trials we did meta-analyse (see appendix on bmj.com). View this table: [Table 3](https://www.bmj.com/content/344/bmj.e1119/T3) Table 3  Sensitivity and subgroup analyses with ratio of odds ratios (ROR) between trials with blinded or non-blinded outcome assessors Meta-regression analyses showed no significant association between low ratios of odds ratios and scores for outcome subjectivity (P=0.27), non-blinded outcome assessor’s overall involvement in the trial (P=0.60), or outcome vulnerability to the reporting and behaviour of non-blinded patients (P=0.52). The slope of the regression line between log ratio of odds ratios and scores for outcome subjectivity, however, was in the expected direction. The 17 trials with clearly subjective outcomes (scores 4-5 on a 1-5 scale) had a pooled ratio of odds ratios of 0.55 (0.32 to 0.95). The five trials with moderately subjective outcomes (scores 2-3) had a pooled ratio of 0.93 (0.56 to 1.54). The pattern of misclassifications underlying the difference between blinded and non-blinded results was characterised by more optimistic non-blinded assessments. The non-blinded assessors detected 26% fewer failure events (such as no wound healing) compared with the blinded assessors (984 *v* 1335). In the intervention groups the non-blinded assessors detected 35% fewer patients with treatment failure than the blinded assessors (421 *v* 649 events), whereas in the control group the proportion was 18% (563 *v* 686 events). The pattern of misclassifications was also characterised by a preoccupation with the intervention group. In the 12 trials with data on agreement, the blinded and non-blinded assessors agreed in a median of 78% of patient assessments (interquartile range 63-91%). The proportion of concordant assessments, and the corresponding proportion of discordant assessments, however, seemed to differ according to the allocation group. The median proportion of discordant assessments between blinded and non-blinded assessors per trial was 28% (9-41%) in the intervention group and 16% (9-37%) in the control group (see appendix on bmj.com). The number of reclassified assessments per trial needed to neutralise a difference between the estimated blinded and non-blinded treatment effects (that is, to drive the ratio of odds ratios to 1.00) ranged from 0 to 41.7, with a median of 2.5. This corresponded to 0-28% of the assessed patients per trial, with a median of 3% (see appendix on bmj.com). ## Discussion The estimated effects of experimental interventions in randomised clinical trials tended to be considerably more optimistic when they were based on non-blinded assessment of subjective outcomes compared with blinded assessment. The pooled ratio of odds ratios was 0.64 (0.43 to 0.96), indicating that the non-blinded outcome assessors generated odds ratios that, on average, were exaggerated by 36%. We interpret this as empirical evidence for substantial observer bias. ### Strengths and weaknesses of the study This result is based on contemporary trials representing a fair range of clinical specialties. The unique trial design with paired data implies a low risk of confounding. The data were high quality, as individual patient data provided about half of the weight of the main analysis. Our results were robust to modifications to both type of analysis and summary statistic. For example, the ratio of relative risks was 0.78 (0.63 to 0.96), indicating that non-blinded outcome assessors generated relative risks that, on average, were exaggerated by 22%. We possibly did not identify all trials but we do not know whether they would report markedly different results. Publication bias is normally driven by the effect of a treatment35 and has less impact on our comparison between two types of assessments. Four trials in our study published papers with a main focus on observer bias. Though confidence intervals were wide, these four trials did not report significantly different findings compared with the 17 other trials. Our cohort of trials is not representative of medical trials in general. We included no trials with clearly objective outcomes, such as total mortality. The trials we did include had mainly subjective outcomes—such as qualitative assessments of patients and evaluation of fracture or wound healing—and our result is applicable to trials with similar subjective outcomes. We would anticipate less observer bias with more objective outcomes, though it is an interesting question which medical outcomes should be considered clearly objective, apart from total mortality and some laboratory outcomes. Furthermore, the extrapolation of our results to randomised trials with binary subjective outcomes hinges on the assumption that the degree of observer bias in our trials with dual observation of outcomes is essentially similar to trials with only non-blinded observers. We found no association between observer bias and five prespecified potential confounders. A special concern, however, is consensus classifications that could reduce observer variability and leave less room for observer bias. The only trial with consensus based non-blinded assessments11 found no observer bias (ratio of odds ratios 1.06, 0.79 to 1.43). It is unclear whether this is caused by the consensus classification, chance, or other trial characteristics. We included one trial with probable reversed direction of bias.17 The trial compared an experimental oral prodrug, valganciclovir, for cytomegalovirus retinitis with the intravenous version of the same substance, ganciclovir. The comparison between non-blinded and blinded outcome resulted in a ratio of odds ratios that was extreme, but in the reversed direction. Comparable retinitis trials, also with blinded and non-blinded assessors, have reported similar results favouring the control intervention on time to event outcomes.36 We included the trial in our main analysis without reversing the direction of bias. Had we done so, the pooled ratio of odds ratios would have been 0.57 (0.39 to 0.84), indicating an average exaggeration of the effect estimate by 43%. Several previous studies have compared treatment effects in “double blind” trials with similar trials not reported as “double blind.”7 8 An overview of seven such studies reported a pooled ratio of odds ratios of only 0.91 (0.83 to 1.00).7 Wood and colleagues’ reanalysis of three of the studies reported a similar overall result but with a ratio of odds ratios of 0.75 (0.61 to 0.93) for subjective outcomes.8 These studies do not directly evaluate the impact of blinded outcome assessors, are partly based on ambiguous terminology,3 37 and involve a considerable risk of confounding. Still, our findings are numerically roughly similar to those of Wood and colleagues.8 ### Mechanisms of observer bias The pattern of misclassifications underlying the observer bias can be characterised by “optimism error” and “intervention preoccupation.” The non-blinded assessors detected fewer failures than blinded assessors. This optimism error, however, was much more pronounced in the intervention group than in the control group. Thus, the non-blinded outcome assessor did not “under-rate” patients in the control group and “over-rate” patients in the intervention group. Both groups were over-rated but the intervention group considerably more so. A third important feature of observer bias is the striking contrast between the substantial degree of observer bias we found and the surprisingly small number of misclassified patients needed to generate this bias. The median number of patients needed to be reclassified to neutralise bias in a trial was 2.5 or 3% of the assessed patients. The difference between numbers of events in the experimental group and the control group determines the estimated effect. Numbers of events are usually considerably smaller than the number of included patients, and still smaller is the number of misclassifications needed to bias the estimated effect. For example, in the trial by Noseworthy and colleagues,5 21 the ratio of odds ratios was 0.81 (0.40 to 1.61). This degree of bias was neutralised by reclassification of two of the 140 included patients. Binary outcomes seem sensitive to directional misclassifications of a few patients. Fundamentally, observer bias is caused by the predispositions of the observers, which might vary unpredictably from trial to trial. Our cohort of trials probably consists of some trials with largely neutral assessors and some trials with predisposed assessors. The expected degree of observer bias in trials with predisposed assessors will be considerably larger than our averaged result. Thus, in any individual trial it is not possible to safely predict neither the direction nor the size of any bias. We would advise against using our pooled average as a simplistic correction factor. When the possible bias in a trial with non-blinded assessors is ascertained, the range of possible observer bias should be taken into account and not only our pooled average. Furthermore, it would be prudent to also consider the type of outcome involved and any indicators for predispositions in assessors. ### Implications Blinding outcome assessors might be seen as too cumbersome, unnecessary, or directly mistaken38 39; compared with the huge logistical challenges involved in setting up a trial, however, it is a minor procedure and one that improves reliability considerably. Fortunately, blinding the assessor is possible in nearly all trials, sometimes after the development of creative blinding procedures.40 41 In some trials a subsample of patients is blindly assessed and the result used to validate non-blinded assessments. Such comparisons are inherently underpowered and should be avoided. Our result strengthens the hypothesis that blinding can also be important for other key people in a trial, especially patients,42 who can be seen as privileged outcome assessors of their own symptoms. Still, it is important to separately study the impact of blinding each key person. For example, one study found little impact of blinded outcome adjudicators in 10 large cardiovascular trials.43 We found no significant association between the degree of observer bias and degree of outcome subjectivity, though the association was in the expected direction. Future investigations could further analyse the role of outcome subjectivity and other factors that could modify the degree of observer bias. The problem of observer bias goes beyond the randomised clinical trial. Comparisons between blinded and non-blinded observers in other types of empirical investigations have reported results indicative of observer bias—for example, in an observational study of patients with primary dystonia,44 an evaluation of cancer staging,45 an assessment of surgical skills,46 and a neurophysiological laboratory study.47 Furthermore, observer bias has been reported or discussed within veterinary science,48 forensic science,49 special educations studies,50 animal behaviour research,51 and broadly within psychology.52 53 Observation is fundamental to scientific activity; observer bias might be too. In conclusion, randomised clinical trials with non-blinded assessors of subjective binary outcomes will, on average, generate substantially biased estimates of treatment effects. The bias is compatible with a high rate of agreement between blinded and non-blinded assessments and is driven by the misclassification of a few patients. #### What is already known on this topic * Non-blinded assessors of binary outcomes are used in many randomised trials * It is prudent to suspect bias in randomised clinical trials with non-blinded outcome assessors * The typical impact of non-blinded outcome assessors on trial results is unclear, partly because previous studies have been based on indirect comparisons with high risk of confounding #### What this study adds * Estimated effects in randomised clinical trials, measured as odds ratios, are exaggerated by an average of 36% when based on non-blinded assessments of subjective binary outcomes * The bias is compatible with a high rate of agreement between blinded and non-blinded outcome assessors and driven by the misclassification of few patients ## Notes **Cite this as:** *BMJ* 2012;344:e1119 ## Footnotes * We thank the following researchers for sharing unpublished outcome data with us: Andrew Jull and the Clinical Trials Research Unit (CTRU) at the University of Auckland, Alexandre Valentin-Opran, Nicky Cullum, Tim Reynolds, Peggy Vandervoort, and George C Ebers (individual patient data); and Daniel Burkhoff, Amy P Murtha, and Cheryl Iglesia (detailed outcome data). We also thank Peter C Gøtzsche for valuable comments on previous versions of the manuscript. * Contributors: AH conceived the idea and design, organised the study, and wrote the first draft of the manuscript. ASST and AH developed the search strategy. ASST, FE, BT, SB, and AH did the non-masked data collection. IB and PR did the masked data collection, supplemented by SB, BT, and AH. AH and JH did the statistical analyses. All authors discussed the result and commented on the manuscript. AH is guarantor. * Funding: The study was partially funded by the Danish Council for Independent Research: Medical sciences. The funder had no influence on study design and the collection, analysis, and interpretation of data and the writing of the article and the decision to submit it for publication. * Competing interests: All authors have completed the ICMJE uniform disclosure form at [www.icmje.org/coi\_disclosure.pdf](http://www.icmje.org/coi_disclosure.pdf) (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. * Ethical approval: Not required. * Data sharing: Statistical code and dataset available from the corresponding author. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: [http://creativecommons.org/licenses/by-nc/2.0/](http://creativecommons.org/licenses/by-nc/2.0/) and [http://creativecommons.org/licenses/by-nc/2.0/legalcode](http://creativecommons.org/licenses/by-nc/2.0/legalcode). ## References 1. Higgins JPT, Green S, eds. Cochrane handbook for systematic reviews of interventions. Version 5.0.0. Cochrane Collaboration, 2008. 2. Rosenthal R. Experimenter effects in behavioral research. Appleton-Century-Crofts, 1966:13-4. 3. Karanicolas PJ, Bhandari M, Taromi B, Akl EA, Bassler D, Alonso-Coello P, et al. Blinding of outcomes in trials of orthopedic trauma: an opportunity to enhance the validity of clinical trials. J Bone Joint Surg Am2008;90:1026-33. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.2106/JBJS.G.00963&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18451395&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 4. Haahr M, Hróbjartsson A. Who is blind in randomised clinical trials? An analysis of 200 trials and a survey of authors. Clin Trials2006;3:360-5. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToic3BjdGoiO3M6NToicmVzaWQiO3M6NzoiMy80LzM2MCI7czo0OiJhdG9tIjtzOjIzOiIvYm1qLzM0NC9ibWouZTExMTkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 5. Noseworthy JH, Ebers GC, Vandervoort MK, Farquhar RE, Yetisir E, Roberts R. The impact of blinding on the results of a randomized, placebo-controlled multiple sclerosis clinical trial. Neurology1994;44:16-20. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToibmV1cm9sb2d5IjtzOjU6InJlc2lkIjtzOjc6IjQ0LzEvMTYiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNDQvYm1qLmUxMTE5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 6. Poolman RW, Struijs PA, Krips R, Sierevelt IN, Marti RK, Farrokhyar F, et al. Reporting of outcomes in orthopedic randomized trials: does blinding of outcome assessors matter? J Bone Joint Surg Am2007;89:550-8. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.2106/JBJS.F.00683&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=17332104&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 7. Pildal J, Hróbjartsson A, Jørgensen KJ, Hilden J, Altman DG, Gøtzsche PC. Impact of allocation concealment on conclusions drawn from meta-analyses of randomised trials. Int J Epidemiol2007;36:847-57. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiaW50amVwaWQiO3M6NToicmVzaWQiO3M6ODoiMzYvNC84NDciO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNDQvYm1qLmUxMTE5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 8. Wood L, Egger M, Gluud LL, Schulz KF, Jüni P, Altman DG, et al. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ2008;336:601-5. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMzYvNzY0NC82MDEiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNDQvYm1qLmUxMTE5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 9. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials1986;7:177-88. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/0197-2456(86)90046-2&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=3802833&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1986F013900001&link_type=ISI) 10. Burkhoff D, Schmidt S, Schulman SP, Myers J, Resar J, Becker LC, et al. Transmyocardial laser revascularisation compared with continued medical therapy for treatment of refractory angina pectoris: a prospective randomised trial. ATLANTIC Investigators. Angina Treatments-Lasers and Normal Therapies in Comparison. Lancet1999;354:885-90. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/S0140-6736(99)08113-1&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=10489946&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000082511800009&link_type=ISI) 11. Dumville JC, Worthy G, Bland JM, Cullum N, Dowson C, Iglesias C, et al. Larval therapy for leg ulcers (VenUS II): randomised controlled trial. BMJ2009;338:b773. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE2OiIzMzgvbWFyMTlfMi9iNzczIjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzQ0L2Jtai5lMTExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 12. Govender S, Csimma C, Genant HK, Valentin-Opran A, Amit Y, Arbel R, et al. Recombinant human bone morphogenetic protein-2 for treatment of open tibial fractures: a prospective, controlled, randomized study of four hundred and fifty patients. J Bone Joint Surg Am2002;84-A:2123-34. 13. Jones AL, Bucholz RW, Bosse MJ, Mirza SK, Lyon TR, Webb LX, et al. Recombinant human BMP-2 and allograft compared with autogenous bone graft for reconstruction of diaphyseal tibial fractures with cortical defects: a randomized, controlled trial. J Bone Joint Surg Am2006;88:1431-41. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.2106/JBJS.E.00381&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=16818967&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 14. Aro HT, Govender S, Patel AD, Hernigou P, Perera de Gregorio A, Popescu GI, et al. Recombinant human bone morphogeneticprotein-2: a randomized trial in open tibial fractures treated with reamed nailfixation. J Bone Joint Surg Am2011;93:801-8. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.2106/JBJS.I.01763&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=21454742&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 15. Jull A, Walker N, Parag V, Molan P, Rodgers A, for the Honey as Adjuvant Leg Ulcer Therapy Trial Collaborators. Randomized clinical trial of honey-impregnated dressings for venous leg ulcers. Br J Surg2008;95:175-82. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/bjs.6059&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18161896&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000255972000007&link_type=ISI) 16. Landsman AS, Robbins AH, Angelini PF, Wu CC, Cook J, Oster M, et al. Treatment of mild, moderate, and severe onychomycosis using 870- and 930-nm light exposure. J Am Podiatr Med Assoc2010;100:166-77. [PubMed](https://www.bmj.com/lookup/external-ref?access_num=20479446&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 17. Martin DF, Sierra-Madero J, Walmsley S, Wolitz RA, Macey K, Georgiou P, et al. A controlled trial of valganciclovir as induction therapy for cytomegalovirus retinitis. N Engl J Med2002;346:1119-26. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1056/NEJMoa011759&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=11948271&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000174880600004&link_type=ISI) 18. Meltzer HY, Alphs L, Green AI, Altamura AC, Anand R, Bertoldi A, et al. Clozapine treatment for suicidality in schizophrenia: International Suicide Prevention Trial (InterSePT). Arch Gen Psychiatry2003;60:82-91. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1001/archpsyc.60.1.82&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=12511175&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000180286800010&link_type=ISI) 19. Miller RS, Steward DL, Tami TA, Sillars MJ, Seiden AM, Shete M, et al. The clinical effects of hyaluronic acid ester nasal dressing (Merogel) on intranasal wound healing after functional endoscopic sinus surgery. Otolaryngol Head Neck Surg2003;128:862-9. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToic3BvdG8iO3M6NToicmVzaWQiO3M6OToiMTI4LzYvODYyIjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzQ0L2Jtai5lMTExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 20. Murtha AP, Kaplan AL, Paglia MJ, Mills BB, Feldstein ML, Ruff GL. Evaluation of a novel technique for wound closure using a barbed suture. Plast Reconstr Surg2006;117:1769-80. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1097/01.prs.0000209971.08264.b0&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=16651950&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 21. Noseworthy JH, Vandervoort MK, Penman M, Ebers G, Shumak K, Seland TP, et al. Cyclophosphamide and plasma exchange in multiple sclerosis. Lancet1991;337:1540-1. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/0140-6736(91)93226-Y&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=1675382&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1991FT11600023&link_type=ISI) 22. Oesterle SN, Sanborn TA, Ali N, Resar J, Ramee SR, Heuser R, et al. Percutaneous transmyocardial laser revascularisation for severe angina: the PACIFIC randomised trial. Potential Class Improvement From Intramyocardial Channels. Lancet2000;356:1705-10. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/S0140-6736(00)03203-7&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=11095257&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000165462300008&link_type=ISI) 23. Reynolds T, Russell L, Deeth M, Jones H, Birchall L. A randomised controlled trial comparing Drawtex with standard dressings for exuding wounds. J Wound Care2004;13:71-4. [PubMed](https://www.bmj.com/lookup/external-ref?access_num=14999992&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 24. Reynolds T, Russell L. Evaluation of a wound dressing using different research methods. Br J Nurs2004;13:S21-4. [PubMed](https://www.bmj.com/lookup/external-ref?access_num=15228025&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 25. Waibel KH, Golding H, Manischewitz J, King LR, Tuchscherer M, Topolski RL, et al. Clinical and immunological comparison of smallpox vaccine administered to the outer versus the inner upper arms of vaccinia-naive adults. Clin Infect Dis2006;42:e16-20. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiY2lkIjtzOjU6InJlc2lkIjtzOjg6IjQyLzQvZTE2IjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzQ0L2Jtai5lMTExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 26. Brandstrup B, Tønnesen H, Beier-Holgersen R, Hjortsø E, Ørding H, Lindorff-Larsen K, et al. Effects of intravenous fluid restriction on postoperative complications: comparison of two perioperative fluid regimens: a randomized assessor-blinded multicenter trial. Ann Surg2003;238:641-8. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1097/01.sla.0000094387.50865.23&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=14578723&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000186308300001&link_type=ISI) 27. Smith S, Busso M, McClaren M, Bass LS. A randomized, bilateral, prospective comparison of calcium hydroxylapatite microspheres versus human-based collagen for the correction of nasolabial folds. Dermatol Surg2007;33(suppl 2):S112-21. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1111/j.1524-4725.2007.33350.x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18086048&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 28. Medicis Aesthetics. FDA PMA P40024/s051 executive summary. 2011. [www.fda.gov/downloads/AdvisoryCommittees/CommitteesMeetingMaterials/MedicalDevices/](http://www.fda.gov/downloads/AdvisoryCommittees/CommitteesMeetingMaterials/MedicalDevices/)[MedicalDevicesAdvisoryCommittee/GeneralandPlasticSurgeryDevicesPanel/UCM252572.pdf](http://MedicalDevicesAdvisoryCommittee/GeneralandPlasticSurgeryDevicesPanel/UCM252572.pdf). 29. Dover JS, Rubin MG, Bhatia AC. Review of the efficacy, durability, and safety data of two nonanimal stabilized hyaluronic acid fillers from a prospective, randomized, comparative, multicenter study. Dermatol Surg2009;35(suppl 1):322-31. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1111/j.1524-4725.2008.01060.x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=19207321&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 30. Iglesia CB, Sokol AI, Sokol ER, Kudish BI, Gutman RE, Peterson JL, et al.Vaginal mesh for prolapse: a randomized controlled trial. Obstet Gynecol2010;116:293-303. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1097/AOG.0b013e3181e7d7f8&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=20664388&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000280186300008&link_type=ISI) 31. Kadish A, Nademanee K, Volosin K, Krueger S, Neelagaru S, Raval N, et al. A randomized controlled trial evaluating the safety and efficacy of cardiac contractility modulation in advanced heart failure. Am Heart J2011;161:329-37. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.ahj.2010.10.025&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=21315216&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000287188000018&link_type=ISI) 32. Still J, Glat P, Silverstein P, Griswold J, Mozingo D. The use of a collagen sponge/living cell composite material to treat donor sites in burn patients. Burns2003;29:837-41. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/S0305-4179(03)00164-5&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=14636761&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000186962600014&link_type=ISI) 33. Swiontkowski MF, Aro HT, Donell S, Esterhai JL, Goulet J, Jones A, et al. Recombinant human bone morphogenetic protein-2 in open tibial fractures. A subgroup analysis of data combined from two prospective randomized studies. J Bone Joint Surg Am2006;88:1258-65. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.2106/JBJS.E.00499&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=16757759&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 34. Baumann LS, Shamban AT, Lupo MP, Monheit GD, Thomas JA, Murphy DK, et al, for the JUVEDERM vs ZYPLAST Nasolabial Fold Study Group. Comparison of smooth-gelhyaluronic acid dermal fillers with cross-linked bovine collagen: a multicenter, double-masked, randomized, within-subject study. Dermatol Surg2007;33(suppl 2):S128-35. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1111/j.1524-4725.2007.33026.x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18086050&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 35. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ1997;315:640-5. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMTUvNzEwOS82NDAiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNDQvYm1qLmUxMTE5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 36. Danner SA, Matheron S. Cytomegalovirus retinitis in AIDS patients: a comparative study of intravenous and oral ganciclovir as maintenance therapy. AIDS1996;10(suppl 4):S7-11. [PubMed](https://www.bmj.com/lookup/external-ref?access_num=9110064&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1996WC42500003&link_type=ISI) 37. Devereaux PJ, Manns BJ, Ghali WA, Quan H, Lacchetti C, Montori VM, et al. Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. JAMA2001;285:2000-3. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1001/jama.285.15.2000&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=11308438&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000167995900030&link_type=ISI) 38. Dodd DC. Blind slide reading or the uninformed versus the informed pathologist. Comments Toxicol1988;2:88-91. 39. Burkhardt JE, Ennulat D, Pandher K, Solter PF, Troth SP, Boyce RW, et al. Topic of histopathology blinding in nonclinical safety biomarker qualification studies. Toxicol Pathol2010;38:666-7. [FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NToic3B0cHgiO3M6NToicmVzaWQiO3M6ODoiMzgvNC82NjYiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNDQvYm1qLmUxMTE5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 40. Boutron I, Guittet L, Estellat C, Moher D, Hróbjartsson A, Ravaud P. Reporting methods of blinding in randomized controlled trials assessing non-pharmacological treatments. A systematic review. PLoS Med2007;4:e61. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1371/journal.pmed.0040061&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=17311468&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 41. Karanicolas PJ, Bhandari M, Walter SD, Heels-Ansdell D, Guyatt GH, for the Collaboration for Outcomes Assessment in Surgical Trials (COAST) Musculoskeletal Group. Radiographs of hip fractures were digitally altered to mask surgeons to the type of implant without compromising the reliability of quality ratings or making the rating process more difficult. J Clin Epidemiol2009;62:214-23,e1. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.jclinepi.2008.05.006&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18778914&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 42. Nüesch E, Reichenbach S, Trelle S, Rutjes AW, Liewald K, Sterchi R, et al. The importance of allocation concealment and patient blinding in osteoarthritis trials: a meta-epidemiologic study. Arthritis Rheum2009;61:1633-41. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/art.24894&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=19950329&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000279276100005&link_type=ISI) 43. Pogue J, Walter SD, Yusuf S. Evaluating the benefit of event adjudication of cardiovascular outcomes in large simple RCTs. Clin Trials2009;6:239-51. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToic3BjdGoiO3M6NToicmVzaWQiO3M6NzoiNi8zLzIzOSI7czo0OiJhdG9tIjtzOjIzOiIvYm1qLzM0NC9ibWouZTExMTkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 44. Valldeoriola F, Regidor I, Mínguez-Castellanos A, Lezcano E, García-Ruiz P, Rojo A, et al. Efficacy and safety of pallidal stimulation in primary dystonia: results of the Spanish multicentric study. J Neurol Neurosurg Psychiatry2010;81:65-9. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiam5ucCI7czo1OiJyZXNpZCI7czo3OiI4MS8xLzY1IjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzQ0L2Jtai5lMTExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 45. Meining A, Dittler HJ, Wolf A, Lorenz R, Schusdziarra V, Siewert JR, et al. You get what you expect? A critical appraisal of imaging methodology in endosonographic cancer staging. Gut2002;50:599-603. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ3V0am5sIjtzOjU6InJlc2lkIjtzOjg6IjUwLzUvNTk5IjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzQ0L2Jtai5lMTExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 46. Miles WS, Shaw V, Risucci D. The role of blinded interviews in the assessment of surgical residency candidates. Am J Surg2001;182:143-6. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/S0002-9610(01)00668-7&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=11574085&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) 47. Mason P, Back SA, Fields HL. A confocal laser microscopic study of enkephalin-immunoreactive appositions on to physiologically identified neurons in the rostral ventromedial medulla. J Neurosci1992;12:4023-36. [Abstract](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Njoiam5ldXJvIjtzOjU6InJlc2lkIjtzOjEwOiIxMi8xMC80MDIzIjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzQ0L2Jtai5lMTExOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 48. McClure S, Evans RB, Miles KG, Reinartson EL, Hawkins JF, Honnas CM. Extracorporal shock wave therapy for treatment of navicular syndrome. Am Assoc Equine Pract Proc2004;50:316-9. 49. Dror I, Rosenthal R. Meta-analytically quantifying the reliability and biasability of forensic experts. J Forensic Sci2008;53:900-3 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1111/j.1556-4029.2008.00762.x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18489557&link_type=MED&atom=%2Fbmj%2F344%2Fbmj.e1119.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000257664300021&link_type=ISI) 50. Salvia JA, Meisel CJ. Observer bias: a methodological consideration in special education research. J Special Education1980;14:261-70. [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToic3BzZWQiO3M6NToicmVzaWQiO3M6ODoiMTQvMi8yNjEiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNDQvYm1qLmUxMTE5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 51. Marsh DM, Hanlon TJ. Seeing what we want to see: confirmation bias in animal behaviour research. Ethology2007;113:1089-98. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1111/j.1439-0310.2007.01406.x&link_type=DOI) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000250262400008&link_type=ISI) 52. Rosenthal R. How often are our numbers wrong? Am Psychol1978;33:1005-8 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1037/0003-066X.33.11.1005&link_type=DOI) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1978FX25900004&link_type=ISI) 53. Lyons JA, Serbin LA. Observer bias in scoring boys’ and girls’ aggression. Sex Roles1986;14:301-13. [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1986C298800006&link_type=ISI)