GRADEing outcomes step-by-step

TABLE OF CONTENTS

1. Introduction


Assessment of the quality and certainty of the evidence is one of the key features of GRADEpro.

Having summarised your body of evidence in a management table or a diagnostic table you need to assess the outcomes and results according to the GRADE method. This article will show you how to do this step by step while explaining the rules of this process in GRADEpro.

While some rules of the GRADE domain will be explained here for a better understanding of the process, this tutorial will focus mostly on the technical aspect of performing the certainty of evidence assessment in GRADEpro. If you are looking for domain guidance on this, please refer to the GRADE Guidance papers on this subject or the GRADEbook.

Whether it is a management or diagnostic question, the elements of the certainty of evidence assessment are the same:

They have been described in more detail below.


2. Quality of Evidence Grades


Although the quality of evidence represents a continuum, the GRADE approach results in an assessment of the quality of a body of evidence in one of four grades:

  • High -> We are very confident that the true effect lies close to that of the estimate of the effect.
  • Moderate -> We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
  • Low -> Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect.
  • Very Low -> We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of the effect

The grades are presented as the following symbols:

⨁⨁⨁⨁
High
⨁⨁⨁
Moderate
⨁⨁◯◯
Low
◯◯◯
Very low

 

3. GRADEing rules


The overall grade of the certainty of evidence depends on various factors, which have been described in detail below.

Here, we will sum up how the particular criteria affect the final grading.

decision tree with criteria affecting the final grading


4. Study design


The study design is critical to judgments about the quality of evidence.

Management questions

For recommendations regarding management strategies – as opposed to establishing prognosis or the accuracy of diagnostic tests – randomised trials provide, in general, far stronger evidence than non-randomised studies, and rigorous non-randomised studies provide stronger evidence than uncontrolled case series.

In the GRADE approach to quality of evidence:

  • randomised trials without important limitations provide high-quality evidence
  • non-randomised studies without special strengths or important limitations provide low-quality evidence

Limitations or special strengths can, however, modify the quality of the evidence of both randomised trials and non-randomised studies.

Diagnostic questions

In a typical test accuracy study, a consecutive series of patients suspected of a particular condition are subjected to the index test (the test being evaluated). Then all patients receive a reference or gold standard (the best available method to establish the presence of the target condition). While in the GRADE approach, appropriate accuracy studies start as high-quality evidence about diagnostic accuracy, these studies are vulnerable to limitations and often lead to low-quality evidence to support guideline recommendations, mostly owing to indirectness of evidence associated with diagnostic accuracy being only a surrogate for patient outcomes.

Cross-sectional or cohort studies in patients with diagnostic uncertainty and direct comparison of test results with an appropriate reference standard (best possible alternative test strategy) are considered high quality. They can move to moderate, low or very low, depending on other factors.


4.1 Study designs - management questions


In the case of management questions, two study designs are available

  • randomised trial -> the baseline GRADEing for randomised trials is High
  • non-randomised study -> the baseline GRADEing for non-randomised studies is Low (unless ROBINS-I is applied)

study design cell with types of study designs available in case of management questionsWhen selecting the non-randomised study design, you need to select the type of non-randomised study as well. The type does not affect GRADEing in any way. The available non-randomised study types are:

  • interrupted time series
  • before-after studies
  • cohort studies
  • case-control studies
  • cross-sectional studies
  • case series
  • case reports
  • case-control + other combined
  • other design

 The study design can be selected by:

study design column in GRADE evidence profile

№ of participants (studies) column in Summary of Findings table

Participants (studies) column in GRADE profile (v2)

Outcome № of participants (studies) column in Summary of Findings table (v3)


4.2 Study designs - diagnostic questions


In case of diagnostic questions three study designs are available. All three have a baseline GRADEing of High.

  • cross-sectional (cohort type accuracy study)
  • case-control type accuracy study
  • cohort & case-control type studies

study design cell with types of study designs available in case of diagnostic questions

 The study design can be selected by:

  • clicking on a cell in the Study design column in case of Layer one and Layer two diagnostic tables

study design column in case of Layer one and Layer two

Certainty of the Evidence (GRADE) column in case of Layer one - SoF and Layer two - SoF


5. Risk of bias


You should assess if the studies had limitations in design or execution that were serious enough to downgrade the quality of evidence for this outcome.

risk of bias cell

To rate study limitations:

  • If you think any limitations were negligible, choose not serious
  • If you think there were serious limitations, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level 
  • If you think there were very serious limitations, choose very serious  -> this will downgrade the quality of evidence for this outcome by 2 levels

If you downgrade to serious or very serious, you will be required to provide an explanation


You may seek further guidance in the GRADE Guidelines paper on the Risk of Bias assessment or in the GRADEbook.


To rate the risk of bias, you should:

Certainty/Certainty of the Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF


5.1. Risk of bias assessment tool


For the Risk of bias criterion, an additional tool is available based on Cochrane's RoB 2.

It provides a detailed discussion of study-level assessments of the risk of bias in the context of a Cochrane review. It proposes an approach to assessing the risk of bias for an outcome across studies as low risk of bias, unclear risk of bias and high risk of bias. These assessments may be used directly to inform the assessment of study limitations in the GRADE approach.

The tool can be accessed through the same menu as the Risk of bias itself.

risk of bias cell

Restriction: For the table of references to be displayed, the references need to be attached to the Number of studies cell in the evidence table. Learn how to attach the references. 


table of references to be assessed for each criterion on scale of low, unclear and high risk of bias

Within the table, you can assess each study for each criterion on a scale of low risk of biasunclear risk of bias and high risk of bias.

cell with scale of low, unclear and high risk of bias

You can then assess the number of studies at particular levels of risk of bias.

cell with assessment of number of studies at each risk-of-bias level

Based on this assessment, you can mark the final level of the risk of bias for this outcome.

final level of risk of bias for the outcome

If you downgrade to serious or very serious, you will be required to provide an explanation


6. Inconsistency


You should assess if the results were consistent across studies and if any inconsistency may have been serious enough to downgrade the quality of evidence for this outcome.

To rate inconsistency:

  • If you think any inconsistency was negligible, choose not serious
  • If you think there was serious inconsistency, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level.
  • If you think there was very serious inconsistency, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels.

ratings available for inconsistency criterion

If you downgrade to serious or very serious, you will be required to provide an explanation

You may seek further guidance in the GRADE Guidelines paper on the inconsistency assessment or in GRADEbook.

To rate the inconsistency, you should

Inconsistency column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Certainty/Certainty of Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF


7. Indirectness


You should assess if the evidence directly answers the health care question you have asked and if any indirectness of available evidence may have been serious enough to downgrade the quality of evidence for this outcome.

ratings available for indirectness criterion

To rate indirectness:

  • If you think the evidence is direct, choose not serious
  • If you have serious doubts about directness, choose serious -> this will downgrade the evidence for this outcome by 1 level
  • If you have very serious doubts about directness choose very serious -> this will downgrade the evidence for this outcome by 2 levels

If you downgrade to serious or very serious, you will be required to provide an explanation

You may seek further guidance in the GRADE Guidelines paper on the indirectness assessment or in the GRADEbook.

To rate the indirectness, you should

Indirectness column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Certainty/Certainty of Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF


7.1. Indirectness assessment tool


An additional tool for assessing indirectness is available in GRADEpro.

ratings available for indirectness criterion

In a table, you summarise details of the PICO question and the outcome itself. Then you can mark the answers to a question Is the evidence sufficiently direct? for each detail.

table with details of PICO question and outcome itself


Having provided all the answers, you can determine the final indirectness judgement more easily.


8. Imprecision


You should assess if the results are precise enough and if any imprecision of the results may have been serious enough to downgrade the quality of evidence for this outcome. Imprecision is defined differently for authors of systematic reviews and for guideline panels.

ratings available for imprecision criterion

To rate imprecision:

  • If you think the results were precise, choose not serious
  • If there was serious imprecision, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level
  • If there was very serious imprecision, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels
  • If there was extremely serious imprecision, choose extremely serious -> this will downgrade the quality of evidence for this outcome by 3 levels

If you downgrade to serious, very serious or extremely serious, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on the imprecision assessment or in the GRADEbook.

To rate the indirectness, you should:

Indirectness column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Certainty/Certainty of the Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF


9. Other considerations


Other considerations is a collective term for additional factors that should be taken into account while assessing the certainty of evidence. They include one factor that can possibly lower the overall quality of evidence - publication bias - as well as three factors that can possibly increase the quality of evidence:

Restriction: The factors that increase the quality of evidence should only be used for non-randomised studies which are not otherwise downgraded through the other factors and which were not graded using the ROBINS-I approach.

window with list of other considerations


To rate the Other considerations factors, you should:

Other considerations column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Certainty/Certainty of the Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF


9.1. Publication bias


Publication bias is a systematic underestimation or an overestimation of the underlying beneficial or harmful effect due to the selective publication of studies. Confidence in the combined estimates of effects from a systematic review can be reduced when publication bias is suspected, even when the included studies themselves have a low risk of bias.

ratings available for publication bias criterion

You should assess if there is a probability of publication bias and if reporting bias may have been serious enough to downgrade the quality of evidence for this outcome.

To rate the probability of the publication bias:

  • If you think there is no evidence of publication bias, choose undetected
  • If there is a high probability of publication bias, choose strongly suspected -> this will downgrade the certainty of the evidence for this outcome by 1 level.

If you downgrade to strongly suspected, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on publication bias or in GRADEbook.


9.2. Large magnitude of an effect


When the body of evidence from non-randomised studies not downgraded for any of the 5 factors yields large or very large estimates of the magnitude of an intervention effect, then we may be more confident about the results. In those situations, even though non-randomised studies are likely to provide an overestimate of the true effect, the study design that is more prone to bias is unlikely to explain all of the apparent benefit (or harm). Decisions to rate up the quality of evidence because of large or very large effects should consider not only the point estimate but also the precision (width of the CI) around that effect: one should rarely and very cautiously rate up the quality of evidence because of apparent large effects, if the CI overlaps substantially with effects smaller than the chosen threshold of clinical importance.

ratings available for magnitude of effect criterion

You should assess if the effect was large or very large and, if so, upgrade the quality of evidence accordingly for this outcome.

To rate the magnitude of the effect:

  • If the effect was not large, choose no
  • If the effect was large, choose large -> this will upgrade the quality of evidence for this outcome by 1 level
  • If the effect was very large, choose very large -> this will upgrade the quality of evidence for this outcome by 2 levels

You should not upgrade for large effect if any of the below criteria are met:

  • you set the study design as a randomised trial
  • you used ROBINS-I for GRADEing
  • you downgraded for any other criteria

You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.


9.3  Effect of plausible residual confounding


On occasion, all plausible residual confounding from non-randomised studies may be working to reduce the demonstrated effect or increase the effect if no effect was observed.

ratings available for effect of all plausible residual confounding criterion

You should assess if the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect. In either of these two cases, upgrade the quality of evidence for this outcome.

To rate the effect of all plausible residual confounding:

  • If there is no evidence that the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect choose no
  • If there is evidence that the influence of all plausible confounding would reduce a demonstrated effect, choose would reduce a demonstrated effect -> this will upgrade the quality of evidence for this outcome by 1 level
  • If there is evidence that the influence of all plausible confounding would suggest a spurious effect when results show no effect choose would suggest spurious effect -> this will upgrade the quality of evidence for this outcome by 1 level

You should not upgrade for the effect of plausible residual confounding if any of the below criteria are met:

  • you set the study design as a randomised trial
  • you used ROBINS-I for GRADEing
  • you downgraded for any other criteria

You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.


9.4. Dose-response gradient


The presence of a dose-response gradient has long been recognized as an important criterion for believing in a putative cause-effect relationship. The presence of a dose-response gradient may increase our confidence in the findings of non-randomised studies and thereby increase the quality of evidence.

ratings available for presence of dose-response gradient criterion

You should assess if there was a dose-response gradient only in non-randomised studies, not downgraded for any reason. If a dose-response gradient was present, upgrade the quality of evidence for this outcome.

To rate the presence of dose-response gradient:

  • If there is no evidence of a dose-response gradient, choose no
  • If there is evidence of dose-response gradient, choose yes -> this will upgrade the quality of evidence for this outcome by 1 level

You should not upgrade for dose-response gradient if any of the below criteria are met:

  • you set the study design as a randomised trial
  • you used ROBINS-I for GRADEing
  • you downgraded for any other criteria

You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.


10. ROBINS-I


ROBINS-I (Risk Of Bias In Non-randomised Studies - of Interventions) is a tool used to assess the risk of bias in non-randomised studies of interventions.

window with checkbox enabling ROBINS-I assessment for outcomes based on observational studies

  

For outcomes based on noon-randomised studies, it is possible to enable ROBINS-I assessment by clicking the checkbox, as presented in the illustration above.

This will change the certainty of evidence calculation in the evidence table and options available in risk of bias assessment to the one used in the ROBINS-I method.

GRADE method calculates certainty of evidence in non-randomised studies, starting from Low (Which represents lack of certainty in non-randomised studies) certainty and downgrading to Very low with any other serious and very serious risk of bias/inconsistency/indirectness/imprecision or publication bias assessment.

ROBINS-I option modifies this method. Certainty of evidence of non-randomised studies starts from High and is downgraded by 1 level (High/Moderate/Low/Very low) each time risk of bias assessment is lowered. With ROBINS-I enabled, an additional level of downgrading for the risk of bias - extremely serious - becomes available.

ratings available for risk of bias criterion when ROBINS-I is enabled You can learn more about ROBINS-I Tool on the official website of the tool, in the Cochrane Handbook, and in the GRADE Guidance article regarding the ROBINS-I tool.



RELATED ARTICLES


  1. Adding references and footnotes to tables
  2. GRADEbook
  3. GRADE Guidelines

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article