GRADEing outcomes step-by-step

TABLE OF CONTENTS

1. Introduction
2. Quality of Evidence Grades
3. GRADEing rules
4. Study design
- 4.1 Study designs - management questions
- 4.2 Study designs - diagnostic questions
5. Risk of bias
- 5.1. Risk of bias assessment tool
6. Inconsistency
7. Indirectness
- 7.1. Indirectness assessment tool
8. Imprecision
9. Other considerations
10. ROBINS-I

1. Introduction

Assessment of the quality and certainty of the evidence is one of the key features of GRADEpro.

Having summarised your body of evidence in a management table or a diagnostic table you need to assess the outcomes and results according to the GRADE method. This article will show you how to do this step by step while explaining the rules of this process in GRADEpro.

While some rules of the GRADE domain will be explained here for a better understanding of the process, this tutorial will focus mostly on the technical aspect of performing the certainty of evidence assessment in GRADEpro. If you are looking for domain guidance on this, please refer to the GRADE Guidance papers on this subject or the GRADEbook.

Whether it is a management or diagnostic question, the elements of the certainty of evidence assessment are the same:

Study design
Risk of bias
Indirectness
Inconsistency
Imprecision
Other considerations (Publication bias, Large effect, Plausible confounding, Dose-response gradient)

They have been described in more detail below.

2. Quality of Evidence Grades

Although the quality of evidence represents a continuum, the GRADE approach results in an assessment of the quality of a body of evidence in one of four grades:

High -> We are very confident that the true effect lies close to that of the estimate of the effect.
Moderate -> We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low -> Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect.
Very Low -> We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of the effect

The grades are presented as the following symbols:

⨁⨁⨁⨁
High

⨁⨁⨁◯
Moderate

⨁⨁◯◯
Low

⨁◯◯◯
Very low

3. GRADEing rules

The overall grade of the certainty of evidence depends on various factors, which have been described in detail below.

Here, we will sum up how the particular criteria affect the final grading.

decision tree with criteria affecting the final grading

4. Study design

The study design is critical to judgments about the quality of evidence.

Management questions

For recommendations regarding management strategies – as opposed to establishing prognosis or the accuracy of diagnostic tests – randomised trials provide, in general, far stronger evidence than non-randomised studies, and rigorous non-randomised studies provide stronger evidence than uncontrolled case series.

In the GRADE approach to quality of evidence:

randomised trials without important limitations provide high-quality evidence
non-randomised studies without special strengths or important limitations provide low-quality evidence

Limitations or special strengths can, however, modify the quality of the evidence of both randomised trials and non-randomised studies.

Diagnostic questions

In a typical test accuracy study, a consecutive series of patients suspected of a particular condition are subjected to the index test (the test being evaluated). Then all patients receive a reference or gold standard (the best available method to establish the presence of the target condition). While in the GRADE approach, appropriate accuracy studies start as high-quality evidence about diagnostic accuracy, these studies are vulnerable to limitations and often lead to low-quality evidence to support guideline recommendations, mostly owing to indirectness of evidence associated with diagnostic accuracy being only a surrogate for patient outcomes.

Cross-sectional or cohort studies in patients with diagnostic uncertainty and direct comparison of test results with an appropriate reference standard (best possible alternative test strategy) are considered high quality. They can move to moderate, low or very low, depending on other factors.

4.1 Study designs - management questions

In the case of management questions, two study designs are available

randomised trial -> the baseline GRADEing for randomised trials is High
non-randomised study -> the baseline GRADEing for non-randomised studies is Low (unless ROBINS-I is applied)

study design cell with types of study designs available in case of management questions When selecting the non-randomised study design, you need to select the type of non-randomised study as well. The type does not affect GRADEing in any way. The available non-randomised study types are:

interrupted time series
before-after studies
cohort studies
case-control studies
cross-sectional studies
case series
case reports
case-control + other combined
other design

The study design can be selected by:

clicking on a cell in the Study design column in case of GRADE evidence profile tables

study design column in GRADE evidence profile

clicking on a cell in the № of participants (studies) column in case of Summary of Findings table and the Summary of Findings table (v2)

№ of participants (studies) column in Summary of Findings table

clicking on a cell in the Participants (studies) column in case of GRADE profile (v2) and Summary of Findings (v4)

Participants (studies) column in GRADE profile (v2)

clicking on a cell in the Outcome № of participants (studies) column in case of Summary of Findings table (v3)

Outcome № of participants (studies) column in Summary of Findings table (v3)

4.2 Study designs - diagnostic questions

In case of diagnostic questions three study designs are available. All three have a baseline GRADEing of High.

cross-sectional (cohort type accuracy study)
case-control type accuracy study
cohort & case-control type studies

study design cell with types of study designs available in case of diagnostic questions

The study design can be selected by:

clicking on a cell in the Study design column in case of Layer one and Layer two diagnostic tables

study design column in case of Layer one and Layer two

clicking on a cell in the Certainty of the Evidence (GRADE) column in case of Layer one - SoF and Layer two - SoF diagnostic tables

Certainty of the Evidence (GRADE) column in case of Layer one - SoF and Layer two - SoF

5. Risk of bias

You should assess if the studies had limitations in design or execution that were serious enough to downgrade the quality of evidence for this outcome.

risk of bias cell

To rate study limitations:

If you think any limitations were negligible, choose not serious
If you think there were serious limitations, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level
If you think there were very serious limitations, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels

If you downgrade to serious or very serious, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on the Risk of Bias assessment or in the GRADEbook.

To rate the risk of bias, you should:

click on the cell in the Risk of bias column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables
click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

Certainty/Certainty of the Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF

5.1. Risk of bias assessment tool

For the Risk of bias criterion, an additional tool is available based on Cochrane's RoB 2.

It provides a detailed discussion of study-level assessments of the risk of bias in the context of a Cochrane review. It proposes an approach to assessing the risk of bias for an outcome across studies as low risk of bias, unclear risk of bias and high risk of bias. These assessments may be used directly to inform the assessment of study limitations in the GRADE approach.

The tool can be accessed through the same menu as the Risk of bias itself.

Restriction: For the table of references to be displayed, the references need to be attached to the Number of studies cell in the evidence table. Learn how to attach the references.

table of references to be assessed for each criterion on scale of low, unclear and high risk of bias

Within the table, you can assess each study for each criterion on a scale of low risk of bias, unclear risk of bias and high risk of bias.

cell with scale of low, unclear and high risk of bias

You can then assess the number of studies at particular levels of risk of bias.

cell with assessment of number of studies at each risk-of-bias level

Based on this assessment, you can mark the final level of the risk of bias for this outcome.

final level of risk of bias for the outcome

If you downgrade to serious or very serious, you will be required to provide an explanation.

6. Inconsistency

You should assess if the results were consistent across studies and if any inconsistency may have been serious enough to downgrade the quality of evidence for this outcome.

To rate inconsistency:

If you think any inconsistency was negligible, choose not serious
If you think there was serious inconsistency, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level.
If you think there was very serious inconsistency, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels.

If you downgrade to serious or very serious, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on the inconsistency assessment or in GRADEbook.

To rate the inconsistency, you should

click on the cell in the Inconsistency column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Inconsistency column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

Certainty/Certainty of Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF

7. Indirectness

You should assess if the evidence directly answers the health care question you have asked and if any indirectness of available evidence may have been serious enough to downgrade the quality of evidence for this outcome.

ratings available for indirectness criterion

To rate indirectness:

If you think the evidence is direct, choose not serious
If you have serious doubts about directness, choose serious -> this will downgrade the evidence for this outcome by 1 level
If you have very serious doubts about directness choose very serious -> this will downgrade the evidence for this outcome by 2 levels

If you downgrade to serious or very serious, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on the indirectness assessment or in the GRADEbook.

To rate the indirectness, you should

click on the cell in the Indirectness column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Indirectness column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

Certainty/Certainty of Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF

7.1. Indirectness assessment tool

An additional tool for assessing indirectness is available in GRADEpro.

ratings available for indirectness criterion

In a table, you summarise details of the PICO question and the outcome itself. Then you can mark the answers to a question Is the evidence sufficiently direct? for each detail.

table with details of PICO question and outcome itself

Having provided all the answers, you can determine the final indirectness judgement more easily.

8. Imprecision

You should assess if the results are precise enough and if any imprecision of the results may have been serious enough to downgrade the quality of evidence for this outcome. Imprecision is defined differently for authors of systematic reviews and for guideline panels.

ratings available for imprecision criterion

To rate imprecision:

If you think the results were precise, choose not serious
If there was serious imprecision, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level
If there was very serious imprecision, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels
If there was extremely serious imprecision, choose extremely serious -> this will downgrade the quality of evidence for this outcome by 3 levels

If you downgrade to serious, very serious or extremely serious, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on the imprecision assessment or in the GRADEbook.

To rate the indirectness, you should:

click on the cell in the Indirectness column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Indirectness column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

Certainty/Certainty of the Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF

9. Other considerations

Other considerations is a collective term for additional factors that should be taken into account while assessing the certainty of evidence. They include one factor that can possibly lower the overall quality of evidence - publication bias - as well as three factors that can possibly increase the quality of evidence:

Large magnitude of an effect
Effect of plausible residual confounding
Dose-response gradient

Restriction: The factors that increase the quality of evidence should only be used for non-randomised studies which are not otherwise downgraded through the other factors and which were not graded using the ROBINS-I approach.

window with list of other considerations

To rate the Other considerations factors, you should:

click on the cell in the Other considerations column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

Other considerations column in GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

Certainty/Certainty of the Evidence (GRADE) column in Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Layer one - SoF and Layer two - SoF

9.1. Publication bias

Publication bias is a systematic underestimation or an overestimation of the underlying beneficial or harmful effect due to the selective publication of studies. Confidence in the combined estimates of effects from a systematic review can be reduced when publication bias is suspected, even when the included studies themselves have a low risk of bias.

ratings available for publication bias criterion

You should assess if there is a probability of publication bias and if reporting bias may have been serious enough to downgrade the quality of evidence for this outcome.

To rate the probability of the publication bias:

If you think there is no evidence of publication bias, choose undetected
If there is a high probability of publication bias, choose strongly suspected -> this will downgrade the certainty of the evidence for this outcome by 1 level.

If you downgrade to strongly suspected, you will be required to provide an explanation.

You may seek further guidance in the GRADE Guidelines paper on publication bias or in GRADEbook.

9.2. Large magnitude of an effect

When the body of evidence from non-randomised studies not downgraded for any of the 5 factors yields large or very large estimates of the magnitude of an intervention effect, then we may be more confident about the results. In those situations, even though non-randomised studies are likely to provide an overestimate of the true effect, the study design that is more prone to bias is unlikely to explain all of the apparent benefit (or harm). Decisions to rate up the quality of evidence because of large or very large effects should consider not only the point estimate but also the precision (width of the CI) around that effect: one should rarely and very cautiously rate up the quality of evidence because of apparent large effects, if the CI overlaps substantially with effects smaller than the chosen threshold of clinical importance.

ratings available for magnitude of effect criterion

You should assess if the effect was large or very large and, if so, upgrade the quality of evidence accordingly for this outcome.

To rate the magnitude of the effect:

If the effect was not large, choose no
If the effect was large, choose large -> this will upgrade the quality of evidence for this outcome by 1 level
If the effect was very large, choose very large -> this will upgrade the quality of evidence for this outcome by 2 levels

You should not upgrade for large effect if any of the below criteria are met:

you set the study design as a randomised trial
you used ROBINS-I for GRADEing
you downgraded for any other criteria

You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.

9.3 Effect of plausible residual confounding

On occasion, all plausible residual confounding from non-randomised studies may be working to reduce the demonstrated effect or increase the effect if no effect was observed.

ratings available for effect of all plausible residual confounding criterion

You should assess if the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect. In either of these two cases, upgrade the quality of evidence for this outcome.

To rate the effect of all plausible residual confounding:

If there is no evidence that the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect choose no
If there is evidence that the influence of all plausible confounding would reduce a demonstrated effect, choose would reduce a demonstrated effect -> this will upgrade the quality of evidence for this outcome by 1 level
If there is evidence that the influence of all plausible confounding would suggest a spurious effect when results show no effect choose would suggest spurious effect -> this will upgrade the quality of evidence for this outcome by 1 level

You should not upgrade for the effect of plausible residual confounding if any of the below criteria are met:

you set the study design as a randomised trial
you used ROBINS-I for GRADEing
you downgraded for any other criteria

You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.

9.4. Dose-response gradient

The presence of a dose-response gradient has long been recognized as an important criterion for believing in a putative cause-effect relationship. The presence of a dose-response gradient may increase our confidence in the findings of non-randomised studies and thereby increase the quality of evidence.

ratings available for presence of dose-response gradient criterion

You should assess if there was a dose-response gradient only in non-randomised studies, not downgraded for any reason. If a dose-response gradient was present, upgrade the quality of evidence for this outcome.

To rate the presence of dose-response gradient:

If there is no evidence of a dose-response gradient, choose no
If there is evidence of dose-response gradient, choose yes -> this will upgrade the quality of evidence for this outcome by 1 level

You should not upgrade for dose-response gradient if any of the below criteria are met:

you set the study design as a randomised trial
you used ROBINS-I for GRADEing
you downgraded for any other criteria

You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.

10. ROBINS-I

ROBINS-I (Risk Of Bias In Non-randomised Studies - of Interventions) is a tool used to assess the risk of bias in non-randomised studies of interventions.

window with checkbox enabling ROBINS-I assessment for outcomes based on observational studies

For outcomes based on noon-randomised studies, it is possible to enable ROBINS-I assessment by clicking the checkbox, as presented in the illustration above.

This will change the certainty of evidence calculation in the evidence table and options available in risk of bias assessment to the one used in the ROBINS-I method.

GRADE method calculates certainty of evidence in non-randomised studies, starting from Low (Which represents lack of certainty in non-randomised studies) certainty and downgrading to Very low with any other serious and very serious risk of bias/inconsistency/indirectness/imprecision or publication bias assessment.

ROBINS-I option modifies this method. Certainty of evidence of non-randomised studies starts from High and is downgraded by 1 level (High/Moderate/Low/Very low) each time risk of bias assessment is lowered. With ROBINS-I enabled, an additional level of downgrading for the risk of bias - extremely serious - becomes available.

ratings available for risk of bias criterion when ROBINS-I is enabled You can learn more about ROBINS-I Tool on the official website of the tool, in the Cochrane Handbook, and in the GRADE Guidance article regarding the ROBINS-I tool.

RELATED ARTICLES