TABLE OF CONTENTS
- 1. Introduction
- 2. Quality of Evidence Grades
- 3. GRADEing rules
- 4. Study design
- 5. Risk of bias
- 6. Inconsistency
- 7. Indirectness
- 8. Imprecision
- 9. Other considerations
- 10. ROBINS-I

1. Introduction
Assessment of the quality and certainty of the evidence is one of the key features of GRADEpro.
Having summarised your body of evidence in a management table or a diagnostic table you need to assess the outcomes and results according to the GRADE method. This article will show you how to do this step by step while explaining the rules of this process in GRADEpro.
While some rules of the GRADE domain will be explained here for a better understanding of the process, this tutorial will focus mostly on the technical aspect of performing the certainty of evidence assessment in GRADEpro. If you are looking for domain guidance on this, please refer to the GRADE Guidance papers on this subject or the GRADEbook.
Whether it is a management or diagnostic question, the elements of the certainty of evidence assessment are the same:
- Study design
- Risk of bias
- Indirectness
- Inconsistency
- Imprecision
- Other considerations (Publication bias, Large effect, Plausible confounding, Dose-response gradient)
They have been described in more detail below.
2. Quality of Evidence Grades
Although the quality of evidence represents a continuum, the GRADE approach results in an assessment of the quality of a body of evidence in one of four grades:
- High -> We are very confident that the true effect lies close to that of the estimate of the effect.
- Moderate -> We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
- Low -> Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect.
- Very Low -> We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of the effect
The grades are presented as the following symbols:
| ⨁⨁⨁⨁ High | ⨁⨁⨁◯ Moderate | ⨁⨁◯◯ Low | ⨁◯◯◯ Very low |
3. GRADEing rules
The overall grade of the certainty of evidence depends on various factors, which have been described in detail below.
Here, we will sum up how the particular criteria affect the final grading.

4. Study design
The study design is critical to judgments about the quality of evidence.
Management questions
For recommendations regarding management strategies – as opposed to establishing prognosis or the accuracy of diagnostic tests – randomised trials provide, in general, far stronger evidence than non-randomised studies, and rigorous non-randomised studies provide stronger evidence than uncontrolled case series.
In the GRADE approach to quality of evidence:
- randomised trials without important limitations provide high-quality evidence
- non-randomised studies without special strengths or important limitations provide low-quality evidence
Limitations or special strengths can, however, modify the quality of the evidence of both randomised trials and non-randomised studies.
Diagnostic questions
In a typical test accuracy study, a consecutive series of patients suspected of a particular condition are subjected to the index test (the test being evaluated). Then all patients receive a reference or gold standard (the best available method to establish the presence of the target condition). While in the GRADE approach, appropriate accuracy studies start as high-quality evidence about diagnostic accuracy, these studies are vulnerable to limitations and often lead to low-quality evidence to support guideline recommendations, mostly owing to indirectness of evidence associated with diagnostic accuracy being only a surrogate for patient outcomes.
Cross-sectional or cohort studies in patients with diagnostic uncertainty and direct comparison of test results with an appropriate reference standard (best possible alternative test strategy) are considered high quality. They can move to moderate, low or very low, depending on other factors.
4.1 Study designs - management questions
In the case of management questions, two study designs are available
- randomised trial -> the baseline GRADEing for randomised trials is High
- non-randomised study -> the baseline GRADEing for non-randomised studies is Low (unless ROBINS-I is applied)
When selecting the non-randomised study design, you need to select the type of non-randomised study as well. The type does not affect GRADEing in any way. The available non-randomised study types are:
- interrupted time series
- before-after studies
- cohort studies
- case-control studies
- cross-sectional studies
- case series
- case reports
- case-control + other combined
- other design
The study design can be selected by:
- clicking on a cell in the Study design column in case of GRADE evidence profile tables

- clicking on a cell in the № of participants (studies) column in case of Summary of Findings table and the Summary of Findings table (v2)

- clicking on a cell in the Participants (studies) column in case of GRADE profile (v2) and Summary of Findings (v4)

- clicking on a cell in the Outcome № of participants (studies) column in case of Summary of Findings table (v3)

4.2 Study designs - diagnostic questions
In case of diagnostic questions three study designs are available. All three have a baseline GRADEing of High.
- cross-sectional (cohort type accuracy study)
- case-control type accuracy study
- cohort & case-control type studies

The study design can be selected by:

- clicking on a cell in the Certainty of the Evidence (GRADE) column in case of Layer one - SoF and Layer two - SoF diagnostic tables

5. Risk of bias
You should assess if the studies had limitations in design or execution that were serious enough to downgrade the quality of evidence for this outcome.

To rate study limitations:
- If you think any limitations were negligible, choose not serious
- If you think there were serious limitations, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level
- If you think there were very serious limitations, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels
If you downgrade to serious or very serious, you will be required to provide an explanation.
You may seek further guidance in the GRADE Guidelines paper on the Risk of Bias assessment or in the GRADEbook.
To rate the risk of bias, you should:
- click on the cell in the Risk of bias column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

- click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

5.1. Risk of bias assessment tool
For the Risk of bias criterion, an additional tool is available based on Cochrane's RoB 2.
It provides a detailed discussion of study-level assessments of the risk of bias in the context of a Cochrane review. It proposes an approach to assessing the risk of bias for an outcome across studies as low risk of bias, unclear risk of bias and high risk of bias. These assessments may be used directly to inform the assessment of study limitations in the GRADE approach.
The tool can be accessed through the same menu as the Risk of bias itself.

Restriction: For the table of references to be displayed, the references need to be attached to the Number of studies cell in the evidence table. Learn how to attach the references. 
Within the table, you can assess each study for each criterion on a scale of low risk of bias, unclear risk of bias and high risk of bias.

You can then assess the number of studies at particular levels of risk of bias.

Based on this assessment, you can mark the final level of the risk of bias for this outcome.

If you downgrade to serious or very serious, you will be required to provide an explanation.
6. Inconsistency
You should assess if the results were consistent across studies and if any inconsistency may have been serious enough to downgrade the quality of evidence for this outcome.
To rate inconsistency:
- If you think any inconsistency was negligible, choose not serious
- If you think there was serious inconsistency, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level.
- If you think there was very serious inconsistency, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels.
If you downgrade to serious or very serious, you will be required to provide an explanation.
You may seek further guidance in the GRADE Guidelines paper on the inconsistency assessment or in GRADEbook.
To rate the inconsistency, you should
- click on the cell in the Inconsistency column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

- click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

7. Indirectness
You should assess if the evidence directly answers the health care question you have asked and if any indirectness of available evidence may have been serious enough to downgrade the quality of evidence for this outcome.

To rate indirectness:
- If you think the evidence is direct, choose not serious
- If you have serious doubts about directness, choose serious -> this will downgrade the evidence for this outcome by 1 level
- If you have very serious doubts about directness choose very serious -> this will downgrade the evidence for this outcome by 2 levels
If you downgrade to serious or very serious, you will be required to provide an explanation.
You may seek further guidance in the GRADE Guidelines paper on the indirectness assessment or in the GRADEbook.
To rate the indirectness, you should
- click on the cell in the Indirectness column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

- click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

7.1. Indirectness assessment tool
An additional tool for assessing indirectness is available in GRADEpro.

In a table, you summarise details of the PICO question and the outcome itself. Then you can mark the answers to a question Is the evidence sufficiently direct? for each detail.

Having provided all the answers, you can determine the final indirectness judgement more easily.
8. Imprecision
You should assess if the results are precise enough and if any imprecision of the results may have been serious enough to downgrade the quality of evidence for this outcome. Imprecision is defined differently for authors of systematic reviews and for guideline panels.

To rate imprecision:
- If you think the results were precise, choose not serious
- If there was serious imprecision, choose serious -> this will downgrade the quality of evidence for this outcome by 1 level
- If there was very serious imprecision, choose very serious -> this will downgrade the quality of evidence for this outcome by 2 levels
- If there was extremely serious imprecision, choose extremely serious -> this will downgrade the quality of evidence for this outcome by 3 levels
If you downgrade to serious, very serious or extremely serious, you will be required to provide an explanation.
You may seek further guidance in the GRADE Guidelines paper on the imprecision assessment or in the GRADEbook.
To rate the indirectness, you should:
- click on the cell in the Indirectness column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

- click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

9. Other considerations
Other considerations is a collective term for additional factors that should be taken into account while assessing the certainty of evidence. They include one factor that can possibly lower the overall quality of evidence - publication bias - as well as three factors that can possibly increase the quality of evidence:
Restriction: The factors that increase the quality of evidence should only be used for non-randomised studies which are not otherwise downgraded through the other factors and which were not graded using the ROBINS-I approach.
To rate the Other considerations factors, you should:
- click on the cell in the Other considerations column in case of the GRADE evidence profile, GRADE profile (v2), Layer one and Layer two tables

- click on the cell in the Certainty/Certainty of the Evidence (GRADE) column in case of the Summary of Findings table, Summary of Findings table (v2), Summary of Findings table (v3), Summary of Findings (v4), Layer one - SoF and Layer two - SoF

9.1. Publication bias
Publication bias is a systematic underestimation or an overestimation of the underlying beneficial or harmful effect due to the selective publication of studies. Confidence in the combined estimates of effects from a systematic review can be reduced when publication bias is suspected, even when the included studies themselves have a low risk of bias.

You should assess if there is a probability of publication bias and if reporting bias may have been serious enough to downgrade the quality of evidence for this outcome.
To rate the probability of the publication bias:
- If you think there is no evidence of publication bias, choose undetected
- If there is a high probability of publication bias, choose strongly suspected -> this will downgrade the certainty of the evidence for this outcome by 1 level.
If you downgrade to strongly suspected, you will be required to provide an explanation.
You may seek further guidance in the GRADE Guidelines paper on publication bias or in GRADEbook.
9.2. Large magnitude of an effect
When the body of evidence from non-randomised studies not downgraded for any of the 5 factors yields large or very large estimates of the magnitude of an intervention effect, then we may be more confident about the results. In those situations, even though non-randomised studies are likely to provide an overestimate of the true effect, the study design that is more prone to bias is unlikely to explain all of the apparent benefit (or harm). Decisions to rate up the quality of evidence because of large or very large effects should consider not only the point estimate but also the precision (width of the CI) around that effect: one should rarely and very cautiously rate up the quality of evidence because of apparent large effects, if the CI overlaps substantially with effects smaller than the chosen threshold of clinical importance.

You should assess if the effect was large or very large and, if so, upgrade the quality of evidence accordingly for this outcome.
To rate the magnitude of the effect:
- If the effect was not large, choose no
- If the effect was large, choose large -> this will upgrade the quality of evidence for this outcome by 1 level
- If the effect was very large, choose very large -> this will upgrade the quality of evidence for this outcome by 2 levels
You should not upgrade for large effect if any of the below criteria are met:
- you set the study design as a randomised trial
- you used ROBINS-I for GRADEing
- you downgraded for any other criteria
You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.
9.3 Effect of plausible residual confounding
On occasion, all plausible residual confounding from non-randomised studies may be working to reduce the demonstrated effect or increase the effect if no effect was observed.

You should assess if the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect. In either of these two cases, upgrade the quality of evidence for this outcome.
To rate the effect of all plausible residual confounding:
- If there is no evidence that the influence of all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results show no effect choose no
- If there is evidence that the influence of all plausible confounding would reduce a demonstrated effect, choose would reduce a demonstrated effect -> this will upgrade the quality of evidence for this outcome by 1 level
- If there is evidence that the influence of all plausible confounding would suggest a spurious effect when results show no effect choose would suggest spurious effect -> this will upgrade the quality of evidence for this outcome by 1 level
You should not upgrade for the effect of plausible residual confounding if any of the below criteria are met:
- you set the study design as a randomised trial
- you used ROBINS-I for GRADEing
- you downgraded for any other criteria
You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.
9.4. Dose-response gradient
The presence of a dose-response gradient has long been recognized as an important criterion for believing in a putative cause-effect relationship. The presence of a dose-response gradient may increase our confidence in the findings of non-randomised studies and thereby increase the quality of evidence.

You should assess if there was a dose-response gradient only in non-randomised studies, not downgraded for any reason. If a dose-response gradient was present, upgrade the quality of evidence for this outcome.
To rate the presence of dose-response gradient:
- If there is no evidence of a dose-response gradient, choose no
- If there is evidence of dose-response gradient, choose yes -> this will upgrade the quality of evidence for this outcome by 1 level
You should not upgrade for dose-response gradient if any of the below criteria are met:
- you set the study design as a randomised trial
- you used ROBINS-I for GRADEing
- you downgraded for any other criteria
You may seek further guidance in the GRADE Guidelines paper on rating up the quality of evidence or in the GRADEbook.
10. ROBINS-I
ROBINS-I (Risk Of Bias In Non-randomised Studies - of Interventions) is a tool used to assess the risk of bias in non-randomised studies of interventions.

For outcomes based on noon-randomised studies, it is possible to enable ROBINS-I assessment by clicking the checkbox, as presented in the illustration above.
This will change the certainty of evidence calculation in the evidence table and options available in risk of bias assessment to the one used in the ROBINS-I method.
GRADE method calculates certainty of evidence in non-randomised studies, starting from Low (Which represents lack of certainty in non-randomised studies) certainty and downgrading to Very low with any other serious and very serious risk of bias/inconsistency/indirectness/imprecision or publication bias assessment.
ROBINS-I option modifies this method. Certainty of evidence of non-randomised studies starts from High and is downgraded by 1 level (High/Moderate/Low/Very low) each time risk of bias assessment is lowered. With ROBINS-I enabled, an additional level of downgrading for the risk of bias - extremely serious - becomes available.
You can learn more about ROBINS-I Tool on the official website of the tool, in the Cochrane Handbook, and in the GRADE Guidance article regarding the ROBINS-I tool.
RELATED ARTICLES
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article