3.5. Clean and Collate data

There are 3 categories of data in the PCF dashboard:

  1. Core data: This is evidence critical to the data analyis and is rountinely collected as part of the NTP data collection processes, and thus readily available in most countries.
  2. Supplementary Data: This is data that is important to the data analysis, as it expands on the evidence of the core data. It is typically not routinely collected as part of the NTP data collection processes, and thus may not be readily available. These missing data should inform revisions to the data collection tools.
  3. Optional Data:This is data that provides extra context and nuance to country issues and identified gaps but is not critical to the data analysis. It also may not be routinely collected as part of the NTP data collection processes, and thus may not be readily available. These missing data may inform revisions to the data collection tools.

Data types and Data sources

Data is extracted from a myriad of global and local data sources to provide a body of evidence that is specific to a country.

The data source is as important as the data it provides. Global data sources such as published peer-reviewed manuscripts, surveys, periodic reports e.t.c, are considered to be validated based ont he rigorous review process it undergoes before it is published. These types of data should serve as the core data sources, in addition to data extracted from national surveillnace systems (these are typically validated during routine data validation processes in the country) .

Data differences/inconsistency

In addition to data validity and data reliability, data consistency is also critical, and these are seen mostly in-country surveillance data. When data discrepancies between two sources occur, for example between two studies or analyses, consensus should be reached through adequate expert consultation. As a general guide, give priority to those with the national representation of study samples (for national-level data consolidation) or to those subnational-specific studies (for subnational level data consolidation) or to those with more recent data, provided the two studies apply a sound study methodology. The areas with the most significant discrepancies should be flagged as priorities for further evidence generation in the new NSP. To identify where differences in data or inconsistencies exist:

  • Compare data values across years and identify any outliers. Outliers may be data values over or under a set threshold, e.g., the data value is 70% higher or lower than the previous or next year. Further investigation should include validation from the original data sources)
  • Compare the trend across strata, where divergent trends are noticed, there should be a reason for this.

Where  considerable difference in the data across years is observed, it is important to identify possible reasons to explain the difference.

Common causes of difference/inconsistency in data

Suggested Solutions

Ongoing or recently implemented interventions that impact the health systems The effect of these interventions should be taken into account when evaluating the data
Changes to indicator definitions or different definitions and/or calculation of an indicator Highlight by which year the definition or calculation changed to explain why a different trend is observed
Significant impact on the TB notification and treatment outcome trends due to emergency health challenges e.g., COVID-19 Highlight by when healthcare services were disrupted
Seasonal variations in case notification Compare data from the same season, e.g. Quarter 1 2021 data with Quarter 1 2020 data.
New diagnostics recently introduced known to affect case notification trends. For example, the introduction of Xpert/MTB tests as initial test, can lead to a surge in bacteriologically confirmed TB patients. Include laboratory indicators when evaluating surveillance data and consider stratified analyses for facilities with and without a specific diagnostic capacity

Incomplete or missing data

Incomplete or missing data includes unavailable data for a specific period (e.g. year or quarter) or specific subnational level, and data not collected for a defined indicator. Where data is not collected for a core indicator, this should be considered an opportunity for strengthening routine surveillance systems or for initiating operational research to obtain the missing data. This should be budgeted for accordingly in the new NSP.  Where data is unavailable for a specific period, further investigation is required to identify the root cause of this problem, and specific actions are taken to improve the data collection and recording process. It is important that the subnational levels are included in this process.

Frequently encountered challenges and proposed solutions

  • Unavailability of recent data is a major challenge for some core indicators. This particularly applies to data about persons who have not yet entered the health system and data on persons who have completed their treatment. The potential unavailability of data should be discussed during the /plan planning phase, including consequences and possible solutions. Additionally, during the planning and preparation phase. It is recommended to make an overview of data that requires permission to be obtained. This would allow for permission to be requested timely, limiting delays in the PCF process.
  • Lack of familiarity with the data consolidation tools. This challenge pertains to the following key areas: 1) unclarity on the difference between core and optional indicators; 2) lack of understanding why some indicators were needed; 3) unclarity about where specific data needed to be uploaded; 4) difficulties using the dashboard function. These challenges can be addressed through adequate orientation during planning and preparation and regular communication amongst country teams and consultants.
  • Conflicting information from various data sources. For certain indicators, data can be abundantly available from different sources some of which may conflict with each other. The data consolidation team should decide on the most reliable data source based on guidance from the country data manager and/or NTP coordinator.
  • Lack of dedicated teams for data consolidation. Since most NTP team members multitask, it can be difficult to find a team devoted to complete data consolidation. Boosting the data consolidation team capacity through ad hoc staffing is one solution until full capacity development takes effect.


Data cleaning is essential to the quality of the data used during data consolidation. It can be done using any statistical software package (STAT), or Microsoft Excel.

Subnational data consolidation

Consolidated data is used to inform the four planning steps (1. Problem Prioritization, 2. Root Cause Analysis, 3. Strategic Intervention identification, and 4. Strategic Intervention optimization) throughout the NSP development process. Preferably, not only national but subnational consolidated data should be generated. This will lead to the identification and prioritization of programmatic gaps and strategic interventions at the national level which takes into account sub-national level priorities. However, if sub-national data consolidation is not feasible due to certain limitations, (for example, human resource, time, and/ or budget constraints); subnational validation could be achieved by including relevant sub-national core indicators into national consolidated data. This could be done through sharing the national consolidated data with sub-national level TB program staff for their reviews and input or by inviting sub-national representatives when reviewing consolidated data and findings.

Last Update: Friday, September 2, 2022  

Wednesday, August 18, 2021 147 Maya Van Tol  Section 3. Data Consolidation
Total 0 Votes:

Tell us how can we improve this post?

+ = Verify Human or Spambot ?

Add A Knowledge Base Question !

You will receive an email when your question will be answered.

+ = Verify Human or Spambot ?

Click here to set sidebar widgets.

Click to copy link of this header Menu

Add A Knowledge Base Question !

You will receive an email when your question will be answered.

+ = Verify Human or Spambot ?