Wizard Overview

Data Requirements

This section provides detailed information on data requirements for the PPA Wizard. For a quick reference, see the PPA Data Cheat Sheet.



The PPA Wizard accepts .csv or .dta files as inputs:


    • DTA (STATA file): The .dta extension represents a STATA file. It was essential for the PPA Wizard to handle large files (larger than can be saved as .csv), and the STATA file format was chosen for this purpose. You do not need to know STATA or have STATA on your machine to work with .dta files in the PPA Wizard.
    • CSV (with UTF8 encoding): Any .csv files used in the wizard must have UTF8 encoding. You can easily save .csv files with UTF8 encoding by opening the file in Excel. Simply select “save as” and then “.csv with UTF8 encoding.”

Data must be in raw, tabular format:

    • Each row represents one person (usually a survey respondent) or one health facility
    • Each column represents a variable associated with the person or health facility


Data Source Columns

Your data sources must contain specific variables for you to successfully create a PPA in the wizard. These variables will be represented in your data sources as columns. The PPA Data Cheat Sheet summarizes the required columns. It does not matter what the column titles in your data sources are, e.g. it is okay (and, realistically, expected) that column titles in your data sources will be different from “Facility Type” or “Health Sector.” The presence of this data is the only requirement.


Number of Data Sources

At minimum, you will need two data sources: one capturing people and where they sought care for illness and a second capturing health facilities and the services they offer. Only one care seeking data source may be used in a given PPA. However, it is common for multiple data sources to be used for health facilities. Often one data source will be used for the Health Facility Master List and other data sources will be used for TB Services Coverage.


Data Cleaning Tips

While not a hard technical requirement of the PPA Wizard, starting with clean datasets will make your work more efficient and your experience more enjoyable! It is particularly beneficial to have clean datasets when working as a team, as it will lead to improved communication between team members. Messy datasets not only cost you extra time when working with them in the wizard; they may cause unnecessary confusion as you try to communicate your decisions on PPA inputs to other members of your team. Specific data cleaning examples area addressed in the FAQs section.


