Data Cart

Your data extract

0 variables
0 samples
View Cart

Frequently Asked Questions (FAQ)

General information about the project
    What is IPUMS PMA (IPUMS Performance Monitoring and Accountability 2020)?
    What's in the future for IPUMS PMA?
    How does the PMA2020 project differ from the Demographic and Health Surveys?
    Why might estimates of some key indicators from PMA2020 data differ from estimates from the DHS?
    How do I get access to IPUMS PMA data?
    How do I request access to additional countries after I've already registered?
    Where can I send questions?
Getting started
    Where should a new user start?
Basic concepts
    What are microdata?
    What are "integrated variables"?
    What are "weights"?
    What does "universe" mean in the variable descriptions?
Getting data
    How do I obtain data?
    What format are the data in?
    What is the best way to use the extract system?
    How long does a data extract take?
    How does "sample selection" work on the IPUMS PMA web site?
    What does "Add to cart" mean?
    Why can't I open the data file?
    Is there a preferred statistical package for using IPUMS PMA?
    Can I get the original data?
Using IPUMS PMA data
    Can I use the household and female files together with the service delivery point files?
    Are there tricky aspects of IPUMS PMA data to be particularly aware of?
    What are the major limitations of the data?
    Can I find particular individuals in the IPUMS PMA data?
    How can I cite IPUMS PMA?
Using the variables page
    Variables page menu
    Variables page details
Using the data extract system
    Your data cart
    Why are some variables in my data cart preselected?
    What is "Type"?
    Extract request page
    Extract option: Describe your extract
    Extract option: Change data format


General information about the project

What is IPUMS PMA (IPUMS Performance Monitoring and Accountability 2020)? [top]

IPUMS PMA is a project designed to help researchers conduct comparative analyses of the Performance Monitoring and Accountability 2020 (PMA2020) project, funded by the Bill & Melinda Gates Foundation and administered by a research team at Johns Hopkins University. The data come from (mostly) nationally-representative household, female, and service delivery point surveys carried out in 11 countries pledging to the Family Planning (FP2020) effort, a global partnership supporting women's right to freely decide their own childbearing. The first round of survey data was collected in 2013. PMA2020 collects data in each participating country at least annually, sometimes with more than one round per year. Enumerators use mobile devices to enter survey responses. High frequency and low cost collection allows PMA2020 to provide researchers with data to monitor key family planning indicators over time. Names and low-level geographic information are not included, to protect the confidentiality of respondents. The IPUMS PMA variables have consistent codes and online documentation to facilitate cross-national and cross-temporal comparisons. This "integration" process is described more fully here.

IPUMS PMA is not a collection of compiled statistics; it is composed of microdata. For material collected using the household and women's questionnaires, each record is a person, with all characteristics numerically coded. Performance Monitoring and Accountability 2020 fields a household survey for sampled households and the household members (including resident women of childbearing age). PMA2020 also administers a separate female questionnaire to women age 15 to 24 years in the household, to collect information on family planning behaviors. Finally, PMA2020 samples family planning service delivery points in the same enumeration areas as the sampled households. Enumerators ask qualified respondents at the service delivery point (such as a clinic or pharmacy) about their provision of family planning products and services.

IPUMS PMA is similar to the IPUMS-DHS data system, which contains harmonized Demographic and Health Surveys data.



What's in the future for IPUMS PMA? [top]

By early 2019, IPUMS PMA will allow users to view the questionnaire text associated with each variable. As new household/female and service delivery point datasets are produced by the PMA2020 team at Johns Hopkins University, IPUMS PMA will harmonize and release the new samples as quickly as possible. Additionally, the creators of PMA2020 data plan to incorporate a longitudinal component to surveys fielded in late 2019.

In 2017, the PMA2020 team modified the initial survey design. Any comparability issues caused by the adjustment are documented on a variable-specific basis.



How does the PMA2020 project differ from the Demographic and Health Surveys? [top]

While the survey design of PMA2020 is purposely similar to the Demographic and Health Surveys (DHS), there are several differences. PMA2020 began collecting data in 2013, and returns to each participating country at least annually. The DHS began fielding surveys in the 1980s, and typically several years pass between surveys in the same country. There is topical overlap, but PMA2020's focus is on family planning, water, and hygiene, whereas the focus of DHS is maternal and child health. PMA2020, which is funded by the Gates Foundation, employs mobile devices and local enumerators to reduce costs and improve efficiency. DHS collects data on births and children in addition to women of childbearing age (15 to 49), and also collects limited information on household members. PMA2020 collects information from women of childbearing age and limited information on household members.

DHS has select samples of data collected on service delivery points (SDP), but the availability is very limited. PMA2020 collects SDP data for almost every year and country for which female and household data are collected.



Why might estimates of some key indicators from PMA2020 data differ from estimates from the DHS? [top]

There are several differences between the PMA2020 data and DHS data that may influence key indicator estimates.

There is natural sampling variability that arises from stratified random sampling in a population. DHS surveys are not conducted every year like PMA2020, so a 2010 DHS sample might be compared to a 2014 PMA sample, and the key indicators within a population could change during that time period. Furthermore, some measurement differences between PMA2020 and DHS might exist. For example, the reference period for a survey question may differ between the two survey series, the wording of a question may change the way a respondent may answer the question in either survey, or the universe of the question may differ. Lastly, for some samples, PMA2020 data are not nationally representative; for example, India 2015 is representative only within the region of Rajasthan.



How do I get access to IPUMS PMA data? [top]

Access to the documentation is freely available without restriction; however, users must register before extracting data from the website. Users request access to data on a country by country basis, and the research description will be evaluated to ensure that use of the data adheres to our terms and conditions.



How do I request access to additional countries after I've already registered? [top]

Once you are registered with IPUMS PMA and have been granted access to particular countries, you may request access to additional countries if your research question has expanded or you have a new research question that warrants access to additional countries. You may apply for access to additional countries' data here.



Where can I send questions? [top]

If you need more information about an IPUMS PMA variable, have problems making or opening a data extract, or have other questions specifically related to IPUMS PMA, you can ask the IPUMS User Support staff, whose contact information is here. If your question relates to the original PMA2020 data collection or to material in the original PMA2020 files that serve as the basis for IPUMS PMA, you can contact the PMA2020 team here.




Getting started

Where should a new user start? [top]

First, you should register a free account with IPUMS PMA to gain access to the data.

Next, the natural starting point is exploring the data selection area, to determine the contents of the IPUMS PMA database. You will be asked to choose either Persons or Service Delivery Points as the unit of analysis. Your choice will determine which survey data you will browse and which variables are displayed. You can change the unit of analysis at any time.

By default, the variables page displays one variable group at a time for all samples in the data series. You can filter the information at any point to include only the samples of interest to you (using the "Select samples" option).

After you select samples, the page will display only variables present in those samples. An "x" indicates the availability of a variable for a particular sample.

You can navigate through variables by topic area, alphabetically, or by keyword search. Clicking on the name of a variable brings you to detailed information about that variable. The first tab shows the unweighted codes and frequencies for the variable. The Description tab briefly states the meaning of the variable. The Comparability tab discusses international and intra-national comparability issues and provides information about comparable variables in IPUMS-DHS. The Universe tab lists who was asked the question or covered by the variable. The Survey Text tab shows the survey text, in English, associated with the variable.

To add a variable to your customized data file, click on the circle to the left of the variable name, in the list of variables ("Add to cart"). You can also click on the "Add to cart" button within the variable description.

The Data Cart in the upper right keeps track of your variable and sample selections. Once you have made some selections, you can click on "View Cart" to review your choices. You must be logged in with your approved user email and password to download a data file. Further instructions for the extraction system are here.




Basic concepts

What are microdata? [top]

Microdata are composed of individual records containing information collected on the unit of analysis: persons (women or household members), households, or service delivery points. Each row in the data file includes coded responses to the questions asked, organized into separate variables.

Microdata stand in contrast to more familiar "summary" or "aggregate" data. Aggregate data are compiled statistics, such as a table of marital status by sex for some locality. IPUMS PMA does not supply such tabular or summary statistics for analysis.

Microdata are inherently flexible. One need not depend on published statistics that compiled the data in a certain way, if at all. Users can generate their own statistics from the data in any manner desired, including individual-level multivariate analyses.



What are "integrated variables"? [top]

Integration -- or "harmonization" -- is the process of making data from different surveys and countries comparable. Some level of consistency is already present in the original PMA2020 data. However, the response categories within these consistently named variables often differ across samples. We recode the relevant variable from each survey into a unified coding scheme.

Because some samples provide more detail than others, a coding scheme that reduced variables down to the lowest common denominator across all samples would inevitably lose important information. As a result, some IPUMS PMA integrated variables use composite coding schemes. The first one or two digits of the code provide information available across all samples, with trailing digits providing detail only intermittently available. All meaningful detail in the original surveys is therefore available to researchers if they need it, but they can confine their attention to the less-detailed digits if they wish. One example of this type of coding scheme is the IPUMS PMA variable FPPROVIDER.

The other component of integration is the variable documentation. The documentation aims to highlight important comparability issues that are not self-evident from the coding structure for the variable. The general comparability discussion emphasizes issues for international comparisons and notes changes across survey rounds.



What are "weights"? [top]

To obtain representative statistics from the samples included in IPUMS PMA, users must make use of weight variables.

1. For variables from the female questionnaire, users should employ the FQWEIGHT variable.

2. For household-level variables such as TOILETTYPE or FLOOR, users should apply the HQWEIGHT weight. For the household variables, all records represent individuals in the household (i.e., all household members, including resident women of childbearing age). Note that households that do not have women of childbearing age are also included in IPUMS PMA.

3. Service delivery point files were not designed to be nationally representative, so there are no weights to apply to these data. Facility data is collected to use in combination with female data to describe the service delivery environment that the women experience.

If you wish to report the proportion of households with some characteristic, limit your analysis to only one person per household, using the variable LINENO (limit to cases where LINENO=1). Limiting your analysis to one person per household ensures that every household is counted just once, rather than including multiple instances for some households.

Users should note that the weights provided with IPUMS PMA do not provide nationally representative counts, only correct proportions. The weights adjust the sample of surveyed people to become representative of the national (or, in some cases, regional) population, but the weighted number of people reported reflects the size of the survey population. IPUMS PMA has created a variable POPWT that provides nationally representative counts for samples that are nationally representative, but not for regionally representative. See the user note on POPWT for more information.



What does "universe" mean in the variable descriptions? [top]

The universe is the population responding to the question or the population covered by the question. In IPUMS PMA, the universe for variables based on the female questionnaire generally includes the women to whom the question was asked. For example, only women who have ever had sex are asked their age at first intercourse. The universe for variables based on the household questionnaire is households/household members covered by a question. Similarly, the universe for variables based on the service delivery point (SDP) questionnaire are those places covered by a question. For example, only facilities offering contraceptive implants are asked about whether implant supplies include anesthetics.

Cases that are outside of the universe for a variable are labeled "NIU (not in universe)" on the codes tab. Differences in a variable's universe across samples are a common data comparability issue.

In some cases, the variable universe specified in IPUMS PMA will not exactly match the frequencies in the variable. For some variables in the original PMA2020 data, missing values were assigned the same codes as NIU cases, and the IPUMS PMA team is not able to disambiguate these cases. Users may find that the variables RESULTFQ and RESULTHQ, which describe why a survey was not completed, are helpful to explain missing values that should be in universe.




Getting data

How do I obtain data? [top]

All IPUMS PMA data are delivered through a data extraction system. Once logged in as a registered user, researchers can select the variables and samples they are interested in, and the system creates a custom-made extract containing only this information. The system will pool data from multiple samples into a single data file; in fact, it was primarily designed for this purpose.

Instructions for downloading and reading the data are available here.

Data are generated on a server. The system sends out an email message to the user when the extract is completed. The user must download the extract and analyze it on their local machine. The extract system is accessed through the Data Cart, which becomes clickable once you have selected variables and samples.



What format are the data in? [top]

By default, IPUMS PMA produces fixed-column ASCII data. Data are rectangular (rather than hierarchical) and mostly numeric (there are a small number of string variables). For the household and female files, there is one record per person with household data on each person's record. There are no separate records representing households. For service delivery point files, each record is a health facility.

In addition to the ASCII data file option, users may specify that they wish to receive their data as an SPSS, SAS, or Stata file or as a CSV (comma delimited) file.

A codebook file is also created with each extract. It records the characteristics of your extract and should be downloaded for record-keeping.

All data files are created in gzip compressed format. You must uncompress the file to analyze it. Most data compression utilities will handle the files.



What is the best way to use the extract system? [top]

The data extraction system is a flexible tool. There is no need to download variables or samples you don't expect to use for your current analysis. The system records every extract you make. You can reload and modify an old extract, dropping or adding variables or samples. Go to the "My Data Extracts" page and click on the "Revise" link.

Some IPUMS PMA variables are preselected for you. They identify the sample (with the SAMPLE variable), in case your extract pools data from multiple sources, as well as the variables used for weighting (FQWEIGHT and HQWEIGHT) to create numbers representative of the sampled population. Other preselected variables, such as HHID and LINENO, uniquely identify records and households. Finally, variables required for variance estimation, such as STRATA, are also preselected for inclusion in your data extract.



How long does a data extract take? [top]

The time needed to make an extract differs depending on the number and size of samples requested and the load on the server. Most extracts will only take a few moments. Sometimes extracts will take longer. The system sends an email when the extract is completed, so there is no need to stay active on the IPUMS PMA site while the extract is being made.



How does "sample selection" work on the IPUMS PMA web site? [top]

When a user first enters the variable documentation system, all samples are displayed by default. Every variable in the system will display on all relevant screens.

Users can filter the information displayed by selecting only the samples of interest to them. Only the variables available in one of the selected samples will appear in the variable lists. The variable descriptions and codes pages will also be filtered to display only the text and columns corresponding to the selected samples. Sample selections can be altered at any time in your session. Selections do not persist beyond the current session.

Users can also choose to filter the cases in an extract.

For the person unit of analysis, the cases in PMA2020 data are female respondents, household members, female non-respondents, and persons who were not interviewed for the household questionnaire. On the samples selection page, users can choose one of the following groups:

1) Female respondents
2) Female respondents and household members
3) Female respondents and female non-respondents
4) All cases

For the service delivery point unit of analysis, the cases in PMA2020 data are service delivery point (SDP) respondents and service delivery point employees who did not complete an interview for the SDP questionnaire. On the samples selection page, users can choose one of the following groups:

1) Service delivery point respondents
2) All cases



What does "Add to cart" mean? [top]

While browsing variables in the documentation system, you can select them to include in a data extract, sending them to your data cart (assuming you have been approved to download data from the sample(s) in question). You can deselect a variable by unchecking its box in the data cart. After you proceed to "create data extract," you can return to the variable list to make more selections.



Why can't I open the data file? [top]

The data produced by the extract system are gzipped (the file has a .gz extension). You must use a data compression utility to uncompress the file before you can analyze it.

Detailed instructions for downloading and reading the data are available here.



Is there a preferred statistical package for using IPUMS PMA? [top]

IPUMS PMA supports SPSS, SAS and Stata. Users may also request a comma-delimited (CSV) file and read the data into R or Excel for analysis.



Can I get the original data? [top]

The source materials for IPUMS PMA are the Performance Monitoring and Accountability 2020 (PMA2020) household and female files distributed through PMA2020. Researchers go through a similar process to apply for access to the original files or the IPUMS PMA version of the data. PMA2020 variables may differ in their coding schemes and mnemonics from the IPUMS PMA version of the same variable. To apply for access to the original PMA2020 files, go here.




Using IPUMS PMA data

Can I use the household and female files together with the service delivery point files? [top]

Yes, households (and the childbearing age women in those households) and service delivery points are sampled from the same geographic enumeration areas (EAs). The sample selection for the service delivery point files is intended to represent the service delivery environment within each EA. Users can merge service delivery point files to household and female files using the variable EAID. For example, one could calculate the number of facilities providing IUDs in each enumeration area and merge that variable onto the record of each woman in the corresponding EA of the same survey round. This would require using separate data extracts of SDPs and women. Note that a service delivery point might serve more than one EA. The variable series EASERVED1 through EASERVED42 list all the EAs the facility serves. Only public service delivery points have information about the EAs the facility serves beyond the EA where the facility is located. PMA2020 received this information from the national or local governments.

For more information about how to use the household and female files with the service delivery point files, see our user note.



Are there tricky aspects of IPUMS PMA data to be particularly aware of? [top]

IPUMS PMA samples are weighted: each individual does not represent the same number of persons in the population. It is important to use the weight variables when performing analyses with these samples. Use the FQWEIGHT variable for IPUMS PMA variables on the female questionnaire; use the HQWEIGHT variable with questions asked on the household questionnaire.

The service delivery point (SDP) data are not meant to be nationally representative. Instead, these SDP samples are meant to describe the health service environment in the geographic areas from which the households and women were sampled. See our Weighting Guide for more information.

For more information on the samples included in IPUMS PMA, see the Sample Descriptions.

Be sure to examine the documentation for the variables you are using. The codes and labels for variable categories do not tell the whole story. In other words, the syntax labels are not enough. There are two things to pay particular attention to. The universe for a variable -- the population answering or covered by the question -- can differ subtly or markedly across samples. Also, read the variable comparability discussions for the samples you are interested in. Important comparability issues should be mentioned there.



What are the major limitations of the data? [top]

IPUMS PMA data are composed entirely of individual person or facility records from a survey. There are no macroeconomic, business, or aggregate statistics. We do not deliver the published statistics from the survey; for aggregate statistics, please consult the key indicators briefs on the Source Documents page for the sample of interest.

Some databases of integrated microdata (such as the IPUMS-International census database) include records for all members of the household. IPUMS PMA currently includes full information only for women of childbearing age, and very limited information (such as age and relationship to the household head) for all household members. Thus, IPUMS PMA is not well-suited to studying some subpopulations, such as the elderly.

Because the data are public-use, measures have been taken to assure confidentiality. Names and other identifying information are suppressed, as is detailed geographic information below the regional level.



Can I find particular individuals in the IPUMS PMA data? [top]

No. A variety of steps have been taken to ensure the confidentiality of the data. Most fundamentally, the samples do not contain names or addresses. The data are only samples, so there is no guarantee any given individual will be in the dataset. Low levels of geography are suppressed.



How can I cite IPUMS PMA? [top]

Our suggested citation can be found here.




Using the variables page

Variables page menu [top]

Use the left side of the menu to browse variables:
Topic: person variables by group
A-Z: integrated variables by first letter of the IPUMS PMA variable name
Search: display only variables that contain specified text in particular fields

Use the buttons and links on the right side of the menu to:
Select Samples: limit the display of variable information to selected samples
Display Options: alter how the variable list is displayed or get help for this page



Variables page details [top]

The Menu
When you first navigate to the extract system, you will be asked to choose your unit of analysis. IPUMS PMA currently provides household and female files, which contain person records, and service delivery point files, which contain facility records. These two types of files are derived from different surveys, so the set of variables are not the same. It is not possible to pool these samples using our extract system. Choose the unit of analysis to browse the appropriate variables of interest. You may, however, merge data across these file types using the EAID variable, once you have downloaded customized data files for different record types.

The variables page allows you to browse harmonized variables while limiting and controlling how the information is displayed.

The left side of the menu is for browsing the variables.

When you select samples, you limit the variable list to display only variables that are available in at least one of those samples. The effect of selecting samples also extends to all the variable descriptions and codes pages you can access through the variable system. Only information relevant to your selected samples will be displayed in any context while you browse the variables. You can change your sample selections at any point.

Selecting samples is a good practice when exploring IPUMS PMA, because the amount of information can be unwieldy. Selecting samples also makes sense if you know you are only interested in a specific country or countries. On the other hand, sometimes you need to see everything to determine what kinds of research are possible using the database.

"Search" lets you specify search terms for specific fields of variable metadata. The system will return a list of variables that include any of the search terms you indicate.

The final choices are "Display options" and "Help." The "Display options" item brings up a screen that offers a number of choices regarding the display of the variable list. Each selection has a default choice.

Use short country codes / Use long country codes
Switch between the 2-letter country abbreviations and longer abbreviations.

View one group / View all groups together
Switch between viewing one variable group at a time and viewing all variable groups on one screen. Unless you have a limited number of samples selected, your browser may be slow to display all groups. The default view is one group at a time.

Show availability detail / Show availability summary
Switch between displaying the full sample-specific availability matrix, and a view that only displays the total number of samples that contain each variable. Both views only display or sum the samples that the user has selected in "Select samples." The default view is the detailed availability information.

View available variables / View all variables
Switch between a view that only displays variables present in one of your selected samples, and a view that displays all variables, even if they are not available in your samples of interest. The default view is to only display available variables.

Samples are displayed oldest to newest / Samples . . . newest to oldest
Display the samples columns indicating variable availability in chronological order or reverse chronological order. The default is oldest to newest.

The Variable List

As you browse the variables, they are displayed in a list containing a number of columns. The variable name links to the variable description, which will include detailed comparability discussions, universes, and survey text.

By default, all samples are displayed. The country abbreviation and sample year identify each sample at the top of every column. Hover over the country code with the mouse to see the full sample name. If a variable is available in a given sample, an "x" is printed in that column.

In the column labeled "Add to cart," each variable has a purple circle with a "+" on the far left. Click these circles to add variables to your data cart. Once you have on clicked them, these icons change to a checked box, indicating that the variable is in your data cart. To remove the variable from your data cart, simply click the checkbox.




Using the data extract system

Your data cart [top]

You cannot create data from the extract system unless you are a registered user approved to download a given sample (or samples). If you are not registered, you must apply for access.

At the top right corner of the variables page is a summary of your data cart. This box displays the number of variables and samples you have selected. Clicking the circle next to a variable places it in your data cart. You can view your data cart at any time by clicking "View Cart." The "View Cart" link only becomes operative when you have selected a variable or sample.

The data cart lists the variables preselected by the extract system as well as any variables you selected while browsing the documentation. As with the variable selection page, you can remove variables from your extract in this step by clicking the checkbox next to the variable in the "Add to cart" column. If you chose a variable but subsequently altered your sample selections in such a way that the variable is no longer available, it is indicated by an "i" icon.

The data cart also includes links to codes pages and sample availability for the variables in your cart.

Buttons are provided to return to the variable list to make more selections or to alter your sample choices. If you return to the variable list, click on "View Cart" again to return to the data cart.

When you are satisfied with your data selections, click "Create data extract" to finalize your extract request.



Why are some variables in my data cart preselected? [top]

Certain variables appear in your data cart even if you did not select them, and they are not included in the constantly updated count of variables in your data cart.

Unless you are absolutely certain you will not need one of these variables, we recommend that you not remove them from your data cart. The preselected variables are needed for weighting, for variance estimation, or to identify the year, country, and/or round of a sample.



What is "Type"? [top]

The "Type" column on the variables selection pages and in your data cart indicates the record type of the variable. Though there are person records and facility records, "P" for "person" will be shown.



Extract request page [top]

When you click "Create data extract" in the Data Cart, you come to the Extract Request page. If you wish, you can simply hit the "Submit" button and create your data extract.

The page summarizes your data extract and provides options for modifying it. A link at the top expands to show the samples you selected. Click the appropriate links to go back to the variable browsing and sample selection pages to alter your choices. You return to the extract request page via the data cart, where you can review the availability matrix for selections and easily drop variables by unchecking them.

When you submit an extract, there will be a short delay, depending on the size of the job. You do not need to wait on our site for the job to be completed. The system will send you an email when your extract is ready.

The definitions of every customized data file will remain on the server indefinitely, but the data files are subject to deletion after three days. However, the screen where you download extracts has a feature that lets you revise old extracts. When you click on "revise," all your selections for that extract will be loaded into the system, after which you can edit or regenerate it. Note, however, that each successive data release can create difficulties for recreating old extracts, because codes might change.



Extract option: Describe your extract [top]

You can describe your extract for future reference. The system will display the description on the page where you download your data extract.



Extract option: Change data format [top]

There is a link across from "Data Format" called "Change". Click the link to request your extract in .csv, .sav, .dta, or .sas7bdat instead of the default .dat file.