IPUMS Global Health Interoperable Variables
Date: January 2025
Introduction
At IPUMS Global Health, many of our users have expressed interest in combining harmonized data from IPUMS DHS, IPUMS MICS, and IPUMS PMA. To make this process easier, we are creating “Global Health” (GH) harmonized variables. These variables, identified with “_GH” at the end of the variable name, share names and codes across the three survey projects. This user note describes:
- Why use IPUMS GH interoperable variables
- What GH variables we have created
- Where to find the GH variables in each of the IPUMS Global Health projects
- How to create a comparable dataset from the downloaded datafiles
- What is coming next
Why use IPUMS GH interoperable variables?
Our goal in the original data collections is to retain all the detail of variable codes while maximizing comparability across samples. These new interoperable Global Health variables instead identify major categories and impose consistent codes and variable names across IPUMS DHS, MICS, and PMA. This consistency allows users to more easily study broad trends or calculate sustainable development goal indicators across time and place.
You will still need to create a separate extract for each IPUMS Global Health data collection. Instructions about gaining access to each harmonized survey database are available through the hyperlinks below.
What GH variables have we created
To date, we have created GH variables for two units of analysis: Women and Households.
Our initial batch represents a pilot of different types of variables that can be created across IPUMS Global Health projects. Variables in this pilot phase were selected based on:
- their representation in multiple IPUMS Global Health projects
- the ease of creating interoperable variables across the projects
- relevance to Sustainable Development Goal indicators
Based on feedback from users, we will continue to add new variables and units of analysis over time.
Technical variables (available for both Women and Households)
These variables uniquely identify samples within each project, while providing consistent labels for variables shared across projects (such as country and year).
Variable Label | GH Variable Name |
---|---|
Country | COUNTRY_GH |
Sample | SAMPLE_GH |
Year of sample | YEAR_GH |
IPUMS Global Health project | PROJECT_GH |
Urban status | URBAN_GH |
Women of Childbearing Age
For most surveys, these variables relate to individual women age 15-49, but check the variable universes to confirm the age range and whether limited to ever-married women.
Variable Label | GH Variable Name |
---|---|
Demographics | |
Age of woman | AGE_GH |
Marital status | MARST_GH |
Age of partner/husband | PARTNERAGE_GH |
Fertility & Family Planning | |
Currently pregnant | PREGNANT_GH |
Number of children ever born | CHEB_GH |
Currently using family planning | FPCURRUSE_GH |
Domestic Violence Attitudes (Justifiable to beat a wife because she…) |
|
Goes out without telling husband | DVAGOOUT_GH |
Neglects the children | DVNEGLECTS_GH |
Argues with husband | DVAARGUES_GH |
Refuses sex with husband | DVAREFUSESEX_GH |
Burns the food | DVABURNFOOD_GH |
Media & Information Technology | |
Woman owns a mobile phone | MOBILEWM_GH |
Frequency of reading newspape | NEWSFREQ_GH |
Frequency of watching television | TVFREQ_GH |
Households
These variables are available for each sampled household.
Variable Label | GH Variable Name |
---|---|
Household Characteristics | |
Material of walls | WALLS_GH |
Material of roof | ROOF_GH |
Type of cooking fuel | COOKFUEL_GH |
Type of toilet | TOILET_GH |
Material of floor | FLOOR_GH |
Household Assets | |
HH has mobile phone | MOBPHONE_GH |
HH has internet | INTERNET_GH |
HH has electricity | ELECTRC_GH |
HH has car | CAR_GH |
HH has radio | RADIO_GH |
HH has television | TV_GH |
HH has personal computer | PC_GH |
HH has bicycle | BIKE_GH |
HH has motorcycle | MOTOCYCLE_GH |
HH has refrigerator | FRIDGE_GH |
HH or someone in the household has a bank account | BANKACC_GH |
Where to find the GH variables in each of the IPUMS Global Health projects
Data within each IPUMS Global Health project are structured in different ways.
For each project, first select the unit of analysis.
Unit of analysis | IPUMS DHS | IPUMS MICS | IPUMS PMA |
---|---|---|---|
Households | Constructed from "Household members" unit | Household characteristics | Constructed from "Person - Family Planning" unit |
Women | Women | Women | Constructed from "Person - Family Planning" unit |
Location of the GH variables within each project
After selecting the unit of analysis mentioned above, the GH variable can be found using the drop down navigation menus on the variable browsing page. The GH variables can be found using the drop-down navigation menus on the variable browsing page. Look for the following headings from the “Topics” menu.
GH variable | IPUMS DHS | IPUMS MICS | IPUMS PMA |
---|---|---|---|
Households | IPUMS Global Health | IPUMS Global Health | Other > Global Health |
Women | IPUMS Global Health | IPUMS Global Health | Other > Global Health |
Other information about how to select samples and browse data
In your extract from each data collection, you may want to add additional information needed for your analysis. For guidance on creating a customized dataset for each project, consult the user guides linked below.
How to create a comparable dataset from the downloaded datafiles
After you create an extract from each project and before you append the datafiles, you will need to perform some additional data manipulation to make the units of analysis comparable.
Additional data manipulation is necessary because the data are derived from different units of analysis. You must recode the data files using the guidelines below before the files can be merged together. The following are the recommended recodes based on the most comparable denominator (least amount of manipulation and recoding). Other ways of merging and combining are possible, given the great flexibility of all the microdata.
Households
The interoperable variables are supported for one observation per household.
IPUMS MICS: The household unit of analysis already is representative of each household being an observation. No further recodes are needed for IPUMS MICS households.
IPUMS DHS: The household member file represents all members of the household. To achieve comparability for IPUMS DHS data, limit observations to only one person per household. After you have downloaded your data extract, keep only cases for which the variable LINENO equals 1 (usually the household head). The following command is written in Stata code.
keep if lineno==1
IPUMS PMA: The person - family planning unit of analysis contains an entire household roster, even for households without women of childbearing age. Select this unit of analysis, and when selecting samples, choose to keep “All Cases” under the “Sample Members” section, and include the variable LINENO. After you have downloaded your data extract, keep only cases for which the variable LINENO equals 1, which retains only one person per household. The following command is written in Stata code.
keep if lineno==1
Women
The interoperable variable apply to each woman of childbearing age (age 15-49 in most samples).
IPUMS MICS and IPUMS DHS data have the same structure. When “Women” is the chosen unit of analysis, each row of data represents one woman of childbearing age. Check the universe statements to determine exact age ranges and any marital status limitations for each sample.
IPUMS PMA: Choose the "Person - Family Planning" unit of analysis. On the samples selection page, choose the option for “Female Respondents” under the Sample Members heading. This selection includes only women of childbearing age who completed the female questionnaire in the data extract, and it is thus comparable to MICS and DHS samples.
Appending data files from different GH projects together
Once you have created and downloaded an extract from each project, you will want to carry out any additional recodes and data cleaning in the individual files before appending them together.
In Stata, use the command “append". You can append multiple files at a time. For example, if you had a file open called “dhs” and you wanted to append the “mics” and “pma” datasets to it, you could use the command:
append using mics pma
In R, download the files you would like to work with into the same directory. Insert the name of that directory below into the first line of the code. This will work even if the files do not have exactly the same columns (i.e., variables).
library(plyr)
#read in the filenames of all files in this folder
file_list <- list.files("[insert filepath here]")
#set the folder from the line above as your working directory
setwd("[insert filepath here]")
#read the files from file_list into R as csvs and create a list called myfiles
myfiles = lapply(file_list, read.csv)
#append all files in the myfiles object into one csv
appended <- do.call('rbind.fill', myfiles)
Either of these approaches will result in a single datafile with information from each of the different IPUMS Global Health data collections.
What’s coming next?
We understand that this current offering of GH variables may not be comprehensive enough. We are planning to expand our Global Health variables not only for the woman and household units of analysis, but also to add variables at the child level.
Do you have a suggestion for us? Please reach out at ipums@umn.edu!