Chapter 5 Post-Stratification Weights
If you know the population values of demographics that you wish to weight on, you can create the weights yourself using an approach known as post-stratification raking. There is a user-written program in Stata to allow for the creation of such weights. The function is called ipfweight
.
As an example, we will use the 2014 Massachusetts Exit Poll data. The dataset already has a sampling weight included, to adjust for the stratified cluster sample approach taken. However, because of unequal response rates among different demographic groups, we also need to do additional post-stratification weighting. Here is what we know about the demographic composition of the electorate in 2014 according to voting records:
Characteristic | % of Electorate |
---|---|
Female | 53% |
White | 88% |
Black | 4% |
Hispanic | 5% |
Age 18-29 | 7% |
Age 65 and older | 30% |
We can use this information to produce post-stratification weights for the survey. First, the easiest thing to do is create indicator variables for each category we will weight on. We can do this as follows:
recode gender 2=1 1=0, gen(female)
tab race, gen(racecat)
ren racecat1 white
ren racecat2 black
ren racecat3 hispanic
tab age, gen(agecat)
ren agecat1 age18 29
ren agecat6 age65 over
Now, we can use the ipfweight
command to weight to the population values. To do this, we use the following command:
ipfweight female white black hispanic age18_29 age65_over, gen(weight) val(47 53 12 88 96 4 95 5 93 7 70 30) maxit(25) st(sampleweight)
This command first identifies the variables on which to weight (female white black hispanic age18_29 age65_over
). The gen()
option specifies the name of the weight variable that will be created after this command is executed (I’m just calling it weight
here). The val()
option is necessary. This part of the command specifies the point estimate for each category of the variables specified earlier. So, for example, the first two numbers are 47 and 53, because on the variable female, we are looking for 47% to be 0 (male) and 53% to be 1 (female). The next two numbers are 12 and 88, because we are looking for 12% to take on a 0 for the white variable (indicating non-white) and 88% to take on a value of 1 (indicating white). And so on…
The st()
option is used to indicating the starting weight for each observation. You will often not have a starting weight like this, but here we do because the sample is already weighted to account for the stratified cluster sample (captured by the sampleweight
variable). Finally, maxit
is simply the maximum number of times that Stata will go through the raking process. It probably makes sense to set this at least to 10, but you may want to set it higher when you are weighting on more variables.
Once you execute this command, a new variable called weight
(or whatever you named it) will appear in your dataset. For the exit poll, the weight variable has a mean of 1 (it should always have a mean of 1), a standard deviation of 1.15, and it ranges from .22 to 21.35. Often, pollsters trim their weights to ensure that no single respondent receives too much influence over the point estimates. Only 4 respondents receive a weight in excess of 8, so we might wish to trim the weights at 8, so that nobody receives a weight in excess of 8. To do this, we simply use the , up()
option in the ipfweight command, as so:
ipfweight female white black hispanic age18_29 age65_over, gen(weighttrimmed) val(47 53 12 88 96 4 95 5 93 7 70 30) maxit(25) up(8) st(sampleweight)