How exactly does religion inform decision making? Is a causal interpretation possible? This week, I begin to program into Stata with the Pew Research data.
On Tuesday, I had to do some programming in Stata to calculate the religious density: to do so, I collapsed the data for the number of people of a religion in a state. I had to review collapsing and merging datasets through some videos on Youtube. I made some errors at first, troubleshooting to get State to match each respondents by their state and religious category. I generated the new variable, religious density, using the religious tradition variable in the data set.
My variables of interest:
- Religious attendance
- Religious density by state
- Race
- Sex
- Age
- Income
- Education level
Some additional variables that I thought would be cool to look at:
- Importance of religion
- Official membership in a religious organization
- Feelings of spiritual peace
- Prayer groups
- Frequency of prayer
I also had to figure out how I wanted interpret the attendance variable (ATTEND). Gruber interpreted attendance on a log scale, “since each unit is roughly twice the previous unit in time terms.” There are a couple of ways I can do this: I can use an ordered logistic regression, because this is survey data and the order of responses does matter, or I can do a regular regression. This just affects the way I choose to interpret my results.
Professor Parman also suggested that I looked at the data visually and see if I can start to see a relationship there. In the first graph, a histogram shows the frequency of attendance. The second image is a scatter plot looking at the relationship between attendance and religious density. (The higher the number for religious attendance, the less one attends religious services.)
A couple of points to note: I forgot to remove the missing data. The response ‘9’ means that there was no response/did not know how to answer.
Looking at these graphs, I started to think that maybe there isn’t really a relationship in this data. Attendance can vary quite widely across the state – if only I had data that can pinpoint per area! These results were going to be part of my conversation with Professor Parman.
Parman recommended that I use bin scatter plots to look at my results visually. A bin scatter would show the average across things on x value, with one data point for each bin. This scatter would show how that average relationship is changing as density increases.
I also need to exclude the missing data when making a graph or running regression. The missing data seems to be random, so it should not be a problem to exclude.
Recent Comments