Economics of Faith, Race, and Community: Programming (June 21-28)

How exactly does religion inform decision making? Is a causal interpretation possible? This week, I begin to program into Stata with the Pew Research data.

On Tuesday, I had to do some programming in Stata to calculate the religious density: to do so, I collapsed the data for the number of people of a religion in a state. I had to review collapsing and merging datasets through some videos on Youtube. I made some errors at first, troubleshooting to get State to match each respondents by their state and religious category. I generated the new variable, religious density, using the religious tradition variable in the data set.

My variables of interest:

  • Religious attendance
  • Religious density by state
  • Race
  • Sex
  • Age
  • Income
  • Education level

Some additional variables that I thought would be cool to look at:

  • Importance of religion
  • Official membership in a religious organization
  • Feelings of spiritual peace
  • Prayer groups
  • Frequency of prayer

I also had to figure out how I wanted interpret the attendance variable (ATTEND). Gruber interpreted attendance on a log scale, “since each unit is roughly twice the previous unit in time terms.” There are a couple of ways I can do this: I can use an ordered logistic regression, because this is survey data and the order of responses does matter, or I can do a regular regression. This just affects the way I choose to interpret my results.

Professor Parman also suggested that I looked at the data visually and see if I can start to see a relationship there. In the first graph, a histogram shows the frequency of attendance. The second image is a scatter plot looking at the relationship between attendance and religious density. (The higher the number for religious attendance, the less one attends religious services.)

image1 image2

A couple of points to note: I forgot to remove the missing data. The response ‘9’ means that there was no response/did not know how to answer.

Looking at these graphs, I started to think that maybe there isn’t really a relationship in this data. Attendance can vary quite widely across the state – if only I had data that can pinpoint per area! These results were going to be part of my conversation with Professor Parman.

Parman recommended that I use bin scatter plots to look at my results visually. A bin scatter would show the average across things on x value, with one data point for each bin. This scatter would show how that average relationship is changing as density increases.

I also need to exclude the missing data when making a graph or running regression. The missing data seems to be random, so it should not be a problem to exclude.