Bias in Machine Learning Through Natural Language: Research Summary

This was certainly the most complex and multifaceted project that I’ve ever worked on. From my initial foray into keras and machine learning, watching tutorials and reading articles, to being able to code and use a model, to reading about how human bias plays into big data and algorithms, to finally being able to read and (mostly) understand a research paper about natural language generation; this project has taught me so much about myself as a researcher and student, along with a number of things about machine learning and computer science as a whole. Here are a just a few of the things I learned this summer, in no particular order:

  • how to create a text generation model in keras
  • how to create a sentiment analysis model in keras
  • how to adjust the parameters of various layers and improve the architecture of a model in keras
  • what machine learning is and some of its different forms
  • the basic process for how a neural network operates
  • how to preprocess text for a neural network to read
  • what dense, LSTM, and CNN layers are
  • what dropout and word embeddings are
  • how to create a virtual environment
  • how to highlight and manipulate text in emacs
  • how to run tensorboard metrics remotely (i.e. measure model metrics running on another computer but display them on my computer)
  • what a transformer is and how it can function like a LSTM network
  • what a GAN is and how it can improve two models at once
  • how to write basic html code
  • how human biases can impact algorithms
  • what sort of algorithms can become dangerous and why
  • how overfitting can affect a model
  • what the algorithmic justice league is and how advocacy can help counter bias in machine learning
  • dimension reduction through t-SNE so that you can visualize high-dimensional data in a 2D or 3D space (like word embeddings)

 

After all has been said and done, I feel pretty happy with how my project turned out. I’ve got somewhat of a deliverable with my two models and their corresponding programs, along with a pretty in-depth understanding and basic write-up of the reading that I did. I’m also proud of my blog, which has chronicled my journey and is super fun to look back on and see how much I’ve learned over the summer. I’ve also got a lot of awesome new knowledge, which I can’t wait to put to further use. The main thing I wish I could’ve gotten around to is figuring out ways to counter bias in my models, which is technically what I set out to do. I severely underestimated the complexity of keras and didn’t realize that just learning the code wouldn’t be enough for me, I had to understand what each line’s purpose served so that I could make real adjustments (as opposed to randomly changing variables). This required me to have a pretty thorough knowledge of at least the basics of machine learning, which took up way more time than expected. To begin countering biases in complex algorithms, one must first understand the algorithms and how they’re made, then figure out where the biases lie (and what can even be considered a bias), and then finally, how to balance them out or remove them entirely. I feel like I’ve made significant progress on the first and second steps (maybe less so on the first), but could use a lot more time to fully understand the third step and the first step. This is the kind of work and research that takes years to understand, so I’m happy that I know far more than I did when I started. I’m obviously infatuated with this area of computer science and would love to continue researching it. I plan on taking an intro to machine learning course this fall semester, which will cover some of the math concepts that my knowledge base is missing, and hopefully improve my understanding of this subject so I can conduct further and more extensive research in the future. I know that in the meantime, I will be doing my part to educate others on machine learning, the injustices of big data, and the ways human bias can play into our machines, which in turn will play into our lives. I will also be advocating for more data transparency from big companies, as that is one of the key components of these “Weapons of Math Destruction.” I’ve had an incredible learning experience, and I’m looking forward to continuing it for a long while.

Read my blog here, and check out the code for my project on my github.