Incidental Language Acquisition Update 7

My last update summarized the summer portion of my research pretty well.

After finishing the novels, the main question was how to format them, and in what format to distribute them to participants. I believe I will distribute them as physical books, to decrease the likelihood of participants looking up translations, or “skimming” them to get through them faster. The vocabulary has also been selected, and the test will be done online, though I plan to be present with the participants while they take the test, to ensure there is no “cheating.”

In formatting my book, I learned a lot about copyright and public domain. I do have rights to the work, considering I altered it substantially, but the only reason I can use the material of another author in this way is because it is public domain – Lewis Carroll died over 70 years ago. I also learned that self-publishing is remarkably easy. Anyone can self-publish, but if a writer wants to be successful, they really need to go through a big publishing company. Perhaps after the experiment is run, I can take this to a publishing company and inquire about the official publishing process.

This led me to wonder how viable this kind of work would be on a large scale – applied to multiple books, perhaps as a series. Hypothetically, how many novels could I translate in this way, if just one took me an entire summer? Obviously, parts of the research would need to be better computerized. I spent a large portion of the summer figuring out how to code programs to find and isolate certain words and parts of speech. Now that I have that code, doing this again would take less time. I also spent a long time correcting mistakes that I found in my free PDF. I would need more advanced computer skills to create a program that could automatically detect and correct mistakes.

So with fine tuning and more advanced coding, part of speech tagging and frequency analysis could be done pretty easily.

The most time-intensive part of the research was figuring out which words could convey meaning through context alone. I don’t think this could be computerized, and would need to be done by a human researcher, at least to an extent. There are so many different ways to make a word clear through context – idiomatic language, illustrations, the use of the word in a set phrase, the use of the word in the native language nearby – that a computer program would need to be incredibly complex to detect this.

For me, it was also incredibly time consuming to translate the selected words to Spanish. This is because I am not fluent in Spanish and spent a lot of time consulting grammar guides. The bulk of this translation work could probably be done by a computer program, and then edited by a native speaker.

Finally, I spent a lot more time than I’d care to admit formatting and proofreading the novels. I think this is an inevitable part of the publishing process – whether one is publishing a traditional book or an experimental one like this. It’s a time suck, but a necessary one.

So – is this project replicable? Probably. Is it viable to do again? Probably. Is it useful? That remains to be seen!