I intend to use this blog as a place to discuss my work (or fun) in progress. I will sometimes digress into observations about language as well — I am a recidivist linguist after all.

It's not you

The other day I was at the meat counter in the grocery store, and I was curious about something I hadn't seen before, so I asked the butcher, "How do you cook that?" She replied "How do I cook it?"

Not just men and walruses

In doing some research for the Coming into Focus project I came across this sentence in a newspaper article:

Miss Sue Dorris and Dr Alice [sic] M. Smith offer the finest proof of the feasibility of two bachelor women [emphasis added — CuC] successfully maintaining a home for themselves together and without friction in the matter of building the morning fires.
Eugene, Oregon Morning Register, March 2, 1915, p. 4

Word embeddings with small corpora

One of the things I'm interested in is how techniques that work in one context might work in other contexts, and what we can learn about those techniques when we go beyond their typical applications.

Word embeddings, aka word vectors, are typically used with large corpora, such as Wikipedia or Common Crawl web pages or massive numbers of tweets. One paper said something to the effect of "As long as your corpora have 100 million words, this technique will work."

But what if your corpus doesn't have 100 million words? What if you are interested in how an author uses words in just one book?

