-
As the new data guy at Tumblr, my first project is to take a look at algorithms we use to find and suggest blogs that a given user might be interested in. This graph is a simple visual sample of my initial research.
Another engineer graciously volunteered to let me peek at the list of blogs he follows, from which I gathered a list of all the blogs they follow. From those two lists, I was able to create a large matrix with a row for each blog and a column for each person that he or she follows. Using a fairly simple SVD recommender, we are able to see a few distinct blog clusters (the axes here are the first three principal components).
The red dots are the blogs our guinea pig engineer follows (first degree), and the blue are the blogs his followers follow (second degree). We performed a few spot tests to make sure that the groups made sense, and sure enough they do. Up in the top left are some Tumblr staff blogs (including the official Staff Blog and David’s Log). The cluster on the far right, meanwhile, are a lot of “funny things I found on the internet”-style blogs. This engineer only follows one blog in the heart of that cloud, but you can see that the other followers of that blog are very cliquey (that is, they all follow each other).

