Tumblr Summer Intern: Walter Menendez
This summer I got to join Tumblr as a search team intern. This was a dream come true: I spend a lot of my day on Tumblr sharing content with others and generally indulging in the specific Tumblr lingo and lifestyle, so to do it all day was just too unreal. (Fun fact: I got hired through Tumblr, thanks to a post on this very same blog!) I didn’t expect to like New York as much as I did and I’m so heartbroken that I have to leave Tumblr and go back to school.
The search team is a really cool team to work on because we do a lot of really critical, quantitative thinking about the highly subjective, qualitative content on Tumblr. My projects definitely required a thorough understanding of Tumblr’s users and their idiosyncrasies in order for our data to make any sense. It was cool to just go into our databases, scrape some data and do all kinds of things with it. It definitely opened up my view of Tumblr as a community and, well, I also found a ton of stuff to follow/reblog while doing so.
I go to MIT, where a lot of my research background is in data visualization and large data sets, so one of my first projects was to work on a data visualization for our trending post stream. The search team had been working on trending content for quite some time now, focusing on the three core forms of content on the site: blogs, tags, and posts. Blogs and tags had a decent amount of front end work but trending posts not so much. We basically had a list of post URLs that were deemed trending based on our metrics, but we had no idea what they looked like, nor did we really have any intuition on to what extent they were trending. To fix that, I built a D3.js based visualization, while grabbing posts from our PHP framework. There was a fair amount of tech involved! From processing JSONs to animating images, the visualization took a fair bit of work. At the moment, it’s more of an internal thing but hopefully, users all over Tumblr will be able to see it as well.
My second project was much more data intensive and focused on search traffic analysis. Tumblr had rolled out trending tags to mobile during my time here, which was a great way to discover more content on Tumblr. However, those metrics and algorithms were only based on post creation. Since more people consume content than they do create it, there was a lot of interesting data lying in tags and their search counts. I started scraping data from our page logs and effectively did a very high level count over hundreds of thousands of tags. Afterwards, I would save the data into Redis and would then compute some statistics and ultimately rank the tags based on how trending they were.
This analysis was a really cool project because every morning as I ran my scripts to collect and process the tag data, I got to see the progression of current events all of a sudden going from single digit counts to thousands of hits. It was also a great way to stay up to date with news as I would often Google a tag that I had no idea what it was referring to. Another cool part was comparing my ranking to our current rankings in production, and seeing just how aligned and different we were.
My time at Tumblr definitely couldn’t have been possible had it not been for the staff here. They’re all so knowledgable and approachable and hilarious, so while my roommates would groan about going to “work”, I would skip on my merry way to a blogger’s haven. Plus, we have dogs. Who wouldn’t rush to the office to see those?