<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>Dispatches from the intrepid tinkerers behind technology at Tumblr.</description><title>Tumblr Engineering</title><generator>Tumblr (3.0; @engineering)</generator><link>http://engineering.tumblr.com/</link><item><title>Tumblr SDK updates</title><description>&lt;p&gt;Alongside yesterday’s &lt;a href="http://staff.tumblr.com/post/48773060295/now-you-can-do-more-than-just-reblog-when-you-find"&gt;Tumblr for iOS 3.3.1&lt;/a&gt; release, I’m happy to also announce a couple of enhancements to the &lt;a href="https://github.com/tumblr/TMTumblrSDK"&gt;Tumblr iOS SDK&lt;/a&gt; (version 1.0.2 is now available in the CocoaPods repository).&lt;/p&gt;

&lt;p&gt;Tumblr for iOS now exposes URL schemes that can be invoked in order to create text, quote, link, and chat posts. Additionally, developers can specify callback URLs that will send the user back to their apps once the Tumblr post is either created or cancelled. These endpoints are now wrapped and exposed by the &lt;a href="https://github.com/tumblr/TMTumblrSDK/blob/master/TMTumblrSDK/AppClient/TMTumblrAppClient.h"&gt;TMTumblrAppClient&lt;/a&gt; class, or the URLs can be accessed directly if you’d prefer (they’re documented &lt;a href="https://github.com/tumblr/TMTumblrSDK#url-schemes"&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Lastly, Tumblr for iOS has been able to create photo and video posts using media shared from other apps for some time now, via Apple’s standard UIDocumentInteractionController. As of 3.3.1, third-party applications can now also specify captions and tags to accompany the photos and videos being shared. The annotations used to do so are documented &lt;a href="https://github.com/tumblr/TMTumblrSDK#uidocumentinteractioncontroller"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you’re using the Tumblr iOS SDK in your app or plan to integrate going forward, please &lt;a href="mailto:bryan@tumblr.com"&gt;get in touch&lt;/a&gt; and let me know if there’s anything I can do to make this even easier.&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/48855322178</link><guid>http://engineering.tumblr.com/post/48855322178</guid><pubDate>Thu, 25 Apr 2013 10:57:39 -0400</pubDate><dc:creator>bryan</dc:creator></item><item><title>My Philosophy On Alerting</title><description>&lt;p&gt;&lt;a class="tumblr_blog" href="http://robewaschuk.tumblr.com/post/48822960728/my-philosophy-on-alerting"&gt;robewaschuk&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I wrote some stuff while I was at Google about writing clean alerts and keeping an oncall rotation sane; after some cleanup they’ve allowed me to make it public.  Of course, this represents my opinions and not Google’s.  They do reflect what we think are best practices at Tumblr, though.  &lt;a href="http://www.jobscore.com/jobs2/tumblr/site-reliability-engineer/a2rv-gIcmr4QdkiGakhP3Q?ref=rss&amp;amp;sid=68"&gt;We&amp;#8217;re hiring Site Reliability Engineers.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Check out &lt;a href="https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit#heading=h.fs3knmjt7fjy"&gt;My Philosophy On Alerting&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When you are auditing or writing alerting rules, consider these things to keep your oncall rotation happier:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Pages should be urgent, important, actionable, and real.&lt;/li&gt;
&lt;li&gt;They should represent either ongoing or imminent problems with your service.&lt;/li&gt;
&lt;li&gt;Err on the side of removing noisy alerts.&lt;/li&gt;
&lt;li&gt;You should almost always be able to classify the problem into one of
&lt;ul&gt;&lt;li&gt;availability &amp;amp; basic functionality&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;correctness (completeness, freshness and durability of data)&lt;/li&gt;
&lt;li&gt;and feature-specific problems.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Symptoms are a better way to capture more problems more comprehensively and robustly with less effort.&lt;/li&gt;
&lt;li&gt;&lt;span&gt;The further up your serving stack you go, the more problems you catch in a single rule. Balance this with being able to distinguish what’s going on.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/48852408813</link><guid>http://engineering.tumblr.com/post/48852408813</guid><pubDate>Thu, 25 Apr 2013 09:52:10 -0400</pubDate><category>sre</category><category>oncall</category><category>tumblr</category><category>monitoring</category><category>notdevops</category><category>engineering</category><dc:creator>robewaschuk</dc:creator></item><item><title>TMCache: fast object caching for iOS &amp; OS X</title><description>&lt;p&gt;A simple &lt;a href="http://en.wikipedia.org/wiki/File:Write_through_with_no-write_allocation.png"&gt;write-through cache&lt;/a&gt; can be a powerful optimization for any system needing fast, frequent access to the same resource (i.e., almost everything ever). It&amp;#8217;s especially true for a limited memory environment like iOS, which is also partially bound by processor and bandwidth constraints. Something as common as downloading and displaying an image touches all these pain points, and that makes it a great place to explore the potential of caching, especially if it&amp;#8217;s something your app does as often as the &lt;a href="http://www.tumblr.com/mobile"&gt;Tumblr mobile apps&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Cocoa itself does a fair bit of caching on its own without intervention, but it has many limitations. In the case of downloading an image using NSURLConnection, the entire server response is stored in a shared NSURLCache, but the inner details are &lt;a href="http://petersteinberger.com/blog/2012/nsurlcache-uses-a-disk-cache-as-of-ios5/"&gt;infamously murky&lt;/a&gt; and entirely dependent on HTTP headers from the server. After you&amp;#8217;ve loaded the image into a UIImageView it&amp;#8217;s one memory warning away from being lost, or even sooner if it gets slated for recycling from a table cell scrolling off screen.&lt;/p&gt;

&lt;p&gt;Today we&amp;#8217;re proud to announce the open sourcing of &lt;strong&gt;&lt;a href="https://github.com/tumblr/TMCache"&gt;TMCache&lt;/a&gt;&lt;/strong&gt;, which is designed for this and any other situation where there&amp;#8217;s a need to persist &amp;#8220;expensive&amp;#8221; objects and access them rapidly. TMCache is an object cache for iOS and OS X, suitable for any object conforming to the &lt;a href="https://developer.apple.com/library/ios/#documentation/Cocoa/Reference/Foundation/Protocols/NSCoding_Protocol/Reference/Reference.html"&gt;NSCoding&lt;/a&gt; protocol (including the basic Foundation data types, collections, and many UIKit objects like UIImage). It consists of two parallel caches, one in memory and one on disk, both coordinated locklessly by GCD. Like &lt;a href="http://nshipster.com/nscache/"&gt;NSCache&lt;/a&gt;, TMCache will automatically remove objects from memory when the app receives a warning or goes into the background. Unlike NSCache, TMCache can do things like transparently restore objects from disk, access objects asynchronously using blocks, and automatically limit the size of the cache by age or size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/tumblr/TMCache"&gt;TMCache&lt;/a&gt;&lt;/strong&gt; is available under the Apache 2.0 license on GitHub or as a &lt;a href="http://cocoapods.org/?q=name%3ATMCache"&gt;CocoaPod&lt;/a&gt;, including full documentation in HTML and docset format. Let us know if you find it useful!&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/48782691160</link><guid>http://engineering.tumblr.com/post/48782691160</guid><pubDate>Wed, 24 Apr 2013 13:06:10 -0400</pubDate><dc:creator>jstn</dc:creator></item><item><title>Open Source - Memcache Top</title><description>&lt;p&gt;&lt;a href="http://tumblr.mobocracy.net/post/48669341489/open-sourcing-memkeys" class="tumblr_blog"&gt;mobocracy&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;We rely on memcache pretty heavily at Tumblr, with over 10TB of cache memory available across the stack. One of the things we’ve historically had a challenging time with at Tumblr is finding hot keys. A hot key is a memcache key getting dramatically more activity than other keys. This can have a significant performance impact on your cache backend.&lt;/p&gt;

&lt;p&gt;We spent the past few days working on a C++ implementation of mctop*, which we’re happy to release today as &lt;a href="https://github.com/tumblr/memkeys"&gt;memkeys&lt;/a&gt;. We do some pretty interesting stuff in memkeys to keep from dropping packets, some of which is documented &lt;a href="https://github.com/bmatheny/memkeys/wiki"&gt;in the wiki&lt;/a&gt;. I’m particularly proud of the striped lock-free queue implementation. In some basic benchmarks I found that memkeys dropped less than 2% of packets when seeing 1Gb/s of traffic. Additionally, the latency between a packet being picked up, parsed, processed, and reported on averages less than 1ms. Here is a screenshot of memkeys in action.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/de4d1a949867ad3a94e33f5adf70736c/tumblr_inline_mlpw2l4hyv1qz4rgp.png" alt="Screenshot"/&gt;&lt;/p&gt;

&lt;p&gt;Interested in stuff like this? We’re &lt;a href="http://www.tumblr.com/jobs"&gt;hiring&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Footnote: Etsy created the excellent mctop tool which aims to be like unix top for memcache, showing you which keys are getting the most activity. Unfortunately (as noted in the known issues), mctop drops packets. It drops a lot of packets. This can be really problematic because depending on the packets being dropped, you’re getting a really incomplete view of your cache story.&lt;/p&gt;&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/48701285213</link><guid>http://engineering.tumblr.com/post/48701285213</guid><pubDate>Tue, 23 Apr 2013 12:53:07 -0400</pubDate><category>memcached</category><category>opensource</category><category>software</category><dc:creator>mobocracy</dc:creator></item><item><title>Tumblr Engineering @ Percona Live MySQL Conference</title><description>&lt;p&gt;We&amp;#8217;re pleased to announce that Tumblr&amp;#8217;s Database Engineering team will be attending the &lt;a href="http://www.percona.com/live/mysql-conference-2013/" target="_blank"&gt;Percona Live MySQL Conference&lt;/a&gt; next week in Santa Clara, CA!&lt;/p&gt;
&lt;p&gt;We&amp;#8217;ll be giving a &lt;a href="http://www.percona.com/live/mysql-conference-2013/sessions/introduction-sharding-jetpants" target="_blank"&gt;talk&lt;/a&gt; on our open source automation software, Jetpants, which has helped us scale to over 175 billion distinct rows of relational data to date. We&amp;#8217;re also looking forward to attending a number of amazing sessions from our friends at Percona, Facebook, Oracle, Palomino, Etsy, and more.&lt;/p&gt;
&lt;p&gt;If you haven&amp;#8217;t registered yet, use code SpeakMySQL to save 15%. Hope to see you there!&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/48304490499</link><guid>http://engineering.tumblr.com/post/48304490499</guid><pubDate>Thu, 18 Apr 2013 17:39:00 -0400</pubDate><category>MySQL</category><category>databases</category><dc:creator>evan</dc:creator></item><item><title>Last week, two of our engineers, Wolf and JC, got to stay up all...</title><description>&lt;img src="http://25.media.tumblr.com/020f628ec8717d534284461858a5028c/tumblr_ml3qoodaVA1qjk2rvo1_500.gif"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;span&gt;Last week, two of our engineers,&lt;/span&gt;&lt;a href="http://mt.tumblr.com/"&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;Wolf&lt;/span&gt;&lt;/a&gt;&lt;span&gt; and &lt;/span&gt;&lt;a href="http://seejohnrun.tumblr.com/"&gt;&lt;span&gt;JC&lt;/span&gt;&lt;/a&gt;&lt;span&gt;, got to stay up all night eating pizza with some really talented young hackers at two different hackathons —&lt;/span&gt;&lt;a href="https://www.hackerleague.org/hackathons/spring-2013-hackny-student-hackathon"&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;2013 hackNY Student Hackathon&lt;/span&gt;&lt;/a&gt;&lt;span&gt; at Columbia University and&lt;/span&gt;&lt;a href="http://www.photohackday.org/"&gt;&lt;span&gt; &lt;/span&gt;&lt;span&gt;Photo Hack Day&lt;/span&gt;&lt;/a&gt;&lt;span&gt; at Facebook HQ.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;After a long day of hacking, pizza, sleep deprivation, more pizza, and some DIY tacos, participants demoed their hacks, including a few for the Tumblr API. We were absolutely blown away by the creativity and quality of the projects and look forward to seeing what you’ll build at the next hackathon!&lt;/span&gt;&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/47711938394</link><guid>http://engineering.tumblr.com/post/47711938394</guid><pubDate>Thu, 11 Apr 2013 13:46:00 -0400</pubDate><dc:creator>codingjester</dc:creator></item><item><title>The spring 2013 hackNY Student Hackathon is this weekend at...</title><description>&lt;img src="http://24.media.tumblr.com/7d87a022c3976043b6deacdf6846fba0/tumblr_mkp3asLJ9T1qjk2rvo1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;span&gt;The spring 2013 hackNY Student Hackathon is this weekend at Columbia University! For 24 hours, students will collaborate on creative coding challenges, and then present their hacks to a panel of judges. Prizes and bragging rights will be awarded to the winners.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;One of our engineers, &lt;/span&gt;&lt;a href="http://mt.tumblr.com/" target="_blank"&gt;The Wolf&lt;/a&gt;&lt;span&gt;, will be demoing the Tumblr API, helping students build awesome stuff, and eating leftover pizza.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Details&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;When: Saturday, 4/6 at 2pm – Sunday, 4/7 at 2pm &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Where: Fu Foundation School for Engineering and Applied Science, Columbia University&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;How: &lt;/span&gt;&lt;a href="https://www.hackerleague.org/hackathons/spring-2013-hackny-student-hackathon" target="_blank"&gt;&lt;span&gt;Sign up here&lt;/span&gt;&lt;/a&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;If all-night hacking isn’t your thing, be sure to check out all the student hacks at DemoFest (&lt;/span&gt;&lt;a href="http://hacknys2013.eventbrite.com/" target="_blank"&gt;tickets are free&lt;/a&gt;&lt;span&gt;!) this Sunday from 12-2 pm.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span&gt;See you soon!&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/47043099482</link><guid>http://engineering.tumblr.com/post/47043099482</guid><pubDate>Wed, 03 Apr 2013 15:56:50 -0400</pubDate><dc:creator>codingjester</dc:creator></item><item><title>Core Data as a cache</title><description>&lt;p&gt;Core Data is great but automatic migrations can be tricky. Migrations can take a long time, which could
result in &lt;a href="http://stackoverflow.com/questions/13333289/core-data-timeout-adding-persistent-store-on-application-launch"&gt;your app being terminated&lt;/a&gt; 
if it is happening on the main thread during application launch. Performing migrations on a background 
thread is also a &lt;a href="http://stackoverflow.com/a/2866725/503916"&gt;bad idea&lt;/a&gt;, meaning your application really
needs to be able to fully launch &lt;em&gt;without a Core Data stack whatsoever&lt;/em&gt; in order to safely migrate. 
This can be a huge change to make to an existing app.&lt;/p&gt;

&lt;p&gt;If you’re really only using Core Data as a cache, you don’t actually &lt;em&gt;need&lt;/em&gt; to perform a migration. 
Simply check if the existing store is compatible with your managed object model and if not, delete 
and recreate it. Here’s my approach for doing so.&lt;/p&gt;

&lt;script src="https://gist.github.com/irace/5216137.js" type="text/javascript"&gt;&lt;/script&gt;&lt;p&gt;Many thanks to Marcus Zarra for being &lt;em&gt;extremely&lt;/em&gt; 
helpful both over Twitter and on Stack Overflow. You should totally buy &lt;a href="http://pragprog.com/book/mzcd/core-data"&gt;his Core Data book&lt;/a&gt;.&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/46278886642</link><guid>http://engineering.tumblr.com/post/46278886642</guid><pubDate>Mon, 25 Mar 2013 17:16:00 -0400</pubDate><dc:creator>bryan</dc:creator></item><item><title>Over the past few weeks we’ve open-sourced quite a few official...</title><description>&lt;img src="http://25.media.tumblr.com/929da0c6b0ce1d121fff9749e31410d8/tumblr_mk2g3n26og1qb1l2uo1_r1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;Over the past few weeks we’ve open-sourced &lt;a href="https://github.com/tumblr/tumblr.js"&gt;quite&lt;/a&gt; &lt;a href="https://github.com/tumblr/tumblr_client"&gt;a&lt;/a&gt; &lt;a href="https://github.com/tumblr/tumblr.php"&gt;few&lt;/a&gt; &lt;a href="https://github.com/tumblr/jumblr"&gt;official&lt;/a&gt; client libraries for the Tumblr API. Today I’m proud to announce the &lt;a href="https://github.com/tumblr/TMTumblrSDK"&gt;Tumblr iOS SDK&lt;/a&gt;, an Objective-C library for easily integrating Tumblr data into your iOS (or OS X) applications, however you see it.&lt;/p&gt;

&lt;p&gt;The Tumblr SDK for iOS contains a few different components to start:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Authentication (OAuth and xAuth implementations)&lt;/li&gt;
&lt;li&gt;A full wrapper around all of our API endpoints&lt;/li&gt;
&lt;li&gt;Inter-app communication (this is pretty limited at the moment but we plan to expand it quite a bit going forward)&lt;/li&gt;
&lt;li&gt;A UIActivity stub for easy inclusion of a Tumblr button in a standard Apple UIActivityViewController&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;We’ve been using this SDK in production for quite some time now and are thrilled to finally be able to share it with you.&lt;/p&gt;

&lt;p&gt;If you’re interested in integrating with Tumblr on iOS or OS X in a way that the SDK doesn’t currently facilitate, please get in touch. I’m very interested in hearing any and all feedback on how we can make this as easy as possible.&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/46010473265</link><guid>http://engineering.tumblr.com/post/46010473265</guid><pubDate>Fri, 22 Mar 2013 16:08:34 -0400</pubDate><dc:creator>bryan</dc:creator></item><item><title>seejohnrun:


tumblr.php
I’m here to announce another new...</title><description>&lt;img src="http://24.media.tumblr.com/75c908efa3040fd56e6a44f2cbe77506/tumblr_mjpqrpWDTo1qzpqc3o1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a class="tumblr_blog" href="http://seejohnrun.tumblr.com/post/45429413506/tumblr-php-im-here-to-announce-another-new"&gt;seejohnrun&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;tumblr.php&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I’m here to announce another new addition to our list of official API clients. This week we have &lt;code&gt;tumblr.php&lt;/code&gt; - it’s available on &lt;a href="https://github.com/tumblr/tumblr.php"&gt;GitHub&lt;/a&gt; and &lt;a href="https://packagist.org/packages/tumblr/tumblr"&gt;composer via packagist&lt;/a&gt;. It is tested, is PSR-2 compatible, and is well documented.&lt;/p&gt;
&lt;p&gt;Like the other clients we’ve been announcing (most recently &lt;a href="https://github.com/tumblr/tumblr.js"&gt;JS&lt;/a&gt; and &lt;a href="https://github.com/tumblr/jumblr"&gt;Java&lt;/a&gt;) it has full support for all of the Tumblr v2 API endpoints.&lt;/p&gt;
&lt;p&gt;Time to make something cool!&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/45429489874</link><guid>http://engineering.tumblr.com/post/45429489874</guid><pubDate>Fri, 15 Mar 2013 13:52:34 -0400</pubDate><dc:creator>codingjester</dc:creator></item><item><title>Using entropy to route web traffic</title><description>&lt;p&gt;&lt;a class="tumblr_blog" href="http://www.adamlaiacano.com/post/44295213078/using-entropy-to-route-web-traffic"&gt;adamlaiacano&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Earlier this week, &lt;a href="http://tumblr.mobocracy.net"&gt;Blake&lt;/a&gt; asked me for some help with a problem he’s working on. He has a couple of hash functions that are being used to route web traffic to a number of different servers. A hash function takes an input, such as a blog’s url, and outputs a number between 0 and 2&lt;sup&gt;32.&lt;/sup&gt; Say we have 1000 servers, that means that each one will handle about 430 million points in the hash-space.&lt;/p&gt;
&lt;p&gt;The data looked something like this (with fake blog names, of course):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##       blog.name        H1        H2        H3        H4
## 1 23.tumblr.com 3.137e+09 1.866e+09 6.972e+08 5.792e+08
## 2 19.tumblr.com 1.875e+09 2.545e+08 2.606e+09 1.312e+09
## 3 34.tumblr.com 1.366e+09 2.236e+09 1.106e+09 3.640e+09
## 4 43.tumblr.com 2.639e+09 1.098e+09 8.755e+08 1.507e+09
## 5 90.tumblr.com 6.564e+08 5.397e+07 3.084e+09 2.961e+09
## 6 29.tumblr.com 2.476e+09 4.532e+08 2.787e+08 4.894e+08
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One important thing to point out before we get started is that this has to be a &lt;em&gt;representative sample&lt;/em&gt; of the request data. Despite the wild popularity of my personal blog, it doesn’t get a sliver of the traffic that &lt;a href="http://beyonce.tumblr.com"&gt;Beyonce&lt;/a&gt; gets. That fact needs to be represented in the sample data, meaning that her blog should appear in more rows of the sample data than mine.&lt;/p&gt;
&lt;h2&gt;Plot the data&lt;/h2&gt;
&lt;p&gt;The first thing I ever do is plot data to get a sense of what I’m working with and what I’m trying to accomplish. The density plots below show the distribution of values in the hash space for each algorithm. If you’re not familiar with &lt;a href="http://en.wikipedia.org/wiki/Kernel_density_estimation"&gt;kernel density plots&lt;/a&gt;, you can imagine this to be a smoothed (and prettier) version of a histogram. For the electrical engineers out there, it’s the sum of the convolution of a kernel function (usually a gaussian), with an impulse function at each of the points on the x-axis (represented here by dots).&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="http://media.tumblr.com/38b19bb8bc204a4b6dc160e90ee895c2/tumblr_inline_mizgewlbod1qz4rgp.png"/&gt;&lt;/p&gt;
&lt;p&gt;By comparison, here are the density plots of a “near-ideal” example (1000 pulls from a uniform distribution) and a bad example (all assigned to the value 2e+09). The worst case example here shows the shape of the kernel function.&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="http://media.tumblr.com/8102f4803a4aa667c0b9dd642e38b690/tumblr_inline_mizgfvggEd1qz4rgp.png"/&gt;&lt;/p&gt;
&lt;h2&gt;Calculate the entropy&lt;/h2&gt;
&lt;p&gt;In information theory, entropy is the minimum number of bits required (on average) to identify an encoded symbol (stay with me here…). If we’re trying to transmit a bunch of text digitally, we would encode the alphabet where each “symbol” is a letter that will be represented in 1’s and 0’s. In order to transmit our message quickly, we want to use as few bits as possible. Since the letter “e” appears more frequently than the letter “q”, we want the symbol for “e” to have fewer bits than “q”. Make sense?&lt;/p&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Huffman_coding"&gt;Huffman Coding&lt;/a&gt; is one encoding algorithm. There’s an example implementation &lt;a href="http://en.literateprograms.org/Huffman_coding_(Python"&gt;here&lt;/a&gt;, which assigns the code &lt;code&gt;100&lt;/code&gt; to “e” and &lt;code&gt;1111110010&lt;/code&gt; to “q”. The more “uneven” your symbol distribution is, the fewer bits it will take, on average, to transmit your message (meaning you’ll have a lower entropy). The entropy value is a lower bound for the actual weighted average of the symbol lengths. There are special cases where some encoding algorithms get closer to the entropy value than others, but none will ever surpass it.&lt;/p&gt;
&lt;p&gt;The actual entropy formula is:&lt;/p&gt;
&lt;p&gt;\[ H(x)=-\sum_{i=0}^{N-1} p_i log_2(p_i) \]&lt;/p&gt;
&lt;p&gt;Where \( H(x) \) is the entropy, and \( p_i \) is the probability of that symbol \( i \) will appear. In the example I linked to above, \( p_e = 0.124 \) and \( p_q=0.0009 \), so it makes sense that e’s symbol is so much shorter. In the example, the average number of bits per symbol is \( \sum S_i p_i = 4.173 \frac{bits}{symbol} \), where \( S_i \) is the number of bits in the symbol. The entropy, from the above equation, is \( 4.142 \frac{bits}{symbol} \).&lt;/p&gt;
&lt;p&gt;The example problem of web traffic distribution is a little different. We’re not actually encoding anything, but rather trying to make the theoretical lower bound for average number of bits/signal as &lt;em&gt;high&lt;/em&gt; as possible.&lt;/p&gt;
&lt;p&gt;We can consider each server to be a symbol, and the amount of traffic that it recieves is decided by the hash function that we’re trying to choose. If one of our servers is the equivalent of the letter “e”, it’s going to be totally overloaded while the “q” isn’t going to be handling much traffic at all. We want each symbol (server) to appear (receive traffic) equally often.&lt;/p&gt;
&lt;p&gt;So to calculate the entropy, we’ll take a histogram of the hash values with 20 buckets (representing the 20 servers). That will give us the number of requests that go to each server. Dividng that by the total number of requests gives us each server’s probability of handing the next incoming request. These are the \( p_i \) values that we need in order to calculate the entropy. In code, it looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="r"&gt;calc.entropy &amp;lt;- function(hash.value) {
    h = hist(hash.value, plot = FALSE, breaks = seq(0, 2^32, length.out = 21))
    probs = h$counts/sum(h$counts)
    print(probs)
    entropy = -sum(probs * log2(probs))
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The entropy values for our four hash functions are:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##   hash.function entropy
## 1            H1   4.203
## 2            H2   4.226
## 3            H3   4.254
## 4            H4   4.180
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And while we’re at it, here’s the entropy of our best/worst case example that we plotted earlier.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##    hash.function entropy
## 1     near.ideal   4.309
## 2 worst.possible   0.000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Why is the worst case value 0? Because if all traffic is going to one server, we wouldn’t need any bits at all to tell us which server the request is going to. The theoretical limit for a histogram with 20 buckets is: \( -20\frac{1}{20}log_2{\frac{1}{20}} = 4.32 \), which we’re close to but can never exceed.&lt;/p&gt;
&lt;p&gt;All of our hash functions appear to be working pretty well, especially for such a small sample size that I’m using for this blog post. It looks like our winner is H3.&lt;/p&gt;
&lt;p&gt;To summarize here’s what we did to find the optimal hashing function:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Get a &lt;em&gt;representative sample&lt;/em&gt; of your web traffic&lt;/li&gt;
&lt;li&gt;Run each request through the hashing function&lt;/li&gt;
&lt;li&gt;Take a histogram of the resulting values with N bins, where N is the number of servers you have available&lt;/li&gt;
&lt;li&gt;Divide the bin counts by the total number of requests in your sample to get the probability of handing a request for each server&lt;/li&gt;
&lt;li&gt;Calculate the entropy, \( H(x)=-\sum_{i=0}^{N-1} p_i log_2(p_i) \), for each hash function&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;There are more considerations to take, like setting an upper bound on the value of \( p_i \) to ensure that no single server ever gets so busy that it can’t handle its load.&lt;/p&gt;
&lt;p&gt;If you want to read up more on information theory, &lt;a href="http://www.amazon.com/Elements-Information-Theory-Thomas-Cover/dp/0471062596/ref=sr_1_25?ie=UTF8&amp;amp;qid=1362113533&amp;amp;sr=8-25&amp;amp;keywords=information+theory"&gt;Elements of Information Theory&lt;/a&gt; by Thomas Cover and Joy Thomas is an excellent book that is reasonably priced (used) on Amazon.&lt;/p&gt;
&lt;p&gt;There is also, of course, Claude Shannon’s landmark paper from 1948 “&lt;a href="http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf"&gt;A Mathematical Theory of Communication&lt;/a&gt;”“, in which he essentially defines the entire field.&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/44299526996</link><guid>http://engineering.tumblr.com/post/44299526996</guid><pubDate>Fri, 01 Mar 2013 11:40:05 -0500</pubDate><category>engineering</category><category>tumblr</category><category>tech</category><category>computer science</category><dc:creator>codingjester</dc:creator></item><item><title>seejohnrun:

tumblr Java client: jumblr
Last week we announced...</title><description>&lt;img src="http://25.media.tumblr.com/4169ab7d72cfe241644acd4aff03b417/tumblr_misa13iHTr1qzpqc3o1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a class="tumblr_blog" href="http://seejohnrun.tumblr.com/post/43987451567/tumblr-java-client-jumblr-last-week-we"&gt;seejohnrun&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;h2&gt;tumblr Java client: jumblr&lt;/h2&gt;
&lt;p&gt;Last week we announced our first official API client, for JavaScript (read: &lt;a href="http://seejohnrun.tumblr.com/post/43515845065/tumblr-js-javascript-client-today-im-excited-to"&gt;the announcement&lt;/a&gt;). This week we’re back to announce the release of an official Java client, Jumblr:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Blog blog = client.blogInfo("seejohnrun.tumblr.com");
for (Post post : blog.posts()) {
    post.like(); // you're too kind..
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like its JavaScript counterpart, Jumblr comes with full support for all of the API V2 endpoints. Check out more detail on the &lt;a href="https://github.com/tumblr/jumblr"&gt;github page&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;If you’re interested in following along with any of our open-source work, check out &lt;a href="http://tumblr.github.com/"&gt;our GitHub page&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/43987461621</link><guid>http://engineering.tumblr.com/post/43987461621</guid><pubDate>Mon, 25 Feb 2013 11:10:00 -0500</pubDate><dc:creator>codingjester</dc:creator></item><item><title>Community is extremely important us here at Tumblr. That goes...</title><description>&lt;img src="http://25.media.tumblr.com/a57a328f34ccfab998de40221296431d/tumblr_mimxa4WL3G1qjk2rvo1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;Community is extremely important us here at Tumblr. That goes double for engineers and developer evangelists. After spending all day giving talks at the &lt;a href="http://www.apistrategyconference.com/"&gt;API Strategy &amp; Practice Conference&lt;/a&gt;, many of the speakers who were in town came over to TumblrHQ for the monthly &lt;a href="http://www.meetup.com/nycevangelists/events/104141492/"&gt;Dev Evangelist Meetup&lt;/a&gt; here in NYC. It was a great time had by all and we were even able to get a group shot of everyone!&lt;/p&gt;
&lt;p&gt;If you’re looking to be a part of a team that encourages community, &lt;a href="http://tumblr.com/jobs"&gt;We’re hiring&lt;/a&gt;.&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/43736615968</link><guid>http://engineering.tumblr.com/post/43736615968</guid><pubDate>Fri, 22 Feb 2013 13:54:09 -0500</pubDate><dc:creator>codingjester</dc:creator></item><item><title>seejohnrun:


tumblr.js JavaScript client
Today I’m excited to...</title><description>&lt;img src="http://24.media.tumblr.com/9b175d6219949ebcc959a107dd4d3e0d/tumblr_mihocseSbc1qzpqc3o1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a class="tumblr_blog" href="http://seejohnrun.tumblr.com/post/43515845065/tumblr-js-javascript-client-today-im-excited-to"&gt;seejohnrun&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;h1&gt;tumblr.js JavaScript client&lt;/h1&gt;
&lt;p&gt;Today I’m excited to announce the release of &lt;a href="https://github.com/tumblr/tumblr.js"&gt;tumblr.js&lt;/a&gt;, the first of several official API clients we’ll be rolling out over the next few months.&lt;/p&gt;
&lt;p&gt;You can install it now with &lt;code&gt;npm&lt;/code&gt;, and start making something awesome:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;var tumblr = require('tumblr.js');
var client = tumblr.createClient({
  consumer_key: 'consumer_key',
  consumer_secret: 'consumer_secret',
  token: 'oauth_token',
  token_secret: 'oauth_token_secret'
});

// Name all of the authenticating user's blogs
client.userInfo(function (err, data) {
  data.user.blogs.forEach(function (blog) {
    console.log(blog.name);
  });
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It comes with full support for all of the &lt;a href="http://www.tumblr.com/docs/en/api/v2"&gt;API V2&lt;/a&gt; endpoints including tag search, following, liking, and post creation. For more detail, see the &lt;a href="https://github.com/tumblr/tumblr.js"&gt;GitHub page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;More to come soon!&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/43515893556</link><guid>http://engineering.tumblr.com/post/43515893556</guid><pubDate>Tue, 19 Feb 2013 17:43:33 -0500</pubDate><dc:creator>codingjester</dc:creator></item><item><title>0xa:

We have Engineering Summer Internships @ Tumblr!  We’re...</title><description>&lt;img src="http://24.media.tumblr.com/d6aec739ebb69adda5df038c26ae190a/tumblr_mhb3ney5DN1rh6s88o1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a class="tumblr_blog" href="http://0xa.tumblr.com/post/41647980830/we-have-engineering-summer-internships-tumblr"&gt;0xa&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We have &lt;a href="http://tumblr.theresumator.com/apply/yxseWO/Engineering-Summer-Intern.html?source=Tumblr+Engineering+Blog"&gt;Engineering Summer Internships @ Tumblr&lt;/a&gt;!  We’re looking for aspiring software engineers with a passion for open source software to join us for a summer of programming and fun.  You will be integrated into a small engineering team working on a real-world project as part of&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Product Engineering: Using PHP and JavaScript, create new and improve site features that keep our growing millions of users doing amazing things with their tumblelogs.&lt;/li&gt;
&lt;li&gt;Search Engineering: Research, expand and refine Tumblr’s search infrastructure and search features.&lt;/li&gt;
&lt;li&gt;Platform Engineering: Write highly optimized distributed services that manage data and requests in real time, helping our site scale to billions of posts.&lt;/li&gt;
&lt;li&gt;Infrastructure Engineering: Work hands on with the Network team configuring, deploying and maintaining JunOS network devices.&lt;/li&gt;
&lt;li&gt;Mobile Engineering: Come work on the application that’s putting Tumblr in millions of users’ pockets.  A passion for IOS and Android is needed.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Polish your code samples, then send us your resume via our &lt;a href="http://tumblr.theresumator.com/apply/yxseWO/Engineering-Summer-Intern.html?source=Tumblr+Engineering+Blog"&gt;Engineering Summer Internship&lt;/a&gt; job page.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;(We’re also hiring full time Engineers at every level of our technical stack.  Learn more on &lt;a href="http://www.tumblr.com/jobs"&gt;Tumblr’s Jobs page&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/41819887564</link><guid>http://engineering.tumblr.com/post/41819887564</guid><pubDate>Tue, 29 Jan 2013 18:43:25 -0500</pubDate><category>engineering</category><category>tumblr</category><category>internship</category><dc:creator>0xa</dc:creator></item><item><title>Tumblr for iPad's custom UIPopoverController</title><description>&lt;p&gt;When &lt;a href="http://blog.petervidani.com"&gt;Peter&lt;/a&gt; was designing &lt;a href="https://itunes.apple.com/us/app/tumblr/id305343404?mt=8"&gt;Tumblr for iPad&lt;/a&gt;, he came up with a great custom popover design:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/e4a02c3f693b7b33bfda10a88d42d8a4/tumblr_inline_mfc5r3Wjn01qz4rgp.png" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;Implementing his design was a challenge that I wanted to share.&lt;/p&gt;

&lt;p&gt;iOS’s UIKit framework includes &lt;a href="http://developer.apple.com/library/ios/#documentation/uikit/reference/UIPopoverBackgroundView_class/Reference/Reference.html"&gt;UIPopoverBackgroundView&lt;/a&gt;, an abstract class which can be subclassed to provide a custom background for a popover. Sounds perfect, right? UIPopoverBackgroundView let me use custom assets provided by Peter and &lt;a href="http://bengold.tv"&gt;Ben&lt;/a&gt; for the background and arrow, but the design also included a custom inset stroke/shadow. Fortunately, UIPopoverBackgroundView makes it easy to hide the default inset:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+(BOOL)wantsDefaultContentAppearance {
    return NO;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Unfortunately, adding its replacement wasn’t so easy.&lt;/p&gt;

&lt;p&gt;First I subclassed UIPopoverController and added the custom inset to the view hierarchy when the popover is presented (&lt;code&gt;_insetShadow&lt;/code&gt; is a resizable UIImageView):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;UIView *superview = self.contentViewController.view.superview;
[superview addSubview:_insetStroke];

_insetStroke.frame = superview.bounds;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Simple enough, unless the popover’s content view controller is a &lt;a href="http://developer.apple.com/library/ios/#documentation/uikit/reference/UINavigationController_Class/Reference/Reference.html"&gt;UINavigationController&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/da170cf2b4cfb70d9d4f17a3a5d9415c/tumblr_inline_mfc5u72I0I1qz4rgp.png" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;In that case, the view hierarchy looks different and we have to resort to something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;UIViewController *controller = self.contentViewController;

BOOL isNavigationController = [controller isKindOfClass:[UINavigationController class]];

UIView *superview = isNavigationController 
        ? controller.view 
        : controller.view.superview;

[superview addSubview:_insetStroke];

CGRect frame = controller.view.superview.bounds;

if (isNavigationController &amp;amp;&amp;amp; 
        !((UINavigationController *)controller).isNavigationBarHidden) {
    float navBarOffset = 36;

    frame.origin.y += navBarOffset;
    frame.size.height -= navBarOffset;   
}

_insetStroke.frame = frame;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Ugly, but it works.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/7ace3fb00e3912eafc852bf33318c582/tumblr_inline_mfc5uka1NR1qz4rgp.png" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;Except it doesn’t work on iOS 5 because &lt;code&gt;wantsDefaultContentAppearance&lt;/code&gt; wasn’t added until iOS 6. On iOS 5 we need to manually traverse the view hierarchy to find the default stroke and remove it. But keep in mind that the view hierarchy looks different depending on whether or not the content view controller is a navigation controller. We end up with something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;if (!IS_IOS_6) {
    for (UIView *subview in [superview subviews]) {
        if (isNavigationController) {
            for (UIView *subsubview in [subview subviews])
                if ([subsubview class] == [UIImageView class])
                    subsubview.hidden = YES;
        } else {
            if ([subview class] == [UIImageView class] &amp;amp;&amp;amp; subview != _insetStroke)
                subview.hidden = YES;
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;</description><link>http://engineering.tumblr.com/post/38408700509</link><guid>http://engineering.tumblr.com/post/38408700509</guid><pubDate>Thu, 20 Dec 2012 16:35:28 -0500</pubDate><dc:creator>bryan</dc:creator></item><item><title>Orr Sella: Introducing tumblr4s: A Scala Library For The Tumblr API</title><description>&lt;a href="http://orrsella.com/post/37654002423/introducing-tumblr4s-a-scala-library-for-the-tumblr-api"&gt;Orr Sella: Introducing tumblr4s: A Scala Library For The Tumblr API&lt;/a&gt;: &lt;p&gt;&lt;a class="tumblr_blog" href="http://orrsella.com/post/37654002423/introducing-tumblr4s-a-scala-library-for-the-tumblr-api"&gt;orrsella&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Back in February of this year I stumbled upon &lt;a href="http://orrsella.com/post/18243790176/tumblr-architecture-15-billion-page-views-a-month-and"&gt;an amazing article&lt;/a&gt; on HighScalability.com about Tumblr’s infrastructure. It’s a great read for many reasons (go read it). One of the greatest take-aways I had from this article, was this “new” programming language called Scala. I had never heard of…&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Awesome stuff. Thanks Orr!&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/37668666339</link><guid>http://engineering.tumblr.com/post/37668666339</guid><pubDate>Mon, 10 Dec 2012 18:14:55 -0500</pubDate><dc:creator>mobocracy</dc:creator></item><item><title>adamlaiacano:


John Myles White was kind enough to come by...</title><description>&lt;iframe src="http://player.vimeo.com/video/54970619" width="400" height="266" frameborder="0"&gt;&lt;/iframe&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a class="tumblr_blog" href="http://adamlaiacano.tumblr.com/post/37333552557/john-myles-white-was-kind-enough-to-come-by-tumblr"&gt;adamlaiacano&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;div&gt;
&lt;p&gt;&lt;a href="http://www.johnmyleswhite.com/"&gt;John Myles White&lt;/a&gt; was kind enough to come by Tumblr HQ this week and give a talk about the advantages that MAB (Multi-Armed Bandit) testing provide over traditional A/B testing.&lt;/p&gt;
&lt;p&gt;Most of the content is drawn from his ebook &lt;a href="http://oreil.ly/WGZsKN"&gt;“Bandit Algorithms for Website Optimization”&lt;/a&gt;. &lt;/p&gt;
&lt;/div&gt;
&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/37333617674</link><guid>http://engineering.tumblr.com/post/37333617674</guid><pubDate>Thu, 06 Dec 2012 10:02:00 -0500</pubDate><dc:creator>radioon</dc:creator></item><item><title>adamlaiacano:

The Bad Data Handbook is finally out! It’s the...</title><description>&lt;img src="http://24.media.tumblr.com/tumblr_mdencsEa1G1r0vuydo1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a href="http://adamlaiacano.tumblr.com/post/35611365010/the-bad-data-handbook-is-finally-out-its-the" class="tumblr_blog"&gt;adamlaiacano&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;The &lt;a href="http://shop.oreilly.com/product/0636920024422.do?sortby=publicationDate"&gt;Bad Data Handbook&lt;/a&gt; is finally out! It’s the first time I’ve contributed to a publication and I’m incredibly excited about it.&lt;/p&gt;
&lt;p&gt;The top required skills of a data scientist are generally considered to be: mathematical know-how, programming capabilities, and some sort of domain knowledge enabling them to ask (and then answer) relevant questions.&lt;/p&gt;
&lt;p&gt;This book is about all of the other garbage that you have to put up with along the way. Each chapter is written by someone who has spent more time than they probably would have liked dealing with a specific issue, and they provide some tips and pitfalls. I haven’t read the other chapters yet, but some of the included topics are:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Test drive your data to see if it’s ready for analysis&lt;/li&gt;
&lt;li&gt;Work spreadsheet data into a usable form&lt;/li&gt;
&lt;li&gt;Handle encoding problems that lurk in text data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Develop a successful web-scraping effort&lt;/strong&gt; (that’s me)&lt;/li&gt;
&lt;li&gt;Use NLP tools to reveal the real sentiment of online reviews&lt;/li&gt;
&lt;li&gt;Address cloud computing issues that can impact your analysis effort&lt;/li&gt;
&lt;li&gt;Avoid policies that create data analysis roadblocks&lt;/li&gt;
&lt;li&gt;Take a systematic approach to data quality analysis&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;I hope at least a few of you buy it and enjoy it.&lt;/p&gt;&lt;/blockquote&gt;</description><link>http://engineering.tumblr.com/post/35644884844</link><guid>http://engineering.tumblr.com/post/35644884844</guid><pubDate>Tue, 13 Nov 2012 13:23:27 -0500</pubDate><category>engineering</category><category>tech</category><category>tumblr</category><category>bad data</category><category>big data</category><dc:creator>codingjester</dc:creator></item><item><title>Tumblr for iPhone is now 100% native</title><description>&lt;p&gt;My appreciation for those who build web browsers has grown dramatically over the past few months.&lt;/p&gt;

&lt;p&gt;With &lt;a href="http://staff.tumblr.com/post/35268804876/iphone-update-with-native-dashboard"&gt;today’s 3.2 release&lt;/a&gt; I’m happy to announce that Tumblr for iPhone is now 100% native. We love web technologies at Tumblr but believe this change provides a much better experience for this particular product.&lt;/p&gt;

&lt;p&gt;HTML is the first-class citizen at Tumblr. All posts, whether authored in plain text, Markdown, or using our WYSIWIG editor, are stored as HTML. This is what we use to build your dashboard both on the website and in the mobile apps, via our &lt;a href="http://tumblr.com/api"&gt;API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The decision to use a web view to render lists of posts in the 3.0 rewrite (June 2012) had nothing to do with cross-platform compatibility or ease of development and everything to do with needing to render (somewhat) arbitrary HTML, provided by our users. This makes building a Tumblr iOS app of the highest caliber an interesting challenge.&lt;/p&gt;

&lt;p&gt;As noted by &lt;a href="http://zachwill.com/tumblr-ios"&gt;Zach Williams&lt;/a&gt;, we used &lt;a href="https://github.com/groue/GRMustache"&gt;GRMustache&lt;/a&gt; to render the post lists and &lt;a href="http://zeptojs.com"&gt;Zepto.js&lt;/a&gt; was used to implement the web view’s behavior, with some slight modications (the &lt;code&gt;longTap&lt;/code&gt; event didn’t work exactly the way we wanted, &lt;code&gt;tap&lt;/code&gt; wouldn’t play HTML5 audio, and we needed to prevent touches while scrolling). CSS classes were used instead of &lt;code&gt;:active&lt;/code&gt; pseudo-classes so they could be removed programmatically, as scrolling began. Our JavaScript-native “bridge” was basically &lt;a href="https://gist.github.com/3688560"&gt;this example&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Suitable scrolling performance was difficult to achieve in the web view as lists of Tumblr posts are usually extremely media-heavy. Some of the measures we took included using images in place of &lt;code&gt;border-radius&lt;/code&gt; and &lt;code&gt;box-shadow&lt;/code&gt; CSS and scaling/compressing photos on our servers, to the exact size needed on the phone.&lt;/p&gt;

&lt;p&gt;We considered writing a JavaScript dequeuing mechanism but didn’t end up doing so. Since our 3.0 release, Airbnb open-sourced their “UITableView for the web” in &lt;a href="http://airbnb.github.com/infinity/"&gt;∞.js&lt;/a&gt; and LinkedIn published a &lt;a href="http://engineering.linkedin.com/linkedin-ipad-5-techniques-smooth-infinite-scrolling-html5"&gt;detailed overview&lt;/a&gt; of how they approached a similar problem in their iPad application. Anyone attempting to build a web/native hybrid application like we did should take a look at both of these resources first.&lt;/p&gt;

&lt;p&gt;I’m excited about the speed and stability that I believe going fully native has brought to our app, but please &lt;a href="http://bryan.io/ask"&gt;let me know&lt;/a&gt; how we can make the Tumblr iOS experience even better.&lt;/p&gt;</description><link>http://engineering.tumblr.com/post/35271768127</link><guid>http://engineering.tumblr.com/post/35271768127</guid><pubDate>Thu, 08 Nov 2012 09:57:00 -0500</pubDate><dc:creator>bryan</dc:creator></item></channel></rss>
