Tumblr Engineering

Month
Filter by post type
All posts

Text
Photo
Quote
Link
Chat
Audio
Video
Ask

June 2018

Tumblr's 4th Annual Security Capture the Flag

security:

We’ve hosted an internal Security Capture the Flag (CTF) event for four years in a row now, with each year getting better than the last!

The event

Previously, we were only open to Tumblr employees. This year we decided to extend an invite out to the other teams housed under our parent company, Oath.

All participants had a three hour window to hack, a buffet of tacos, beer, and wine to dive into, and a stack of prizes for the top four players (see Prizes below for details)!

Challenges were available Jeopardy-style, broken down by category. We had eight fun categories to select from:

  • Auth Bypass (authn | authz)
  • Cross Site Request Forgery (CSRF)
  • Cross Site Scripting (XSS)
  • Crypto
  • Forensics
  • Reverse Engineering
  • SQL Injection (SQLi)
  • XML Injection (+ XXE)

We also sprinkled a few “inside joke” Easter eggs around the system that awarded bonus points to anyone that discovered them! For example, if they attempted to find a hole in the CTF system itself and navigated to /wp-admin, we’d give them a flag on a prank WordPress page; or perhaps testing to find XSS with a <marquee> tag — only the greatest of all XSS tags!

While the Security Team walked around and helped out, we also setup a mini lockpick village just because.

Keep reading

Jun 14, 2018 38 notes
#security #engineering #capture the flag

May 2018

Come join us!

javascript:

If you’ve been following this Tumblr, you’ll likely know that we, the Core Web team, have recently started rewriting and modernizing the Tumblr web platform. This undertaking presents some incredibly exciting opportunities to innovate with lots of fun technologies. We’re working on improving every aspect of the web; the dashboard, the archive, the blog network, you name it.  

Are you a senior JavaScript engineer and wanna be a part of this adventure? Come join Core Web! You’ll help create the building blocks with which a brand new modern Tumblr will be built. Your work will directly impact and define the user experience for millions of users and the development tools for a large number of product engineers across several teams at Tumblr!

Originally posted by successfulantfarmer

What you’ll do

We’re looking for an extraordinary senior JavaScript engineer who wants to take on the following challenges:

  • Keep making our build and deployment more delightful and futuristic
  • Help establish norms and standards for how this new web client should be architected, including setting JavaScript, CSS, performance and other best-practices, and introducing/creating the tools to achieve them
  • Internally and externally raising awareness around the work the team is doing by being active in the Open-source and engineering community 
  • Whatever else you think will help us create the highest quality web platform and development experience!

Who we’re looking for

An ideal team member is someone with:

  • Strong JavaScript and CSS fundamentals
  • Experience setting up Continuous Integration / Continuous Deploys
  • Expertise in build tools like Webpack, Parcel (or similar)
  • Pragmatism and the ability to decide what’s “good enough” (while planning ahead and knowing when to iterate)
  • An ability to independently drive projects
  • A desire to innovate and bring new things into the world
  • An understanding of code quality, unit test coverage, and performance
  • Empathy and the desire to elevate those around them
  • The belief that work is just as much about the journey as the destination

Our current toolkit

  • Webpack
  • ES6
  • React and React Router
  • CSS Modules
  • TypeScript
  • Jenkins and Jenkins pipelines
  • Docker
  • Node and Express
  • Kubernetes

If you’re interested, but your background does not include all of the above, please don’t let that hold you back. Let’s talk! To apply, follow the instructions at the bottom of our official job listings page! 

Originally posted by strangememories

We can’t wait to hear from you!

Come work with the amazing Tumblr Core Web team!!

May 31, 2018 66 notes
#job openings #working at tumblr #meanwhile at nuhq #engineering
NYC PHP Meetup May 30th

tl;dr: Come and join us for the New York PHP Meetup on May 30th with Mohannad Ali, the VP of Engineering Berlin @ HelloFresh on “Lessons in Engineering Leadership: Harnessing the power of narrative”, Rohit Sodhia on “JSON Web Tokens - Auth Made Easy(er)” and Michael Butler on “Speeding up PHPUnit with Paratest”

When

May 30th 2018, 7:00pm - 9pm - Doors open 6:30pm

Where

6th Floor, 770 Broadway (9th st. btw Broadway and University place), New York, NY 10003

Details

Our featured talk this month is presented by Mohannad Ali, the VP of Engineering Berlin @ HelloFresh (https://www.hellofresh.com/) titled “Lessons in Engineering Leadership: Harnessing the power of narrative”.

Summary: How I learned the importance of narrative at the workplace, and how it could become the most powerful tool to drive motivation, innovation, and purpose.

Check out the HelloFresh engineering blog.

HelloFresh will also kindly be providing some great giveaways for you to feast over, so don’t miss out!

Lightning talks

We also have 2 excellent lightening talks:

Rohit Sodhia on “JSON Web Tokens - Auth Made Easy(er)” and Michael Butler on “Speeding up PHPUnit with Paratest”.

Schedule

6:30pm - Doors open 

7:00 - 7:10pm - Intro & welcome to New York PHP + refreshments 

7:15 - 7:30pm - JSON Web Tokens - Auth Made Easy(er) - Rohit Sodhia 

7:35 - 7:50pm - Speeding up PHPUnit with Paratest - Michael Butler 

8:00 - 8:45pm - Lessons in Engineering Leadership: Harnessing the power of narrative - Mohannad Ali 

8:45pm - 9pm - Questions & closing 

After - Come and join us for a drink and chat at a local bar

What to bring

  • A great attitude about all things PHP.
  • Laptops are not necessary but we will have wifi capabilities if you want to hack along.
  • Tumblr is generously providing pizza 🍕 and 🍺 beer/drinks.

How to RSVP

Please use Meetup to RSVP.

May 23, 2018 15 notes
#meetup #php

April 2018

A Big New Beautiful Future for the Web at Tumblr

javascript:

In the ten years that Tumblr’s been around, a lot has changed in web technology. We’ve kept up, of course, but it’s always been a process of addition, layering one new technology on top of another. And what we were working with—a custom framework built on top of Backbone, messily entangled with a PHP backend and its associated templates—was becoming unmanageable. Our piecemeal conversions to new technologies meant we had thousands of ways posts were rendered (only a moderate exaggeration). And each of those had to be updated individually to support new features or design changes.

It was time to step back, survey the world of web technology, and clean house in a big way. That we could finally test some of the new tech we’ve been itching to use was just a little bonus.

We started by laying out our goals:

  • A web client codebase fully separated from the PHP codebase that gets its data from the API in the same way our mobile apps do
  • A development environment that’s as painless as possible
  • Dramatically improved performance
  • Isomorphic rendering
  • Robust testing tools
  • Built on a framework with a healthy and active community, with some critical mass of adoption

With those goals in mind, we spent the beginning of the year on research - figuring out what kinds of things people were building web apps with these days, tooling around with them ourselves, and trying to assess if they would be right for Tumblr. We landed, eventually, on React, with a Node server (running Express) to make isomorphism as easy as possible. On top of that, we’re using Cosmos for developing components, React Router for routing, and TypeScript to make our lives better in general. (My colleague Paul already wrote about what went into our decision to use TypeScript here.)

As if writing an entirely new stack wasn’t enough, we realized along the way that this was our perfect chance to start deploying containerized applications with Kubernetes, a first for Tumblr. We had never previously deployed a node application to production here, and didn’t have the infrastructure for it, so it was a perfect green field on which to build another new and exciting thing. There’ll be more to come later on Kubernetes.

So where are we now? Well, we’ve launched one page powered by this new app - image pages, like this - with more to come very soon. 

Though it may seem simple, there’s a whole new technological world between you clicking that link and seeing that page. There’s a ton more exciting stuff happening now and still to happen in the future, and we’re looking forward to sharing it here. Wanna get in on the action yourself? Come work with us: https://www.tumblr.com/jobs.

- Robbie Dawson / @idiot

Apr 10, 2018 148 notes
#dr phil m&m #computer #javascript #js #engineering #react #lol

March 2018

Using srcset and sizes to make responsive HTML5 images

javascript:

If you’ve tried to implement responsive retina images on the web, you’ve probably come across one of the many informative articles on the subject. Many of the posts I found about it are really great, but they downplay or overlook a point that I think is really important:

If you set up srcset and sizes, your browser will automatically download higher density images on retina devices, if they are available.

Let’s investigate how to do that.

What is srcset?

srcset is a list of image URLs with a descriptor. The descriptor can either be the image width (in the form of [width in pixels]w), or the screen pixel density that is best for the image (ex. 2x, 3x, etc). Here’s an example that uses image widths: srcset="image_20.jpg 20w, image_40.jpg 40w. Here is an example that uses screen pixel density: srcset="image_20.jpg 1x, image_40.jpg 2x.

Don’t be fooled by pixel density

To my surprise, you can’t combine image width and pixel density descriptors in the srcset list. In other words, something like this is invalid and your browser will silently fall back to the src url: srcset="image_20.jpg 20w 1x, image_40.jpg 40w 2x". So, how do you get images that are responsive based on image width and screen density?

When you use an image width descriptor, the image size is chosen based on the viewport width. What if you need to display your image in a smaller size than the entire width of the viewport? sizes can help.

Sizes

sizes is a list of optional queries and sizes that correspond to the width of the image on screen. For example, sizes="(max-width: 540px) 100vw, 540px" means that the image will be displayed at 100% of the viewport width for screens up to 540px wide, and at 540px for screens 541px and wider.

Retina images, automatically

The ✨🎩 magic 🎩✨ part of all of this is when your browser chooses the image from srcset to fit the size at which it will be displayed, it automatically factors in screen density. So if your screen density is 1x, on a device with a viewport that is larger than 540px wide, you will get the size greater than or equal to 540w. But if your screen density is 2x, on a device with a viewport that is larger than 540px wide, you will get the size greater than or equal to 1080w.

You can see it in action in this Codepen. To test srcset and sizes, you need to request the page with a new incognito window each time, so that you don’t load images from your browser cache. Try it with:

  • a wide viewport with 1x pixel density (Apple Thunderbolt Display, most random external monitors) to get the 540w image
  • a wide viewport with 2x pixel density (MacBook Pro display) to get the 1280w image
  • a narrow viewport with 1x pixel density to get the 500w or 250w image (depending on how small your viewport is)

How we use this at Tumblr

Once you have a good base of srcset and sizes, it’s pretty simple to modify sizes for different layouts. Consider Tumblr photosets: some rows may have 1 image, some rows may have 3 images. We can simply scale down the values in sizes by the number of images per row, and the browser will automatically figure out which image is the correct size. Here is an example on Codepen.

An example row in a photoset might look like this:


<div class="row">
  <div class="item">
    <img
      src="image1_540.gif"
      srcset="image1_250.gif 250w, image1_540.gif 540w"
      sizes="(max-width: 818px) 50vw, 270px" />
  </div>
  <div class="item">
    <img
      src="image2_540.gif"
      srcset="image2_250.gif 250w, image2_540.gif 540w"
      sizes="(max-width: 818px) 50vw, 270px" />
  </div>
</div>

With simple markup like this, your browser can figure out which image size will be best to display in the photoset row, based on the viewport width and display pixel density. It just goes to show that if you set up srcset and sizes correctly, the browser will take care of retina images automatically.

– Paul Rehkugler (@pr)

Originally posted by strangememories

Look at this amazing work!

Mar 6, 2018 74 notes
#javascript #react

February 2018

NYC PHP Reboot - First up "Getting Specific About APIs" with Phil Sturgeon

tl;dr Come and join us for the NYC PHP meetup @ Tumblr on March 1st 2018.

When

March 1st 2018, 7pm - 9pm

Where

35 East 21st Street (21st & Broadway), New York, 10010

Details

The monthly NYC PHP Meetup is getting rebooted in 2018! The first meetup will be hosted at Tumblr HQ in New York City, with Phil Sturgeon as headline speaker and two or three lightning talks.

About Phil Sturgeon

When he’s not talking about falling off bikes Phil Sturgeon is passionate about building great APIs. Sometimes scientists forget to label their units and they crash satellites into planets. Phil’s science teacher always told him to label his units and JSON Schema is how you do that in a HTTP API, even AMQP too! Come and learn about the future of APIs and other PHP topics.

We want you - for lightning talks

We are looking for 2-3 lightning (10-15 min) talks about anything PHP related. If you’re interested, please get in touch with Oli Griffiths via Meetup, or @oligriffiths on Twitter. This is a great way to get into the speaking community, and to test out material you might like to submit to conferences.

What to bring

  • A great attitude to all things PHP.
  • Laptops are not necessary but we will have wifi capabilities if you want to hack along.
  • Tumblr is generously providing pizza 🍕 and 🍺 beer/drinks.

How to RSVP

Please use Meetup to RSVP. Feel free to bring a guest if they’re not using Meetup, or encourage them to join so they can stay up-to-date.

- @oli

Feb 9, 2018 28 notes
#meetup #php

January 2018

How I review code

cyle:

Reviewing code is one of the most important parts of an engineer’s job at Tumblr, even more so than writing code. Our codebases are shared by hundreds of engineers, so it’s critical to make sure we’re not just writing the best code we can, but that the code being written can be understood by others. Taking the time to review someone else’s code is the most critical opportunity to ensure all of that is happening.

At Tumblr, every code change happens in a Pull Request on an internal Github instance. We have repositories for the PHP backend, our database schemas, our iOS (Swift/Obj-C) and Android (Java/Kotlin) mobile apps, infrastructure projects written in Go, C/C++, Lua, Ruby, Perl, and many other projects written in Scala, Node.js, Python, and more. All of our code repositories rely on authors to write Pull Requests and get approvals from their peers before merging their changes to the master branch and deploying to production where real people interact with it.

How I personally review code has changed considerably over my few years at Tumblr. Before working at Tumblr, I wrote code mostly by myself and reviewed code with a very small set of people. Shifting to a huge codebase with hundreds of contributors was a big change. Thankfully I’ve had some good teachers. I went from reviewing maybe one pull request a month to currently reviewing an average of 25 pull requests a week. Here are some of the principles that help me keep my reviews timely and helpful.

Review the code with its author in mind

The first thing I ask myself after a review has been requested of me is who wrote this? Are they a junior or senior engineer? Are they new to this codebase or a seasoned veteran? Have I ever reviewed their code before? Am I familiar with the project this code change contributes to?

When I’m reviewing the code of someone I work with closely, I probably know pretty well what their thinking was when they wrote it, and I have an idea of what experiences they’ve been through. Junior engineers sometimes need a little more hand-holding, which usually means giving them more help with code examples and references. Senior engineers sometimes need to be reminded that highly performant, abstract, or clever code is often difficult to read and understand later, which usually means asking them for more inline comments and documentation.

It’s also fundamentally important to review the code as if anyone could read the review you’re about to submit, not just the author. There are two main reasons for this. First, some people learn by reading the reviews that other engineers write; as a more junior engineer that’s exactly how I found out the most about the intricacies of Tumblr’s codebase. Also, in six months’ time it’s very likely you may be looking at this code again to figure out how it works. Having a helpful code review of it around can give some insight into the decisions that went into why it works the way it does.

Review the code with everyone else in mind, too

The core of my review, no matter who is writing the code change, centers around being able to understand the code itself and the motivations and context around it. To me, ideally anyone should be able to pop into a pull request and expect enough context to understand the code change and why it was done the way it was done and how it works the way it works. This is especially important in an old, shared codebase, where someone three years from now may be looking at your PR to figure out why you chose to do what you did. If that’s not included, or if there aren’t at least links out to the relevant context, something is wrong. More detail is always better.

I don’t worry as much about code style or syntax itself, as we have automated processes to ensure that new or changed code conforms to our agreed-upon coding standards. Similarly to what I wrote about in how I code now, I look for code that is well-documented (both inline and externally), and code that is clear rather than clever. I’d rather read ten lines of verbose-but-understandable code than someone’s ninja-tastic one-liner that involves four nested ternaries. Especially if the person writing the code has been around the block a few times and been burned themselves by old, undocumented, clever code.

Once I feel like I can understand the code change, I try to put myself in the shoes of someone who doesn’t deal with this area of the codebase very often (which may be the case for me at the time!) and think of how to review the code to help make it clear for them. I try to think of someone new being hired six months from now, looking at this code, wondering how it works.

Understand the PR’s scope

Sometimes not everything can get done in one pull request. At Tumblr we try to keep our PRs small so they can be reviewed quickly and safely, rather than bundling a ton of hard-to-review work into a 5,000-line-change PR. Because of this, sometimes the work has to be broken up into chunks, with PRs that build a foundation and lead to future PRs with finished implementations.

Or, alternatively, it’s common for evergreen codepaths to have known issues or work that’s been ticketed for future sprints, so it’s become a good, common practice to leave a @todo in the code with the name of the ticket where that todo will get done. That way we can unblock code changes from having to be totally complete within one pull request.

Stay on top of the whole review process

The number one thing that helps me review code in a timely manner, and stay on top of updates about PRs, is email. I check every Github email I get; I make sure that I don’t get notified for everything that happens in the repo, but I do get every email that happens relating to a PR I’m associated with. This helps me stay on top of every step in the review process, because it’s almost always a back-and-forth that ideally shouldn’t last more than a day.

At Tumblr, most of our reviewers are selected by automated round-robin assignment when the PR author is ready to receive reviews. That assignment triggers an email and subscribes me to everything that happens relating to that PR. From there, it’s on me to stay on top of my email and make sure that I not only allocate time to do the review as soon as possible, but follow up on the PR if I leave a review and the author updates it in response to my review.

Remember to be a human

The most important advice for reviewing code (and, in other ways, writing code) is to remember to be a human. Remember that the person who wrote the code you’re reviewing is also a human. Give them the benefit of the doubt. Be nice when you write a suggestion, or have a question, or find an edge case that they don’t seem to have covered. Even if they’re a seasoned veteran coder who has written bulletproof performant code for years, treat them like a person who makes mistakes sometimes. Even if they’re someone you work with every day and you feel comfortable cracking jokes at their expense, understand that a new person might not understand.

Remember that shared, living codebases are often hectic and strange, especially ones that have been around for a decade. Remember that sometimes things are in a rush, so you can only do the best you can. We can’t halt everything in the name of perfect code, but we should make sure that everyone is doing the best they can, whether we’re writing or reviewing code.

Jan 23, 2018 216 notes
#engineering #code review

September 2017

Flow and TypeScript

javascript:

One of the Core Web team’s goals at Tumblr is to reduce the number of runtime issues that we see in our React codebase. To help move some of those issues from runtime to compile time, I evaluated the two leading type systems, Flow and TypeScript, to see if they could give us more type safety. I did a bit of background reading about the differences between Flow and TypeScript to see what the community had to say about them.

Background Reading

TypeScript, Flow and the Importance of Toolchains over Tools by Ben Teese

This post claims that Flow and TypeScript are similar enough that you should choose whichever of them is easier to integrate with your other tools. For Angular development, it recommends using TypeScript; for React, Flow.

TypeScript vs. Flow by Marius Schulz

This post claims that both TypeScript and Flow are equally good.

Flow vs. Typescript by Jan Varwig

This post outlines the author’s experience with using Flow in a React codebase. It advocates switching from Flow to TypeScript because of Flow’s unhelpful error messages, bad tooling, and propensity to spread untyped code. It also claims that most of the type annotations are able to be shared between Flow and TypeScript with only minor changes.

Type Systems for JavaScript by Oliver Zeigermann

This slideshow shows many differences around the philosophies and goals of TypeScript and Flow, and it gives detailed explanations in the differences between the two type systems. It explains IDE support and how to get access to third-party type definitions.

Lack of Consensus

It seems like many people have differing opinions about which type system is better for a React codebase. Because there wasn’t a broad consensus across the community, I decided to get some first-hand experience with each of these tools to see which one would be most practical and helpful for use at Tumblr.

Project Setup

I worked with a sample application to vet Flow and TypeScript. The application I used was Microsoft’s TypeScript React Starter. It uses a custom fork of create-react-app to get TypeScript set up. When testing out Flow, I used the standard version of create-react-app and used the source code from this exercise.

For the most part, Flow and TypeScript are basically interchangeable. I was able to reuse most of the source code between both projects with only minor changes. Here are some examples of changes I needed to make to get my TypeScript code working with Flow:

  • Flow requires that types are imported using import type where TypeScript re-uses import.
  • Some generic type constraints are different in redux’s type declarations between Flow and TypeScript, so I dropped the generic constraint for Flow.
  • Types cannot have the same name as constants, so I had to rename a few small things (see below).

Testing

After I got the project prepared I set up the following situations to see which tool performed better. These are my assumptions of the most common situations in which a type checker will help when writing React code on a day-to-day basis.

Handling an Unnecessary Case in a Switch

TypeScript

TypeScript realizes that 'not_real' is not a possible case for the switch.

Flow

Flow does not detect any issue.

Declaring Variables with Same Name as Type

TypeScript

TypeScript allows types to have the same name as constants, and it allows Command-clicking on the types to see their declarations.

Flow

Flow requires types and constants to have different names. In this case, I needed to rename the type to INCREMENT_ENTHUSIASM_T to appease Flow’s type checker.

Returning Incorrect Type from Function

TypeScript

[ts]
    Type '{ enthusiasmLevel: string; languageName: string; }' is not assignable to type 'StoreState'.
      Types of property 'enthusiasmLevel' are incompatible.
        Type 'string' is not assignable to type 'number'.

Flow 0.52

[flow] object literal (This type is incompatible with the expected return type of object type Property `enthusiasmLevel` is incompatible:)

Flow 0.53

[flow] property `enthusiasmLevel` of StoreState (Property not found in number) [flow] property `languageName` of StoreState (Property not found in number)

Missing Required Props When Instantiating a Component

TypeScript

TypeScript shows the error at the site where the properties are missing with the error:

[ts] Type '{}' is not assignable to type 'IntrinsicAttributes & Props'. Type '{}' is not assignable to type 'Props'. Property 'name' is missing in type '{}'.

Flow

Flow shows the error within the component where the property will be used, with no way to discover which call site is missing a property. This can be very confusing in codebases that have lots of reusable components. Flow displays this error:

[flow] property `name` of Props (Property not found in props of React element `Hello`)

Code Safety

TypeScript

TypeScript allows enforcing full type coverage on .ts files with the noImplicitAny flag in the tsconfig.

Flow

Flow provides a code coverage plugin so that you can see which lines are implicitly not typed.

Other Considerations

Flow has the most React community support and tooling, so there is much more documentation about how to get Flow and React working together. TypeScript is more popular with Angular developers. Choosing TypeScript may be breaking from community standards, so we may have more issues that don’t have a simple answer on Google.

Conclusion

I concluded that we should use TypeScript because it seems easier to work with. My experience seems to line up with this blog post. It has better error messages to debug type issues and its integration with VSCode makes coding more pleasant and transparent. If this ends up being the wrong choice later on, our codebase will be portable to Flow with some minor changes.

Shortly after arriving at this conclusion, Flow 0.53 was released and a blog post on Medium published touting it’s “even better support for React”. However, after running through the test cases above, I only found one case where Flow had improved its error messaging. TypeScript still seems like the more reliable, easier to use solution.

Further Reading

To continue our journey with TypeScript, I will need to integrate it into our codebase and teach it to the rest of our frontend developers. Getting started with TypeScript and React and Setting up a new Typescript 1.9 and React project look like they will be helpful articles when integrating TypeScript into our codebase. TypeScript Deep Dive looks like a great book for JavaScript developers that aren’t familar with TypeScript.

– Paul Rehkugler (@pr)

Sep 12, 2017 116 notes
#javascript #tumblr engineering

August 2017

Building the Tumblr Neue Post Format

We’ve been looking at improving the posting and reblogging experience in our mobile apps for a long time. As many of our power users and public API consumers are aware, posts on Tumblr are stored and served in a sanitized HTML format. This choice made the most sense when Tumblr was originally built, when using Tumblr meant visiting via a web browser on your computer on the information superhighway back in 2007.

Storing post content primarily as HTML has remained our standard for ten years; there are a significant number of assumptions in our codebase about posts being primarily HTML. To compound this, when we want to change something about how posts are made or stored, we have to think in terms of the 150 billion posts on Tumblr and the billion new posts made every month. We have to spend a lot of time thinking about that scale whenever we consider how to make posting on Tumblr a better experience.

Over a year ago, Tumblr Engineering came up with a very ambitious idea: ditch HTML entirely and move to a brand new format for how posts are created and stored. HTML is fine, but its scope is limited as it was intended for the browser, long before the concept of mobile apps existed. Conversely, the JSON standard has been heavily favored by APIs and mobile development for years, and feels much cleaner and more flexible than HTML. We can apply an extensible format and schema with JSON easier than we can with HTML.

With this in mind, we’ve chosen to write a brand new JSON-based specification for post content. We’re calling it the Tumblr Neue Post Format, or simply NPF. The NPF specification breaks apart post content into discreet content blocks, each defined by a type field. All of our existing post content easily fits into this kind of specification, affording backwards-compatibility with the billions of posts on Tumblr.

For example, right now when you add text to a post, we store and serve:

<p>Some text in a post!<p>

With NPF, the same thing is created and served this way:

{
  "type": "text",
  "text": "Some text in a post!"
}

Those two representations are fully interchangeable, but we begin to gain advantages with JSON for things HTML cannot do well, providing flexibility and extensibility for future integrations. The power of NPF really becomes critical when we want to build content blocks for Tumblr that cannot be easily represented with HTML, such as a physical location:

{
  "type": "location",
  "latitude": 40.739485,
  "longitude": -73.988402,
  "map_style": "quirky"
}

This new JSON specification also gives us the benefit of not having to worry as much about potential security risks in malicious HTML payloads in post content. Moving from HTML to JSON allows us to have safer, more injection-proof defaults, and prevents us from having to do heavy DOM parsing at runtime, which means improved performance of our backend and mobile apps. With NPF, posting and viewing posts on Tumblr should be considerably faster and safer.

Our work so far with the NPF specification has been to reach feature parity with the rich text editor available to Tumblr users on the web, as well as extend those basic options with new ones, such as fun new text styles:

{
  "type": "text",
  "text": "Oh, worm?",
  "subtype": "quirky"
}

Our initial release includes support for text blocks (with inline formatting), GIF search blocks, and image upload blocks. All of these options are available in our mobile apps via the Text, Quote, and Chat post forms, as well as when you reblog a post. Yes, you can now upload images in a reblog on mobile.

Future releases of the mobile apps will continue to close the gap with our other post options as we build NPF support for link blocks, video upload blocks, third-party video and audio blocks, and more. We also plan on allowing third-party API consumers to view and create posts using the new specification sometime in the future.

- me (@cyle) and @noisysocks (with love for @keithmcknight who started the original NPF spec)

Aug 31, 2017 331 notes
#engineering #neue post format #new post forms #fancy font #finally

June 2017

Jetpants Integration Testing

Tumblr is a big user of MySQL, and MySQL automation at Tumblr is centered around a tool we built called Jetpants. Jetpants does an incredible job making risky operations safe and reliable, even fairly complex tasks like replacing failed master servers, or splitting a shard.

While Jetpants is an incredibly effective and valuable tool for Tumblr’s day-to-day operation, it has remained very difficult to implement a meaningful testing framework. Integration testing at this level is very challenging. In this article I’ll go through these challenges and how we’ve tackled them at Tumblr.

Requirements

Jetpants operates under the assumption you’re managing MySQL daemons on a fully functional host, and that it can:

  • ssh to the target system
  • manage processes via service or systemctl commands
  • copy data around between systems
  • allocate spare servers from the asset management system, Collins

Right away this means we have some challenges with respect to infrastructure testing:

  • We need a Collins deployment
  • We need an environment with spare servers running MySQL
  • We need these spare servers to actually be servers, not light-weight Docker containers

Problems

For most of the life of Jetpants, these requirements were fulfilled using actual hardware in a testing pool in our datacenter. This wasn’t ideal, however. Running a test which allocated more replicas, or tested shard splitting means using an extensive amount of real hardware that takes hours to reprovision. Testing changes to the Collins code meant talking to a real Collins deployment. What if we messed up?

This test strategy has all the hallmarks of manual testing. It doesn’t prevent regressions. Test coverage of our featureset is spotty based on what was interesting at the time. Public contributors can’t run the tests.

For a new user to pick up Jetpants and Collins, it can be very difficult to get started. Jetpants requires Collins to be configured it certain ways that aren’t publicly documented. When I first built the testing environment, I had to regularly compare what I had to our actual deployment to figure out why Jetpants wasn’t working correctly.

Solution

During a Tumblr hackathon earlier this year, I devoted my time to developing an isolated, automatic testing system. We have since integrated this system directly into Jetpants and are using it in our day-to-day development and testing.

Our test framework is based on the NixOS test framework, the same framework NixOS uses to verify it is safe to release a new version. These tests use QEMU to start an isolated environment of at least one VM, and NixOS configuration to build the VMs.

Our testing framework adds lots of tooling on top to let us create robust tests. By default, a test has a running Collins instance, a master database server, and one replica. Simple options allow provisioning additional spares or additional replicas on that initial master.

Below is an example test we’ve written for performing a dead master promotion. This is where the current master database is dead, and we replace it with one of the existing replicas.

Here you can see what a test looks like, and how easily we can express the components and phases of our tests:

import ../make-test.nix  ({ helpers, ... }:
{
  name = "shard-dead-master-promotion";
  starting-slave-dbs = 2;

  test-phases = with helpers; [
    (jetpants-phase "shutdown-master" ''
      Jetpants.pool("POSTS-1-INFINITY").master.stop_mysql
    '')
    (phase "jetpants-promotion" ''
      (
        echo "YES" # Approve for promotion
        echo "YES" # Approve after summary output. Confirmation.
      ) | jetpants promotion --demote=10.50.2.10 --promote=10.50.2.11
    '')
    (assert-shard-master "POSTS-1-INFINITY" "10.50.2.11")
    (assert-shard-slave "POSTS-1-INFINITY" "10.50.2.12")
 ];
})

Running this test first provisions the base environment, by

  1. starting Collins
  2. starting 3 Linux systems running MySQL
  3. creating a master-replica relationship between one MySQL server as a master, and two MySQL servers as replicas, then loading in a default schema, and naming it the POSTS-1-INFINITY shard

Once all this preparation is done, our test phases begin.

First we shut down the current master, to simulate a dead master situation. We then run the jetpants promotion command which will replace the old master (10.50.2.10) with a new master we have selected, 10.50.2.11. jetpants promotion will prompt for confirmations, so we echo approvals to its stdin.

We continue by validating that the jetpants command did what we expected, and verifying the master and slaves.

Initial Results

Through this testing, we have already identified and fixed several race conditions and very old interface bugs. Nix’s functional nature allows us to create and tear down test VMs in minutes, as it isn’t a convergence-based configuration management tool. The stability of the test framework, and consistency of its results have allowed us to more aggressively change the underlying code in Jetpants while remaining confident our tools will work correctly during our day-to-day production maintenance.

Jetpants has been under continuous and vigorous development at Tumblr for many years now, and I’m excited about where the future will be taking MySQL automation at Tumblr.

- @beta

Jun 15, 2017 43 notes
#engineering #mysql #database #jetpants
Introducing Graywater for Android

Introducing Graywater, Tumblr’s framework for decomposing complex items in a RecyclerView list in order to improve scroll performance, reduce memory usage, and lay a foundation for a more composition-oriented approach to building lists. With Graywater, the app now scrolls faster and crashes less often, and it also gives us a solid foundation for building new features faster and better than before.

On screens that display posts, such as the dashboard, the Tumblr Android app customizes one adapter across multiple screens. This approach results in a complex adapter, and over time, our previous solution became difficult to manage and hard to reason about since there was no consistent place for screen-specific behavior.

Furthermore, each post type had its own layout and viewholder, which meant that once a user encountered a post type they hadn’t seen on that screen before, the entire post had to go through the inflate, layout, and draw process. Once offscreen, the post would take up large chunk of memory in the RecyclerView pool.

Graywater solves this by rendering only the parts of a post that are visible and reusing the parts of a post that appear in other posts, such as the header and footer. By breaking up a large post into smaller components, the UI thread has to do less on each scroll. Even though there are more view types, each individual view type is smaller, so memory usage is lower.

For example, a photoset post may be composed of ten photos, one after another. In the previous architecture, a photoset layout with headers and footers would be inflated and the photo views added in afterwards. If the viewholder is recycled and the next photoset post only has one photo, the extra photo views are discarded. With Graywater, each individual photo view is recycled separately, which allows RecyclerView to reuse the photo views that appeared earlier in the photoset.

This idea is based off of Facebook’s post on a faster news feed and Components for Android, which have been open-sourced as Litho.

Graywater differs from other RecyclerView libraries by being small (a single file!) and flexible enough to work within your model architecture. For libraries like Epoxy and Groupie to accomplish sub-item recycling, complex items like posts need to be decomposed into smaller viewmodels beforehand. For Litho to flatten view hierarchies and perform fine-grained recycling, existing XML layouts need to be converted to layout specs.

By converting to Graywater, we’ve been able to reduce OutOfMemory errors by 90% and dramatically improve scroll performance. It is now much easier to add new item types that are composed of preexisting post components. We have also migrated screen-specific logic to the screen itself by injecting the customized component into the adapter. By open-sourcing Graywater, we’re hoping the Android community will achieve similar performance and architecture gains, and we’re excited to hear what the community builds next!

- @dreamynomad

Jun 7, 2017 120 notes
#android #engineering #graywater

April 2017

Tumblr Themes & React and Redux: Part 1 - Setup and the Initial State

As a platform that prides itself on being a home for artists and creatives alike, it only makes sense that we allow our users to fully customize their Tumblrs to fully express themselves. Here at Tumblr, the world is your oyster not only in terms of looks but also in how you create your theme. I wanted to demonstrate how you too can develop a theme using Redux and React. Since there are plenty of docs and tutorials on how to use those libraries themselves, I will briefly describe how I got the libraries to work with the Tumblr theme engine, and share some handy tips that made developing more efficient and more enjoyable.

If you follow the ever changing landscape of JavaScript, then you’ve at least heard of these two libraries. Prior to building the Post-It-Forward theme, I only knew of them by name but never got the chance to actually use them. Developers couldn’t get enough of how React made it easy to create and reuse components. Many also praise how elegantly React manages and renders views, especially when paired with Redux for state management. All of this sounded great. I wanted to turn this project into a learning experience. I thought, “why not?” and gave it a shot.

An Extremely Brief Introduction to Tumblr Themes

The way themes work on Tumblr is that we have a theme engine that provides special types of operators. These operators insert dynamic data, such as your Tumblr’s title or description, or are blocks that serve as conditionals for rendering a block of HTML, like the “Next Page” link.

My HTML started off a little something like this:

<!DOCTYPE html>
    <head>
    <title>{Title}</title>
        <style></style>
    </head>
    <body>
        <div id="post-it-forward-root"></div>
    </body>
</html>

As you can see, {Title} is a variable that will return the title of the Tumblr. The point of entry for this theme is the <div> element with the #post-it-forward-root ID. In your index.js file you’ll reference this DOM element in your ReactDom.render() method. If you want to learn more about the theme engine, head over to our Theme Docs

Creating the Initial State

To get things started, we need to create an initial state. How do we introduce this initial state if we have to rely on the theme engine to give us all our data? How do we get the data from HTML land to JS land? Well, here’s one way of doing it:

<script type="text/javascript">
    (function(root) {
        var ensureString = function(str) {
            return !str ? '' : str;
        };

        var basicVariables = {
            title: ensureString({JSTitle}),
            name: ensureString({JSName}),
                        description: ensureString({JSDescription}),
                        metaDescription: ensureString({JSMetaDescription}),
                        blogUrl: ensureString({JSBlogURL}),
                        rss: ensureString({JSRSS}),
            favicon: ensureString({JSFavicon}),
            customCss: ensureString({JSCustomCSS}),
            isPermalinkPage: !!ensureString(/*{block:PermalinkPage}*/true/*{/block:PermalinkPage}*/),
            isIndexPage: !!ensureString(/*{block:IndexPage}*/true/*{/block:IndexPage}*/),
            /*{block:PostTitle}*/
            postTitle: ensureString({JSPostTitle}),
            /*{/block:PostTitle}*/
            /*{block:PostSummary}*/
            postSummary: ensureString({JSPostSummary}),
            /*{/block:PostSummary}*/
            portraitUrl16: ensureString({JSPortraitURL-16}),
            portraitUrl24: ensureString({JSPortraitURL-24}),
            portraitUrl30: ensureString({JSPortraitURL-30}),
            portraitUrl40: ensureString({JSPortraitURL-40}),
            portraitUrl48: ensureString({JSPortraitURL-48}),
            portraitUrl64: ensureString({JSPortraitURL-64}),
            portraitUrl96: ensureString({JSPortraitURL-96}),
            portraitUrl128: ensureString({JSPortraitURL-128}),
            copyrightYears: ensureString({JSCopyrightYears}),
            isSearchPage: !!ensureString(/*{block:SearchPage}*/true/*{/block:SearchPage}*/),
            searchQuery: ensureString({JSSearchQuery}),
            safeSearchQuery: ensureString({JSURLSafeSearchQuery}),
            searchPlaceHolder: ensureString('{lang:Search Blog}'),
            noSearchResults: !!ensureString(/*{block:NoSearchResults}*/true/*{/block:NoSearchResults}*/),
        };

        root.tumblrData = {
            basicVariables: basicVariables,
            };
    })(this);
</script>

This creates a tumblrData attribute on the browser’s window object.

Sometimes the theme engine returns nothing for a particular variable if it’s not available. For example, if I made a post that does not have a title, the final root.tumblrData object will not have postTitle as a key. Sometimes the key will be available but the theme engine returned an empty value for it. For those cases, I created a helper method called ensureString() that turns those empty values into empty strings. Sometimes you might need a boolean value. In those cases, I’ll enter the conditional variables from the theme engine into the helper method to get the boolean value from it.

Once you’ve set up your initial state make sure that you place this script tag before the script tag that references the rest of your code that should be compiled and minified and uploaded through the asset uploader that the Tumblr text editor provides. This ensures that the tumblrData is accessible through the window object by the time the React app gets initiated.

tumblrData should look something like this:

const tumblrData = {
    basicVariables: {
        blogUrl: "https://mentalhealthquilt.tumblr.com/",
        copyrightYears: "2016–2017",
        customCss: "",
                description: "Mental Health Quilt",
        favicon: "https://68.media.tumblr.com/avatar_c402eedfb9d5_128.png",
        isIndexPage: true,
        isPermalinkPage: false,
        isSearchPage: false,
        metaDescription: "Mental Health Quilt",
        name: "mentalhealthquilt",
        noSearchResults: false,
        portraitUrl16: "https://68.media.tumblr.com/avatar_c402eedfb9d5_16.png",
        portraitUrl24: "https://68.media.tumblr.com/avatar_c402eedfb9d5_24.png",
        portraitUrl30: "https://68.media.tumblr.com/avatar_c402eedfb9d5_30.png",
        portraitUrl40: "https://68.media.tumblr.com/avatar_c402eedfb9d5_40.png",
        portraitUrl48: "https://68.media.tumblr.com/avatar_c402eedfb9d5_48.png",
        portraitUrl64: "https://68.media.tumblr.com/avatar_c402eedfb9d5_64.png",
        portraitUrl96: "https://68.media.tumblr.com/avatar_c402eedfb9d5_96.png",
        portraitUrl128: "https://68.media.tumblr.com/avatar_c402eedfb9d5_128.png",
        rss: "https://mentalhealthquilt.tumblr.com/rss",
        safeSearchQuery: "",
        searchPlaceHolder: "Search mentalhealthquilt",
        searchQuery: "",
        title: "Mental Health Quilt",
    },
}

Now we have the data that the theme engine gave us in a format that React and Redux can work with.

If you are new to these libraries, I highly recommend following the simple Todo App Tutorial that is on the Redux website. They do a wonderful job of explaining the process as you build the app.

Helpful Tips

Setting up a local server will make developing way faster than the current setup. If you’re using both the “webpack” and “webpack-dev-server” packages, in your package.json file under scripts you can place something like this in it:

In your package.json file

...
"scripts": {
    "local-server": "NODE_ENV=development webpack-dev-server --config path/to/webpack.config.js --port=3000 --inline --hot"
},
...

To run that script, in the terminal you will type this command:

> npm run local-server

In the Tumblr editor, be sure to replace your script tags referencing these external files like so:

<!DOCTYPE html>
        <head>
                <title>{Title}</title>
                <link rel="stylesheet" type="text/css" href="http://localhost:3000/path/to/prod/index.css">
        </head>
        <body>
                <div id="post-it-forward-root"></div>
                <script type="text/javascript">
                        // where the tumblrData gets created
                </script>
                <script src="http://localhost:3000/path/to/prod/index.js"></script>
        </body>
</html>

Once you run that script, it’ll enable live reload so that every time you save a .js_.css_.scss/etc. file, it’ll rebuild the assets and refresh your Tumblr blog for you. This is way faster than having to re-upload your assets every time you make a change, no matter how small. Just remember to return your script and style references to the uploaded assets when you’re done working. Localhost is only for development.

You could also add the Redux logger middleware to your project during development so that you can view how the state changes as you fire off different actions. For more information on how to set this up, the Redux Logger Github is a great resource.

Summary

Building a Tumblr theme using Redux and React is possible! Not only is there a workflow that makes development much faster, but it’s also a great way to flex your web development muscles. You can add more to the user experience of your Tumblr now that you have the world of JavaScript at your fingertips. Go forth and make some awesome themes!

Stay tuned for part 2 that will cover paginating.

- @0xmichelle

Apr 6, 2017 377 notes
#engineering #react #redux #theme engine #javascript #tumblr themes

March 2017

Play
Mar 23, 2017 35 notes
#swift #prototyping

February 2017

Play
Feb 23, 2017 114 notes
#swift #networking
How I Code Now

cyle:

I’ve learned a lot about how to be a better engineer after almost two years of writing code at Tumblr. The majority of Tumblr is built on a few massive shared codebases, so I’ve learned that the strength of the product is only as good as our collective ability to write code for each other. And we ship a lot of code all the time—we have engineers writing code, getting it reviewed, and deploying it to production within the first few days of being on the job.

I’ve found that coding at scale is more social than technical, and this is a very good thing. When writing code in a large scale environment with a codebase shared by more than a handful of people, I’m not writing code just for the computer to read anymore: I’m writing code for the dozens of other engineers who share the codebase with me. At some companies, and for some open source projects, a codebase can be shared by hundreds or thousands of people; your experience with another person may be solely through their code or code review. With this in mind, it’s extremely important to have good, humanist coding practices.

A humanist coding practice means my code is easy to read by anyone who shares the codebase with me. My code is explained not only by the way it’s written (the literal syntax, structure, and variable naming) and the unit tests I’ve written for it, but also by documenting it inline with comments. In my world, there can never be too many comments explaining how something works. Documenting the internals of my code is just as important as documenting its interface. Too often engineers focus their documentation effort into making clear the way to use their code without spending any time documenting how their code actually works.

It’s similar to the idea of “good taste” when coding. While it’s important to keep complexity low and efficiency high, it’s even more important to keep readability (by humans) high. If I write code in a shared codebase that’s highly performant, but nobody else can understand it, is it really all that useful at the end of the day? Almost never. While it’s true that I’m writing code to be performant on a machine, my first priority should be to make sure my code is maintainable by other people. Every engineer needs to be able to take a vacation and feel confident that someone else can fix a bug in their code.

Keep reading

Feb 7, 2017 233 notes
#programming #engineering

December 2016

Golang and The Tumblr API

You’ve been asking for an official Golang wrapper for the Tumblr API. The wait is over! We are thrilled to unveil two new repositories on our GitHub page which can be the gateway to the Tumblr API in your Go project.

  • API Endpoints Wrapper
  • API Client

Why Two Repos

We’ve tried to structure the wrapper in a way that is as flexible as possible so we’ve put the meat of the library in one repo that contains the code for creating requests and parsing the responses, and interacts with an interface that implements methods for making basic REST requests.

The second repo is an implementation of that interface with external dependencies used to sign requests using OAuth. If you do not wish to include these dependencies, you may write your own implementation of the ClientInterface and have the wrapper library use that client instead.

Handling Dynamic Response Types

Go is a strictly typed language including the data structures you marshal JSON responses into. This means that the library could have surfaced response data as a map of string => interface{} generics which would require the engineer to further cast into an int, string, another map of string => interface{}, etc. The API Team decided to make it more convenient for you by providing typed response values from various endpoints.

If you have used the Tumblr API, you’ll know that our Post object is highly variant in what properties and types are returned based on the post type. This proved to be a challenge in codifying the response data. In Go, you’d hope to simply be able to define a dashboard response as an array of posts

type Dashboard struct {
  // ... other properties
  Posts []Post `json:"posts"`
}

However this would mean we’d need a general Post struct type with the union of all possible properties on a Post across all post types. Further complicating this approach, we found that some properties with the same name have different types across post types. The highest profile example: an Audio post’s player property is a string of HTML while a Video post’s player property is an array of embed strings. Of course we could type any property with such conflicts as interface{} but then we’re back to the same problem as before where the engineer then has to cast values to effectively use them.

Doing Work So You Don’t Have To

Instead, we decided any array of posts could in fact be represented as an array of PostInterfaces. When decoding a response, we scan through each post in the response and create a correspondingly typed instance in an array, and return the array of instances as an array of PostInterfaces. Then, when marshalling the JSON into the array, the data fills in to the proper places with the proper types. The end user can then interact with the array of PostInterface instances by accessing universal properties (those that exist on any post type) with ease. If they wish to use a type-specific property, they can cast an instance to a specific post type once, and use all the typed properties afterward.

This can be especially convenient when paired with Go’s HTML templating system:

snippet.go

// previously, we have some `var response http.ResponseWriter`
client := tumblrclient.NewClientWithToken(
    // ... auth data
)

if t,err := template.New("posts").ParseFiles("post.tmpl"); err == nil {
    if dash,err := client.GetDashboard(); err == nil {
        for _,p := range dash.Posts {
            t.ExecuteTemplate(response, p.GetSelf().Type, p.GetSelf())
        }
    }
}

post.tmpl

{{define "text"}}
<div>
    {{.Body | html}}
</div>
{{end}}
{{define "photo"}}
<div>
    Post: {{.Type}}
</div>
{{end}}
{{define "video"}}
<div>
    Post: {{.Type}}
</div>
{{end}}
{{define "audio"}}
<div>
    Post: {{.Type}}
</div>
{{end}}
{{define "quote"}}
<div>
    Post: {{.Type}}
</div>
{{end}}
{{define "chat"}}
<div>
    Post: {{.Type}}
</div>
{{end}}
{{define "answer"}}
<div>
    Post: {{.Type}}
</div>
{{end}}
{{define "link"}}
<div>
    Post: {{.Type}}
</div>
{{end}}

This is a rudimentary example, but the convenience and utility is fairly evident. You can define blocks to be rendered, named by the post’s type value. Those blocks can then assume the object in its named scope is a specific post struct and access the typed values directly.

Wrapping Up

This is a v1.0 release and our goal was to release a limited scope, but flexible utility for developers to use. We plan on implementing plenty of new features and improvements in the future, and to make sure that improvements to the API are brought into the wrapper. Hope you enjoy using it!

Dec 20, 2016 114 notes
#golang #api

November 2016

Command Line Tumblr

A Totally New Interface for Tumblr?

Today, Tumblr is accessible via mobile, web or api—but what if you’re a linux enthusiast? Nerds like you can now access Tumblr completely via command line.

“What about images?” you ask. Displaying an image in command line is not something new. There are already a bunch of existing libs doing this, namely aalib, libcaca and super low level ncurses. And the most interesting project built based on those—p2p video chat—comes from a hackathon.

I picked up a much higher level library called blessed, for least efforts to achieve a best looking interface. As you may seen, blessed is javascript-based and very fancy. It provides you with almost every widget you might need to build an awesome dashboard.

Most of the work has already been done after figuring out the right library, to show tumblr in command line, we just need to

  • Connect the api to fetch image urls.
  • Do some front-end design to show a Tumblrish dashboard.

What? Still need codes?…

var post = blessed.box({
    parent: dashboard,
    top: '15%',
    left: 'center',
    width: '40%',
    height: '80%',
    draggable: true,
    border: {
        type: 'line'
    },
    style: {
        fg: 'white',
        bg: 'white',
        border: {
            fg: '#f0f0f0'
        }
    },
});

var load_post = function() {
    if (index < 0 || index >= posts.length)
        return;

    post.free();
    var post_data = posts[index];
    /** avator */
    blessed.ANSIImage({
        parent: post,
        top: 0,
        left: '-30%',
        width: '20%',
        height: '20%',
        file: post_data.avator,
    });

    /** posts */
    var count = post_data.count;
    // TODO: switch all sizes
    for (var i = 0; i < count; i++) {
        var offset = 100/count * i;
        var width = 100/count;
        blessed.ANSIImage({
            parent: post,
            left: offset + '%',
            width: width + '%',
            height: '98%',
            file: post_data.data[i]
        });
    }

    screen.render();
}

Blessed already provided lots of high level apis. As an example, to display a post as an image, all your input is just an image url, and call

blessed.ASNImage({
    ...
    file: image_url/local_file
})

It supports png and gif, and even, if you’d like to show a video, blessed also provides video. Hypothetically speaking, we can use this library to build almost all components in the dashboard of Tumblr today. Note, it’s not connecting the real api, but I suppose that would be pretty easy. Also there’s a memory optimization issue might need to be addressed if we really want to use this library for something.

Nov 17, 2016 272 notes
#command line
PHP 7 at Tumblr

At Tumblr, we’re always looking for new ways to improve the performance of the site. This means things like adding caching to heavily used codepaths, testing out new CDN configurations, or upgrading underlying software.

Recently, in a cross-team effort, we upgraded our full web server fleet from PHP 5 to PHP 7. The whole upgrade was a fun project with some very cool results, so we wanted to share it with you.

Timeline

It all started as a hackday project in the fall of 2015. @oli and @trav got Tumblr running on one of the PHP 7 release candidates. At this point in time, quite a few PHP extensions did not have support for version 7 yet, but there were unofficial forks floating around with (very) experimental support. Nevertheless, it actually ran!

This spring, things were starting to get more stable and we decided it was time to start looking in to upgrading more closely. One of the first things we did was package the new version up so that installation would be easy and consistent. In parallel, we ported our in-house PHP extensions to the new version so everything would be ready and available from the get-go.

A small script was written that would upgrade (or downgrade) a developer’s server. Then, during the late spring and the summer, tests were run (more on this below), PHP package builds iterated on and performance measured and evaluated. As things stabilized we started roping in more developers to do their day-to-day work on PHP 7-enabled machines.

Finally, in the end of August we felt confident in our testing and rolled PHP 7 out to a small percentage of our production servers. Two weeks later, after incrementally ramping up, every server responding to user requests was updated!

Testing

When doing upgrades like this it’s of course very important to test everything to make sure that the code behaves in the same way, and we had a couple of approaches to this.

Phan. In this project, we used it to find code in our codebase that would be incompatible with PHP 7. It made it very easy to find the low-hanging fruit and fix those issues.

We also have a suite of unit and integration tests that helped a lot in identifying what wasn’t working the way it used to. And since normal development continued alongside this project, we needed to make sure no new code was added that wasn’t PHP 7-proof, so we set up our CI tasks to run all tests on both PHP 5 and PHP 7.

Results

So at the end of this rollout, what were the final results? Well, two things stand out as big improvements for us; performance and language features.

Performance

When we rolled PHP 7 out to the first batch of servers we obviously kept a very close eye at the various graphs we have to make sure things are running smoothly. As we mentioned above, we were looking for performance improvements, but the real-world result was striking. Almost immediately saw the latency drop by half, and the CPU load on the servers decrease at least 50%, often more. Not only were our servers serving pages twice as fast, they were doing it using half the amount of CPU resources.

These are graphs from one of the servers that handle our API. As you can see, the latency dropped to less than half, and the load average at peak is now lower than it’s previous lowest point!

Language features

PHP 7 also brings a lot of fun new features that can make the life of the developers at Tumblr a bit easier. Some highlights are:

  • Scalar type hints: PHP has historically been fairly poor for type safety, PHP 7 introduces scalar type hints which ensures values passed around conform to specific types (string, bool, int, float, etc).
  • Return type declarations: Now, with PHP 7, functions can have explicit return types that the language will enforce. This reduces the need for some boilerplate code and manually checking the return values from functions.
  • Anonymous classes: Much like anonymous functions (closures), anonymous classes are constructed at runtime and can simulate a class, conforming to interfaces and even extending other classes. These are great for utility objects like logging classes and useful in unit tests.
  • Various security & performance enhancements across the board.

Summary

PHP 7 is pretty rad!

Nov 10, 2016 498 notes
#tumblr engineering #php #php7

October 2016

The Art of Open-Sourcingmedium.com

effectiveandroid:

An article by @vanillaburritos as a reflection of her experience open-sourcing PermissMe at Tumblr. Give it a read!

Oct 25, 2016 31 notes
#android #open source #permissme
Juggling Databases Between Datacenters

    Recently we went through an exercise where we moved all of our database masters between data centers. We planned on doing this online with minimal user impact. Obviously when performing this sort of action there are a variety of considerations such as cache consistency and other pieces of shared state in stores like HBase, but the focus of this post will be primarily on MySQL.

    During this move we had a number of constraints. As mentioned above this was to be online when serving production traffic with minimal user impact. In aggregate we service hundreds of thousands of database queries per second. Additionally we needed to encrypt all data transferring between data centers. MySQL replication supports encryption, but connections to the servers themselves present several challenges. Specifically, from a performance standpoint the handshake to establish a connection across a WAN can impact latency if there is significant connection churn. Additionally, servicing read queries across a backhaul link adds latency, which is never desirable.

    We decided to tackle these issues in several ways. We were able to leverage a number of existing features of our applications and infrastructure, as well as developing new automation to fill gaps in functionality. Our configuration and applications in various runtimes, were able to support a read/write split (which may seem obvious to some, but isn’t always easy to accomplish in every scenario). We used the read/write split, along with encrypted replication, to provide a local read replica. Some runtimes can set up a persistent encrypted connection to a remote master, which serviced read requests in those cases, as the per-connection latency was amortized over a large number of queries. For runtimes which have a high churn rate, such as PHP, we used a MySQL proxy, ProxySQL, which provided persistent, encrypted connections, as well as meeting our performance requirements. We built automation to deploy proxies for numerous database pools, servicing thousands of requests per second, per pool.

    When performing the cutover, our workflow was as follows. In each data center, there was a config which pointed to a local read slave, a remote master, and a local proxy with the master (remote or local) as a backend. When moving masters between datacenters, our database automation, Jetpants (new release coming soon!), reparented all replicas, and our automation updated the proxy backend to point to the new master. This resulted in seconds of read-only state per database pool and minimal user impact.

More coming soon!

Oct 4, 2016 39 notes
#databases #mysql #proxysql #jetpants #datacenters
Oct 4, 2016 1,594 notes
#android

September 2016

Introducing Laphs

The Core Web team at Tumblr is proud to announce the release of Laphs (Live Anywhere Photos - LAPhs; get it?), an open source JavaScript library for implementing Apple’s Live Photos on the web.

We use Laphs to support Live Photos on the web at Tumblr and now you can too! Check it out on github and npm and let us know what you think.

Happy coding!

Sep 20, 2016 98 notes
#open source #javascript #live photos #apple

August 2016

Categorizing Posts on Tumblr

Millions of posts are published on Tumblr everyday. Understanding the topical structure of this massive collection of data is a fundamental step to connect users with the content they love, as well as to answer important philosophical questions, such as “cats vs. dogs: who rules on social networks?”

As first step in this direction, we recently developed a post-categorization workflow that aims at associating posts with broad-interest categories, where the list of categories is defined by Tumblr’s on-boarding topics.

Methodology

Posts are heterogeneous in form (video, images, audio, text) and consists of semi-structured data (e.g. a textual post has a title and a body, but the actual textual content is un-structured). Luckily enough, our users do a great job at summarizing the content of their posts with tags. As the distribution below shows, more than 50% of the posts are published with at least one tag.

However, tags define micro-interest segments that are too fine-grained for our goal. Hence, we editorially aggregate tags into semantically coherent topics: our on-boarding categories.

We also compute a score that represents the strength of the affiliation (tag, topic), which is based on approximate string matching and semantic relationships.

Given this input, we can compute a score for each pair (post,topic) as:

where

  • w(f,t) is the score (tag,topic), or zero if the pair (f,t) does not belong in the dictionary W.
  • tag-features(p) contains features extracted from the tags associated to the post: raw tag, “normalized” tag, n-grams.
  • q(f,p) is a weight [0,1] that takes into account the source of the feature (f) in the post (p).

The drawback of this approach is that relies heavily on the dictionary W, which is far from being complete.

To address this issue we exploit another source of data: RelatedTags, an index that provides a list of similar tags by exploiting co-occurence patterns. For each pair (tag,topic) in W, we propagate the affiliation with the topic to its top related tags, smoothing the affiliation score w to reflect the fact these entries (tag,topic) could be noisy.

This computation is followed by filtering phase to remove entries (post,topic) with a low confidence score. Finally, the category with the highest score is associated to the post.

Evaluation

This unsupervised approach to post categorization runs daily on posts created the day before. The next step is to assess the alignment between the predicted category and the most appropriate one.

The results of an editorial evaluation show that the our framework is able to identify in most cases a relevant category, but it also highlights some limitations, such as a limited robustness to polysemy.

We are currently looking into improving the overall performances by exploiting NLP techniques for word embedding and by integrating the extraction and analysis of visual features into the processing pipeline.

Some fun with data

What is the distribution of posts published on Tumblr? Which categories drive more engagements? To analyze these and other questions we analyze the categorized posts over a period of 30 days.

Almost 7% of categorized posts belong to Fashion, with Art as runner up.

The category that drives more engagements is Television, which accounts for over 8% of the reblogs on categorized posts.

However, normalizing by the number of posts published, the category with the highest average of engagements per post isGif Art, followed by Astrology.

Last but not least, here are the stats you all have been waiting for!! Cats are winning on Tumblr… for now…

Aug 2, 2016 532 notes
#tags #cats vs dogs #post categorization #data science

July 2016

Flux and React in Data Lasso

javascript:

TL;DR

Flux helped bring the complexity of Data Lasso down, replacing messy event bus structure. React helped make the UI more manageable and reduce code duplication. More below on our experience.

Keep reading

Jul 26, 2016 37 notes
#javascript #data lasso #react #flux
Play
3:07
Jul 19, 2016 108 notes

cocoa:

WWDC 2016 has come and passed, but we wanted to take the time to call out the new idea that Apple unveiled which we think are important as developers, as well as things to make our product teams aware of for future launches.

WWDC has slowly returning back to a software and developer focused event over these passed few years, and this year was no exception. Many new technologies, tools, and ideas introduced for developers to plug into to enrich both their applications as well as the Apple ecosystem in general. So let’s get into what we saw and enjoyed.

Keep reading

Jul 5, 2016 29 notes

June 2016

tumblr.js update

javascript:

We just published v1.1.0 of the tumblr.js API client. We didn’t make too much of a fuss when we released a bigger update in May, but here’s a quick run-down of the bigger updates you may have missed if you haven’t looked at the JS client in a while:

  • Method names on the API are named more consistently. For example, blogInfo and blogPosts and blogFollowers rather than blogInfo and posts and followers.
  • Customizable API baseUrl. We use this internally when we’re testing new API features during development, and it’s super convenient.
  • data64 support, which is handy for those times when you have a base64-encoded image just lying around and you want to post it to Tumblr.
  • Support for Promise objects. It’s way more convenient, if you ask me. Regular callbacks are still supported too.
  • Linting! We’ve been using eslint internally for a while, so we decided to go for it here too. We’re linting in addition to running mocha tests on pull requests.

Check it out on GitHub and/or npm and star it, if you feel so inclined.

tumblr.js REPL

When we were updating the API client, we were pleasantly suprised to discover a REPL in the codebase. If you don’t know, that’s basically a command-line console that you can use to make API requests and examine the responses. We dusted it off and decided to give it its own repository. It’s also on npm.

If you’re interested in exploring the Tumblr API, but don’t have a particular project in mind yet, it’s a great way to get your feet wet. Try it out!

Jun 30, 2016 56 notes
#javascript #tumblr api
Jun 15, 2016 46 notes
#wwdc 2016
Some Themed Posts Updates

cyle:

  • New feature: Your theme’s accent/link color changes your post’s like/reblog/reply/etc colors! Whoa!
  • Big bug fix: The colors used to theme your posts now make more sense. Your title color is your text color (used to be your link color, which doesn’t really make sense), and your background color is still your background color.
  • Bug fix: “Keep reading” links on reblogs are now the right color.
  • Bug fix: Ask/answer posts are now themed better.
  • Bug fix: Added a line under your “contributed content” for a reblog, so that the space there doesn’t look as weird. Maybe it still does, I dunno.
  • Bug fix: The “follow” text above reblogged content should now be the right color.

I’m still working on lots of bigger changes, too. More info on those when they’re ready for release. As always, feel free to message me if you have any questions or suggestions!

Jun 9, 2016 86 notes
#tumblr labs #themed posts
U2F with Yubikeys

During our recent hackday we wanted to explore new ways to login to Tumblr and play with some cool toys. The following is not an announcement of any kind, other than that U2F is awesome and everyone should buy a Yubikey (they aren’t paying us to say this, we swear).

Authenticating your online identity

If you’ve ever logged into any website on the internet, chances are you’ve been through an authentication flow. You provide the site with a username you use to identify yourself on that platform, followed by a password that (in theory) only you know to prove that you are you. If all that matches what the site has in their database, you’re authenticated! However, that particular flow only represents a single factor of authentication, the “knowledge factor” (because you know your password). But even if you have a highly complex password, unique to that one site, that probably won’t be enough to really secure your account from unauthorized access. That’s why we provide the ability (and highly encourage users) to enable Two-Factor Authentication (2FA).

Traditionally, 2FA is done either via SMS or through an authenticator app (i.e. Duo, Authy, Google Authenticator, etc). But what happens if you don’t have reception, how will you receive a text message? What if there’s an issue with the authenticator service, and you don’t have a fallback? Surely there has to be a realistic and practical option past what industry has been relying on that can help mitigate some of these issues.

Keep reading

Jun 7, 2016 84 notes
#hackday #yubikey #2fa #u2f

May 2016

Improved GIF Tagging

Tumblr receives a massive daily volume of gifs. We can only associate gifs with metadata from the post, rather than the gif itself, which presents a tricky technical question: how should this gif be indexed for future searches? The post’s tags could be used, but users often use a post’s tags as an under-your-breath-style postscript to the content, not actual post metadata. Gif reactions are ubiquitous on social media platforms, and users expect relevant images quickly supplied to their fingertips to keep the banter going. So Tumblr has to address the need for an accurate and fast heuristic for returning gifs based on a text query.

As part of Tumblr’s recent Hack Day, I devised a potential improvement to our current method of matching gifs and tags. I built a standalone service, nicknamed Taggy, consisting of a simple API and an even simpler user frontend. The goal was to create a classification pipeline from Tumblr’s already extensive gif library to users, prompting them for additional metadata about a gif, and then storing that response in a way that provides easy future search and analysis.

Keep reading

May 31, 2016 83 notes
#hackday #taggy
Babby's First Hack Day: Fast Queue

Technically this was not my first hack day, but this was the first hackday where I attempted to work on a project on my own. I am a QA Engineer, not a mobile developer, so my experience in making iOS apps is pretty light. I have been working with Swift for automation purposes, and for practicing coding on my own, and this was the first hack day since I started coding in earnest where I could put what I learned to the test.

Keep reading

May 26, 2016 42 notes
#hackday #fast queue
May 24, 2016 349 notes
#engineers and their stickers
Peeking into a Black Box - The Effect of Dependency Structure on iOS Launch Times

Abstract

We’ve experienced issues with iOS’s dynamic linker increasing our app’s launch time to unacceptable levels. To learn more about the linker’s inner workings and explore possible solutions, we created a Ruby script that generates dependency hierarchies of arbitrary complexity and tested their effect on launch times.

Keep reading

May 17, 2016 25 notes
#ios
Data Lasso 2

javascript:

Data Lasso, Tumblr’s three-dimensional visualization tool, just got a serious upgrade. Along with a version bump to 2.x, Data Lasso now has some handy new features (as well as completely reworked internals). A GIF is worth a thousand words:

Quick refresher: Data Lasso is a visualization tool that Tumblr built that allows us to look at large multi-dimensional data sets quickly. If you haven’t tried it yet, check out the hosted version here.

New stuff

  • Data Lasso is built on the premise of being able to quickly visualize data and select a subset of interest, using a lasso-like tool. That tool just became much more flexible. Now, you will be able to make complex selections by adding and subtracting from an existing selection - much like the tools that you are already used to, if you work with image editing programs. Hold your shift key to add, option/alt to subtract.
  • Now, you can also upload datasets using a URL, without needing to download them. Same rules apply - it can be any .csv, .tsv or .json, as long as it’s properly formatted. That will come in handy if you are using data lasso with public datasets that are available online, or if you are working with systems like Hive that provide a link to your query results.

Reworked Internals

A lot was changed under the 3 dimensional hood of Data Lasso.

  • Architecture now follows principles of Flux (a fitting approach for a complex front-end application like Data Lasso) and its interface is now powered by React. These two things help to reduce the complexity a lot. More on moving to Flux + React in a blog post to follow.
  • The build process was moved to Webpack and was simplified a lot. Webpack loaders also allowed us to have .hlsl files in the codebase for the first time - so we no longer had to rely on workarounds to include the vertex and fragment shaders that Data Lasso relies on for utilizing GPU.

It won’t be a major version bump, of course, if it did not contain backwards incompatible changes. With a move to Flux, the event bus was deprecated. So if you are using Data Lasso inside your app and rely on events for interacting with it, you will have to switch to using Store and Dispatcher instead. It is good in the long term - as it provides so much more clarity into what’s going on inside Data Lasso.

That should be it! Overall, 2.0 is a solid release that adds new fundamental functionality, while allowing for future work to go smoother. As usual, if you encounter a problem - open an issue on the repository.

May 12, 2016 66 notes
Swift Compilation Reporting at Tumblr

Doing a clean build of the Tumblr iOS app takes…a while, and we have a bunch of developers on the team, assuming each person does at least two clean builds a day1 that adds up to, well, too much time. As we march forward with Swift for new features, we noticed our compilations times increasing at surprising rates. There have been a few blog posts outlining expressions that the Swift compiler has trouble on. For instance, type inference of nested literal expressions2 and operator overloads are expensive to resolve.

To address this, we decided to automate monitoring of compilation performance. The goal was to create a weekly job that would compile our project with specific debug flags3, process the results, and email out the slowest compilation paths.

The result is a Swift script, called SwiftCompilationPerformanceReporter (nicknamed SwiftCPR), that we use to generate our weekly compilation report. Below are the steps SwiftCPR takes:

  • Runs a clean build with the following command4

    xcodebuild -workspace workspacePath -scheme scheme clean build OTHER_SWIFT_FLAGS="-Xfrontend -debug-time-function-bodies" | grep [1-9].[0-9]ms | sort -nr > buildOutputDirectory

    where workspacePath, scheme, and buildOutputDirectory are the workspace file, scheme, and output directory for the raw compilation logs, respectively. These can be specified in the config.json file.

  • Processes the raw compilation logs and merges duplicate entires.

    Sample raw logs:

    5992.1ms  /Users/tumblr/workspace/SwiftCPR/orangina/Classes/PerformanceLoggingEvent.swift:267:37  final get {}
    5718.3ms  /Users/tumblr/workspace/SwiftCPR/orangina/Classes/PerformanceLoggingEvent.swift:267:37  final get {}
    4376.1ms  /Users/tumblr/workspace/SwiftCPR/orangina/Classes/UniversalLink.swift:127:25    private final class func dictionaryOfAppArgumentsFromQueryString(string: String) -> [NSObject : AnyObject]?
    ...
    
  • Outputs a final report with the total compilation time and the slowest limit compilation paths, where limit can be configured in the config.json file.

    Sample report:

    Total build time: 1115.24661797285
    11.7104   /Users/tumblr/workspace/SwiftCPR/orangina/Classes/PerformanceLoggingEvent.swift:267:37  final get {}
    8.5783    /Users/tumblr/workspace/SwiftCPR/orangina/Classes/UniversalLink.swift:127:25    private final class func dictionaryOfAppArgumentsFromQueryString(string: String) -> [NSObject : AnyObject]?
    ...
    

Once the above steps are complete, our job emails the report to the team! From these insights, we have been able to refactor functions that took over 10 seconds to compile to roughly a tenth of a second. Hope this script can help your team better profile Swift compilation times!

Jasdev Singh

1: Majority of our builds are incremental.

2: This has been resolved and the fix will ship with Swift 3!

3: Specifically, -debug-time-function-bodies

4: Thanks to Michael Skiba!

May 10, 2016 40 notes
May 3, 2016 90 notes
#javascript #webpack

April 2016

Dragging, Scaling, Rotating on April Fools Day

effectiveandroid:

The April Fools 2016 project was executed and finalized on a very tight deadline. The mobile platforms (Android, iOS) had only 2 weeks to ship a finalized, polished implementation from scratch. Originally, we planned on doing a full-blown custom election, where users would, themselves, be able to run as a candidates in the election, create their own campaign and be featured on a top Candidates page. However, we had to scope down the project as much as we could. Development had to occur while scoping the project and mocks were incomplete.

One of the biggest contributors to the success of the AF project were the new creator tools we provided. The campaign posters and campaign endorsements.

While creating the campaign poster creator on Android, there were 4 main actions to implement for users:
1) Adding items to the poster
2) Dragging items
3) Scaling items
4) Rotating items
* “items” refers to text and stickers

I decided to structure the implementation as follows:
image of activity -> layoutView -> canvas, with imageViews and textviews being added to the canvas, and the color picker being a separate ViewGroup.


Keep reading

Apr 27, 2016 26 notes
Peeking into electronics prototyping with the Arduino Uno

With so many passionate engineers here at Tumblr, meetings can get a little, well, heated. When staff turns up the fire in the kitchen, it’s time to crack a window.

Open windows can have their disadvantages. Wasted heat, wasted cooling, and unexpected visitors.

Don’t let them in.

However, it’s easy to forget to close the windows when leaving the room. To that end, I decided to build a reminder system during our most recent Hack Day.

The system, as shown above, uses an Arduino Uno to which a flexiforce pressure sensor has been wired. When the window is opened, a “start up” tune is played, and a timer begins. Once a certain amount of time has passed (say, 30 minutes,) a warning sound plays. Then, a reminder sound will play at another customizable interval. There’s also a button that activates a snooze feature to mute the reminder sound, during which an LED will blink. The logic for the main loop is below. Since this is Hack Day code, keep in mind it may be a bit wonky.


void loop() {
  timer.run(); 

  int sensorValue = analogRead(sensorPin);
  int buttonState = digitalRead(buttonPin);

  if (started && buttonState == HIGH) snooze();

  int difference = sensorValue - lastVoltage;
  if (difference < 0) difference = difference * -1;
  if ((difference < 150) && (lastVoltage > -1)) return;

  if (sensorValue <= threshold) {    
    //window is open
    if(! started) {
      playTune();
      started = true;
    }
    if(! triggered) {
      timer.enable(open_start_id);
      timer.enable(reminder_id);
    }
    lastVoltage = sensorValue;
  } else {
    //window is closed
    if(started) { 
      //reset everything
      open_start_id = timer.setTimeout(warnAfter, play_warn);
      triggered = false;
      started = false;
    }

    timer.disable(open_start_id);
    timer.disable(reminder_id);
    lastVoltage = sensorValue;
  }
}

And since we’d be remiss in not showing you this baby in action:

https://aloria.tumblr.com/post/143120428399

Apologies to Lipps Inc.

A list of components can be found here: https://aloria.tumblr.com/post/143121827147/arduino-project-components. Also, https://www.tumblr.com/jobs

- @aloria

Apr 26, 2016 68 notes
#hackday
Play
2:27
Apr 21, 2016 87 notes
#hackday #meanwhile at hq

March 2016

Collins and Go

Historically, the SRE teams at Tumblr have often turned to Ruby to write the the various scripts and tools that keep the site running. But lately we’ve been starting to introduce Go as an alternative for certain tasks and it’s been working out great so far!

As you may know, we use our inventory management system, Collins, to keep track of all our servers and what they do. In order to do useful things with Go we need to be able to talk to Collins from it. So, naturally, one of the first things we wrote in Go was a client library for Collins.

Today, we’re open sourcing it so that anyone else who is using Collins and thinks Go is cool can use the two together. You can find the code (and links to documentation) here: https://github.com/tumblr/go-collins

Please let us know on GitHub if you have issues/questions/pull requests!

(The Go Gopher was created by Renee French, licensed under CC Attribution 3.0)

Mar 29, 2016 56 notes
gulp-css-hashes

javascript:

Believe it or not, but since ancient times engineers at Tumblr have had to manually take care of busting caches for assets referenced in CSS. Do you have to change that png you used as a background in CSS? Bump that awkward …?57 to be …?58 so your asset will get properly busted out of the cache and replaced with a new version. How antiquated! How manual!  

There are a number of options we could have used to automate this process, but we wouldn’t be able to call ourselves true Web Engineers if we didn’t write our own approach to solve the problem.  (But really, sassiness aside, we did our own thing because nothing existed that worked in quite the robust and simple way that we wanted.)

So, gulp-css-hashes was born! (We thought a lot about the name.)

It’s a simple gulp plugin that you can add into your gulp flow that will go over all url() instances in your CSS and will append hashes to those urls based on md5 of that local file. It’s nicely documented, straightforward in how it works, and configurable enough to fit many use cases. Check it out here!

Mar 4, 2016 59 notes

February 2016

Moving things out of critical rendering path

javascript:

Tumblr’s web pages are heavy.

Naturally, the heaviest strain on the network comes from the content - filled with heavy gifs and other content. However, we also load embarrassingly large amounts of JavaScript. There is one particularly heavy JavaScript file that contains our vendor libraries (e.g., jQuery, Backbone, etc), which we call the vendor bundle.

It was loaded right on the top of the page, significantly adding to the critical rendering path. But we don’t need it there, being one of the first things user loads. Moving the vendor bundle down the page might result in performance gains and a more responsive page load time - all are great goals that Tumblr’s Core Web team is set on accomplishing.

After a few months of patching ancient, legacy inline scripts, and running performance tests, the script is finally moved to the bottom of the page! Even more, with this effort we decided to really dig into performance - we conducted A/B testing of that move (establishing performance split testing framework along the way) to see what exactly what effect the move would have on our site’s performance.

Results

Starry eyed and excited to see improvements across the board (that file exists on nearly every page of tumblr, after all) we jumped head on into the data to find ourselves amused and slightly disappointed.

Turns out, the performance gains that we expected to see from moving a heavy file out of a critical rendering path were not there. The 75th percentile of page load times across the board remained the same. We suspected that analyzing on a more granular would reveal performance gains on certain pages or even certain browsers - but results were pretty uniform. See for yourself - below are boxplots for performance on the dashboard:

Was our approach to taking measurements incorrect? We revisited numbers again and again. We looked into finer detail at various pages. We excluded IE9 after hypothesizing that it might skewing our results. We measured across several metrics that we had at our disposal (we sample actual navigation timings from actual users). Results remained the same.

Outcome

As much as we were disappointed, we were also glad that we ended up knowing precisely what effect we had on the performance. Oftentimes we take blind faith that best practices will lead to best results, and fail to realize that myriads of factors weigh in on actual outcomes.

If anything - we learned a lot from this experience. We gained better insight into our codebase and established an approach to measuring performance in future experiments!

Feb 19, 2016 74 notes

November 2015

Data Lasso

javascript:

There I was. Stranded. Alone with my hive query export of many thousands of records, earned by a tireless series of painfully refined select statements, needing to identify what the outliers were in this madness of data.

“I have data. Now what…Crap.” I mumbled to myself, realizing that I am limited to a few unpromising options.

Flexing my brain muscle, I tried to recollect bits and pieces I knew on mighty R. “What was the name of the library? ggdraw? ggchart? Dammit it’s ggplot.” Prospect of trying to remember cryptic R work was dooming, weighing heavily on a tired engineer who had enough of suffering for the day.

Then a shameful thought passed through my mind: “Just open it in MS Excel. No one has to know.” Countless minutes passed as I was still looking at a beach ball of death, spinning as the naive program tried to open my data set and obviously failing.

Desperation fell on a lonely engineer. There’s got to be a better way. A way to easily visualize the data set as a whole and not needing to write code for it. Just plug in the data set and specify what to show from it. A scalable solution. Firmly deciding that no one should have to be stranded in that situation, a determination came to write a tool that will solve that gap. So was born the Data Lasso.

Data Lasso

Data Lasso is a visualization tool that allows exploration of arbitrary set of data in 3D. It is built to be agnostic to the structure and formatting of data.

There is no setup. You don’t need to prepare your data, .csv, .tsv or .json will do. It can easily visualize half a million entries. All of that in a three dimensional space, so you can have complete freedom of looking at your data.

Data Lasso can help answer such fundamental questions as:

  • What are the outliers in this multi-dimensional data set?
  • How one dimension of data correlates to another one? Another two?
  • How do you find signal in what is, otherwise, simply noise?

Under the hood

Future

WebGL. The future is upon us, and 3D in a browser is a reality. Using three.js to help wrangle WebGL, Data Lasso benefits from that extra dimension a lot. Rotating, moving and zooming in 3D space gives additional freedom to look at your data closely.

Data Lasso can visualize around half a million entries - all thanks to shaders that allow to take rendering off the CPU and pass it on to the GPU. Shaders alone might have been the single most important breakthrough for Data Lasso, enabling those smooth 60fps even with large data sets.

Goes well with your stack

At it’s core, It is built to be extensible by means of modules, that can hook right into Data Lasso event bus. That allows you to set up data flow out of Data Lasso that is customized for your needs, or add a UI on top of the Data Lasso to be specific to your data.

Data Lasso can be used standalone, but it was not made into an npm module for no reason - add it to your stack and serve it up from your systems.

Data Lasso was used inside Tumblr for several months, and shown itself to be an extremely useful visualization tool, filling a big gap in a workflow of working with data.

Now it’s open sourced. Go check it out.

Nov 24, 2015 160 notes
Resizing Gifs On-Demand

Why dynamically resize images in the first place?

Before the dawn of on-demand resizing at Tumblr, every posted image was resized into seven or eight different sizes, and each was saved into our backing media store (a massive S3 bucket). This made serving our images very fast—just grab the size you want right from the bucket! While this was great, it also meant that any changes to our image processing would not affect any images we had already saved (billions of images). If we were to upgrade image quality, add a new size crop, or change how we handle taking down media, the effect would only be marginal…what a bummer! The cost of storing all the resizes as separate files (petabytes of data!), along with a lack of agility moving forward, motivated us to adopt a dynamic resizing and serving strategy.

We began with resizing jpg and png images on-demand instead of persisting each different resize crop in our S3 bucket. This has been a great success; our “Dynamic Image Resizer” churns through over 6,000 images a second, at a roundtrip request latency of only 250ms per image. Not having to store the resizes saves us tens of thousands of dollars a month! So, the natural question was, can we also do this for gifs and make a “Dynamic Gif Resizer?”

The problem with resizing gifs on-demand

Gifs, as a medium, are a wonderful thing. They capture a special or hilarious moment and repeat it back to you, forever. However, the actual Graphics Interchange Format leaves much to be desired. Last touched in 1989, the format is woefully outdated, and this begets massive, low quality animated images. When compared to video format counterparts (H.264 and the like), the gif file size can be tens of times larger at similar visual quality. Many companies have punted on the gif file format entirely; imgur released their gifv format, which wraps an mp4 video. Instagram will loop your video clips, but will flatten gifs to a still image. However, as the true “home of the gif,” Tumblr isn’t ever giving up on your gif files!

Resize it faster

A while ago, one of my colleagues @dngrm posted about updates we made in our gif resizing technology. Essentially, we switched our gif resizing library from ImageMagick to gifsicle with great success—we got lower latency and higher-quality results. In order to resize a gif in a realistic timeframe for on-demand resizing and serving, we proposed some changes to gifsicle that parallelizes the resizing step. Since a gif is just a stack of image frames, we figured that resizing them using a thread pool could lead to a performance improvement. Luckily for us (and the world!), gifsicle author Eddie Kohler accepted and merged our changes into gifsicle. With this new threaded resize option in gifsicle, we gained about a 1.5-2x speed-up in resizing an average gif against using the vanilla gifsicle. This brought down the average wall time of a gif resize to about 100ms. The entire gif resize request (downloading the upstream image, resizing the gif, and serving the response) is now only 400ms on average.

Cache is king

To make all this possible, tumblr heavily relies on CDNs to cache massive amounts of static content and avoid repeated work. Thanks to this, the Dynamic Gif Resizer only gets a little over 1,000 resize requests per second, thanks to an incredibly-high cache hit ratio on our CDNs.

On top of that, we rely on the usage of conditional GET requests and 304 Not Modified responses to cut down on the amount of real work we must do in the resizing level. The number of 304s we serve fluctuates between 30-50% for all gif responses, which saves us a tremendous amount of compute time!

Putting it all together

The resizer itself is an nginx server with a custom module that does the upstreaming and resizing, and is written in C. The jpg/png resizer utilizes OpenCV for image manipulation, while the gif resizer uses the aforementioned gifsicle library.
Our fleet of resizers and their surrounding architecture are housed in AWS. The main motivation for this was colocation to our image store (S3) and the ability to automatically scale our instance count up and down, depending on time of day (our traffic pattern is heavily cyclic over a 24h window). The rest of tumblr’s architecture is housed in our own DC. Below is a minimalistic diagram of our resizer setup.

Thanks

Both the Image Resizer and Gif Resizer were massive undertakings, and a lot of people deserve credit for fantastic work:
Massive thanks to co-developer @naklin, and to @neerajrajgure who helped with improvements.
To @dngrm, @michaelbenedict, @heinstrom, @yl3w, and @jeffreyweston for architecture help and sage advice.
To our AWS enterprise support team Frank Cincotta, Shaun Qualheim, Darrell DeCosta, and Dheeraj Achra.
And to Eddie Kohler who helped clean up my ugly gifsicle changes and let them be a part of his library.

Questions? Comments?

Talk to me on tumblr using our new messaging system! My tumblr is @hashtag-content

Originally posted by gameraboy

Nov 19, 2015 284 notes
#gif #gif resizer #resizer #engineering #tumblr #staff

October 2015

Play
Oct 30, 2015 42 notes
#culture matters #culturematters #jamf #jnuc #jnuc2015 #junc 2015 #mac #MacIT #IT #apple #tumblr

September 2015

Sep 11, 2015 17 notes
#JNUC #JNUC2015 #JAMF #Mac #IT #Mac IT #OS X #Casper #Mac OS X #Apple #Macs #Casper Suite #JAMF Nation
Sep 4, 2015 82 notes
Sep 3, 2015 157 notes
#Android #animation #open source

July 2015

Adventures in making a CocoaPods Plugin

brianmichel:

So, there I was, trying to comprehend why I couldn’t grab some dependencies that had been resolved by including a specific pod only to realize that it was a problem with not being able to clone in a specific way. In my case it was a specification that had a source URL over SSH as opposed to HTTP. An hour or two later I figured out a workaround by replacing the specific source URL with an HTTPS URL instead of using the specified git URL. Meaning I would transform something like this:

git@cool-git-server.com:Organization/repo.git

into something like this:

https://access-token-here@cool-git-server.com/Organization/repo.git

Let’s talk about this transformed structure for a second first. It’s almost a standard https URL that you’ve seen dozens of time on GitHub and other places. But what’s this access-token-here bit? It’s a GitHub personal access token!1

We now know the structure of the transform we need to create, all we need is a way for us to tell CocoaPods to run code that will perform the transform. We’re in luck, since CocoaPods 0.28, there has been a plugins interface! Check out their blog announcement here.

After reading the blog post, I was sure this was the right way forward, and there were even linked examples, and an informal interface, which was all good stuff. The following made me very happy to see they had thought about this.

Plugin support allows one to tweak the internals of CocoaPods and to insert additional commands.

Here’s where things get a bit murky. The listed example only covers 1/3rd of the extensibility options offered by this plugin interface. If you need to make a plugin that patches internals (dangerous), or uses a pre or post-install hook (a sanctioned CocoaPods interface) you’re on your own. Thankfully, due to the informal interface conventions, every other plugin must begin with cocoapods- so let’s take a look at RubyGems.org and find some other plugins.

I ended up checking out the cocoapods-src plugin which will checkout all of the sources for a pod after you’ve finished running pod install. I chose this because it utilized a post-install hook as it’s trigger, and figured this would closely mirror what a pre-install hook plugin would do. Here’s the bit of code in the gem that helped me get going in the right direction:

Pod::HooksManager.register(:post_install) do |installer_context|
  Pod::Src::Downloader.new(installer_context).download
end

Great, I’m off and running by changing that to this:

Pod::HooksManager.register(:pre_install) do |installer_context|
    configuration = SourceURLRewriter::Configuration.new(installer_context)
    SourceURLRewriter::Rewriter.new(installer_context, configuration).rewrite!
end

I installed my gem locally, ran pod install, and…nothing happened.

At this point the effort of this project hockeysticked. Without much documentation to go off of, I setup my machine as if I was going to contribute to the CocoaPods project (instructions here). My plan was to search through the code for some symbols I’d been using such as :pre_install and work my way through what was going on.

I found that, in Pod::HooksManager::Hook, the use of hooks without specifying a plugin name has been deprecated. Okay, so now I just need to specify a name for this plugin, no problem. In the same file there is the signature for the register function. I can use that to figure out how to call this function with a name:

Pod::HooksManager.register('url_rewriter_plugin', :pre_install) do |installer_context|
    configuration = SourceURLRewriter::Configuration.new(installer_context)
    SourceURLRewriter::Rewriter.new(installer_context, configuration).rewrite!
end

I rebuilt my gem, installed it locally, ran pod install, and…nothing happened, again.

At this point, I’m losing the desire to even want to try and build a plugin, and start entertaining the idea in my mind that I can figure out a different way to solve this problem.

After reading more of the source, and reaching out for help from the masterful @segiddins, he explained that the name of plugin you register must be exactly the same as the gem you publish as your plugin. In retrospect this totally makes sense, and I don’t know of another way this could work.

So I renamed the pre-installation hook that I’m registering, built my gem, installed it locally, and ran pod install, and…stuff happened!

However, at this point I realized that my problem was a bit deeper than simply grabbing URLs of the dependencies specified in my Podfile and changing them. Since after dependency resolution there are likely many other dependencies that you need to build, but are not directly specified, I would need to find a way to rewrite these URLs at the time of downloading vs. just changing them in memory on the Podfile object.

WARNING: This gem was designed to be a stop gap solution and monkey patching is dangerous, YMMV!

I worked with Sam a bit more, and we decided that a solution, albeit a dangerous one, would be to patch the Git downloader object in CocoaPods. Since this was designed to be a temporary solution, I felt comfortable doing this and then working other channels to resolve the actual network issue. Ultimately, I ended up with a CocoaPods plugin that leveraged something like the following code in my cocoapods_plugin.rb file to deliver a solution:

module Pod
  module Downloader
    # Reopening the Git downloader class
    class Git
      alias_method :git_url_rewriter_url, :url
      def url
        source_url_rewriter.url_for(git_url_rewriter_url)
      end

      def source_url_rewriter
        @source_url_rewriter ||= SourceURLRewriter::Rewriter.new
      end
    end
  end
end

I can’t stress how dangerous this is, and ultimately not guaranteed to last forever. I am willingly taking advantage of the nature of Ruby to add behavior into the CocoaPods library that suits my needs in the short term. The decision to patch any library in this fashion should not be taken lightly with the expectation that this will break in the future.

Now that I had this working, I wanted to be able to provide these rewrite options within my plugin declaration so that it was highly visible in the Podfile what was going to happen. For this I looked at the cocoapods-keys plugin to see how they configured their plugin. Our goal is to be able to provide syntax like this:

plugin 'cocoapods-git_url_rewriter', {
    'git@cool-git-server.com:' => 'https://access-token-here@cool-git-server.com/'
}

Which leads us towards an internal method that looks like this:

def user_options
  @options ||= podfile.plugins['cocoapods-git_url_rewriter']
  Pod::UI.notice('No options have been specified for rewriting') unless @options
  @options
end

On the Podfile object, there exists a hash of plugin which can be referenced by the name of your plugin. The value returned from this is the hash you passed in when you declared the plugin usage in your Podfile. This allows us to keep sensitive info out of the plugin, while being very descriptive of what will happen to our dependencies.

Once this had been done, I could test my gem and see that the resources I previously was unable to reach are now being resolved, and downloaded correctly. Additionally, I was able to provide a simple solution that was transparent to my fellow developers and extensible for future use (but hopefully not needed). I ran into some problems that I feel like others developing plugins might hit which is why I felt the need to document them here.

Please feel free to reach out if you have questions I’m @brianmichel and I couldn’t have done this without help from the awesome @segiddins, seriously, he’s great.


1. Personal access tokens are a solution provided by GitHub and GitHub Enterprise that allow you to specify them instead of OAuth tokens or the basic username:password combination which will grant the caller access to a specific resource. These tokens are the kind of thing that you create once, get one time to look at it, and then it just becomes an opaque resource on your account. Additionally, they can be configured with different limitations (check out the options in the screenshot).

Jul 21, 2015 28 notes
Next page →
20172018
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201620172018
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201520162017
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201420152016
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201320142015
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201220132014
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
201120122013
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
20112012
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December