SlideShare Ditches Flash for HTML5

Watch our HTML5 gallery here.

SlideShare today announced the biggest change since we started. We are now rendering presentations and documents using HTML5 instead of Flash. This is a milestone. 5 years ago, it was impossible to build something like SlideShare or Youtube without Flash. But the web has finally caught up.

This project was the biggest engineering project in SlideShare’s history. A lot of SlideShare engineering has been working on this around-the-clock for the last six months. As we have learnt over the past five years, people are picky about how their presentations look. Getting the fonts and the text placement to look exactly right across all supported browsers was a real engineering challenge. So we’re happy to finally be able to see this on SlideShare.net.

Ditching Flash for HTML5 feels like the right choice for us for a number of engineering reasons.

  1. The exact same HTML5 documents work on the iPhone / iPad, Android phones/tablets, and modern desktop browsers. This is great from an operations perspective. This saves us from extra storage costs, and maximizes the cache hit ration on our CDN (since a desktop request fills the cache for a mobile request, and vice-versa). It’s also great from a software engineering perspective, because we can put all our energy into supporting one format and making it really great.
  2. Documents load 30% faster and are 40% smaller. ‘Nuff said on that front, faster is ALWAYS better.
  3. The documents are semantic and accessible. Google can parse it and index the documents, and so can any other bot, scraper, spider, or screen-reader. This means that you can write code that does interesting things with the text on the slideshare pages. You can even copy and paste text from a SlideShare document, something that was always a pain with Flash.

What were the most challenging parts of this project? Glad you asked.

Font Conversion

Font handling was the biggest challenge. We had to build support for rendering arbitrary fonts in your browser that are not available on the client. If you invent a new font, and upload a pdf that uses it, it should still render perfectly on SlideShare. Whoa!

Text Placement

Placing the text is very tricky due to differences between different browsers, differences between fonts (handling ligature), and several other complexities. To illustrate: the PDF coordinate system starts in the bottom left. HTML starts in the top left. Pdfs use points, HTML you get your choice of unit, however no two browsers agree on how precise any particular unit is! The largest problem we face with placement is normalization. We spent a lot of time finding that magic combination of em’s, percentages and zoom which gives us correct placement across the web.

Error Handling

We also built a system to find out when there is variance between an image of the HTML output and an image generated directly from the document. If there’s more than a certain amount of variance, we consider that an error and we won’t serve that page as HTML5. Instead we’ll serve a png image of the page when that page is requested. There was some hard-core computer vision involved in the error-handling system. The way we look at it, we want to serve HTML5, but not at the expense of a document that looks bad and disappoints the author.

Cloud Computing

Our conversion stack runs on Amazon EC2 and is configured and managed by Puppet. We’ve been using EC2 for our conversion stack for years, so we’re old hands at that stuff. For this new system, we started out with a number of different types of servers (a font extractor, a font generator, etc). What we found out is that the coordination time between different machines (using Amazon SQS) and the IO time (using S3) were a huge bottleneck. So our architecture for this new system is more remenicent of the netflix “Rambo” architecture. Each box is a self-contained system that can do the entire job of conversion, with no help from anyone.

As we speak, an army of hundreds of Amazon EC2 instances is crunching away at converting the *millions* and *millions* of presentations and documents that have been uploaded to slideshare over the last 5 years to HTML5. New documents will automatically be converted to HTML5 from now on. We hope to have the transition complete by the end of the year (maybe sooner, but no promises!). At that point all slideshare content will be served as Html5.

Next Steps

This is a work in progress … we are betting the company on HTML 5, and are going to continue to invest in the HTML5 conversion stack and JavaScript player technologies that we’re releasing today. Some of the next things on our plate include

  1. Handling Z-indexes (objects occluding other objects) better
  2. continued development on our font extraction techology
  3. Adding some features that we just weren’t able to port to our html5 player in time for this launch, like embedded video and synchronized audio.

Obligatory recruiting pitch

If you’re a developer and like working on this kind of stuff, SlideShare wants to talk to you! Check out our jobs page for details.

48 Responses to “SlideShare Ditches Flash for HTML5”

    • Adam

      Not really. pdf.js is actually a tech demo used to show off the speed of modern JavaScript engines. The rendering is done entirely in the browser so it could get slow. Slideshare and other hosting sites need a fast, complete implementation. Actually they just need to convert the files in pdf format to html and web font (and maybe svg?). Html5 capabilities has caught up with pdf

  1. Leo

    Not working for me on Android. Tested with 2.3.4 stock browser and Dolphin, both mobile and desktop versions, and could not navigate the slides.
    Works fine on my desktop in Chrome, but text seems to be missing anti-aliasing.

  2. neduma

    Very Cool. Especially for IPAD users.

    By the way, When you’re folks to fix your search engine?

    Better, Just provide link to Google with site params. That works much better IMHO.

  3. Ittay

    Well, copy & paste from the “I can’t believe it’s not flash” presentation did not work well. I copied the text in the first slide (the title). Pasting into a text editor, I got 1-3 characters per line for 14 lines.

  4. Nick Pettit

    This is an insane feat of engineering. You guys could have continued to serve Flash, or just bunch of PNGs, but no. You went way beyond the extra mile to make this happen! Other companies aspiring to support HTML5 should look to this as a prominent and non-trivial example of what’s possible.

    Well done!

  5. Benjamin Woodard

    @Leo we have tested presentations using the default Android web browser and it worked before for us. We will try with Dolphin also. What phone are you using, and could you send me the link to to ben at slideshare dot com?

    @Ittay we are still ironing out the kinks with text, but we are working on copying and pasting text.

  6. Dave Hoder

    Bummer. We dumped Scribd because of this (well, that and they’re now holding documents for ransom). PDFs with links didn’t seem to work too well with their HTML 5 implementation. Hopefully you guys can do it.

  7. Nelson

    Hey Guys,
    this are great news!!! and a big part for the HTML5-Movement. I love it.
    But one thing you can improve: to give users the opportunity to make the presentation-elements unselecteble, sometimes it’s a little bit annoying.

    Keep going
    Nelson

  8. Philippe Marschall

    What’s wrong with just providing a download link to a PDF? I’m on a desktop, I have a PDF reader that is way superior than anything you can ever come up in HTML5.

  9. Ali

    congratulations .. you now won some iPad users .. and lost a hell ton of users on other platforms that doesn’t support HTML5 yet

  10. Brent Gulanowski

    Congratulations, but I’m not sure you should be so proud of having worked your poor engineers “around the clock for six months”. That’s something to be ashamed of. Nice way to set an example and expectations of programmer exploitation.

  11. neb

    So great! I could never read/view SlideShare presentations, because I use Gnash instead of Adobes Flash, and it didn’t work with SlideShare.

    Hooooray!

  12. Rob

    Wow, what a step backwards. The fonts look terrible now. This is the future? Looks like you guys were more eager to have slideshare work on iPads than providing an optimal user experience with presentations.