Video Encoding, Still Slow in 2014

Over at Oasis Digital, some of us work together in our St. Louis office, while others are almost entirely remote. I’ve written before about tools for distributed teams, and we’ve added at least one new tool since then: talk while drawing on a a whiteboard, record video, upload it to where the whole team can watch it. There are fancier ways to record a draw-and-talk session (for example, record audio of the speaker, and video of drawing on a Wacom tablet), but it’s hard to beat the ease of pointing a video camera and pressing Record.

This is effective, but I am disappointed by the state of video encoding today.

Good: Recording with a Video Camera

We tried various video cameras and still cameras with a video feature. There are problems:

  • Some cameras have a short video length limit.
  • Some cameras emit a high-pitched noise from their autofocus system, picked up by the microphone. (The worst offender I’ve heard so far is an Olympus camera. Many other cameras manage to autofocus quietly.)
  • Some cameras have a hilariously poor microphone.
  • Many cameras lack the ability to accept an external microphone.
  • Some cameras have a hard time focusing on a whiteboard until there is enough written on it.
  • Many cameras lack the depth of field to accommodate both a whiteboard and a person standing in front of it.

For example, I recorded a whiteboard session yesterday, 28 minutes of video, using a GoPro camera at 1080 HD (so that the whiteboard writing is clear). The GoPro camera did a good job. I’d prefer it have a narrower field of view (a longer lens) to yield a flatter, less distorted image, but it is acceptable for this impromptu daily use.

The GoPro produces high quality, but poorly compressed, MP4 video. In this example, the file size was ~5 GB for 38 minutes.

Bad: Uploading and Downloading

The question is: how to provide this video to others? We have a good internet connection at headquarters, but 5 GB still takes a while to upload and download. Even if the transfer speed was greatly improved, as a person who remembers computers with kilobytes of memory, I find 5 GB for 38 minutes morally offensive.

Good: Re-Encoding to Reduce File Sizer

The answer of course, is to re-encode the video. A better encoder can pack 38 minutes of HD video into much less than 5 GB. Keeping the resolution the same, with common encoding systems this 5 GB is easily reduced by 80% or more. If I’m willing to give up some resolution (1080 to 720), it can be reduced by 90%.

By the way, I describe this as a re-encoding rather than transcoding because most often we use the same encoder (H.264) for both. Sometimes we use Google’s VP8/9/WEBM/whatever, as that sounds like it might be “the future”.

Bad: Re-Encoding in the Cloud

I love the idea of making re-encoding someone else’s problem. Various companies offer a video encoding service in the cloud; for example Amazon has Elastic Transcoder. But that doesn’t solve the problem I actually have, which is that I want to never upload 5 GB in the first place.

Good: Ease of Re-Encoding / Transcoding

There are plenty of tools that will perform the re-encoding locally with a few clicks. I have used these recently, and many more in the past:

Worse: Encoding Speed

With the computers I most often use (a MacBook air, a Mac Mini, various others) the encoding time for good compression is a multiple of real time. For example, my experience yesterday which led to this blog post was 38 minutes of video re-encoding in 3+ hours on my Air. (It would’ve been considerably faster than that on the faster Mac Mini, but still a long time).

Is There a Good Solution?

I’m hoping someone will point me to a piece of hardware that does high quality transcoding to H.264 or WEBM, at real-time speed or better, for a not-too-painful price. It seems like such a thing should exist. It is just computation, and there are ASICs to greatly speed some kinds of computations (like BitCoin mining) at low cost. There are GPUs, which speed video rendering, but from reviews I have seen the GPU based video encoders tend to do a poor-quality job, trading off video quality and compression rate for speed. Is there a box that speeds of transcoding greatly, while keeping high quality and high compression? Or the the only improvement a very fast (non-laptop) PC?

 

Code Reviews – Use a gate, not a drive-by

Does your team / project use code reviews? If not, I suggest starting today. There are countless resources online about how and why to do code reviews; there are numerous code review tools, open source and commercial.

Who should review code? A great question, perhaps for other blog posts.

When should code be reviewed? Hopefully also great question, and the topic of this blog post.

Context

When making a recommendation, I’m trying to get in the habit of always mentioning the context, the situation in my work from which my recommendation arises. This is in response to seeing frustration when people try to follow advice, not realizing that their situation is completely unlike that of the writer.

Here ate Oasis Digital our context is mostly complex line-of-business applications, which are expected to live for a long time and are important to the business operations of the companies that use them. We are almost always replacing an existing system (which a company relies on) or expanding an existing system (which a company relies on).

Another critical part of our context: we have flexible and fast source control, and a group of developers who have learned how to use it competently. We have decoupled source control from build automation, so that we can perform builds of branches in progress without merging them to a mainline. We can achieve a sufficient degree of continuous integration without actually putting each change directly into the same code line.

Review Code Early

How long after code has been written should it be reviewed? Hours or days, not weeks. For this reason, we commit and push code frequently (to a disposable branch of course) where others can see it for early review. It can be a challenge to get past the ego-driven desire to make code perfect before exposing it to the scrutiny of others, but it is worthwhile. By exposing code to review and comment early in its life, a developer can get useful feedback early in a unit of work.

Review Code Often

Another anti-pattern of code review is to look at a given proposed change only once or twice. That is sensible for a very small change, but for a complex change to a complex system it is best to perform code review repeatedly as work proceeds. For example, consider a feature that takes four weeks to implement. Our typical approach, which I recommend to others (if their context is reasonably similar to ours, see above) is something like this:

  • Review the code a few days in.
  • Review the code progress once a week or so.
  • Review the code when it is nearly ready to be called “done” and to go in the mainline.
  • If there are fixes needed, review those promptly so as not to cause delay.

Review Code Now

Code waiting for review is a drag on the speed of development; the right time to review code is now. A quick review now is probably more valuable than a deep review deferred.

Review Code as a Gate

In some organizations, code review happens asynchronously and sporadically, after code is already part of a project. This risks software quality collapse. Once code is in the mainline of a product, it is arguably too late to review. If a reviewer finds a problem with code already in a system, there will be pressure to sweep that problem under the rug and move on, or to convince your that is not really a problem, or that quality doesn’t really matter, just this once.

The right answer is simple: use code review as a gate through which code must pass before it can become part of the mainline of the project. Use your source control system, and stack up proposed code changes that are nearly ready to go into the mainline. Then periodically perform your review process (whether it is in person, remote, synchronous, asynchronous, etc.). Once the code passes a final review process, then it goes in the mainline. If there are serious problems, the code sits in a branch, a developer improves it, and it tries again next time.

We have had excellent results with this approach, maintaining software quality for years on end even through ongoing development team turnover.

 

Update Your Obsolete Packages

A Great Solution…

Maven, Leiningen, Nuget, Gradle, NPM, and numerous other package/dependency management tools are very helpful for modern (or perhaps post-modern) development, which typically involves numerous library dependencies.

These tools implement a fundamentally good and important idea:

  1. List the packages, and versions, your package/application depends on. In a text file. In the project. Where it can be diffed and merged.
  2. Run a command, all the libraries are all fetched and made available.

All of the above tools default to fetching from open source software repositories. Some or all of them can be easily configured to perform the same job with internal, closed-source repositories if needed.

All of the above tools are a large improvement over the bad old days, when adding a library meant a manual, recursive search of the internet for transitive dependencies.

… Leads to a New Problem

These tools make it so easy to “lock in” specific library versions, that projects can very easily fall far behind the current release versions of those libraries. To avoid this in our projects, a few times per year we upgrade all the libraries (timed to avoid doing it right before any important release dates).

I’ve seen this done by hand, looking up the current version of each library – and it is very tedious. Instead, a package/dependency manager ought to have an easy way to update versions. Sadly, as far as I know none of them have such a thing built in. Here are the addon tools I’ve found so far:

NPM

Use npm-check-updates. The built in “npm obsolete” sounds like it might do the right thing, but it doesn’t.

npm-check-updates -u

Leiningen

Use lein-ancient.

lein ancient upgrade

Maven

The Versions Plugin does the job.

mvn versions:use-latest-releases

Ruby

gem outdated

or

bundle outdated

Bower

There are numerous Stack Overflow questions asking for this functionality, but it is not present. To some extent, “bower list” will show you packages for which newer versions are available, then you can manually update them in your bower.json file.

Cocoapods

pod outdated

Others?

If anyone knows of similar tools for other dependency managers, I’ll be happy to add them to this list.

 

A Better Bash+Git Prompt

I enjoy a souped-up Bash prompt which radiates information about (for example) the branch I am on, in addition to the usual information (current directory). There are countless examples online, but the nicest I’ve found so far is this one from Martin Gondermann. I have it set to show just a little information:

bash-prompt-with-git-info

Martin shows a more sophisticated example, along with an explanation of the Git-related decorations, as shown:

bash-prompt-explanation

Whether you prefer this one or any of the other similar tools, it’s worth the few minutes to upgrade your prompt from “$”.

Clojure Conj 2013

Clojure Conj 2013 venue small 2Last week I attended Clojure Conj, the annual “main” event for the Clojure development community. Past events were held in Durham, NC at a typical conference hotel; this year’s event was held in Alexandria VA, in the much more impressive venue shown here – I happened to look out my hotel room window at just the right moment, as you can see.

(I should mention of course, that Oasis Digital was a sponsor.)

As is often the case that software related conferences, it is not so much about the learning (which you can achieve as well or better on the Internet) as about the community. Some of my current attention is on ClojureScript; I’ve been interacting with the group of people on its mailing list, but last week I have the opportunity to chat with several them in person.

Here is an overall impression. Most of the content was very worthwhile. Many of the talks were at a relatively sophisticated technical level, which is very good for the audience at this event, mostly populated by people who are already in the Clojure world.

A few of the talks, which I will not identify out of politeness, were not so hot in terms of the value received versus the hour spent watching. My hope is that as the community grows and matures, there will be a greater supply of speakers who are more skilled at reliably delivering value in the time allotted – It would be great to raise that evaluation from “most of the talks were worth the time” to “all of the talks were worth the time”.

A few more specific notes:

data.fressian

Fressian is near to my heart because my previous company (which sold a Java Web Start SaaS application) used Hessian, after which Fressian is modeled and named. Hessian served us very well, and we were able to adjust its source code a bit too match a specific local need to traverse some object references and not others.

Prismatic Schema

Prismatic Schema is a very appealing piece of technology, which will quite likely make it into my projects.

Clojure Conj 2013 venue small 3

Harmonikit

I get the impression that Rich Hickey can spend a relatively small number of hours and come up with yet another fascinator chunk of technology. Perhaps “on-demand”. Or even without any demand, just sitting in a hammock.

The most visually appealing part of the project was an off-the-shelf $50 product for building audio control GUIs on iPads. A talk about how to make things like (in Clojure, if at Clojure Conj) hat might be even more interesting than this talk which merely used it.

Programming with Hand Tools

Tim Ewald is an outstanding speaker. His talk about hand tools was a delight, partly because I spend many hours as a teen working on projects in wood, with a combination of hand tools and power tools. But I came away with a somewhat different impression than Tim did about the relative merits of power tools versus hand tools. This is probably an indication of my lack of skill, but I was always much happier with the results I could get with a power tool. A carefully used power tool could produce a bit of work perfectly straight, perfectly cut, etc. the first time. I remember in a required (!) shop class at school (do they even have those anymore?) cutting a dado with a hand saw and chisel. Neither my cut, nor any other, not the teacher’s, was anywhere near the ideal easily obtained with a table saw and dado blade.

Still, that didn’t take away from the enjoyment of Tim’s talk at all. I think this is a talk people will mention at every future Clojure Conj to come.

 

Chuck Moore / GreenArrays at Strange Loop 2013

I think this was my favorite bit of Strange Loop 2013. Chuck Moore (inventor of FORTH and other interesting things) spoke about his work at GreenArrays  on a novel CPU. The CPU consists of 144 separate “computers” those computers are quite small in capability compared what we typically think of as a computer or CPU. They are extremely RISC and startlingly efficient in terms of nanowatts per unit of computation. Chuck made a compelling case that a few years down the road this metric, energy expended relative to work done will become a main driver in system design.

2013-09-20 Strange Loop Chuck Moore2

Chuck is a low-level kind of guy, so he spent much of the talk discussing ways to program this array of tiny computers at the individual level. The techniques are quite radically different from any kind of programming I’ve ever heard of; for example it is necessary to make two or three of these tiny computers collaborate over a wire/channel between them merely to communicate with main memory! Also, these tiny computers have an unusual six-bit “byte” and 18 bit “word”.

Now the obvious thing to want here is a compiler. A compiler that takes a program written in a high-level language and emits an array of programs to run on this array of computers to achieve the high level program’s purpose. My understanding of the state-of-the-art in compiler design is woefully incomplete, but I’m fairly confident that the thing needed here is far beyond what anyone is doing. I can’t find much about this concept on the GreenArrays website, surprisingly. I wonder if there are any research projects attempting to build such a compiler. Certainly this would be an extreme case of the quest for a “sufficiently smart compiler“.