ffmpeg – Kyle Cordes

SaaS: The Business Model – Video

On Feb. 27 at St. Louis Innovation Camp 2010, I gave a talk on the SaaS business model. I posted the slides, handout, audio, and transcript soon thereafter. Here is the 44-minute video the talk, conveniently on YouTube:

But until I revisited this page in 2020, the video situation was much more complex. It took three months (back in 2010) to post.

Warning: Sausage-making Discussion Below

The following has nothing to do with the content of the video.

This is an x.264 video, shown here initially with a Flash-only player (FV WordPress Flowplayer). Later I’ll replace this Flash-only widget with one that offers HTML5 video (for iPad use, in particular), when I find one that works sufficiently well.

That’s the easy part, though. Getting this video to you here was an adventure, and not in a good way. Three recordings were made of the talk:

We hired a professional videographer to record the talk. When I say professional, I mean it only in the most literal way, i.e. the videographer charged money. They showed up with a nice camera and a wireless lapel mic… but somehow produced a broken video recording (the first 10-15 minutes were intermittent video noise). In addition, the mic gain was turned up way too high and thus the audio is awful.
Dave Blankenship recorded the talk on his consumer camcorder; he was not paid for this, yet he did a much better job. This video is usable all the way through, but arrived in an oddball format produced mostly by some models of JVC camcorders. The audio was not so hot, because he used the mic built in to the camcorder from the back of the room.
I recorded the audio using a $5 microphone plugged in to an iPod Nano, sitting on a table at the front of the room. It’s a bit noisy, but with a few minutes of work with Audacity (Noise Removal and Normalization), the results are much better than either video attempt.

Armed with this, I set about to somehow combine the video from #2 with the audio from #3. I send emails describing this mess to several videographers I found on Craigslist. Most of them didn’t reply at all. I finally got a cost estimate from one, of many hundreds of dollars or more, and not much assurance of results.

Now I’m willing to spend some money to get good results, but spending it without confidence of results is less appealing; so I set about trying myself instead.

First, I cleaned the audio in Audacity as mentioned above.

Second, I watched the video and listened to the audio a few times, to get the approximate starting timestamp in each one of the moment the talk actually started; each recording had a different amount of lead-in time

Third, I grabbed ffmpeg, the swiss army knife of command line video and audio processing. After reading a dozen web pages of ffmpeg advice, and a number of experiments (with short -t settings, to quickly see how well it works without waiting to transcode the whole thing), I ended up with this command to produce the encoded video:

ffmpeg -y -ss 40.0 -i Recording-3-audio-only-clean.wav -ss 95 -i Recording-2-video-ok-audio-bad.mod -shortest -t 18000 -vcodec libx264 -vpre normal -b 700k -threads 2 Cordes-2010-SaaS.m4v

I then noticed that the MacPorts installation of ffmpeg omits the important qt-faststart tool, and found this helpful version of qt-faststart and used it instead, on my Mac; later I switched to a Linux machine with an ffmpeg install including qt-faststart. Without the faststart step, the metadata in the m4v file is arranged in a way that prevent progressive/streaming play-while-downloading.

The results are good but not great:

The video has some motion/interlace artifacts; these were present in the original recording, and I’m not aware offhand of what to do about them
The video camera used rectangular pixels; the pixel aspect ratio is 3:2 while it is intended for display at 16:9. I wasn’t able (at least in 20 minutes of learning and experimentation) to get the 16:9 output working correctly, so if you grab the underlying m4v file you can see the aspect ratio a bit off in the shape of the clock on the wall, for example.
The audio-video sync is adequate (and plenty good enough to follow along) but not perfect. Clearly using the audio track on a video recording is much better than putting them together in post-processing.
The audio is not as good as if I used a lav or headset mic, though I think it’s quite remarkably good for a $5 mic plugged in to iPod.
I’ve no idea if ffmpeg complies with any of the relevant copyrights/patents/whatever in video production, though it seems hopefully safe to use for a one-off non-commercial video like this. (Normally I use Apple’s iMovie for my videos, and I assume Apple has taken care of such things.)

A few morals of this story:

Get some powerful tools, and learn how to use them.
Be willing to pay for professional work, but be skeptical. Just because you pay, doesn’t mean it will be quality work.
Have a plan B. If I had assumed that at least one of the two videos would get decent audio, and skipped my own audio recording, I’d not have been able to deliver the acceptable audio here. If Dave had assumed that my professional videographer would produce results, and turned off his camera, we’d have no video here at all.

Pipe RGB data to ffmpeg

A while back I asked on the ffmpeg mailing list how to pipe RGB data in to ffmpeg. I described it as follows:

in my code I am building video frames, 720x480x24bit. I have in mind generating a large number of these, as long as a full DVD worth at 30fps, then using ffmpeg (followed by dvdauthor) to encode them in to MPEG2 for DVD usage.

There were a few replies, but no definitive answer. With considerable experimentation, I got it to work. It turns out that (as far as I can tell) ffmpeg does not have the ability to accept piped in RGB frames. It will however accept piped in data in its “yuv4mpegpipe” format. With some searching and reading I found that this is roughly akin to the format of raw DV video; each frame consists of a header something like this:

YUV4MPEG2 W%d H%d F%d:%d Ip A0:0 C420mpeg2 XYSCSS=420MPEG2

… then an LF character, then data for the the Y, U, and V “planes”. The Y data is full resolution, while the U and Y are half-resolution (this is called “420” in the video world). These planes are uncompressed, one byte per pixel. All of my past work with computer video (going back to Commodore 64s and Apple IIs) has arranged all of the bits for each pixel within a few bytes of each other; this format (with all the Y data for the whole frame, then all the U data, then all the V data) is starkly different.

The essential problem remaining was how to convert RGB to YUV. Happily there are plenty of online references for this. Unhappily there are few fast implementations, and a naive implementation will be very slow. I solved this problem by finding and hiring an expert in low-level data processing with MMX, SSE2, etc. instructions. (I am not in a position to publish that code here.)

In retrospect, though, there are routines included in Intel’s “Integrated Performance Primitives” library which perform this transformation in a highly optimized way. IPP is a bargain: for only a few hundred dollars you get a wealth of high optimized ready-to-use library routines for signal processing.

The ffmpeg piping solution consists, therefore, of:

A module which generated frames in RGB format, to contain whatever contents your application requires.
A module to very quickly convert these to YUV in yuv4mpegpipe format (write your own, or use routines in IPP, for the RGB->YUV420 part).
Pipe this data stream to ffmpeg with stdin; ffmpeg is invoked something like this: ffmpeg -y -f yuv4mpegpipe -i – -i audio.mp3 -target ntsc-dvd -aspect 4:3 foo.mpg

By using a multicore CPU and threads, this whole process can be made to happen in real time or better (i.e., one second of “wall clock” processing time, for one second of finished MPEG2 video). The resulting MPEG2 file can be used with a DVD authoring application to produce a ready-to-burn DVD ISO image.

Update: the data format above is published here as part of the mjpegtools man pages.

Make a DVD with ffmpeg

For a project we have going at Oasis Digital, we have explored various libraries for creating video DVDs from computer-generated content until program/script control. There are quite a few ways to do this; one that is appealing for a command-line junkie is the combination of ffmpeg, dvdauthor, and mkisofs. It took considerable research to figure out what commands to string together for a simple scenario:

you have some video in AVI format (for example, an MJPEG AVI from a DV video camera)
you have some background music in mp3 format
you want a simple one-title one-chapter DVD with that video and audio

There are plenty of sites with long and complex sets of commands to accomplish these things. But for this simplest case, the essential commands are:

ffmpeg -y -i video.avi -i audio.mp3 -target ntsc-dvd -aspect 4:3 dvd.mpg

mkdir DVD

dvdauthor -x file.xml # there is a way to avoid the file by putting a few more options here

mkisofs -dvd-video -o dvd.iso DVD

Of course, there is considerable other work involved in wiring up a full solution, but that is more project specific. I hope these example commands shorten the research time for the next fellow who needs to do this core processing.