Audio for Video Part I: The Basics

Updated: March 1, 2022. “I’m not a sound person.” This is the typical refrain from image-oriented creators who are afraid of, or confused by, the complexities of audio for video. Indeed, sound is its own department, its own skill set, and those who specialize in sound are focused on achieving more than simple sound capture — they are dedicated to the full sound design in a visual piece.

Sound is immersive. Without sound, we don’t know what is going on in a scene involving dialog. Sound brings another dimension to the visuals — as the train passes by in the film, they pan the audio from left to right to bring this spatial relationship to the film. Filmmakers also use music to bring an emotional quality to the film — whether it is the creepy music that indicates something scary is about to happen, or a melancholic song that brings a tear to your eye in a tender, dramatic scene. Sound in the form of dialogue and music provides us information and emotion, respectively.

But, on the independent filmmaker scale, most often we do not have the budget for a sound professional, let alone a whole department and we must make do with what we have. Many filmmakers starting out have discovered that the microphone on their DSLR or mirrorless camera is complete garbage. Some of this has to do with the electronics and some of it has to do with the proximity of the mic to the sound source. I’ll never forget the first time I had a $1,800 professional shotgun microphone on the other end of my headphones. I could hear the insects buzzing around the lawn! The clarity and detail were phenomenal!

This series, “Audio for Video,” is aimed at helping beginning filmmakers make sense of sound and what they should look for in equipment. I’ll also cover a variety of common filmmaking situations that you may run into that require specific considerations for your sound tools.

This article, “Part One: The Basics,” is aimed at the very basics of understanding sound, what you are capturing, how it is captured, and the resulting electronic signal that is recorded. Subsequent articles will focus on the specific situations and gear considerations, but it’s imperative that you start with this set of knowledge first before looking at gear.

What is sound? How is it being recorded digitally?

Sound is vibrations of air molecules. The vibrations can be measured as wavelengths and a single start-to-finish wavelength is a cycle — and 1 cycle per second is measured as a hertz (Hz).

The vibrations are vibrating so fast that a typical audible sound is measured in multiple hertz. For example, middle C on a piano is around 500 of these wavelengths oscillating per second. That’s right, 500 per second! Or written as 500Hz. This oscillation is called a frequency.

Pitch is the descriptor we use to describe how “high” or “low” the sound is. Think of the sound of the highest C note on a typical piano. This pitch is around 4,000 wavelengths per second. This is typically written as 4kHz (4 kilohertz, or, 4,000 Hz).

Human hearing extends from around 20Hz to 20kHz. A dog whistle clocks in at about 22kHz, just outside of the range of our hearing. Inside our inner ear are tiny hair cells that pick up these vibrations. As we age, we begin to lose our hearing gradually for anything above 10Hz, but typically it is the higher frequencies (specifically 4kHz) that will go first! Hearing Instrument Specialist Alyssa Keith from Vibrant Hearing in Missoula, Montana, explains, “The hair cells in your inner ear that are responsible for high frequency sounds are positioned in a way that makes them more vulnerable to damage, whether it be from music/noise exposure or even exposure to certain chemicals. Therefore, regardless of its etiology, hearing loss begins in the higher frequencies and will gradually bleed into a wider bandwidth of sound.” So, wear your earplugs at those rock concerts!

Frequency is the spectrum of oscillating vibrations that are measurable by instruments and are mapped to the pitches we hear.

Ultra-sonic devices clock in at the MegaHertz count — so something rated at 500Mhz would be 500,000,000 wavelengths per second, and thus, we cannot hear this sound. Wireless microphone transmitters operate in these ultra-sonic frequencies. One of the top-of-the-line wireless microphone devices, Lectrosonics, operates at these ultra-sonic frequencies.

When we use a microphone to capture a vibration, it is capturing this energy and converting it to electrical current, which is sent down a wire and then reconverted from this analog signal to a digital signal called “samples”. You can think of these samples as miniature photographs of the wavelength at that time. Similar in how if we have a camera that has more pixels, it will resolve a higher quality photograph — in sound, if we have more samples of a wavelength it will result in a higher fidelity recording of the sound. Therefore, our recording devices will have a sample rate that you can set, which is how many times in a second the sound is being captured along that wavelength.

44.1kHz is typical of what you would hear on an mp3 or on the internet. (Yes, that is 44,100 little samples per second!)
48kHz is a standard setting for most digital recorders.
96kHz and higher is considered a high-fidelity recording — usually reserved for professional-level recordings to capture the intricate details.

Along with the sample rate, recorders will have a bit depth which is the detail in which the wavelengths are captured.

The higher the bit-depth, the more-detailed the sample. To take the deep dive into how computers do this, you can visit this article from PreSonus.com. I like how the author of the article describes bit depth as: “Working with a higher bit depth is like measuring with a ruler that has finer increments: You get a more precise measurement.”

Typical settings are 48kHz sample rate at 24bit.

When we hit “Record”, it records this information in a waveform that gives us a visual representation of the amplitude and loudness peaks of the recorded audio.

To keep it simple, if your camera is also recording scratch audio along with an external recorder, you should set them to the same sample rate, as well as setting the recorder at the same frame rate that the camera is shooting — 48kHz at 24bit at 23.976fps, for example.

Enjoying this article?

Subscribe to our blog to be notified when we post new content!

What is a microphone?

There are generally three types of microphones: Dynamic, Condenser and Ribbon. The dynamic microphone has a thin disk, or diaphragm, that is attached to a wire and that wire is wrapped around a magnet. When you speak into the mic, the disc vibrates and converts that energy through the wire to the magnet generating voltage that travels down the wire. This process is known as electromagnetic induction. The voltage amount is very low — considered “mic level” signal.

A condenser microphone is a thin disk that is actively charged by voltage; these types of microphones require a small amount of voltage to be supplied to them, up the wire, to operate. This is called “phantom power” and many mixers and recorders will have a button on them to activate this extra bit of juice up to the mic to make it work. Dynamic mics do not need phantom power — condenser mics need phantom power.

Ribbon mics have a thin ribbon of metal that wraps around the poles of a magnet and when the ribbon vibrates it creates voltage through electromagnetic induction. Ribbon mics are very sensitive — not only to sound, but to the elements, therefore they rarely leave indoor studios and are most often used for musical instruments. Because of their fragile nature, you never use phantom power on a ribbon microphone because that would destroy it.

Lavalier mics and condenser boom mics are the most common in independent filmmaking — whereas ribbon mics, bi-directional mics and handheld dynamic mics are most common in music and podcasting.

Microphone Pick-Up Patterns a.k.a., “Polar Patterns”

Think of a microphone as something like an ear that “hears”. There is a particular “hearing zone” that microphones operate best in. These zones are called Polar Patterns and these are visually charted as a part of a microphone’s specifications. There are scenarios that these various patterns will work best in — thus, it’s best to consider the fact that there is not one mic that does it all, you have to pick the right one for the type of sound you are looking to capture.

Omni-directional

Omni-directional patterns pick up on all sides. They are usually ideal for lavalier “clip-on” style microphones as they can be positioned in any direction and pick up sound. You will often see this sort of microphone clipped on lapels but they can also be useful for what are called “plant mics.” Say you are filming a scene with a shirtless actor, and it’s a wide shot so you can’t have a boom, and the actors are sitting on a couch and there’s a coffee table in front of them. You may choose to put a prop on the table — like a houseplant — and “plant” the plant mic in the plant! This is one use of a plant mic. Other times omni-directional mics may be used as a plant mic would include scenes where actors are driving a car. Often they will plant this type of mic just above their head in the visor to capture clean sound in the interior of a car. These mics may also be lavalier in design, but uni-directional in polar pattern to omit ambient car noise.

Cardioid

The cardioid pickup pattern is one of the most common among microphones. Named as such because it resembles a heart, its pick-up is primarily in the front while rejecting sound from behind it and allowing a bit off the sides. You see these mics most frequently used as vocal microphones for musicians, such as the Shure SM-58 or the Shure SM-57. This pattern is also typical of podcast microphones or radio broadcaster/DJ microphones such as the Shure SM-7B.

Bi-Directional

The bi-directional pattern is just as you might imagine — it picks up at the front and rear — 0 degrees and 180 degrees — but rejects from the sides. These mics are useful for podcast situations with the speakers facing each other, but most often used as instrument microphones or when recording in what is called “Mid-Side” or “Blumlein” which are techniques for capturing in stereo. The CAD M179, which supports all the various pick-up patterns, offers bi-directional polar pattern. And the Cascade FAT HEAD II mic is a bi-directional ribbon microphone.

Super-Cardioid

This type of pick-up pattern is ideal for rejection from the sides as well as when working in indoor environments where there is a lot of echo/reverb off the walls. These mics do a better job at rejecting these reflections. The Sennheiser MKH-50 and Sennheiser MKH 60 are commonly considered the professional’s choice and have been in use in tons of famous motion pictures.

Hyper-Cardioid

The hyper-cardioid types of mics have an even tighter pick-up pattern and are a bit harder to handle or “aim”. These mics are also ideal for interior dialogue work, such as indoor interviews or narrative scenes filmed in tight spaces, or in musical situations where rejecting other sounds nearby is critical. The Audix SCX1HC is a good example of a hyper-cardioid. Audix mics are commonly used in music recording when needing to reject sounds from other sides of the mic. This specific mic works fine for small-budget video work, but it can be susceptible to radio frequency interference (RFI) and thus, I would not deem it reliable for critical video/film work as this RFI can ruin a take. If you use this mic for video work, it can record pleasing dialogue indoors — just make sure everyone turns their phones to OFF when it is in play. I have had the 3G network interference sneak into this mic on a shoot because someone on-set was texting.

Shotgun

The shotgun mic is typically what you see on the end of a boom pole in outdoor situations. These mics have a long “reach” but with this is the trade-off that they also pick up quite a bit at the rear of the microphone and are not a good choice for tight interior scenes as they will “hear” the room and its reflections more so than a hyper or super-cardioid microphone. They are ideal for outdoor interview or narrative capture and with this you will want to invest in wind protection for a mic like this. The Sennheiser MKH-416 is the professional’s choice, but the Rode NTG-3 is also a great choice for this sort of situation.

The pick-up patterns that a filmmaker will most often utilize are shotgun, omni-directional lavalier, and super-cardioid. Rarely does a video project need a bi-directional microphone, although I have seen this in conjunction with films about musicians where the musicians are using a bi-directional microphone and this audio is also used in the film. There is a great scene in the film Echo in the Canyon where Bob Dylan’s son, Jakob, sings a duet with Norah Jones and they are standing on opposite sides of a bi-directional ribbon microphone.

From this point, we need to get our microphone into the next device in our “signal chain” which would be either a Mixer or a Recorder — or a device that does both. A mixer is a multi-channel device that can take input from more than one microphone or sound source. From there the user can “mix” the relative volume levels of the sounds, as well as “pan” them either Left or Right or Center in the stereo field (left speaker, right speaker, both speakers evenly). Some mixers are only that, and then they carry that mixed sound to a video camera that can record the sound synched to the picture. Other devices are ONLY recorders that don’t sync the sound to the camera. And some devices are both a mixer and recorder allowing the most flexibility to have multi-channel sound mixed and then recorded immediately to an SD card, and in some cases also sent to the camera.

Line Level vs. Mic Level

There are two types of electronic signal being sent either to the mixer or from the mixer. One type is very “hot” called Line Level. And the other is a weaker signal called Mic Level. Think of it like an adjustable flow on a shower head. You have your hard-pressure setting when you want a massage. And then there is the slower rainfall setting. The hard-pressure setting is more like Line Level and the slower setting is more like Mic Level.

Most “microphones” will be a Mic Level signal; you use the mixer/recorder to amp up the signal with “Gain” to a specified, safe level for a clean recording.

A Line Level signal is much beefier and you have to set your mixer/recorder to accept this level of amplitude. Think of it this way: A singer sings into a microphone. This weak signal heads up the wire to the mixing console where it is then sent to an amplifier. The amplifier gains the signal up louder to project through large speakers at a concert. Line Level needs less of this amplification as it is already a hot signal coming into the mixing console.

The trick is to know what your source is: microphone? Mic Level. Some type of electronic? Possibly Line Level. Another decent way to consider this is if the source is already amplified — and you take a cable out of that source — it is Line Level. Sometimes sound engineers will take a cable out of the back of a guitar amp to get a clean feed of that sound. The amp sits on the stage and will only project so far beyond the musician, so they take a cable out of the amp and run that to the mixer to be able to send that guitar through the main speakers, too. This is a Line Level signal because it’s already coming from the guitar to the amplifier, then to the mixing console. There are also times that I have taken a patch off of a P.A. speaker at a wedding — this, too, is Line Level. I’ll discuss that further in a future article on wedding sound.

If you have your mixer set to Mic Level and you feed it a Line Level signal, you’ll know it! It will sound like crazy distorted noise. And if you feed a Mic Level signal to a mixer that is expecting Line Level, it will be extremely weak and you’ll have to gain it up way too much. Be sure to match the source and input levels. Whenever connecting devices, as they say, “Read the Manual.”

In-Camera vs. External “Second System” Sound

One consideration you will be faced with is how to sync this sound to your video. The camera is capturing the visual and may have the option to capture sound, too. In most DSLR and mirrorless cameras, you have a built-in microphone which is subpar. You may also have a microphone input that is 1/8″ (3.5mm) in size that can take a microphone cable; you can attach a better microphone and record the sound directly to your device.

However, not all devices record sound the same. One benefit of recording directly to your camera is negating the need to sync the sound in software later. But, the drawbacks to this are many. One drawback is that most of the electronics in a typical DSLR/mirrorless system are devoted to the visuals, not the audio components. These components are referred to as “preamps” and the preamps in most cameras are not high-end and will produce an audible hiss that is difficult to remove. If this is not an issue and/or your need to have the mic on-board the camera and plugged into the camera outweighs your need for stellar audio, then this is a fine way to go.

The most important rule is to get the mic as close to the sound source as reasonably possible. We’ve all been to a wedding where the speeches start up and the speaker is holding the microphone by their belly button and everyone is straining to hear what they are saying. Get the microphone up to your lips!

Every time I’ve played music on stage, the sound engineer will tell me to “kiss the mic”, meaning when I’m doing the microphone check, to get my lips right up near the mic.

With sound-for-video, we have the consideration of what the camera is seeing. Perhaps it is a wide shot of two actors and there’s no way to get a boom mic in close enough without it being seen in the film. In that case, we’d hide lavalier microphones on the talent. But the closer you can get that microphone to their mouth, the clearer the sound will be — within reason. When using condenser mics, you don’t need it right up to their lips. Within 12-18″ from their lips is the “sweet spot”, and middle of the chest is generally ideal for most lavalier microphones.

Therefore, in times where we have a wide shot, and we are working in a system of wanting to plug the microphone into the camera to sync the sound, we either need to run a very long cable back to camera, or go wireless. The biggest drawback to this, besides the quality of the preamps, is that we don’t have a good way to capture any backups or safety tracks like we can get with external recording.

Second-system sound, or external recording, is when we use a separate recorder to record the sound and use several other methods to sync this sound back to the video in post-production. This method offers the opportunity for the highest fidelity recordings, the most backup and safety tracks, and the ability to mix a lot of sound at once. The only real issue with this method is it requires a separate device and, in some cases, another person to either run this device and the microphone, or two people: One to run the device and the other to handle the microphones. But it is the best method for quality, reliability, organization and flexibility.

A typical on-board microphone might be a Rode Video Mic Pro that sits on the hot-shoe of your DSLR/mirrorless, and plugs into the camera via the 1/8″ microphone port. You then get into the audio settings of your camera and adjust the audio level manually for the optimum gain of that sound source. This level is typically having the meter bouncing around the -12 mark on these “decibel full-scale” (dbFS) meters on cameras. But, if your talent decides to start screaming, or they become really quiet ? you will have an audio file that is either unrecoverable due to distortion, or so quiet that you have to gain it up in post-production, introducing hiss to the video (known as the “Noise Floor”).

Pro tip: Safety tracks are a feature of some external recording devices. They capture the exact same recording as you are gathering on another track, but a quieter version, in case someone starts yelling or things get louder suddenly. It’s called an “Attenuated” track (ATT). This is an indispensable feature to look out for!

The typical second-system involves at the most basic level, a recorder with built-in microphones (such as the Zoom H-1, Zoom h4n or Zoom H6 as well as some Tascam models). In these cases, you’d need to get the microphone as close to the subject as possible to pick up good sound. I find these systems clunky and difficult to rig in a way that provides premium sound, but could be an okay option for documentary shooters.

The next step up would be to use this device as a recorder, but plug in external microphones. In this case you can get the microphone on a stand, or out on a boom over the action and pick up the sound from that sweet spot of 12-18″ away and out of frame. When you run external sound like this, you need a reference for syncing the sound. At the most basic level this means having someone clap their hands in front of the camera while both camera and audio are rolling. This is a simple, fail-proof way to have a visual and audio reference that can be matched up in a non-linear editing program like Adobe Premiere, Final Cut or Davinci Resolve.

Another sync method is to use a clapper board, which is a plastic or wooden board with camera information written on it and a small slat at the top that you clap, much like what you see them use in the movies “Scene 3 Take 11?.clap!”

Advanced second-system sound situations utilize what is called “timecode” which is a running tally of time counting up in frames, based on the frame rate you are filming in. This timecode is synced to the recorder as well as any/all cameras rolling. The benefit is that a camera can stop rolling for a bit, and then begin again and the file is appended with the timecode at that exact moment, of which matches the audio file’s timecode.

Your hands clapping are the cheapest sync method; regular clappers (a.k.a. “slates”) are the next step up; and timecode slates would be the priciest, most professional tool (generally used only for larger budget productions). The Denecke Timecode slate is an example of this type of clapper, which is synced to the camera and audio timecode, providing a visual read-out of the timecode-of-the-moment as well as the visual and sound of the clapper.

Really, the only true drawback to second-system sound is the added gear. To be sure, if your shooting style is run-and-gun, this will be problematic unless you have a sound person handling that aspect.

Most other video projects would benefit greatly from second-system sound and other articles in this Audio for Video series will focus on utilizing second-system, as on-camera microphone applications are inherently limited in their abilities and quality.

Imagine a typical second-system scenario …

There would be a sound person with a small bag holding their mixer/recorder. If they are going to shoot an outdoor interview, the sound person would get a shotgun mic inside a “blimp” which is a wind-cage, and outfit it with a fuzzy fur that the industry calls a “dead cat” or “wombat” — further minimizing wind noise.

They use an XLR 3-pin microphone cable plugged into the mic, running down the length of their boom pole and into the mixer/recorder set to Mic Level. The microphone needs Phantom Power, so they engage the Phantom button to send 48v up the wire to power the mic. They hover the mic over the speaking talent about 12-18″ away, as the camera angle will allow.

They have the talent speak a few lines so they can set the level on their recorder so that the loudest aspects do not peak above -18, and keep most of the signal hovering around -20dbfs.

They engage Record and record the audio. If they are syncing with a clap, both audio and camera begin to roll, someone calls out the scene and claps in front of camera, within earshot of the microphone. If timecode is being used, then both camera and recorder are setup with timecode based on the frame rate that the camera is shooting, making sure that they match.

What is also flexible about second system is that in some cases you can run a cable from the mixer back to the camera for what is called a “scratch track.” This is simply a better audio file coming from the microphone that the editor can use to get the first edit going while a sound editor finesses the second-system recorded audio. Then, after they mix and master that audio, they can re-sync it to the edited piece. Yes, even after it’s been edited!

However, using this sort of cable can also save your butt! Once I was on a shoot and my SD card died inside the recorder, emitting a very high-pitched squeal in my headphones while the talent was delivering their lines to the camera. I had a cable running out of my mixer to the camera and so while my files on the recorder for that take were trashed, the camera recorded what my mic was gathering because of that cable. Boom! Saved!

Second-system sound devices also give the recordist the ability to take notes on the metadata of the file. They can put in the location and scene number as well as any other comments pertinent to the editor, which helps them keep things organized. At the end of the day, the recordist can output a little spreadsheet that lists all the track names, specifications, and other editorial comments that can be printed by the editor. Pretty slick!

On this metadata sheet, you can see that I could name the “scene” ahead of time. I was using a boom microphone, as well as a wireless lavalier mic — so I can name my tracks according to the actual microphone as well. Additionally, Tracks 5 and 8 are my attenuated “safety” tracks. It notes the time code, the frame rate, the type of file, sample rate, bit depth and any notes I feel like putting in for the editor to help them. You’ll notice there are a couple takes of audio of just the sound the room was making called “Room Tone.” I ask everyone on-set to be still for 30 seconds — completely still (which is hard to get a group of people to do!) — and this gives the editor the ability to use this particular audio to filter out the room noise as needed. Technology is cool!

As you can see, second-system sound is the way to go for most professional audio applications. For sure putting a good mic on your camera’s hot-shoe will be better than the built-in camera, but if that camera is too far away from the sound source, it won’t matter that much. You would have to frame all your scenes within a couple feet of the sound sources (the talent) to get the most out of a hot-shoe mounted microphone.

Second-system sound devices come in a variety of inputs with a range of bells and whistles, depending on the scale of production. Each unit has pros and cons for the sorts of scenarios you may run into as an independent filmmaker.

The next articles in this Audio for Video series will focus on these typical scenarios. We’ll cover sound for interviews, sound for documentary projects, sound for weddings, sound for live conferences and sound for music videos where the musicians are performing live. We’ll be looking at these situations from the independent lower-budget perspective.

Just like with photography, having the gear is just one part of the equation — you must know how to operate it to know how to “capture the image,” whether it’s a sound or a visual. Just having the gear without knowing how to use it is akin to having an expensive camera and keeping it on Auto all the time. And there is a difference between a low-end lens versus the sharpness and clarity of an expensive lens — same for sound, there is a big difference between the quality you can get from a $200 microphone versus a $1,200 microphone — but it is more important that you know how to use the tool than it is for you to have and expensive tool that you don’t know how to use.

Keep an eye out for my second installation in this series on audio — Sound for Interviews. Coming Soon!

Audio for Video Part I: The Basics