Last updated February 2007.
Sox is a very useful program, but its command line syntax is confusing and it isn't always easy to figure out how to get it to do what you want it to do. Under most circumstances, sox copies its input to its output, possibly making changes along the way. It therefore needs an input file name and an output file name, possibly together with information about them. If it is desired to do anything other than copy the input to the output (possibly with a change in format), it is necessary to specify what to do.
The simplest use of sox therefore is with two filenames as arguments:
sox foo.aiff foo.wav
This command tells sox to copy the file foo.aiff, changing its format from aiff to wav. Sox will infer the type of a file from its extension. Since the header of the aiff file contains sufficient information about the file to convert it to wav format, no other information is necessary.
Sometimes you need to convert a file to pure PCM data so that it can be processed by programs that don't understand the various encodings and headers. Such pure PCM files are called raw files and are recognized by sox by the extension .raw. By using this extension you can use sox to convert a file to raw format. The command:
sox foo.wav foo.raw
converts the file foo to raw format.
We can also use sox to convert a raw file to another format. In this case, we have to supply some information about the raw file:
sox -r 441000 -s -w foo.raw foo.wav
The three flags preceding the input file name tell sox that the input file has a sampling rate of 44,100 samples per second, that the data is signed, and that each sample consists of a two byte word. With this information, sox can create a copy in wav format. The wav header also obligatorily includes the number of channels, but the number of channels in the input file need not be specified as sox assumes a default of mono.
It is also possible to change the representation of the data. For example, we can change the sampling rate by specifying the sampling rate for the output file:
sox foo.wav -r 22050 foonew.wav
This command changes the sampling rate to 22,050 samples per second.
Thus far we have used sox only to copy a file, possibly with a change in format. Sox can also transform its input in various ways. Some of these, such as reverb, are for musical use, but a number of effects, such as filtering, may be useful for phonetics. The name of the effect follows the the name of the output file. Any further parameters necessary to specify the effect follow its name. For example, the command:
sox foo.wav bar.wav lowp 1000.0
applies a low pass filter with cutoff at 1000 Hz to foo.wav and puts the result in bar.wav.
Sox can also change the number of channels. For example, some sound cards insist on stereo data, so it may be useful to convert monaural sound files to stereo. This command does the job:
sox foo.wav -c 2 foostereo.wav split
Of course, we can't create true stereo from mono data; the effect of the command is to duplicate the original single channel.
On the other hand, sometimes it is necessary to extract a single channel from a stereo recording. This may be because we want to process it using software that cannot deal with stereo input, or it may be because we are interested only in one channel. Sox can deal with monaural (1 channel), stereo (2 channel) and quadriphonic (4 channel) data.
There are two ways to reduce the number of channels. One is to select a particular channel. This is done by using the avg effect with an option indicating what channel to use. The options are -l for left, -r for right, f for front, and b for back. For example, to extract the left channel give a command like this:
sox foo.wav -c 1 foomono.wav avg -l
Another approach is to average the channels. To create a monoaural file from a stereo file by averaging the two channels, give a command like this:
sox foo.wav -c 1 foomono.wav avg
The general option -v is used to change the volume. The argument to this option is used as a multiplier:
sox -v 2.0 foo.wav bar.wav
places in bar.wav a copy of foo.wav with the volume doubled.
The "stat" effect produces statistical information about the audio data:
sox foo.wav -e stat
The -e flag tells sox not to generate any output other than the statistical information.
If the stat effect is followed by the flag -v, all that is printed is the multiplier that will maximize the volume without clipping. This value can be used as the argument to the -v general option.
Sox can synthesize a number of standard waveforms (sine wave, square wave, etc.) and types of noise. These are specified by means of the synth effect. Even though sox creates the output from scratch, an input file name must still be specified. The input file should be /dev/null, with the -t flag to specify its special type:
sox -t nul /dev/null sine.wav synth 1.0 sine 1000.0
This command synthesizes a 1000 Hz sine wave 1.0 seconds long, leaving the result in sine.wav.
If called as soxmix, sox adds two input files together to produce its output. For example, the command:
soxmix sine100.wav sine250.wav sine100-250.wav
adds sine100.wav and sine250.wav, leaving the result in sine100-250.wav.
On our GNU/Linux systems, sox provides the usual means for playing and recording sound files. The play command is actually a shell script that calls sox. Playing a sound file is accomplished by copying the file to the device special file /dev/dsp. The following command plays the file foo.wav:
sox foo.wav -t ossdsp /dev/dsp
The -t flag specifies the type of the file /dev/dsp.