Skip to main content
Animation and motion graphics

Using SoX to automate audio creation for 3D Studio Max

By February 11, 20123 Comments

Lately I’ve been getting into Maxscript, a way of generating 3D objects and animations by scripting. It allows 3D artists to create complex procedural animations without having to create each object and animation manually…

For example, I wrote a script that tells 3D Studio Max how to generate a labyrinth, and how to animate it growing onto the screen:

Like all graphics, though, it needs some sort of audio to match up to the picture. In the past I’ve had to lay audio out manually – for example, I created this simple physics simulation as part of a film for Hazelnut Films:

The ‘ding’ sounds when the coins collide with things are all added manually afterwards. They match up pretty well with the pictures but it was a bit fiddly – and there are far less objects moving around than in the labyrinth animation above.

So I started looking for a way to have 3D Studio generate the necessary audio at the same time as it creates the animation. A couple of days ago I came across an audio utility called SoX that makes it all possible – so here’s a description of the way I got it working, in the hopes it may help others trying to do the same thing.

(This is a long post – scroll down to the last video if you just want to hear the results)


SoX is a cross-platform command-line audio utility that lets you perform simple operations on audio files. More info, downloads and documentation are here. You can use it for converting audio files from one format to another, you can pad and trim audio files, and you can change their pitch and add simple effects. Because you run it by typing commands in at the DOS or Terminal prompt, rather than using a mouse to operate it, it’s eminently scriptable, using simple batch files. For example, typing the following into a DOS prompt [assuming you have SoX installed]:

sox -m mysound.wav myothersound.wav theResult.wav

will produce a new WAV audio file by mixing the two input files mysound.wav and myothersound.wav together. You can pad files to change the timing:

sox tick.wav theResult.wav pad 2000s

This’ll take the tick.wav file and add 2000 samples of silence to the beginning, creating a new audio file called theResult.wav. You can change the pitch of audio clips too:

sox tick.wav slowtick.wav speed 0.5

This creates a copy of tick.wav that plays back at half the speed, and hence sounds an octave lower.

With me so far? The basic deal is that SoX can only do simple operations on audio files, but because it can be run with a batch file, you can create a complex list of simple operations to generate a complicated output file. So, to add a ‘ding’ sound 3 seconds into an existing audio file takes two operations:

sox ding.wav temp-ding.wav pad 144000
sox -m temp-ding.wav existingaudiofile.wav mynewoutputfile.wav

The first command makes a temporary copy of the ding.wav file, but with 3 seconds (144000 samples) of silence added to the beginning. The second command takes this delayed ding sound and mixes it with the existing audio file to create a new file with the ding in the right place in it, laid over any audio that was there already.

That’s the basics of it.


I created a DOS batch file that runs a few SoX commands:


sox ting.wav temp-ting.wav pad %1s
sox -m -v 0.2 temp-ting.wav -v 1 output.wav newoutput.wav
del output.wav
ren newoutput.wav output.wav

So, assuming you have a file called output.wav with your existing audio in it, and a file called ting.wav you want to mix into it at [say] 2 seconds in (which equates to 96000 samples at 48K/s sample rate), you can execute this at the command prompt:

addtingsound 96000

Line by line, the batch file does this:
– line 1 makes a copy of the ting.wav sound, adding silent padding at the beginning. The amount of padding is specified by %1, which represents the first parameter given in the command line; 96000 in the example above
– line 2 mixes this new delayed sound with the existing output.wav file to create a temporary new output file. the -v parameters before each input filename specifies the relative volumes to mix them at
– line 3 deletes the old output.wav file
– line 4 renames the temporary output file to output.wav

The result is a new version of output.wav with the ting mixed in at the right place.

You can add multiple tings to the output file by running the batch file as many times as you need:

addtingsound 2000
addtingsound 96000
addtingsound 128000

… and this can, of course be automated with another batchfile.

Interfacing with Max

There’s a number of ways of doing this, but I chose the simplest and cludgiest. I wrote a simple tile-flipper gadget for Max that lets you create and animate arrays of tiles in a similar way to the labyrinth generator above, so I added a few extra lines to the tile animation code. This gets run every time a tile animates:

-- the tile flips over at myFrameNumber
-- so add a sound at that point
myString = "echo call addtingsound " + (((myFrameNumber) * (48000/25)) as string) + " >> tilesounds.bat"
DOSCommand myString

When Max has finished animating each tile, it adds a new command to my tilesounds.bat file. After the whole animation has run, I end up with the tiles animated in Max, and a tilesounds.bat file containing a list of commands:

call addtingsound 2310
call addtingsound 9600
call addtingsound 42000
call addtingsound 67500
call addtingsound 74320
call addtingsound 83250

All I have to do to create the final audio mix is run tilesounds.bat. It calls the addtingsound batch file repeatedly, adding tings to the right points in the file to match up with the animation:

The sounds happen at the right times – and all automatically! The sounds are a bit crude, though, and they sound a little superimposed, detached from the action. To add realism, a bit of variety is needed.

Adding realism

The example above uses a single ‘ting’ sample, but each tile animation goes through 3 stages: it rotates, bounces, and then settles. So, I added another sample – a ‘tick’ sound, and some randomisation to the pitch. Max knows where on screen each tile appears so it’s relatively simple to add a panning effect too. I even added some subtle delays to the left and right channels to emulate the real-life effect whereby a sound on your left hand side reaches your right ear later than the left ear. If a tile is in the distance, I reduce the volume a bit and add a bit more of a delay too.

Max does all the calculations as it animates each tile, ultimately passing my addtingsound batch file a number of parameters. The tilesounds.bat file ends up looking like this:


call addtingsound 1.1661 0 19200 34560 0.94 0.223093 94 117
call addtingsound 0.985767 1920 21120 53760 0.94 0.27824 94 115
call addtingsound 0.922504 3840 23040 36480 0.94 0.333387 94 114
call addtingsound 0.939196 5760 24960 42240 0.94 0.388533 94 112
call addtingsound 1.0006 7680 26880 46080 0.94 0.44368 94 110
call addtingsound 0.919269 9600 28800 38400 0.94 0.498827 94 108


The parameters for each tile are, in order:
– a random number between 0.9 and 1.2, used to vary the pitch of the sample
– the time (in samples) the tile starts moving
– the time the tile bounces
– the time the tile settles
– left-hand side volume
– right-hand side volume
– left-hand side delay
– right-hand side delay.

Max calculates the left- and right-hand volumes and delays based on the tile’s position.

The new addtingsound batch file is somewhat more complicated now; it takes a number of SoX operations to add each sound now, as each one needs to be pitch-changed, padded out, trimmed, balanced, each channel delayed, and finally mixed into the output file. Without further dissection here’s what it looks like:

rem Create first ting:
..\sox ting.wav temptingb.wav speed %1
..\sox temptingb.wav tempting.wav pad %2s remix 1v%5 2v%6
..\sox tempting.wav firstting.wav trim 0 %3s

rem Create bounce ting:
..\sox ting.wav temptingb.wav speed %1
..\sox temptingb.wav tempting.wav pad %3s remix 1v%5 2v%6
..\sox tempting.wav secondting.wav trim 0 %4s

rem Create final tick:
..\sox tick.wav temptingb.wav speed %1
..\sox temptingb.wav thirdting.wav pad %4s remix 1v%5 2v%6

rem Create left and right composite versions with separate gains and delays:
..\sox -m -v 0.2 firstting.wav -v 0.4 secondting.wav -v 0.1 thirdting.wav leftcomp.wav remix 1v1 1v0 pad %7s
..\sox -m -v 0.2 firstting.wav -v 0.4 secondting.wav -v 0.1 thirdting.wav rightcomp.wav remix 2v0 2v1 pad %8s

rem Mix into the output file:
..\sox -m -v 1 output.wav -v 0.5 leftcomp.wav -v 0.5 rightcomp.wav tempoutput.wav

rem Replace the old output file with the new one ready for the next tile:
del output.wav
ren tempoutput.wav output.wav

A bit more complicated, huh… but the result seems to be worth it. This example ought to be previewed with decent speakers, as the stereo imaging is pretty good now:

What a timesaver… realistic audio, generated automatically. Up until now it’s been useful to use batch files as an intermediary between Max and SoX, but there’s no reason Max couldn’t perform the SoX commands itself for more control. And, of course, this technique can be used with any 3D software that will allow you to write to a text file.

Hopefully this’ll help anyone else trying to do this sort of thing…


Leave a Reply