Phonetics Lab Software

Phonetics Lab Software

Last updated Februrary 2007.

Nearly all of the software used in the phonetics is now available, for free and with source code, in versions that will run on commonplace Microsoft Windows and GNU/Linux PCs, and in some cases on Macintoshen.

Background: Hardware and Operating Systems

The Phonetics Lab in the Linguistics Department at the University of Pennsylvania (located in Williams Hall, room 623) contains Intel PCs, mostly running GNU/Linux (RH 9), with a few running Microsoft Windows. These machines are widely and easily available, outside the lab as well as inside, and there is now a rapidly improving mix of free-software programs for dealing with various aspects of research in acoustic phonetics, from transcription and acoustic analysis of sound recordings to statistical analysis of speech databases.

There are no Macs at present, though there have been and may be again someday.

Programs

Except where otherwise noted, all of the programs cited below exist in more-or-less equivalent versions for GNU/Linux and MS Windows, and recent versions are installed on all the lab computers. On the MS Windows machines, there should be icons for these programs on the desktop. On the Linux machines, the standard user profile should allow you to access them.

Macintosh versions of these programs often exist as well, but you are on your own in finding and testing them.

Transcriber

This is a program for creating (typically orthographic) transcriptions of sound recordings, time linked (typically at the phrase level) to a digital audio file. It will conveniently deal with long recordings -- an hour or more.

The handling of audio I/O and waveform displays is based on the Snack sound toolkit, which is the same foundation as wavesurfer (see below) and in the near future, it should be integrated with it to some extent. Versions will be available soon that can maintain multiple transcripts in parallel (for highly-interactive conversation, for instance), that are specialized for interlinear transcription on several levels (e.g. orthographic, morphemic, phonemic, phonetic), and so on.

Source code and binary distributions are available at http://www.ldc.upenn.edu/mirror/Transcriber/

A recent MS Windows binary is available here.

On the unix machines, the command is "trans".

Wavesurfer

This is a simple but powerful program for interactive display of waveforms, spectrograms, pitch tracks and transcriptions (phonetic, orthographic etc.). Source code and various binary distributions are available at http://www.speech.kth.se/wavesurfer/

The current MS Windows binary is here.

On the unix machines, the command is "wavesurfer".

Praat

Praat is a "research, publication, and productivity tool for phoneticians." It includes a comprehensive set of capabilities, usable both interactively and via a scripting language. Although it is not yet free software, it soon will be, according to its creator, Paul Boersma.

For now, you can download it from here.

On the unix machines, the command is "praat".

R

R is a free-software version of the improved version of the S statistics language, whose proprietary version goes by the name of "Splus".

A page containing lots of useful information about R, especially useful as a local Penn reference, is: http://finzi.psych.upenn.edu/

The main page for R is at: http://www.r-project.org/

If you must use Microsoft Windows, a binary version of R can be downloaded from here.

A repository of code and datasets for S and Splus, most of which will also run under R, can be found at http://lib.stat.cmu.edu/S/.

A nice, short, and simple introduction to R can be found at: http://lib.stat.cmu.edu/R/CRAN/doc/contrib/kickstart/index.html.

Octave

Octave is a free-software clone of Matlab. It is largely compatible with Matlab 4.X but not with Matlab 5.X. Luckily, the examples and provided programs for Ling525/CIS558 are in Matlab 4.X.

An MS Windows binary for Octave is available here. Some interesting free octave/matlab toolboxes are available here.

SoX

SoX ("Sound eXchange") is a command line utility that can convert various formats of computer audio files to other formats, also changing sampling rate and performing some other modifications as instructed. The command line syntax is difficult. Here are instructions on how to perform the usual tasks.

Awk

AWK is a text-processing language commonly used for massaging data. Details.

Other interesting things

(These are not necessarily installed on all lab machines).

Emu : "a collection of software tools for the creation, manipulation and analysis of speech databases." It is designed to work with the R statistics package (see below).

The Festival speech synthesis package.

Speech software from ISIP at Mississippi State.

Intra is a transcription tool, that incorporates synthesis for checking.

Slang, a C++-based software platform for speech processing (and especially speech recognition) research.

Sphinx, afree-software speech recognition system from CMU.

The CSLU speech toolkit.

The UCL SFS (Speech Filing System).

Pointers to some other systems can be found in the Linguistic Annotation Page.