File: README

package info (click to toggle)
flite 1.2-release-2
links: PTS
area: main
in suites: sarge
size: 53,484 kB
ctags: 171,380
sloc: ansic: 526,028; sh: 3,078; lisp: 1,515; cpp: 776; makefile: 560; perl: 37
file content (265 lines) | stat: -rw-r--r-- 11,346 bytes
parent folder | download | duplicates (3)

         Flite: a small run-time speech synthesis engine
                          version 1.2-release
          Copyright Carnegie Mellon University 1999-2003
                      All rights reserved
                      http://cmuflite.org


Flite is a small fast run-time speech synthesis engine.  It is the
latest addition to the suite of free software synthesis tools
including University of Edinburgh's Festival Speech Synthesis System
and Carnegie Mellon University's FestVox project, tools, scripts and
documentation for building synthetic voices.  However, flite itself
does not require either of these systems to compile and run.

The core Flite library was developed by Alan W Black <awb@cs.cmu.edu>
(mostly in his so-called spare time) while employed in the Language
Technologies Institute at Carnegie Mellon University.  The name
"flite", originally chosen to mean "festival-lite" is perhaps doubly
appropriate as a substantial part of design and coding was done over
30,000ft while awb was travelling.

The voices, lexicon and language components of flite, both their
compression techniques and their actual contents were developed by
Kevin A. Lenzo <lenzo@cs.cmu.edu> and Alan W Black <awb@cs.cmu.edu>.

Flite is the answer to the complaint that Festival is too big, too slow,
and not portable enough.

o Flite is designed for very small devices, such as PDAs, and also
  for large server machine with lots of ports.
o Flite is not a replacement for Festival but an alternative run time
  engine for voices developed in the FestVox framework where size and
  speed is crucial.
o Flite is all in ANSI C, it contains no C++ or Scheme, thus requires
  more care in programming, and is harder to customize at run time.
o It is thread safe
o Voices, lexicons and language descriptions can be compiled 
  (mostly automatically) into C representations from their FestVox formats
o All voices, lexicons and language model data are const and in the
  text segment (i.e. they may be put in ROM).  As they are linked in
  at compile time, there is virtually no startup delay.
o Although the synthesized output is not exactly the same as the same 
  voice in Festival they are effectively equivalent.  That is flite 
  doesn't sound better or worse than the equivalent voice in festival,
  just faster, smaller and scalable.
o For standard diphone voices, maximum run time memory
  requirements are approximately less than twice the memory requirement 
  for the waveform generated.  For 32bit archtectures
  this effectively means under 1M. (Later versions will include a 
  streaming option which will reduce this to less than one quarter).
o The flite program supports, synthesis of individual strings or files
  (utterance by utterance) to direct audio devices or to waveform files.
o The flite library offers simple functions suitable for use in specific
  applications.

Flite is distributed with a single 8K diphone voice (derived from the
cmu_us_kal voice), an pruned lexicon (derived from
cmulex) and a set of models for US English.  Here are comparisons
with Festival using basically the same 8KHz diphone voice
                Flite    Festival
   core code    100K     2.6M
   USEnglish    35K      ??
   lexicon      1.6M     5M
   diphone      2.1M     2.1M
   runtime      <1M      16-20M

On a 500Mhz PIII, a timing test of the first two chapters of
"Alice in Wonderland" (doc/alice) was done.  This produces about
1300 seconds of speech.  With flite it takes 19.128 seconds (about
70.6 times faster than real time) with Festival it takes 97 seconds
(13.4 times faster than real time).  On the ipaq (with the 16KHz diphones)
flite synthesizes 9.79 time faster than real time.

Requirements:  

o A good C compiler, some of these files are quite large and some C
  compilers might choke on these, gcc is fine.  Sun CC 3.01 has been
  tested too.  Visual C++ 6.0 is known to fail on the large diphone
  database files.  We recommend you use GCC under Cygwin or mingw32
  instead.
o GNU Make
o An audio device isn't required as flite can write its output to 
  a waveform file. 

Supported platforms:

We have successfully compiled and run on 

o Various Intel Linux systems (and iPaq Linux), under various versions
  of GCC (2.7.2 to 3.2.1)
o FreeBSD 3.x and 4.x
o Solaris 5.7, and Solaris 9
o Initial support for Mac OS X
o OSF1 V4.0 (gives an unimportant warning about sizes when compiled cst_val.c)
o Windows 2000 under Cygwin 1.3.5
o Some support for WinCE (2.11 and 3.0) is included but is not complete

Other similar platforms should just work, we have also cross compiled
on a Linux machine for StrongARM.  However note that new byte order
architectures may not work directly as there is some careful
byte order constraints in some structures.  These are portable but may
require reordering of some fields, contact us if you are moving to
a new archiecture.

News
----

New in 1.2-release
    o A build process for diphone and clunit/ldom voices
      FestVox voices can be converted (sometimes) automatically
    o Various bug fixes
    o Initial support for Mac OS X (not talking to audio device yet)
      but compiles and runs
    o Files can be converted to a single audio file
    o optional shared library support (Linux)

Compilation
-----------

In general

    tar zxvf flite-1.2-release.tar.gz
    cd flite-1.2-release
    ./configure
    make

Where tar is gnu tar (gtar), and make is gnu make (gmake).

Configuration should be automatic, but maybe doesn't work in all cases
especially if you have some new compiler.  You can explicitly set to
compiler in config/config and add any options you see fit.   Configure
tries to guess these but it might be able for cross compilation cases
Interesting options there are

-DWORDS_BIGENDIAN=1  for bigendian machines (e.g. Sparc, M68x)
-DNO_UNION_INITIALIZATION=1  For compilers without C 99 union inintialization
-DCST_AUDIO_NONE     if you don't need/want audio support

To compile for the ipaq Linux distribution (we've done this on Familiar 
and Intimate), no automatic cross compilation configuration is
set up yet.  Thus configure on a Linux machine and
edit config/config.  
   change gcc ar ranlib to their arm-linux equivalents
   change FL_VOX to cmu_us_kal16,
   make clean
   make
   arm-linux-strip bin/flite

The copy bin/flite to the ipaq.  This binary is also
available from 
   http://cmuflite.org/packed/flite-1.2/flite-1.2_bin16KHz_arm-linux.tar.gz
Because the Linux ipaq audio driver only supports 16KHz (and more) 
we include this larger voice.  

The Sharp Zaurus SL-5000D is a very similar machine, however it 
does support 8KHz sampling and a smaller binary is provided.  The
Zaurus typicall has less free memory so there is an advantage to
this
   http://cmuflite.org/packed/flite-1.2/flite-1.2_bin8KHz_arm-linux.tar.gz

This voice also used fixed point rather floating point as the
StrongARM doesn't have floating point instructions.  We are working on
a much smaller voice for the ipaq hopefully small enough to fit in the
flash.

Usage:
------

If it compiles properly a binary will be put in bin/, note by
default -g is one so it will be bigger that ios actually required

   ./bin/flite "Flite is a small fast run-time synthesis engine" flite.wav

Will produce an 8KHz riff headered waveform file (riff is Microsoft's
wave format often called .WAV).

   ./bin/flite doc/alice

Will play the text file doc/alice.  If the first argument contains
a space it is treated as text otherwise it is treated as a filename.
If a second argument is given a waveform file is written to it,
if no argument is given or "play" is given it will attempt to 
write directly to the audio device (if supported).  if "none"
is given the audio is simply thrown away (used for benchmarking).
Explicit options are also available.

   ./bin/flite -v doc/alice none

Will synthesize the file without playing the audio and give a summary
of the speed.

   ./bin/flite doc/alice alice.wav

will synthesize the whole of alice into a single file (previoous
versions would only give the last utterance in the file, but
that is fixed now).

An additional set of feature setting options are available, these are
*debug* options, Voices are represented as sets of feature values (see
lang/cmu_us_kal/cmu_us_kal.c) and you can override values on the
command line.  This can stop flite from working if malicious values
are set and therefor this facility is not intended to be made
available for standard users.  But these are useful for
debugging.  Some typical examples are

./bin/flite --sets join_type=simple_join doc/intro
     Use simple concatenation of diphones without prosodic modification
./bin/flite --seti verbosity=1 doc/alice
     Print sentences as ther are said 
./bin/flite --setf duration_stretch=1.5 doc/alice
     Make it speak slower
./bin/flite --setf int_f0_target_mean=145 doc/alice
     Make it speak higher

The talking clock is an example talking clode as discussed on
http://festvox.org/ldom it requires a single argument HH:MM
under Unix you can call it
    ./bin/flite_time `date +%H:%M`

Voice quality
-------------

So you've eagerly downloaded flite, compiled it and run it, now you
are disappointed that is doesn't sound wonderful, sure its fast and
small but what you really hoped for was the dulcit tones of a deep
baritone voice that would make you desperately hang on every phrase it
sang.  But instead you get an 8Khz diphone voice that sounds like it
came from the last millenium.

Well, first, you are right, it is an 8KHz diphone voice from the last
millenium, and that was actually deliberate.  As we developed flite we
wanted a voice that was stable and that we could directly compare with
that very same voice in Festival.  Flite is an *engine*.  We want to
be able take voices built with the FestVox process and compile them
for flite, the result should be exactly the same quality (though of
course trading the size for quality in flite is also an option).  The
included voice is just an sample voice that was used in the testing
process.  We have better voices in Festival and are working on the
coversion process to make it both more automatic and more robust and
tunable, but we haven't done that yet, so in this first beta release.
This old poor sounding voice is all we have, sorry, we'll provide you
with free, high-quality, scalable, configurable, natural sounding
voices for flite, in all languages and dialects, with the tools to
built new voices efficiently and robustly as soon as we can.  Though
in the mean time, a few higher quality voices will be released with
the next version.

Todo:
----

This release is just the beginning, there is much to do and this can
be a lot faster and smaller.  We have already seriously considered some
of the following but they didn't make this release.  In near
future versions we will add

o Streaming synthesis so no large buffers of waveforms need be held
  (we've got an initial pass at that in Cepstral but not read for this
  release)
o Better compression of the lexicon, and unit databases
o Better quality speech (we have better diphone databases which 
  haven't been released yet but do give better quality synthesis).
o Documentation to allow people to more easily integrate flite
  into applications.
o Some reasonable voices based on our newer voice building work