Once yaafe is installed and environment is correctly configured, you can start extracting audio features with yaafe.py. Yaafe uses the YAAFE_PATH environment variable to find audio features libraries.
First, some help is provided with the -h option.
> yaafe.py -h
With the -l option you can list available audio features:
> yaafe.py -l
Available features:
- AmplitudeModulation
- AutoCorrelation
- ComplexDomainOnsetDetection
- Energy
- Envelope
- EnvelopeShapeStatistics
- Frames
- LPC
- LSF
- Loudness
- MFCC
- MagnitudeSpectrum
- OBSI
- OBSIR
- PerceptualSharpness
- PerceptualSpread
- SpectralCrestFactorPerBand
- SpectralDecrease
- SpectralFlatness
- SpectralFlatnessPerBand
- SpectralFlux
- SpectralRolloff
- SpectralShapeStatistics
- SpectralSlope
- SpectralVariation
- TemporalShapeStatistics
- ZCR
Available feature transforms:
- AutoCorrelationPeaksIntegrator
- Cepstrum
- Derivate
- HistogramIntegrator
- SlopeIntegrator
- StatisticalIntegrator
Some of these features are not really features (like Frames or Envelope), but they are intermediate representation used for other features.
Feature transforms are transformations that can be applied to features.
You can view a description of each feature with the -d option:
> yaafe.py -d MFCC
Compute the Mel-frequencies cepstrum coefficients.
Parameters are :
- CepsIgnoreFirstCoeff (default=1): 0 means to keep the first cepstral coeffcient, 1 means to ignore it
- CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Syntax to define a feature to extract is:
name: feature [param=value] [param=value] ... [> feature-transform [param=value] ... [> ...] ]
For example:
mfcc: MFCC blockSize=1024 stepSize=1024 CepsNbCoeffs=11
mfcc_d1: MFCC blockSize=1024 stepSize=1024 CepsNbCoeffs=11 > Derivate DOrder=1
mfcc_d2: MFCC blockSize=1024 stepSize=1024 CepsNbCoeffs=11 > Derivate DOrder=2
name is used in output file to name the table that holds feature values. Parameters are optional, they are set to default values if not specified.
To extract one feature, you can use:
> yaafe -r 44100 -f "mfcc: MFCC blockSize=1024 stepSize=512" test.wav
-f option defines the feature to compute, according to Feature definition format. You may provide -f option multiple times. -r option defines the expected sample rate.
You may define a feature extraction plan, which is a text file with one feature defined per line. For example:
lx: Loudness
lx_sp: PerceptualSpread
ls_sh: PerceptualSharpness
ss: SpectralSlope
sv: SpectralVariation
sd: SpectralDecrease
sf: SpectralFlatness
sm: SpectralShapeStatistics
mfcc: MFCC blockSize=512 stepSize=256 CepsNbCoeffs=11
lpc: LPC LPCNbCoeffs=3
obsi: OBSI
obsir: OBSIR
am: AmplitudeModulation blockSize=30720 stepSize=15360
To extract all features define in a feature extraction plan, use the -c option:
> yaafe.py -c featureplan -r 32000 file1.wav
You may pass several audio files as arguments to yaafe.py script, but you can also use the -i option to specify a text file that contains one filename per line. Each audio file must have the same sample rate.
Yaafe can normalize the input signal. The normalized signal has mean equal to 0 and maximum absolute value equal to the value given by --normalize-max option.
In this paragraph, all indices starts from 0. An array of size N contains elements from 0 to N-1.
We assume that a feature value computed over a signal frame is associated with the time of the frame’s center. When computing a feature with frame size (blockSize) of b and step between frame (stepSize) of s, frame iteration operates as following:
For example, extraction of MFCC with blockSize=1024 and stepSize=512 over a signal of 10000 samples will result in following frames:
frame 0 : [ -512 511] -> centered on sample 0, padded with 0 on the left
frame 1 : [ 0 1023] -> centered on sample 512
frame 2 : [ 512 1535] -> centered on sample 1024
frame 3 : [ 1024 2047] -> centered on sample 1536
...
frame 18 : [ 8704 9727] -> centered on sample 9216
frame 19 : [ 9216 10239] -> centered on sample 9728, padded with 0 on the right
This frame iteration ensures that all features with same stepSize parameter will always be aligned, even if they have different blockSize parameter.
Yaafe outputs feature values in a HDF5 file. HDF5 is a binary format designed for efficient storage of large amount of scientific data. Yaafe creates one H5 file per input audio file, and stores each extracted feature in a different dataset. Some usefull metadata are attached to dataset, see next section.
If you’re working with Matlab, you can use some matlab scripts in yaafe package to load feature data from h5 files into Matlab environment.
If you’re working with Python, you can use the h5py package to manipulate feature data.
Each feature data is stored in a HDF5 dataset. Following metadata are attached to dataset:
-b option can specify a base directory for output files.