Yaafe core features
Yaafe core audio features.
Available features
class yaafefeatures.AmplitudeModulation
Tremelo and Grain description, according to [SE2005] and [AE2001].
- AmplitudeModulation uses Envelope to describe tremolo and grain. Analyzed frequency ranges are :
- Tremolo : 4 - 8 Hz
- Grain : 10 - 40 Hz
- For each of these ranges, it computes :
- Frequency of maximum energy in range
- Difference of the energy of this frequency and the mean energy over all frequencies
- Difference of the energy of this frequency and the mean energy in range
- Product of the two first values.
[AE2001] | A.Eronen, Automatic musical instrument recognition. Master’s Thesis, Tempere University of Technology, 2001. |
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
AmplitudeModulation EnDecim=200 blockSize=32768 stepSize=16384
class yaafefeatures.AutoCorrelation
Compute autocorrelation coefficients ac on each frames.
- Parameters:
- ACNbCoeffs (default=49): Number of autocorrelation coefficients to keep
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
AutoCorrelation ACNbCoeffs=49 blockSize=1024 stepSize=512
class yaafefeatures.ComplexDomainOnsetDetection
Compute onset detection using a complex domain spectral flux method [CD2003].
[CD2003] | C.Duxbury et al., Complex domain onset detection for musical signals, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003 |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
ComplexDomainOnsetDetection FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.Energy
Compute energy as root mean square of an audio Frame.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Energy blockSize=1024 stepSize=512
class yaafefeatures.Envelope
Extract amplitude envelope using hilbert transform, low-pass filtering and decimation.
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
Envelope EnDecim=200 blockSize=32768 stepSize=16384
class yaafefeatures.EnvelopeShapeStatistics
Centroid, spread, skewness and kurtosis of each frame’s amplitude envelope. For more details about moments, see Shape Statistics.
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
EnvelopeShapeStatistics EnDecim=200 blockSize=32768 stepSize=16384
class yaafefeatures.Frames
Segment input signal into frames.
First frame has zeros on left half so that it is centered on time 0s, then consecutive frames are equally spaced.
Consequently, frame i (starting from 0) is centered on sample i * stepSize.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Frames blockSize=1024 stepSize=512
class yaafefeatures.LPC
Compute the Linear Predictor Coefficients (LPC) of a signal frame. It uses autocorrelation and Levinson-Durbin algorithm. see [JM1975].
[JM1975] | Makoul J., Linear Prediction: A tutorial Review, Proc. IEEE, Vol. 63, pp. 561-580, 1975. |
- Parameters:
- LPCNbCoeffs (default=2): Number of Linear Predictor Coefficients to compute
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
LPC LPCNbCoeffs=2 blockSize=1024 stepSize=512
class yaafefeatures.LSF
Compute the Line Spectral Frequency (LSF) coefficients of a signal frame. Algorithm was adapted from ([TB2006], [SH1976]).
[TB2006] | Tom Backstrom, Carlo Magi, Properties of line spectrum pair polynomials–A review, Signal Processing, Volume 86, Issue 11, Special Section: Distributed Source Coding, November 2006, Pages 3286-3298, ISSN 0165-1684, DOI: 10.1016/j.sigpro.2006.01.010. |
[SH1976] | Schussler, H., A stability theorem for discrete systems, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.24, no.1, pp. 87-89, Feb 1976 |
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
LSF blockSize=1024 stepSize=512
class yaafefeatures.Loudness
The loudness coefficients are the energy in each Bark band, normalized by the overall sum. see [GP2004] and [MG1997] for more details.
[MG1997] | Moore, Glasberg, et al., A Model for the Prediction of Thresholds Loudness and Partial Loudness., J. Audio Eng. Soc. 45: 224-240, 1997. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- LMode (default=Relative): “Specific” computes loudness without normalization, “Relative” normalize each band so that they sum to 1, “Total” just returns the sum of Loudness in all bands.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Loudness FFTLength=0 FFTWindow=Hanning LMode=Relative blockSize=1024 stepSize=512
class yaafefeatures.MFCC
Compute the Mel-frequencies cepstrum coefficients [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height
Then MFCCs are computed as following, using DCT II:
[DM1980] | (1, 2) S.B. Davis and P.Mermelstrin, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28 :357-366, 1980. |
- Parameters:
- CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
- CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MFCC CepsIgnoreFirstCoeff=1 CepsNbCoeffs=13 FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
class yaafefeatures.MagnitudeSpectrum
Compute frame’s magnitude spectrum, using an analysis window (Hanning or Hamming), or not.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MagnitudeSpectrum FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.MelSpectrum
Compute the Mel-frequencies spectrum [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height
- Parameters:
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MelSpectrum FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
class yaafefeatures.OBSI
Compute Octave band signal intensity using a trigular octave filter bank ([SE2005]).
[SE2005] | (1, 2) S.Essid, Classification automatique des signaux audio-frequences: reconnaissance des instruments de musique. PhD, UPMC, 2005. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
OBSI FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
class yaafefeatures.OBSIR
Compute log of OBSI ratio between consecutive octave.
- Parameters:
- DiffNbCoeffs (default=0): Maximum number of coeffs to keep. 0 keeps N-1 value (with N the input feature size)
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
OBSIR DiffNbCoeffs=0 FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
class yaafefeatures.PerceptualSharpness
Compute the sharpness of Loudness coefficients, according to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
PerceptualSharpness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.PerceptualSpread
Compute the spread of Loudness coefficients, according to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
PerceptualSpread FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralCrestFactorPerBand
Compute spectral crest factor per log-spaced band of 1/4 octave.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralCrestFactorPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralDecrease
Compute spectral decrease accoding to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralDecrease FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralFlatness
Compute global spectral flatness using the ratio between geometric and arithmetic mean.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlatness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralFlatnessPerBand
Compute spectral flatness per log-spaced band of 1/4 octave, as proposed in MPEG7 standard.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlatnessPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralFlux
Compute flux of spectrum between consecutives frames.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- FluxSupport (default=All): support of flux computation. if ‘All’ then use all bins (default), if ‘Increase’ then use only bins which are increasing
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlux FFTLength=0 FFTWindow=Hanning FluxSupport=All blockSize=1024 stepSize=512
class yaafefeatures.SpectralRolloff
Spectral roll-off is the frequency so that 99% of the energy is contained below. see [SS1997].
[SS1997] | (1, 2) E.Scheirer, M.Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331-1334, 1997. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralRolloff FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralShapeStatistics
Compute shape statistics of MagnitudeSpectrum, (see [GR2004]).
Shape Statistics are centroid, spread, skewness and kurtosis, defined as follow:
[GR2004] | O.Gillet, G.Richard, Automatic transcription of drum loops. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, Canada, 2004. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralShapeStatistics FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralSlope
SpectralSlope is computed by linear regression of the spectral amplitude. (see [GP2004])
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralSlope FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.SpectralVariation
SpectralVariation is the normalized correlation of spectrum between consecutive frames. (see [GP2004])
[GP2004] | (1, 2, 3, 4, 5, 6) Geoffroy Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralVariation FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
class yaafefeatures.TemporalShapeStatistics
Compute shape statistics of signal frames.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
TemporalShapeStatistics blockSize=1024 stepSize=512
class yaafefeatures.ZCR
Compute zero-crossing rate in frames. see [SS1997].
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
ZCR blockSize=1024 stepSize=512