Yaafe core features
Yaafe core audio features.
Available features
AmplitudeModulation
-
class yaafefeatures.AmplitudeModulation
Tremelo and Grain description, according to [SE2005] and [AE2001].
- AmplitudeModulation uses Envelope to describe tremolo and grain. Analyzed frequency ranges are :
- Tremolo : 4 - 8 Hz
- Grain : 10 - 40 Hz
- For each of these ranges, it computes :
- Frequency of maximum energy in range
- Difference of the energy of this frequency and the mean energy over all frequencies
- Difference of the energy of this frequency and the mean energy in range
- Product of the two first values.
[AE2001] | A.Eronen, Automatic musical instrument recognition. Master’s Thesis, Tempere University of Technology, 2001. |
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
AmplitudeModulation EnDecim=200 blockSize=32768 stepSize=16384
AutoCorrelation
-
class yaafefeatures.AutoCorrelation
Compute autocorrelation coefficients ac on each frames.
- Parameters:
- ACNbCoeffs (default=49): Number of autocorrelation coefficients to keep
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
AutoCorrelation ACNbCoeffs=49 blockSize=1024 stepSize=512
ComplexDomainOnsetDetection
-
class yaafefeatures.ComplexDomainOnsetDetection
Compute onset detection using a complex domain spectral flux method [CD2003].
[CD2003] | C.Duxbury et al., Complex domain onset detection for musical signals, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003 |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
ComplexDomainOnsetDetection FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
Energy
-
class yaafefeatures.Energy
Compute energy as root mean square of an audio Frame.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Energy blockSize=1024 stepSize=512
Envelope
-
class yaafefeatures.Envelope
Extract amplitude envelope using hilbert transform, low-pass filtering and decimation.
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
Envelope EnDecim=200 blockSize=32768 stepSize=16384
EnvelopeShapeStatistics
-
class yaafefeatures.EnvelopeShapeStatistics
Centroid, spread, skewness and kurtosis of each frame’s amplitude envelope. For more details about moments, see Shape Statistics.
- Parameters:
- EnDecim (default=200): Decimation factor to compute envelope
- blockSize (default=32768): output frames size
- stepSize (default=16384): step between consecutive frames
Declaration example:
EnvelopeShapeStatistics EnDecim=200 blockSize=32768 stepSize=16384
Frames
-
class yaafefeatures.Frames
Segment input signal into frames.
First frame has zeros on left half so that it is centered on time 0s, then consecutive frames are equally spaced.
Consequently, frame i (starting from 0) is centered on sample i * stepSize.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Frames blockSize=1024 stepSize=512
LPC
-
class yaafefeatures.LPC
Compute the Linear Predictor Coefficients (LPC) of a signal frame. It uses autocorrelation and Levinson-Durbin algorithm. see [JM1975].
[JM1975] | Makoul J., Linear Prediction: A tutorial Review, Proc. IEEE, Vol. 63, pp. 561-580, 1975. |
- Parameters:
- LPCNbCoeffs (default=2): Number of Linear Predictor Coefficients to compute
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
LPC LPCNbCoeffs=2 blockSize=1024 stepSize=512
LSF
-
class yaafefeatures.LSF
Compute the Line Spectral Frequency (LSF) coefficients of a signal frame. Algorithm was adapted from ([TB2006], [SH1976]).
[TB2006] | Tom Backstrom, Carlo Magi, Properties of line spectrum pair polynomials–A review, Signal Processing, Volume 86, Issue 11, Special Section: Distributed Source Coding, November 2006, Pages 3286-3298, ISSN 0165-1684, DOI: 10.1016/j.sigpro.2006.01.010. |
[SH1976] | Schussler, H., A stability theorem for discrete systems, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.24, no.1, pp. 87-89, Feb 1976 |
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
LSF blockSize=1024 stepSize=512
Loudness
-
class yaafefeatures.Loudness
The loudness coefficients are the energy in each Bark band, normalized by the overall sum. see [GP2004] and [MG1997] for more details.
[MG1997] | Moore, Glasberg, et al., A Model for the Prediction of Thresholds Loudness and Partial Loudness., J. Audio Eng. Soc. 45: 224-240, 1997. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- LMode (default=Relative): “Specific” computes loudness without normalization, “Relative” normalize each band so that they sum to 1, “Total” just returns the sum of Loudness in all bands.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
Loudness FFTLength=0 FFTWindow=Hanning LMode=Relative blockSize=1024 stepSize=512
MFCC
-
class yaafefeatures.MFCC
Compute the Mel-frequencies cepstrum coefficients [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height .
Then MFCCs are computed as following, using DCT II:
[DM1980] | (1, 2) S.B. Davis and P.Mermelstrin, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28 :357-366, 1980. |
- Parameters:
- CepsIgnoreFirstCoeff (default=1): 0 keeps the first cepstral coeffcient, 1 ignore it
- CepsNbCoeffs (default=13): Number of cepstral coefficient to keep.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MFCC CepsIgnoreFirstCoeff=1 CepsNbCoeffs=13 FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
MagnitudeSpectrum
-
class yaafefeatures.MagnitudeSpectrum
Compute frame’s magnitude spectrum, using an analysis window (Hanning or Hamming), or not.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MagnitudeSpectrum FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
MelSpectrum
-
class yaafefeatures.MelSpectrum
Compute the Mel-frequencies spectrum [DM1980].
Mel filter bank is built as 40 log-spaced filters according to the following mel-scale:
Each filter is a triangular filter with height .
- Parameters:
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- MelMaxFreq (default=6854.0): Maximum frequency of the mel filter bank
- MelMinFreq (default=130.0): Minimum frequency of the mel filter bank
- MelNbFilters (default=40): Number of mel filters
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
MelSpectrum FFTWindow=Hanning MelMaxFreq=6854.0 MelMinFreq=130.0 MelNbFilters=40 blockSize=1024 stepSize=512
OBSI
-
class yaafefeatures.OBSI
Compute Octave band signal intensity using a trigular octave filter bank ([SE2005]).
[SE2005] | (1, 2) S.Essid, Classification automatique des signaux audio-frequences: reconnaissance des instruments de musique. PhD, UPMC, 2005. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
OBSI FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
OBSIR
-
class yaafefeatures.OBSIR
Compute log of OBSI ratio between consecutive octave.
- Parameters:
- DiffNbCoeffs (default=0): Maximum number of coeffs to keep. 0 keeps N-1 value (with N the input feature size)
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- OBSIMinFreq (default=27.5): Minimum frequency for OBSI filter.
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
OBSIR DiffNbCoeffs=0 FFTLength=0 FFTWindow=Hanning OBSIMinFreq=27.5 blockSize=1024 stepSize=512
PerceptualSharpness
-
class yaafefeatures.PerceptualSharpness
Compute the sharpness of Loudness coefficients, according to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
PerceptualSharpness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
PerceptualSpread
-
class yaafefeatures.PerceptualSpread
Compute the spread of Loudness coefficients, according to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
PerceptualSpread FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralCrestFactorPerBand
-
class yaafefeatures.SpectralCrestFactorPerBand
Compute spectral crest factor per log-spaced band of 1/4 octave.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralCrestFactorPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralDecrease
-
class yaafefeatures.SpectralDecrease
Compute spectral decrease accoding to [GP2004].
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralDecrease FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralFlatness
-
class yaafefeatures.SpectralFlatness
Compute global spectral flatness using the ratio between geometric and arithmetic mean.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlatness FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralFlatnessPerBand
-
class yaafefeatures.SpectralFlatnessPerBand
Compute spectral flatness per log-spaced band of 1/4 octave, as proposed in MPEG7 standard.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlatnessPerBand FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralFlux
-
class yaafefeatures.SpectralFlux
Compute flux of spectrum between consecutives frames.
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- FluxSupport (default=All): support of flux computation. if ‘All’ then use all bins (default), if ‘Increase’ then use only bins which are increasing
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralFlux FFTLength=0 FFTWindow=Hanning FluxSupport=All blockSize=1024 stepSize=512
SpectralRolloff
-
class yaafefeatures.SpectralRolloff
Spectral roll-off is the frequency so that 99% of the energy is contained below. see [SS1997].
[SS1997] | (1, 2) E.Scheirer, M.Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. IEEE Internation Conference on Acoustics, Speech and Signal Processing, p.1331-1334, 1997. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralRolloff FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralShapeStatistics
-
class yaafefeatures.SpectralShapeStatistics
Compute shape statistics of MagnitudeSpectrum, (see [GR2004]).
Shape Statistics are centroid, spread, skewness and kurtosis, defined as follow:
[GR2004] | O.Gillet, G.Richard, Automatic transcription of drum loops. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, Canada, 2004. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralShapeStatistics FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralSlope
-
class yaafefeatures.SpectralSlope
SpectralSlope is computed by linear regression of the spectral amplitude. (see [GP2004])
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralSlope FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
SpectralVariation
-
class yaafefeatures.SpectralVariation
SpectralVariation is the normalized correlation of spectrum between consecutive frames. (see [GP2004])
[GP2004] | (1, 2, 3, 4, 5, 6) Geoffroy Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, 2004. |
- Parameters:
- FFTLength (default=0): Frame’s length on which perform FFT. Original frame is padded with zeros or truncated to reach this size. If 0 then use original frame length.
- FFTWindow (default=Hanning): Weighting window to apply before fft. Hanning|Hamming|None
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
SpectralVariation FFTLength=0 FFTWindow=Hanning blockSize=1024 stepSize=512
TemporalShapeStatistics
-
class yaafefeatures.TemporalShapeStatistics
Compute shape statistics of signal frames.
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
TemporalShapeStatistics blockSize=1024 stepSize=512
ZCR
-
class yaafefeatures.ZCR
Compute zero-crossing rate in frames. see [SS1997].
- Parameters:
- blockSize (default=1024): output frames size
- stepSize (default=512): step between consecutive frames
Declaration example:
ZCR blockSize=1024 stepSize=512