NetNado
  Найти на сайте:

Учащимся

Учителям



Обработка сигналов в системах телекоммуникаций

6. Conclusions and future works


This article has provided a new codebook structure for Split Vector Quantizers of Line Spectral Frequencies. We have introduced new fast codebook search algorithm. Our method assures large reduction of computational complexity with minimal deterioration of processed speech signal quality.

Future work will include tests of wider range of internal codebook splitting and class selection methods. This may direct to performance enhancement. Codebook creation with weighted distance measure may improve spectral distortion of speech signal and will assert transparent coding quality at less than 24 bits per frame with minimal computational complexity.

7. Acknowledgements

*This work was supported by Bialystok Technical University under the grant W/WI/2/05.

References


1. Paliwal, K.K. and Kleijn, W.B., Quantization of LPC parameters, Kleijn, W.B., Paliwal, K.K. (Eds.), Speech Coding and Synthesis, Elsevier, Amsterdam, pp. 443-466, 1995

2. A. Petrovsky, A. Sawicki, A. Pavlovec, Split Vector Quantization of Psychoacoustical Modified LSF Coefficients in Speech Coder Based on Pitch-Tracking Periodic-Aperiodic Decomposition, Information processing and security systems, Springer-Verlag, pp.67-76, 2005

3. Grocholewski S., Założenia akustycznej bazy danych dla języka polskiego na nośniku CD-ROM, Mat. I KK: Głosowa komunikacja człowiek-komputer, Wrocław 1995, s. 177-180

4. A.H. Gray, Jr. and J.D. Markel, Quantization and bit allocation in speech processing, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-24, pp. 459–473, 1976.

5. Giuseppe Patanè, Marco Russo, ELBG Implementation, International Journal of Knowledge based Intelligent Engineering System, vol. 4, pp.94-109, Apr. 2000

6. K.K. Paliwal, B.S. Atal, Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1 (1) (1993) 3–14.

7. F. Lahouti, A.K. Khandani, Quantization of LSF parameters using a trellis modeling, IEEE Trans. Speech and Audio Processing, Vol.11, Issue.5, pp. 400-412, Sept. 2003.

8. Stephen So , Kuldip K. Paliwal, Efficient product code vector quantization using the switched split vector quantiser, Digital Signal Processing, v.17 n.1, p.138-171, January, 2007

9. J. Zhou, Y. Shoham, A. Akansu, Simple fast vector quantization of the line spectral frequencies, in: Proc. ICSLP ’96, vol. 2, 1996, pp. 945–948.



New method for instantaneous amplitude and frequency estimation of pitch harmonics in speech based on harmonic transform


Zubrycki P., Petrovsky A.

Department of Real-Time Systems, Bialystok Technical University
Wiejska 45A street, 15-351 Bialystok, Poland
phone: + (48 85) 746–90–50, fax: + (48 85) 746–90–57, [email protected], [email protected]

Abstract. This paper presents a new method for estimation of instantaneous amplitudes and frequencies of pitch harmonics. The method is based on Harmonic Transform which acts as a set of time-varying filters with the centre frequencies synchronized with the time-varying speech fundamental frequency. Hilbert transform is used as an estimator of instantaneous amplitude of each harmonic. The method is then used in the speech signal decomposition into voiced and unvoiced components.

Introduction


The speech signal can be viewed as a mixed-source signal with both periodic and aperiodic excitation. In the sinusoidal and noise speech models this mixed-source speech signal is generally modelled as [1]: , (1), where Ak is the instantaneous amplitude of k-th harmonic, K is the number of harmonics present in speech signal, r(n) is the noise component, φk is the instantaneous phase of k-th harmonic defined as [8]: , where fk is the instantaneous frequency of the k-th harmonic, Fs is the sampling frequency and φk(0) is the initial phase of the k-th harmonic. Determination of the model parameters is a difficult task. Usually analysis of a voiced speech is performed on a frame-basis and the speech signal is assumed as a stationary within a single frame [2-5], i.e. pitch frequency and amplitudes of its harmonics are assumed constant. The DFT is often used for determination of the amplitudes of the harmonics [2,5]. Amplitudes of pitch harmonics determined with the STFT analysis can be interpreted as an average values within analysis window. In fact both pitch frequency and harmonics amplitudes are time-varying and thus the methods based on DFT are prone to artefacts [7].

In this paper we propose a new method for estimation of the model parameters defined in (1). We use the Harmonic Transform (HT) [6,8] as an analysis tool, which kernel is synchronized with changes of the pitch frequency an thus it is able to transform the signal directly to the harmonic domain [8].

Harmonic Transform

The Harmonic transform is given as: , where is the unit phase function which is the phase of the fundamental divided by its instantaneous frequency [6] and is first order derivative of . Inverse Harmonic Transform is defined as: .

We assume linear pitch frequency and amplitudes of its harmonics change within an analysis window. In the case of linear change of the pitch frequency discrete version of HT [8] is given as: , . (2).

Inverse transform is defined as: , Where a is pitch frequency change rate.

Estimation of Harmonics Instantaneous Amplitudes

Proposed algorithm starts from searching the fundamental frequency change by examining the Harmonic Transform spectrum for a different a parameter (2). Optimal a parameter value is defined as the value which minimises the Spectral Flatness Measure: , where HT(a,k) is the harmonic spectrum of a given speech segment for a given a and |.| denotes absolute value. The minimal spectral flatness value indicates the highest concentration, which means an optimal fit of the signal and the Harmonic Transform kernel. This also means, that the optimal speech fundamental frequency change is found for a given speech segment [8].

Once the fundamental frequency change rate is found, the pitch frequency is estimated. First step of this algorithm is the determination of the pitch frequency harmonics candidates fi by peak picking of the Harmonic Transform spectra based on the algorithm proposed in [7,8]. Pitch harmonics candidates with the central frequency located between 50 and 450Hz are considered as the pitch candidates. For each pitch candidate the algorithm tries to find its harmonics. In the case of inability to find three of the first four harmonics the candidate is discarded. In order to prevent pitch doubling or halving confidence factor is computed for each harmonic on the basis of a ratio of energy carried by the harmonic signal for particular pitch and energy carried by the signal. As a pitch for a given frame the pitch candidate is selected with the greatest confidence factor. Finally, the pitch value is refined using following formula: , where fn is the frequency of nth harmonic candidate. Described procedure estimates the central pitch frequency for one frame. Further prevention of the pitch errors is provided by usage of the tracking buffer which stores the fundamental frequency estimates from a several consecutive frames.



Fig. 1. Block diagram of proposed algorithm.

After the optimal kernel and the pitch frequency are found, the signal is divided into number of subbands. Centres of each subband correspond to integer multiplies of pitch frequency, thus in each subband signal only one harmonic exists. For determination of signal in each band all coefficients of Harmonic Transform are set to zero except the ones which belong to the particular subband and then the Inverse Harmonic Transform is performed. For each subband signal its instantaneous amplitude is estimated Hilbert Transform. Information about amplitude track is passed to generation block, where individual harmonics are synthesized. Frequencies of harmonics and frequency change rates are taken from previous stages of algorithm. In order to estimate all parameters of model defined in (1) signals of all harmonics are subtracted from the original signal and the residual is the noise component estimate.

Experimental results


We have tested the algorithm for real speech signals. Example spectrograms are shown in figure 2. As it can be seen in the figure the algorithm is able to separate the input speech signal into two components, harmonic one and noise one. In the noise component harmonic content is not noticeable.

In order to evaluate the proposed method ability to estimate instantaneous amplitudes and frequency of polyharmonic signals we have tested it on synthetic signals. The synthetic signal was set as follows. We have used different pitch frequencies in range from 100 to 350 Hz and different pitch frequency change rates with average values from 0 up to 20% within a frame of 32ms. We have tested the algorithm for a constant harmonics amplitudes and for time-varying amplitudes with average change rate 0, 40% and 100%. We assume linear change of both fundamental frequency and harmonics amplitudes within analysis frame. After estimation of the harmonic component we have measured SNR coefficient defined as an energy ratio of the original signal to a difference between the original signal and the estimated one. We have also tested our algorithm using Harmonic Transform (HT) and Fourier Transform (FT). Results of experiments are given in dB in Table 1. For a constant amplitudes and constant pitch frequency there is no difference between algorithms with the Harmonic Transform and the Fourier Transform. While the pitch change rate increases SNR of an estimated signal with usage of Harmonic Transform is far better than the one estimated with Fourier Transform. For a time-varying amplitudes of harmonics we have tested two versions of algorithm. In the first version we have used instantaneous amplitudes estimated as described in previous section. In the second version amplitudes of each harmonic were taken directly from signal spectrum which is an average value of amplitude of harmonic within analysis frame.





Fig. 2. Example spectrograms of speech decomposition: original speech signal (top), estimated harmonic component (bottom left) and estimated noise component (bottom right).
Table 1 – Results of experiments: SNR of estimated signal

Pitch

Frequency

Change

Rate

Constant

Amplitudes

Time-Varying Amplitudes

Estimated Instantaneous Amplitudes

Average Amplitudes

FT

HT

Fourier Transform

Harmonic Transform

Fourier Transform

Harmonic Transform

-

-

0

40%

100%

0

40%

100%

0

40%

100%

0

40%

100%

0

143

143,5

143,3

84,7

68,2

142,8

85,4

67,7

164,8

36,6

21,3

164,5

36,6

21

5%

43

95,9

40,6

37,9

40,5

89,9

80

66

41,6

29,9

19,8

89,4

36,6

21

10%

29,2

91,6

32

31,4

28

92,2

79,3

67,5

37,6

25,5

17,9

93,8

36,6

21

15%

24,5

92,1

23,9

24,1

22,9

91,4

80,2

66,9

25,1

20,4

14,6

91,1

36,6

21

20%

24,3

93,5

20

22,9

17,2

93,6

79,5

65,8

21,1

17,6

13,5

94,5

36,6

21

страница 1страница 2страница 3страница 4 ... страница 6страница 7


скачать

Другие похожие работы: