In speech synthesis [28,39], digital filters are often used to simulate formant filtering by the vocal tract. It is well known [24] that the different vowel sounds of speech can be simulated by passing a ``buzz source'' through a only two or three formant filters. As a result, speech is fully intelligible through the telephone bandwidth (nominally only 200-3200 Hz).
A formant is a resonance in the voice spectrum. A single formant may thus be modeled using one biquad (second-order filter section). For example, in the vowel as in ``father,'' the first three formant center-frequencies have been measured near 700, 1220, and 2600 Hz, with half-power bandwidths10.6 130, 70, and 160 Hz [40].
In principle, the formant filter sections are in series, as can be found by deriving the transfer function of an acoustic tube [48]. As a consequence, the vocal-tract transfer function is an all-pole filter (provided that the nasal tract is closed off or negligible). As a result, there is no need to specify gains for the formant resonators--only center-frequency and bandwidth are necessary to specify each formant, leaving only an overall scale factor unspecified in a cascade (series) formant filter bank.
Numerically, however, it makes more sense to implement disjoint resonances in parallel rather than in series.10.7 This is because when one formant filter is resonating, the others will be attenuating, so that to achieve a particular peak-gain at resonance, the resonating filter must overcome all combined attenuations as well as applying its own gain. In fixed-point arithmetic, this can result in large quantization-noise gains, especially for the last resonator in the chain. As a result of these considerations, our example will implement the formant sections in parallel. This means we must find the appropriate biquad numerators so that when added together, the overall transfer-function numerator is a constant. This will be accomplished using the partial fraction expansion (§6.8).10.8
The matlab below illustrates the construction of a parallel formant filter bank for simulating the vowel . For completeness, it is used to filter a bandlimited impulse train, in order to synthesize the vowel sound.
F = [700, 1220, 2600]; % Formant frequencies (Hz) BW = [130, 70, 160]; % Formant bandwidths (Hz) fs = 8192; % Sampling rate (Hz) nsecs = length(F); R = exp(-pi*BW/fs); % Pole radii theta = 2*pi*F/fs; % Pole angles poles = R .* exp(j*theta); % Complex poles B = 1; A = real(poly([poles,conj(poles)])); % freqz(B,A); % View frequency response: % Convert to parallel complex one-poles (PFE): [r,p,f] = residuez(B,A); As = zeros(nsecs,3); Bs = zeros(nsecs,3); % complex-conjugate pairs are adjacent in r and p: for i=1:2:2*nsecs k = 1+(i-1)/2; Bs(k,:) = [r(i)+r(i+1), -(r(i)*p(i+1)+r(i+1)*p(i)), 0]; As(k,:) = [1, -(p(i)+p(i+1)), p(i)*p(i+1)]; end sos = [Bs,As]; % standard second-order-section form iperr = norm(imag(sos))/norm(sos); % make sure sos is ~real disp(sprintf('||imag(sos)||/||sos|| = %g',iperr)); % 1.6e-16 sos = real(sos) % and make it exactly real % Reconstruct original numerator and denominator as a check: [Bh,Ah] = psos2tf(sos); % parallel sos to transfer function % psos2tf appears in the matlab-utilities appendix disp(sprintf('||A-Ah|| = %g',norm(A-Ah))); % 5.77423e-15 % Bh has trailing epsilons, so we'll zero-pad B: disp(sprintf('||B-Bh|| = %g',... norm([B,zeros(1,length(Bh)-length(B))] - Bh))); % 1.25116e-15 % Plot overlay and sum of all three % resonator amplitude responses: nfft=512; H = zeros(nsecs+1,nfft); for i=1:nsecs [H(i,:),w] = freqz(Bs(i,:),As(i,:)); end H(nsecs+1,:) = sum(H(1:nsecs,:)); ttl = 'Amplitude Response'; xlab = 'Frequency (Hz)'; ylab = 'Magnitude (dB)'; sym = 'k'; %lgnd = {'sec 1','sec 2', 'sec 3', 'sec sum'}; lgnd = ''; % legend not working in Octave 2.1.71 figure(1); clf; np=nfft/2; % Only plot for positive frequencies wp = w(1:np); Hp=H(:,1:np); myplot(wp,20*log10(abs(Hp)),sym,ttl,xlab,ylab,1,lgnd); saveplot('../eps/lpcexovl.eps'); % Now synthesize the vowel [a]: nsamps = 256; f0 = 200; % Pitch in Hz w0T = 2*pi*f0/fs; % radians per sample nharm = floor((fs/2)/f0); % number of harmonics sig = zeros(1,nsamps); n = 0:(nsamps-1); % Synthesize bandlimited impulse train for i=1:nharm, sig = sig + cos(i*w0T*n); end; sig = sig/max(sig); speech = filter(1,A,sig); soundsc([sig,speech]); % hear buzz, then 'ah'
Notes: