1.1 Short Term Time Domain Processing of Speech
Signals
T.Venk An engineering solution proposed for
processing speech was to make use of existing signal processing tools in a
modified fashion. To be more specific, the tools can still assume the signal
under processing to be stationary. Speech signal may be stationary when it is
viewed in blocks of 10-30 msec. Hence to process speech by different signal
processing tools, it is viewed in terms of 10-30 msec. Such a processing is
termed as Short Term Processing (STP).
Short Term Processing of speech can be performed
either in time domain or in frequency domain. The particular domain of
processing depends on the information from the speech that we are interested
in. For instance, parameters like short term energy, short term zero crossing
rate and short term autocorrelation can be computed from the time domain
processing of speech.
Click Here To Download Complete Project With Source Code
please wait 2-3 seconds for the download to start
1.2 Objectives:
please wait 2-3 seconds for the download to start
1.2 Objectives:
Ø Load,
display and manipulation of speech signals.
Ø Modify
them for the case of (a) Hamming window, (b) Hanning. Illustrate your
observations.
Ø Understand
need for short term processing of speech.
Ø Find
short term energy and study its significance.
Ø Perform
short term zero crossing rate and study its significance.
Ø Compute
short term autocorrelation and study its significance.
Ø Estimate
pitch of speech using short term autocorrelation. Develop the pitch estimation
program in Matlab using frame sizes of 10, 50 & 100 msec, each with a shift
of 10 msec.
Ø Perform
voiced/unvoiced/silence classification of speech using short term time domain
parameters.
1.3 Need for Short Term Processing of Speech
Speech is produced from a time varying vocal tract system with time varying excitation. As a result the speech signal is non-stationary in nature. Most of the signal processing tools studied in signals and systems and signal processing assume time invariant system and time invariant excitation, i.e. stationary signal. Hence these tools are not directly applicable for speech processing. This is because, use of such tools directly on speech violates their underlying assumption. However, even if you use them blindly and compute the output from the tool, then such an output is of little practical significance. For instance, the tool for total energy computation is a fundamental relation in signal processing.That is
Speech is produced from a time varying vocal tract system with time varying excitation. As a result the speech signal is non-stationary in nature. Most of the signal processing tools studied in signals and systems and signal processing assume time invariant system and time invariant excitation, i.e. stationary signal. Hence these tools are not directly applicable for speech processing. This is because, use of such tools directly on speech violates their underlying assumption. However, even if you use them blindly and compute the output from the tool, then such an output is of little practical significance. For instance, the tool for total energy computation is a fundamental relation in signal processing.That is
This relation is useful for the case of
stationary signal having finite energy. Suppose, if you use this tool for
computing total energy of a speech signal. No doubt, this gives total energy
present in the speech signal. However, the total energy is of no use. This is
because, from the nature of its production, we know that speech has time
varying amplitude and energy. Therefore what is important in case of speech
production is a tool that gives information about time varying energy. Thus a
need for different way of processing speech
1.4 Short Term Energy Parameter
The energy associated with speech is time varying in nature. Hence the interest for any automatic processing of speech is to know how the energy is varying with time and to be more specific, energy associated with short term region of speech. By the nature of production, the speech signal consist of voiced, unvoiced and silence regions. Further the energy associated with voiced region is large compared to unvoiced region and silence region will not have least or negligible energy. Thus short term energy can be used for voiced, unvoiced and silence classification of speech.
The energy associated with speech is time varying in nature. Hence the interest for any automatic processing of speech is to know how the energy is varying with time and to be more specific, energy associated with short term region of speech. By the nature of production, the speech signal consist of voiced, unvoiced and silence regions. Further the energy associated with voiced region is large compared to unvoiced region and silence region will not have least or negligible energy. Thus short term energy can be used for voiced, unvoiced and silence classification of speech.
The relation for finding the short term
energy can be derived from the total energy relation defined in signal processing.The
total Energy of an Energy Signal is given by
In case of short
term energy computation we consider speech in terms of 10-30 msec . Let the
samples in a frame of speech are given by "n=0 to
n=N-1", where " N " is the length
of frame (samples), then for energy computation the speech will be zero
outside the frame length. Then for energy computation amplitude of the speech
samples will be zero outside the frame.
1.5 Short Term Zero
Crossing Rate (ZCR)
Zero Crossing Rate gives information about the number of zero-crossings present in a given signal. Intuitively, if the number of zero crossings are more in a given signal, then the signal is changing rapidly and accordingly the signal may contain high frequency information. On the similar lines, if the number of zero crossing are less, hence the signal is changing slowly and accordingly the signal may contain low frequency information. Thus ZCR gives an indirect information about the frequency content of the signal.
Zero Crossing Rate gives information about the number of zero-crossings present in a given signal. Intuitively, if the number of zero crossings are more in a given signal, then the signal is changing rapidly and accordingly the signal may contain high frequency information. On the similar lines, if the number of zero crossing are less, hence the signal is changing slowly and accordingly the signal may contain low frequency information. Thus ZCR gives an indirect information about the frequency content of the signal.
In case of speech the nature of signal changes with time
over few msec. For instance, from initial voiced to unvoiced and back to voiced
and so on. To have some useful information, ZCR needs to be computed
using typical frame size of 10-30 msec with half the frame size as shift. A
speech signal for the message " she had your suit in your
greasy wash water all year" and its short term ZCR computed
are shown in Figure_2. As it can be observed, in case of unvoiced sounds like
|s|, the ZCR value is significantly high compared to the region of voiced
sounds like |a| and hence can be used for distinguishing voiced and unvoiced
regions.
1.6 Short Term Autocorrelation:
Cross correlation tool from signal processing can be used for finding the similarity among the two sequences and refers to the case of having two different sequences for correlation. Autocorrelation refers to the case of having only one sequence for correlation. In autocorrelation, the interest is in observing how similar the signal characteristics with respect to time. This is achieved by providing different time lag for the sequence and computing with the given sequence as reference.The autocorrelation is a very useful tool in case of speech processing. However due to the non-stationary nature of speech, a short term version of the autocorrelation is needed. Where s w (n)=s(m).w(n-m) is the windowed version of s(n). Thus for a given windowed segment of speech , the short term autocorrelation is a sequence. The nature of short term autocorrelation sequence is primarily different for voiced and unvoiced segments of speech. Hence information from the autocorrelation sequence can be used for discriminating voiced and unvoiced segments.
Cross correlation tool from signal processing can be used for finding the similarity among the two sequences and refers to the case of having two different sequences for correlation. Autocorrelation refers to the case of having only one sequence for correlation. In autocorrelation, the interest is in observing how similar the signal characteristics with respect to time. This is achieved by providing different time lag for the sequence and computing with the given sequence as reference.The autocorrelation is a very useful tool in case of speech processing. However due to the non-stationary nature of speech, a short term version of the autocorrelation is needed. Where s w (n)=s(m).w(n-m) is the windowed version of s(n). Thus for a given windowed segment of speech , the short term autocorrelation is a sequence. The nature of short term autocorrelation sequence is primarily different for voiced and unvoiced segments of speech. Hence information from the autocorrelation sequence can be used for discriminating voiced and unvoiced segments.
1.7 Short Term Energy Computation
The
speech signal and its sampling frequency along with the frame size and frame
shift are the inputs needed for computing the short term energy. Using the
sampling frequency value, the number of samples for the given frame size and
frame shift are computed. For instance, if the sampling frequency is 8 KHz and
frame size and frame shift are 20 msec and 10 msec , respectively then the
number of samples in a frame will be 160 and number of samples for frame shift
will be 80 samples. To compute short term energy, the input speech signal is
considered in frames of 160 samples with a shift of 80 samples and the energy
is computed for each frame. The short term energy values are then plotted as a
function of time index.The STE contour follows the general shape of signal
amplitude distribution of speech signal. The STE associated with unvoiced
regions is relatively smaller compared to voiced regions. Thus STE can be
therefore used for voiced/unvoiced class of speech.
1.8 Short Term Zero Crossing Rate (ZCR)
The input speech signal can be viewed in blocks of 10-30 msec for computing ZCR.For each block of the speech signal, the ZCR is computed using the short term ZCR relation. The ZCR value is highest in unvoiced region and lowest in voiced region . In case of silence region the value lies in between of voiced and unvoiced cases.
The input speech signal can be viewed in blocks of 10-30 msec for computing ZCR.For each block of the speech signal, the ZCR is computed using the short term ZCR relation. The ZCR value is highest in unvoiced region and lowest in voiced region . In case of silence region the value lies in between of voiced and unvoiced cases.
1.9 Hannig Window
The
Hann function is typically used as a window function in digital signal
processing to select a subset of series of samples in order to perform a
fourier transform or other calculations.
1.9.1 Hamming Window:
It
is a mathematical function that is zero valued outside of some chosen
interval.For instance a function that is constant inside the interval and 0
elsewhere is called Rectangular Window.
Source code
MAT Lab Code To Load a Speech Signal:
Ø recObj
= audio recorder(44100, 16, 2);
Ø get(recObj);
Ø %
Record your voice for 5 seconds.
Ø disp('Start
speaking.')
Ø Record
blocking(recObj, 10);
Ø %disp('End
of Recording.');
Ø %
Play back the recording.
Ø play(recObj);
Ø %
Store data in double-precision array.
Ø myRecording
= getaudiodata(recObj);
Short-Time Speech
Measurements, Short-Time energy calculation, window length, Rectangular
Ø % Energy is calculated every period samples.
Ø period = 50;
Ø % 4 different window lengths
Ø winLens = [161 321 501 601];
Ø nWindows = length(winLens);
Ø k = 0;
Ø for iWinLen = winLens
Ø k = k+1;
Ø wRect = rectwin(iWinLen);
Ø % Short-Time energy calculation
Ø ienergyST = STenergy(speechSignal, wRect, iWinLen-period);
Ø % Display results
Ø subplot(nWindows, 1, k);
Ø delay = (iWinLen - 1)/2;
Ø plot(t(delay+1:period:end - delay), ienergyST);
Ø if (k==1)
title('Short-Time Energy for various Rectangular window lengths')
Ø end
Ø legend(['Window length:',num2str(iWinLen),' Samples']);
Ø end
Speech waveform for Frame size 100,300,500
//function to plot speech
wayform and compute short term energy
function [c]
=
short_term_energy(Speech_signal, Fs, Frame_size, Frame_shift,
window_type)
y=Speech_signal;
Frame_size=Frame_size/1000:
Frame_shift=Frame_shift/1000:
t=1/F5:1/Fs:Ilength(y)/Fs):
subplot(4,1.1):plot(t.y):
xtitle(Speech Waveform .):
window _length =
Frame_sizeFs;
sample_shift =
Frame_shift.Fs;
sum1=0;energy=0;
w=wincloAwindow_type,window_lengthLi=1;
for
i=1:(floor((lenth(y))/sample_shift)-cezl(window_length(samp)e_shift))
for
i=(((i-11.sample_shift)-1-1):(((i-11.sample_shift1+window_length)
AD=AD*wa
suml=suml+yy;
end
length(w)
energy(i1=suml:
sum1=0;i=1;
end
w=0;
tt=1/Fs:Frame_shift:(length(energy)*Frame_shift);
c=energy;
return(c):
endfunction
Iy,Fs,bits]=wayread(/yar/www/scilab/wayfile/exp7.way.)://input: Speech wayform
Frame_size=30; filnput:
Frame Size in millisecond
Frame_shift=10; filnput:
Frame -shift in millisecond
max_yalue=max(abs(y)):
y=y/max_yalue:
window_type =
.re.; //Input: .hrn for hamming window, .hro for hanning window and 're
for rectangular window
energy=short_term_energy(y,
Fs, Frame_size, Frame_shift, window_type);
tt=1/Fs:(Frame_shift/1000):(length(energy).(Frame_shift/10001):
sobplot(4.1,2):plot(ttenergy):
xtetle('Short Term Energy using Rectangular window.);
window_type =
.hrre; //Input: .hm for hamming window, .hn for hanning window and 're
for rectangular window
energy=short_term_energy(y,
Fs, Frame_size, Frame_shift, window_type);
tt=1/Fs:(Frame_shift/1000):(length(energy).(Frame_shift/10001):
5,11,plot(4.1,3):plot(ttenergy):
xtetle('Short Term Energy using Hamming window.);
window_type =
.hre; //Input: .hm for hamming window, .hn for hanning window and 're
for rectangular window
energy=short_term_energy(y,
Fs, Frame_size, Frame_shift, window_type);
tt=1/Fs:(Frame_shift/1000):(length(energy).(Frame_shift/10001):
sobplot(4.1.41:plot(ttenergy):
xtetle('Short Term Energy using Hanninq window);
Short-Time Speech
Measurements, Short-Time energy calculation, window length, Hamming
//function
to plot speech wavform and compute short term
energy
function [c]
=
short_term_energy(Speech_signal, Fs. Frame_size, Frame_shift,
window_type)
y=Speech_signal;
Frame_size=Frame_size/1000:
Frame_shift=Frame_shift/1000:
t=1/Fs:VFs:(length(y)/Fs):
subplot(5.1.1):plot(t.y):
xtitle('Speech Waveform .):
window
_length = Frame_sizeo Fs;
sample_shift =
Frame_shifto Fs;
sum1=0;energy=0;
w=window(window_typeorvindow_length);i=1;
for
i=1:(0oor((length(y))/sample_shift)-cezl(window_length(sample_shift))
YI(Ii-11.sample_shift)+1)=YMi-11
0 sample_shift)+11'w(i):jj=jj+1:
for .i.(((i-11.sample_shift)-1-2):Mi-11
0sample_shift1+window_length)
YO)=Y(Vw(i):i=i+
1:
YY=Y(i)*Y0-1):
if(yy <
0)
suml=sum1+1:
end
end
zerocrossing(i1=sumV(2.nindow_length):
sum1=0;i=1;
end
w=0;
c=zerocrossing;
return(c):
endfunction
Iy.Fs.bits)=vcavreadC/var/www/scilab/wavfile/exp7.wav.)://input: Speech wavform
Frame_size=30;
filnput: Frame Size in millisecond
Frame_shift=10;
filnput: Frame -shift in millisecond
max_value=max(abs(y)):
y=y/max_value;
window_type =
.re.; //Input: .hm for hamming window, .hn for hanning window and .re
for rectangular window
energy=short_term_energy(y.
Fs. Frame_size, Frame_shift, window_type);
tt=1/Fs:(Frame_shift/1000):(length(energy)
0(Frame_shift/10001):
subplot(4.1,21;plotftenergy); xtitle('Short
Term Zero -crossing Rate using Rectangular window.);
.
window_type = .hm.; //Input: .hrn for hamming window.
'hr ) for hanning window and .re for rectangular window
energy=short_term_energy(y.
Fs. Frame_size, Frame_shift, window_type);
tt=l1Fs:(Frame_shift/1000):(length(energy)
0(Frame_shift/10001):
stIbplot(4.1,31;plotftenergy); xtitle('Short
Term Zero -crossing Rate using Hamming window.);
.
window_type = .hre; //Input: .hm for hamming window. 'hr , for hanning window and 're for rectangular
window
energy=short_term_energy(y.
Fs. Frame_size, Frame_shift, window_type);
tt=l1Fs:(Frame_shift/1000):(length(energy)
0(Frame_shift/10001):
st.lbplot(4.1,41:r
t(ttenergy); xtetle('Short Term Zero -crossing Rate using Hanning
window.);
Auto Correlation
//function to plot speech wavform and compute short term energy
function [c] = short_term_energy(Speech_signal, Fs. Frame_size, Frame_shift, window_type)
y=Speech_signal;
Frame_size=Frame_size/1000:
Frame_shift=Frame_shift/1000:
t=1/F5:1/Fs:Ilength(y)/Fs):
subplot(5.1.1):plot(t.y):
xtitle('Speech Waveform .):
window _length = Frame_sizeo Fs;
sample_shift = Frame_shifto Fs;
sum1=0;energy=0;
w=window(window_type,window_length);i=1;
for i=1:(floor((length(y))/sample_shift)-cezl(window_length(sample_shift))
VIIIi-11. sample_shift)+1)=YMi-11 0 sample_shift)+11'w(i):jj=i+1:
for .i.(((i-11.sample_shift)-1-2):Mi-11 0sample_shift1+window_length)
YO)=V(Vw(i):i=i+1:
YY=Y(i)*Y0-1):
if(yy < 0)
suml=sum1+1;
end
end
zerocrossing(i1=sum11(2.nindow_length):
sum1=0;i=1;
end
w=0;
c=zerocrossing;
return(c):
endfunction
Iy.Fs.bits)= , ,avreadC/var/www/scilab/wavfile/exp7.wav .)://input: Speech wavform
Frame_shift=10; //Input: Frame -shift in millisecond
window_type = .re.; //Input: . hm for hamming window..hn . for hanning window and 're for rectangular window
max_value=max(abs(y)):
y=y/max_value:
Frame_size=10; //Input: Frame Size in millisecond
energy=short_term_energy(y. Fs. Frame_size, Frame_shift, window_type)://calling the short term energy function
tt=1/Fs:(Frame_shift/1000):(length(energy) 0(Frame_shift/10001):
subplot(5.1.21:plot(ttenergy);xtitle('Short Term Zero -crossing Rate with 10 ms Framesize .);
Frame_size=30; //Input: Frame Size in millisecond
energy=short_term_energy(y. Fs. Frame_size, Frame_shift, window_type)://calling the short term energy function
tt=l1Fs:(Frame_shift/1000):(length(energy) 0(Frame_shift/10001):
solvlot(5.1.31:plot(ttenergy);xtitle('Short Term Zero -crossing Rate with 30 ms Framesize .);
Frame_size=50; //Input: Frame Size in millisecond
energy=short_term_energy(y. Fs. Frame_size, Frame_shift, window_type)://calling the short term energy function
tt=igs:(Frame_shift/1000):(lengthIenergy) 0(Frame_shift/10001):
subplot(5.1.41:plot(ttenergy):xtitleCShort Term Zero -crossing Rate with 50 ms Framesize .);
Frame_size=100; //Input: Frame Size in millisecond
energy=short_term_energy(y. Fs. Frame_size, Frame_shift, window_type)://calling the short term energy function
tt=igs:(Frame_shift/1000):(lengthIenergy) 0(Frame_shift/10001):
subplot(5.1,51;plotIttenergy);xtitle('Short Term Zero -crossing Rate with 100 ms Framesize ...time in seconds.);
Pitch estimation by autocorrelation method
//Program
to compute and plot the pitch contour of a speech waveform
Iy.Fs.bits]=wavreadC/var/www/scilab/wavfile/exp7.wav9:
Frame_size =
30d/input Frame -size in
millisecond
Frame_shift =
10d/input Frame -shift in
millisecond
max_value=ma4abs(0):
y=y/max_value:
window_period=Frame_size/1000;window_length =
window_period.Fs;
shift_period=Frame_shift/1000;sample_shift =
shift_periocPFs;
pitchireq=0;
t=11Fs:1/Fs:(length(y)ifs):
stibplot(2.1.1):
pIotIty);
xtitle('Speech signal waveform...time in seconds.);
sum1=0;energy=0;autocorrelation=0;
for
i=1:(floor((length(y))/sample_shift)-cezl(window_length(sample_shift))
1,1;yy=0:
for
.i.(((i-11.sample_shift)+1):(((i-11.sample_shift1+window_length)
YY04=545):
Ic=k-F1:
end
for 1=0:(lengthIyy)-11
sum1=0:
for u=1:(length(yy)-0
s=yy(u).yy(u+l);
suml=suml+s:
end
autocor(1+11=suml:
autocorrelation(1+11(i)=
autocor(I+1):
end
auto=autocor(21:160):
max1=0:
for uu=1:140
if(auto(uu)›maxl)
maxl=auto(uu):
sample_no=uu:
end
end
pitch_freg(i)=1A(20+sample_no).(11Fs)):
end
(rows.cols)=size(
autocorrelation):
1:1:1,1iFs:shift_period:(coleshift_period);
subplot(2,1,21;plotIkkkpitch_freg....);xtitle(Pitch Contour...time in seconds.);
Click Here To Download Complete Project With Source Code
please wait 2-3 seconds for the download to start
Click Here To Download Complete Project With Source Code
please wait 2-3 seconds for the download to start
========== Hacking Don't Need Agreements ==========
Just Remember One Thing You Don't Need To Seek Anyone's To Hack Anything Or Anyone As Long As It Is Ethical, This Is The Main Principle Of Hacking Dream
Thank You for Reading My Post, I Hope It Will Be Useful For You
I Will Be Very Happy To Help You So For Queries or Any Problem Comment Below Or You Can Mail Me At Bhanu@HackingDream.net
No comments:
Post a Comment