AIM:
1)
To Load, display and manipulate the sample
speech signal
2)
To estimate pitch of speech signal by auto
correlation method
3)
To estimate pitch of speech signal by cepstrum
method
INTRODUCTION:
Speech signal can be
classified into voiced, unvoiced and silence regions. The near periodic
vibration of vocal folds is excitation for the production of voiced speech. The
random ...like excitation is present for unvoiced speech. There is no
excitation during silence region. Majority of speech regions are voiced in nature
that include vowels, semivowels and other voiced components. The voiced regions
looks like a near periodic signal in the time domain representation. In a short
term, we may treat the voiced speech segments to be periodic for all practical
analysis and processing. The periodicity associated with such segmentsis
defined is 'pitch period To' in the time domain and 'Pitch frequency
or Fundamental Frequency Fo' in the frequency domain. Unless
specified, the term 'pitch' refers to the fundamental frequency ' Fo'.
Pitch is an important attribute of voiced speech. It contains speaker-specific
information. It is also needed for speech coding task. Thus estimation of pitch
is one of the important issue in speech processing. There are a large set of
methods that have been developed in the speech processing area for the
estimation of pitch. Among them the three mostly used methods include,
autocorrelation of speech, cepstrum pitch determination and single inverse
filtering technique (SIFT) pitch estimation. One success of these methods is
due to the involvement of simple steps for the estimation of pitch. Even though
autocorrelation method is of theoretical interest, it produce a frame work for
SIFT methods.
2. PROJECT DESCRIPTION:
2.1 Autocorrelation:
The term autocorrelation can be stated as the similarity between
observations as a function of the time lag between them. Autocorrelation is
often used in signal processing for analyzing functions or series of
values, such as time domain signals. It is a mathematical tool for
finding repeating patterns, such as the presence of a periodic signal obscured
by noise, or identifying the missing
fundamental frequency in a signal
implied by harmonic frequencies. Initially we should have the basic
understanding of identifying the voiced/unvoiced/silence regions of speech from
their time domain and frequency domain representations. For this we need to plot the speech
signal in time and frequency domains. The time domain representation is termed
as waveform and frequency domain
representation is termed as spectrum. we consider speech signals in short
ranges for plotting their waveforms and spectra. The typical lengths include
10-30 msec. The time domain and frequency domain characteristics are distinct
for the three cases. Voiced segment represents periodicity in time domain and
harmonic structure in frequency domain. Unvoiced segment is random noise-like
in time domain and spectrum without harmonic structure in frequency domain.
Silence region does not have energy in either time or frequency domain.
Analysis of voiced speech
We should be able to
identify whether given segment of speech, typically, 20 - 50 msec, is voiced
speech or not. The voiced speech segment is characterized by the periodic
nature, relatively high energy, less number of zero crossings and more
correlation among successive samples. The voiced speech can be identified by
observation of the waveform in the time domain due to its periodicity nature.
In the frequency domain, the presence of harmonic structure is the evidence
that the segment is voiced. Further, the spectrum will have more energy,
typically, in the low frequency region. The spectrum will also have a downward
trend starting from zero frequency and moving upwards. The autocorrelation of a
segment of voiced speech will have a strong peak at the pitch period. The high
energy can be observed in terms of high amplitude values for voiced segment.
However, energy alone cannot decide the voicing information. Periodicity is
crucial along with energy to identify the voiced segment unambiguously.
Similarly the relatively low zero-crossings can also be indirectly observed as
smooth variations among sequence of sample values. Figure 2 below shows the
code to generate the waveform, spectrum and autocorrelation sequence for a
given segment of voiced speech.
No comments:
Post a Comment