Objectives
(a) Load, display and manipulation of speech
signals.
(b) Compute and display the spectrum of
speech signals.
(c) Determine and plot the Power Density
Spectrum of Speech signals.
Telephones are increasingly being used in noisy environments
such as cars, airports etc. The aim of this project is to implement a system
that will reduce the background noise in a speech signal while leaving the
signal itself intact: this process is called speech enhancement. It is desired
to implement spectral subtraction technique for this purpose.
Algorithm:
Many different
algorithms have been proposed for speech enhancement: the one that we
will use is known as spectral subtraction. This technique operates in the
frequency domain and makes the assumption that the spectrum of the input signal
can be expressed as the sum of the speech spectrum and the noise spectrum. The
procedure is illustrated in the diagram and contains two tricky parts:
- Estimating
the spectrum of the background noise
- Subtracting
the noise spectrum from the speech
Step1: Generate a multi tone signal with
frequency components 100 Hz, 500 Hz, 600Hz, 800 Hz and 1000 Hz. Add AWGN for the
various noise variances (20dB, 30dB and 40dB). Display the signal and its
spectrum.
Step2: Compute the magnitude and phase response
of the signal using FFT and plot them. Find the Power Density Spectrum
and plot.
Step3: Estimate the noise power.
Step4: Subtract the noisy estimate (power) from
the Power Density Spectrum of signal. Determine the magnitude spectrum
from resultant signal
Step5: Perform the inverse IFFT operation. Find
the signal to noise ratio (SNR) and peak signal to noise ratio (PSNR).
Task2: Denoising
for male voice
speech signal.
Step1: Load and display a male voice speech
signal and its spectrum.
Step2: Add AWGN for the various noise variances
(20dB, 30dB and 40dB).
Step3: Divide the given speech signal into 50
ms blocks of speech frames and shift of 10 msec.
Step4: Compute the magnitude and phase response
of the segmented speech signal using FFT and plot them. Find the Power
Density Spectrum and plot.
Step5: Estimate the noise power by computing
Log Energy and zero crossing to determine non-speech activity.
Step6: Subtract the noisy estimate (power) from
the Power Density Spectrum of segmented speech signal. Determine the
magnitude spectrum from resultant signal
Step7: Perform the inverse IFFT that results
the denoised speech signal. Find the signal to noise ratio (SNR) and
peak signal to noise ratio (PSNR).
Step8: Repeat the above steps for various
segmented speech signal.
CHAPTER 1
Introduction:
A discrete signal or discrete-time signal is a time series consisting of a sequence of quantities. In other words, it is a
time series that is a function over a domain of
integers. Unlike a continuous-time
signal, a discrete-time signal is not a function of a continuous
argument however, it may have been obtained by sampling from
a continuous-time signal, and then each value in the sequence is called a sample.
When a discrete-time signal obtained by sampling a sequence corresponds to
uniformly space times, it has an associated sampling rate;
the sampling rate is not apparent in the data sequence, and so needs to be
associated as a characteristic unit of the system.
A digital
signal is a
discrete-time signal for which not only the time but also the amplitude has
been made discrete; in other words, its samples take on only values from a discrete set (a countable set that can be mapped one-to-one to a subset of integers).
If that discrete set is finite, the discrete values can be represented with digital words of a finite width. Most commonly, these discrete
values are represented as fixed-point words or floating-point words. After sampling, the process of
converting a continuous-valued discrete-time signal to a digital signal is
known as analogue-to-digital conversion. It usually
proceeds by replacing each original sample value by an approximation selected
from a given discrete set a process known as quantization. This process
loses information, and so discrete-valued signals are only an approximation of
the continuous-valued discrete-time signal, itself only an approximation of the
original continuous-valued continuous-time signal. Amplitude modulation (AM) is
a modulation technique used in electronic communication, most
commonly for transmitting information via a radio carrier wave. AM
works by varying the strength (amplitude) of the carrier in proportion to the
waveform being sent that waveform may, for instance, correspond to the sounds
to be reproduced by a loudspeaker, or the light intensity of television
pixels. This contrasts with modulation, in the frequency which the
frequency of the carrier signal is varied, and phase
modulation, in which its phase is varied, by the modulating signal.
Discrete time views values of variables as occurring at distinct,
separate "points in time", or equivalently as being unchanged
throughout each non-zero region of time .Thus a variable jumps from one value
to another as time moves from time period to the next. In this framework, each
variable of interest is measured once at each time period. The number of
measurements between any two time periods is finite. Measurements are typically
made at sequential integer values
of the variable "time
Project description:
SPECTRAL SUBTRACTION METHOD
The principle of spectral subtraction
the spectral subtraction is based on the principle that the
enhanced speech can be obtained by
subtracting the estimated spectral components of the noise from the spectrum of
the input noisy signal. Assuming that noised(n) is additive to the speech
signal x(n), the noisy speech y(n) can be written as,
y(n)=x(n)+d(n), for 0≤n≤N-1
Where n is the time index, N is a
number of samples. The objective of speech enhancement is to find the enhanced
speech x(n)from given y(n),with the assumption that d(n) is uncorrelated with
x(n). Input signal y(n) is segmented into K segments of the same length. The
time-domain signals can be transformed to the frequency-domain as,
Y(w)=X(w)+D(w),
for 0≤k≤K-1
Where k is the segment index, Yk(ω)
Xk(ω) and Dk(ω)denote the short-time DFT magnitudes taken of y(n),x(n),and
d(n), respectively, and raised to a power a(a=1 corresponds to magnitude
spectral subtraction, a=2 corresponds to power spectrum subtraction). If an
estimate of the noise spectrum D can be obtained, then an approximation of
speech X can be obtained from Y
Spectral subtraction is based on the
principle that one can obtain an
estimate of the clean signal spectrum by subtracting an estimate of the noise
spectrum from the noisy speech spectrum. The noise spectrum can be estimated,
and updated, during the periods when the signal is absent or when only noise is
present.
Methods Of Spectral Subtraction
The first method for Spectral
subtraction was introduced in post 1970’s. In past more then 30 years this
method has been modified and new methods has been developed. This section
gives the study of some of such
methods beginning from the starting till date.1. In 1979 Berouti [2] gave a
Spectral Subtraction method, for enhancing speech corrupted by broadband noise.
As
discussed in Section I, original
method entails subtracting an estimate of the noise power spectrum from the
speech power spectrum, setting negative differences to zero, recombining
the new power spectrum with the
original phase, and then reconstructing the time waveform. While this method
reduces the broadband noise, it also usually introduces an annoying
“musical noise” [11]. We have
devised a method that eliminates this “musical noise” while further reducing
the background noise. The method consists in subtracting an overestimate
of the noise power spectrum, and
preventing the resultant spectral components from going below a preset minimum
level (spectral floor). The method can automatically adapt to a wide
range of signal-to-noise ratios, as
long as a reasonable estimate of the noise spectrum can be obtained. The
technique can be described using equation below
Here |Ŝj(ω)| denotes the enhanced
spectrum estimated in frame j and |De(ω)| is the spectrum of the noise obtained
during non speech activity. With α ≥1 and D < β ≤ 1. Where α is over subtraction
factor and β is the spectral floor parameter. Parameter β controls the amount
of residual noise and the amount of perceived Musical noise. If β is too small,
the musical noise will became audible but the residual noise will be reduced.
If β is too large, then the residual noise will be audible but the musical
issues related to spectral subtraction reduces.
Parameter α affects the amount of
speech spectral distortion. If α is too large then resulting signal will be
severely distorted and intelligibility may suffer. If α is too small noise
remains
in enhanced speech signal. When α
> 1, the subtraction can remove all of the broadband noise by eliminating
most of wide peaks. But the deep valleys surrounding the peaks still remain
in the spectrum [1]. The valleys
between peaks are no longer deep when β>0 compared to when β=0 [4]. Berouti
found that speech processed by equation (7) had less musical noise.
Experimental results showed that for
best noise reduction with the least amount of musical noise, α should be
smaller for high SNR frames and large for low SNR frames. In this way
this method can adapt to various
Signal to Noise ratios by adjusting the α and β and reduce the musical noise.
The parameter values have to be set optimally so that the best enhancement
performance can be achieved. It can be done using NSS algorithm [21]2. In the
same year 1979, S.F.Boll [3] also proposed method for removal of acoustic noise
in speech. In this method a spectral estimator is used to compute the spectral
error and then four methods are used to minimize the error. Speech, suitably
low-pass filtered and digitized, is analyzed by windowing data from
half-overlapped input data buffers. The magnitude spectral of the windowed data
are calculated and the spectral noise bias calculated during non speech
activity is subtracted off.
Resulting negative amplitudes are
then zeroed out. Secondary residual noise suppression is then applied. A time
waveform is recalculated from the modified magnitude. This waveform
is then overlap added to the previous data to
generate the output speech.Consider that a windowed noise signal n(k) has been
added to a windowed speech signal s(k), with their sum denoted by x(k) )
No comments:
Post a Comment