Research | My Site 1

DC Removal

DC removal eliminates the direct current (DC) component from a signal, leaving only the alternating current (AC) components. A high-pass filter (HPF) achieves this by allowing high frequencies to pass while blocking low frequencies, including DC. In a simple RC high-pass filter, the transfer function is H(s) = sRC / (1 + sRC), where R is the resistance and C is the capacitance. The cutoff frequency, which determines the transition point, is given by f_c = 1 / (2πRC).

PDM2PCM

Converting Pulse-Density Modulation (PDM) to Pulse-Code Modulation (PCM) involves filtering, decimation, high-pass filtering, and gain adjustment. PDM signals are high-frequency bitstreams, with pulse density representing signal amplitude. The process includes low-pass filtering to remove high-frequency noise, decimation to reduce the bit rate (using: ), high-pass filtering to eliminate DC offset and low-frequency noise, and gain adjustment to normalize signal amplitude. These steps ensure the PCM output accurately reflects the original analog signal.

Noise Reduction

Noise reduction involves minimizing unwanted noise from a signal to enhance its quality. Noise reduction in the time-frequency domain using spectral subtraction aims to restore the power or magnitude of a signal corrupted by noise. The process involves estimating the noise spectrum N_hat(f) by selecting noise-only segments, computing their Fourier transforms, and averaging these spectra to get

The noisy signal Y(f) is the sum of the clean signal X(f) and the noise N(f): Y(f)=X(f)+N(f). The clean signal spectrum X_hat(f) is obtained by X_hat(f)=Y(f)−N_hat(f). The extent of the subtraction can be varied by applying a scaling factor α : X_hat(f)=Y(f)−αN_hat(f).

Transmission and Reception

Transmission involves sending a signal using only half the bandwidth, achieved through Single Side Band (SSB) modulation. We start by reading the signal and determining its frequency, then create the SSB signal using a Hilbert transform to obtain the analytic signal xa(t), which is multiplied by a complex exponential to shift the frequency: . The SSB signal is then demodulated by multiplying it with a cosine function, reducing its amplitude by half, and filtered using a Low Pass Filter (LPF) to remove high-frequency components: y(t) = LPF { SSB(t) ⋅ cos (2πft) }.

Pitch Estimation

Pitch estimation is the process of identifying the fundamental frequency of a periodic signal. There are three main methods for pitch estimation: time-domain, frequency-domain, and cepstrum-domain. Our approach utilizes the time-domain method due to its relatively fast computation time, which is suitable for real-time applications. Specifically, we employ the autocorrelation function (ACF), given by:

, where wav is the input signal, N is the number of samples, and τ\tauτ is the delay. The peak in the ACF indicates the fundamental period of the signal, helping to identify the pitch.

Acoustic Gain Control

Acoustic gain control addresses the issue of significant audio level variations within a file, which can cause loss of detail or distortion due to excessively high volume. It manages dynamic range differences in audio files to ensure consistent volume levels and prevent distortion. The algorithm reads a WAV file, converts audio data to a Numpy array, and calculates the RMS amplitude: and peak amplitude Apeak=max (∣x1∣,∣x2∣,…,∣xN∣). It then adjusts the gain G= Vtarget / Vcurrent, where Vtarget and Vcurrent are the desired and current dBFS levels.

This ensures clear and consistent output before saving the processed data to a new WAV file.

Voice Activity Detector

Voice Activity Detection (VAD) aims to distinguish speech segments from silence or background noise in an audio file. The algorithm divides the audio signal into frames and analyzes each frame to determine if it contains speech or silence using the formula d[i]=VAD(x[i:i+N−1]), where x[i : i+N−1] represents a frame of the audio signal and d[i] is the binary value for that frame. The result of this analysis is a binary vector where 1 indicates speech and 0 indicates silence or noise. This process enhances the efficiency of speech recognition and audio indexing by focusing on the relevant parts of the signal.

Short Time Fourier Transform

STFT (Short-Time Fourier Transform) is a method used to analyze the frequency content of a signal over time. It achieves this by dividing the signal into overlapping segments and applying the Fourier Transform to each segment. Mathematically, this can be expressed as

where x(t) is the signal in time, w(t) is the window function, tau represents the time shift, and ω is the angular frequency. The result is a complex-valued matrix representing the magnitude and phase information of the signal at different frequencies and time intervals.

Decimation and Interpolation

Decimation and interpolation are two fundamental techniques in digital signal processing. Decimation involves reducing the sampling rate of a signal by removing some of its samples, effectively downsampling the data. Interpolation, on the other hand, increases the sampling rate of a signal by adding new samples between the existing ones, effectively upsampling the data. Both processes are essential for tasks such as resizing images, converting audio sampling rates, and improving the efficiency of data transmission and storage.

Speed Down/Up

Speeding up or slowing down a signal involves changing its time scale. For a given signal x(t), speeding up the signal by a factor of α\alphaα is done by x(αt), where α > 1. Conversely, slowing down the signal by a factor of α is achieved by x(t/α), where α > 1. In speech processing, this technique is used to adjust the playback speed without altering the pitch.

Research Final Presentation

Welcome visitors to your site with a short, engaging introduction.

Double click to edit and add your own text.

'Noise Reduction Based on Modified Spectral Subtraction Method'

https://www.researchgate.net/publication/50264702_Noise_Reduction_Based_on_Modified_Spectral_Subtraction_Method

'SPEECH PITCH DETECTION'

https://hajim.rochester.edu/ece/sites/zduan/teaching/ece472/projects/2014/Lio_Chen_SpeechPitchDetection.pdf

'Design of a Multi-Stage PDM to PCM Decimation Pipeline'

https://tomverbeure.github.io/2020/12/20/Design-of-a-Multi-Stage-PDM-to-PCM-Decimation-Pipeline.html