How to calculate audio file size? - audio

You have 30 seconds audio file sampled at a rate of 44.1 KHz and quantized using 8 bits ; calculate the bit rate and the size of mono and stereo versions of this file ؟؟

The bitrate is the number of bits per second.
bitrate = bitsPerSample * samplesPerSecond * channels
So in this case for stereo the bitrate is 8 * 44100 * 2 = 705,600kbps
To get the file size, multiply the bitrate by the duration (in seconds), and divide by 8 (to get from bits to bytes):
fileSize = (bitsPerSample * samplesPerSecond * channels * duration) / 8;
So in this case 30 seconds of stereo will take up (8 * 44100 * 2 * 30) / 8 = 2,646,000 bytes

Assuming uncompressed PCM audio...
time * sampleRate * bitsPerSample * channelCount
For 30 seconds mono audio at 44.1kHz, 8bps, that's 1,323,000 bytes. For stereo, that's two channels, so double it.

Formula = Sample rate x sample bit x # of channels x time in seconds / 8x1024
CD Quality (Sample Rate) = 44.1Khz
Size of mono = (44 100 x 8 x 1 x 30) / 8 x 1024
= 1291.99KB
= 1.26 MB
Size of Stereo = (44 100 x 8 x 2 x 30) / 8 x 1024
= 2583.98 KB
= 2.52 MB
≈ 2.5 MB

Related

How to multiply an audio signal from wav file by a fixed sine wave in Python3?

I have some audio stream in a Wav file and I need to multiply it by a fixed sine wave at a given frequency. Also, the volume of the input Wav should not differ from the volume of the sinusoidal signal. How can i do this?
I was able to do the multiplication of one sinusoid by another, but did not understand how to do the multiplication of data from a wav file.
import wave, os, bitstring
import numpy as np
from matplotlib import pyplot as plt
SAMPLE_RATE = 41600 # Hertz
DURATION = 578 # Seconds
def generate_sine_wave(freq, sample_rate, duration):
x = np.linspace(0, duration, sample_rate * duration, endpoint=False)
frequencies = x * freq
# 2pi because np.sin takes radians
y = np.sin((2 * np.pi) * frequencies)
return x, y
_, nice_tone = generate_sine_wave(1, SAMPLE_RATE, DURATION)
_, noise_tone = generate_sine_wave(50, SAMPLE_RATE, DURATION)
noise_tone = noise_tone * 0.3
mixed_tone = nice_tone + noise_tone
Plt img

How to process output into signed 16-bit big-endian integers

need to make this equation output the data as signed 16-bit big-endian integers for a wave (WAV) file: how is the most efficient way to express that in this equation?
import numpy as np
import pyglet
import sox
# sample rate in Hz
sample_rate = 44100
# generate a 1-second sine tone at 440 Hz
y = np.sin(2 * np.pi * 440.0 * np.arange(sample_rate * 1.0) / sample_rate)
print(y) #prints first and last in array y (its floating point)
# really need signed 16 bit big endian integers for a wave file

How to generate a sine wave in python?

To generate a csv file where each column is a data of sine wave of frequency 1 Hz, 2 Hz, 3Hz, 4Hz, 5Hz, 6Hz and 7 Hz. The amplitude is one volt. There should be 100 points in one cycle and thus 700 points in seven waves.
Here is how I will go about it:
import pandas as pd
import numpy as np
freqs = list(range(1, 9))
time = np.linspace(0, 2*np.pi, 100)
data = {f"{freq}Hz": np.sin(2 * np.pi * freq * time) for freq in freqs}
df = pd.DataFrame(data)
df.head()

FFT plot of raw PCM comes wrong for higher frequency in python

Here I am using fft function of numpy to plot the fft of PCM wave generated from a 10000Hz sine wave. But the amplitude of the plot I am getting is wrong.
The frequency is coming correct using fftfreq function which I am printing in the console itself. My python code is here.
import numpy as np
import matplotlib.pyplot as plt
frate = 44100
filename = 'Sine_10000Hz.bin' #signed16 bit PCM of a 10000Hz sine wave
f = open('Sine_10000Hz.bin','rb')
y = np.fromfile(f,dtype='int16') #Extract the signed 16 bit PCM value of 10000Hz Sine wave
f.close()
####### Spectral Analysis #########
fft_value = np.fft.fft(y)
freqs = np.fft.fftfreq(len(fft_value)) # frequencies associated with the coefficients:
print("freqs.min(), freqs.max()")
idx = np.argmax(np.abs(fft_value)) # Find the peak in the coefficients
freq = freqs[idx]
freq_in_hertz = abs(freq * frate)
print("\n\n\n\n\n\nfreq_in_hertz")
print(freq_in_hertz)
for i in range(2):
print("Value at index {}:\t{}".format(i, fft_value[i + 1]), "\nValue at index {}:\t{}".format(fft_value.size -1 - i, fft_value[-1 - i]))
#####
n_sa = 8 * int(freq_in_hertz)
t_fft = np.linspace(0, 1, n_sa)
T = t_fft[1] - t_fft[0] # sampling interval
N = n_sa #Here it is n_sample
print("\nN value=")
print(N)
# 1/T = frequency
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.xlim(0,15000)
# 2 / N is a normalization factor Here second half of the sequence gives us no new information that the half of the FFT sequence is the output we need.
plt.bar(f[:N // 2], np.abs(fft_value)[:N // 2] * 2 / N, width=15,color="red")
Output comes in the console (Only minimal prints I am pasting here)
freqs.min(), freqs.max()
-0.5 0.49997732426303854
freq_in_hertz
10000.0
Value at index 0: (19.949569768991054-17.456031216294324j)
Value at index 44099: (19.949569768991157+17.45603121629439j)
Value at index 1: (9.216783424692835-13.477631008179145j)
Value at index 44098: (9.216783424692792+13.477631008179262j)
N value=
80000
The frequency extraction is coming correctly but in the plot something I am doing is incorrect which I don't know.
Updating the work:
When I am change the multiplication factor 10 in the line n_sa = 10 * int(freq_in_hertz) to 5 gives me correct plot. Whether its correct or not I am not able to understand
In the line plt.xlim(0,15000) if I increase max value to 20000 again is not plotting. Till 15000 it is plotting correctly.
I generated this Sine_10000Hz.bin using Audacity tool where I generate a sine wave of freq 10000Hz of 1sec duration and a sampling rate of 44100. Then I exported this audio to signed 16bit with headerless (means raw PCM). I could able to regenerate this sine wave using this script. Also I want to calculate the FFT of this. So I expect a peak at 10000Hz with amplitude 32767. You can see i changed the multiplication factor 8 instead of 10 in the line n_sa = 8 * int(freq_in_hertz). Hence it worked. But the amplitude is showing incorrect. I will attach my new figure here
I'm not sure exactly what you are trying to do, but my suspicion is that the Sine_10000Hz.bin file isn't what you think it is.
Is it possible it contains more than one channel (left & right)?
Is it realy signed 16 bit integers?
It's not hard to create a 10kHz sine wave in 16 bit integers in numpy.
import numpy as np
import matplotlib.pyplot as plt
n_samples = 2000
f_signal = 10000 # (Hz) Signal Frequency
f_sample = 44100 # (Hz) Sample Rate
amplitude = 2**3 # Arbitrary. Must be > 1. Should be > 2. Larger makes FFT results better
time = np.arange(n_samples) / f_sample # sample times
# The signal
y = (np.sin(time * f_signal * 2 * np.pi) * amplitude).astype('int16')
If you plot 30 points of the signal you can see there are about 5 points per cycle.
plt.plot(time[:30], y[:30], marker='o')
plt.xlabel('Time (s)')
plt.yticks([]); # Amplitude value is artificial. hide it
If you plot 30 samples of the data from Sine_10000Hz.bin does it have about 5 points per cycle?
This is my attempt to recreate the FFT work as I understand it.
fft_value = np.fft.fft(y) # compute the FFT
freqs = np.fft.fftfreq(len(fft_value)) * f_sample # frequencies for each FFT bin
N = len(y)
plt.plot(freqs[:N//2], np.abs(fft_value[:N//2]))
plt.yscale('log')
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
I get the following plot
The y-axis of this plot is on a log scale. Notice that the amplitude of the peak is in the thousands. The amplitude of most of the rest of the data points are around 100.
idx_max = np.argmax(np.abs(fft_value)) # Find the peak in the coefficients
idx_min = np.argmin(np.abs(fft_value)) # Find the peak in the coefficients
print(f'idx_max = {idx_max}, idx_min = {idx_min}')
print(f'f_max = {freqs[idx_max]}, f_min = {freqs[idx_min]}')
print(f'fft_value[idx_max] {fft_value[idx_max]}')
print(f'fft_value[idx_min] {fft_value[idx_min]}')
produces:
idx_max = 1546, idx_min = 1738
f_max = -10010.7, f_min = -5777.1
fft_value[idx_max] (-4733.232076236707+219.11718299533203j)
fft_value[idx_min] (-0.17017443966211232+0.9557200531465061j)
I'm adding a link to a script I've build that outputs the FFT with ACTUAL amplitude (for real signals - e.g. your signal). Have a go and see if it works:
dt=1/frate in your constellation....
https://stackoverflow.com/a/53925342/4879610
After a long home work I could able to find my issue. As I mentioned in the Updating the work: the reason was with the number of samples which I took was wrong.
I changed the two lines in the code
n_sa = 8 * int(freq_in_hertz)
t_fft = np.linspace(0, 1, n_sa)
to
n_sa = y.size //number of samples directly taken from the raw 16bits
t_fft = np.arange(n_sa)/frate //Here we need to divide each samples by the sampling rate
This solved my issue.
My spectral output is
Special thanks to #meta4 and #YoniChechik for giving me some suggestions.

Tensorflow: Input Pipeline very slow / does not scale

I'm trying to set up a Tensorflow input pipeline for feeding images into an AlexNet for feature extraction (not for training, this is a one of thing). Since AlexNet is rather small it is crucial to provide input data at a high rate for achieving acceptable performance (~1000 images / second).
My images are 400x300 JPEGs with 24KB per image on average.
Unfortunately it seems, that the Tensorflow input pipeline can't keep up with a GTX1080 running the AlexNet.
My input pipeline is simple: load a file, decode the image, resize it and batch them.
I created a small benchmark to show the issue:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import time
import glob
import os
IMAGE_DIR = 'images'
EPOCHS = 1
def main():
print('batch_size\tnum_threads\tms/image')
for batch_size in [16, 32, 64, 128]:
for num_threads in [1, 2, 4, 8]:
run(batch_size, num_threads)
def run(batch_size, num_threads):
filenames = glob.glob(os.path.join(IMAGE_DIR, '*.jpg'))
(filename,) = tf.train.slice_input_producer(
[filenames],
capacity=2 * batch_size * num_threads,
num_epochs=EPOCHS)
raw = tf.read_file(filename)
decoded = tf.image.decode_jpeg(raw, channels=3)
resized = tf.image.resize_images(decoded, [227, 227])
batch = tf.train.batch(
[resized],
batch_size,
num_threads,
2 * batch_size * num_threads,
enqueue_many=True)
init_op = tf.group(
tf.global_variables_initializer(),
tf.local_variables_initializer())
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
t = time.time()
try:
while not coord.should_stop():
sess.run(batch)
except tf.errors.OutOfRangeError:
pass
finally:
coord.request_stop()
tpe = (time.time() - t) / (len(filenames) * EPOCHS) * 1000
print('{: <11}\t{: <10}\t{: <8}'
.format(batch_size, num_threads, tpe))
coord.join(threads)
if __name__ == "__main__":
main()
Running this on a MacBook Pro (early 2015, 2,9 GHz Intel Core i5) yields the following results:
batch_size num_threads ms/image
16 1 4.81571793556
16 2 3.00584602356
16 4 2.94281005859
16 8 2.94555711746
32 1 3.51123785973
32 2 1.82255005836
32 4 1.85884213448
32 8 1.88741898537
64 1 2.9537730217
64 2 1.58108997345
64 4 1.57125210762
64 8 1.57615303993
128 1 2.71797513962
128 2 1.67120599747
128 4 1.6521999836
128 8 1.6885869503
It shows overall bad performance far from 1/ms per image. Also it does not scale beyond two threads, which in this case is to be expected since it is a dual core processor only.
Running the same benchmark on a 2.5Ghz AMD Opteron 6180 SE with 24 cores yields the following:
batch_size num_threads ms/image
16 1 13.983194828
16 2 6.80965399742
16 4 6.67097783089
16 8 6.63090395927
32 1 12.0395629406
32 2 5.72535085678
32 4 4.94155502319
32 8 4.99696803093
64 1 10.9073989391
64 2 4.96317911148
64 4 3.76832485199
64 8 3.82816386223
128 1 10.2617599964
128 2 5.20488095284
128 4 3.16122984886
128 8 3.51550602913
Here, too, single threaded / overall performance is very bad and it does not scale beyond 2/4 threads.
The systems are neither IO nor CPU bound in any of the cases. For both systems loading and resizing the images with OpenCV gives far better numbers (~0.86ms/image in the MacBook, which in this case is CPU bound and up to ~0.22ms/image on the server, which in this case is IO bound).
What's going on with Tensorflow here? How can I speed this up?
I already tried to assemble a batch of images manually and use enqueue_many for batching, this made things even worse. I tried to add a small sleep before running the loop, just to make sure the queues are filled - but no luck.
Any help is greatly appreciated.

Resources