Create 2D numpy array from buffer - python-3.x

Consider a system with n_channels transmitting n_samples at a given sampling rate. The 1D buffer containing the timestamps and the 2D buffer containing (n_channels, n_samples) is:
from ctypes import c_double, c_float
# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)()
ts_buffer = (c_double * n_samples)()
I have a C++ binary library that fills the buffer. The function can be summarized as:
from ctypes import byref
fill_buffers(
byref(data_buffer),
byref(ts_buffer),
)
At this point, I have 2 filled buffers, one with 2048 elements (timestamps) and one with 3* 2048 elements (data). I want to load as efficiently as possible those 2 buffers in a numpy array.
np.frombuffer seems amazing to read 1D array, e.g. the timestamps, but I can't find a counterpart for N-dim arrays.
# read from buffer for the 1D array
timestamps = np.frombuffer(ts_buffer) # 192 ns ± 1.11 ns per loop
timestamps = np.array(ts_buffer) # 854 ns ± 2.99 ns per loop
For now, the data array is loaded with:
data = np.array(data_buffer).reshape(-1, n_channels, order="C").T
Any way to use the same efficient method as np.frombuffer while providing the output shape and the order?
This question is different from How can I initialize a NumPy array from a multidimensional buffer? and from How to restore a 2-dimensional numpy.array from a bytestring? since it does not focus on an alternative to np.frombuffer, but an alternative as efficient.
EDIT: Why is np.frombuffer(data_buffer).reshape(-1, n_channels).T not working? With 3 channels and 1024 points (to speed-up my testing), I get len(data_buffer) = 3072, but:
np.array(data_buffer).reshape(-1, 3).T.size = 3072
np.frombuffer(data_buffer).reshape(-1, 3).T.size = 1536
The application is a LabStreamingLayer buffer. The buffer is filled here https://github.com/labstreaminglayer/liblsl-Python/blob/87276974a311bcf7ceb3383e9d04c6bdcf302771/pylsl/pylsl.py#L854-L861
using the C++ library https://github.com/sccn/liblsl with specifically this function https://github.com/sccn/liblsl/blob/08aa186326e9a339316b7d5677ef31b3651b4aad/src/lsl_inlet_c.cpp#L180-L185

Does np.frombuffer(data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T not work correctly? As you are doing it np.array treats the buffer as a 1D array until you reshape it anyways.
For me the following code produces the right shapes. (Hard to verify if it works correctly without a MWE for the data that should be in the buffers).
import numpy as np
from ctypes import c_double, c_float
# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)() # Note that c_float is typically 32 bytes while c_double and numpy's default is 64 bytes
ts_buffer = (c_double * n_samples)()
# Create a mock buffer
input_data = np.arange(0,n_data_values, dtype=c_float)
input_data_buffer = input_data.tobytes()
timestamps = np.frombuffer(ts_buffer)
# Note to specify the data type for the array of floats
data = np.frombuffer(input_data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T
# data has values 0,1,2 for first time point, 3,4,5 for second, and so on

Related

What is the fastest way to generate / cast a float32 matrix from an uint8 or uint16 C-Pointer for pytorch?

I am currently trying to access multiple live network cameras for machine learning.
The problem is that each camera has a resolution of ~ 20 MPix and therefore produces large amounts of data which need preprocessing before the next frame arrives.
The SDK gives access to the raw C pointer of camera to speed up data acquisition.
The pointer is either uint8 or uint16 per Pixel, depending on the camera model and settings.
The goal is to extract the image from the data and provide it as float32 pytorch tensor.
My main problem is that my current pipeline, especially the conversion from uint8 to float32 is the main bottleneck of the framerate.
Currently I have to use numpy as a crutch to extract the buffer data, as pytorch is unable to work with uint16 buffers.
Afterwards, I typecast the uint8/16 numpy matrix into a preallocated float32 matrix converting the data types and avoiding malloc overhead of .astype.
Using troch.from_numpy() and an element-wise copy I transfer the data to the shared tensor.
I attached a code snippet as an example where the camera data is replaced by the data of a numpy array.
Just the conversion alone takes from 24 to 25ms, with additional overhead, e.g. transferring data to gpu this results in massive reductions in framerate.
Is there a way to speed this whole process up?
import numpy as np
import torch
import time
# random image data from the camera SDK
pdata = np.random.randint(0, 65535, (5000, 4000, 3), dtype=np.uint16).data
# preallocate memory for typecast to float
float_frame = np.zeros((5000, 4000, 3), dtype=np.float32)
# preallocate memory for "shared tensor"
torch_frame = torch.zeros((5000, 4000, 3), dtype=torch.float32, requires_grad=False, device='cpu')
torch_frame.share_memory_()
# example loop for converting a frame
n = 100 # number of iterations
start = time.perf_counter()
for i in range(n):
# get data from buffer
frame = np.ctypeslib.as_array(pdata, (5000, 4000, 3))
# convert uint8/uint16 to float32
float_frame[:] = frame
# move into shared tensor
torch_frame[:] = torch.from_numpy(float_frame)
# normalize float 0 to 1.0
torch_frame = torch_frame / 65535
time_delta = time.perf_counter() - start
print(f"Time for N: {n} loops: {time_delta:.5}s. With {time_delta / n:.5} seconds per loop.")

FFT plot of raw PCM comes wrong for higher frequency in python

Here I am using fft function of numpy to plot the fft of PCM wave generated from a 10000Hz sine wave. But the amplitude of the plot I am getting is wrong.
The frequency is coming correct using fftfreq function which I am printing in the console itself. My python code is here.
import numpy as np
import matplotlib.pyplot as plt
frate = 44100
filename = 'Sine_10000Hz.bin' #signed16 bit PCM of a 10000Hz sine wave
f = open('Sine_10000Hz.bin','rb')
y = np.fromfile(f,dtype='int16') #Extract the signed 16 bit PCM value of 10000Hz Sine wave
f.close()
####### Spectral Analysis #########
fft_value = np.fft.fft(y)
freqs = np.fft.fftfreq(len(fft_value)) # frequencies associated with the coefficients:
print("freqs.min(), freqs.max()")
idx = np.argmax(np.abs(fft_value)) # Find the peak in the coefficients
freq = freqs[idx]
freq_in_hertz = abs(freq * frate)
print("\n\n\n\n\n\nfreq_in_hertz")
print(freq_in_hertz)
for i in range(2):
print("Value at index {}:\t{}".format(i, fft_value[i + 1]), "\nValue at index {}:\t{}".format(fft_value.size -1 - i, fft_value[-1 - i]))
#####
n_sa = 8 * int(freq_in_hertz)
t_fft = np.linspace(0, 1, n_sa)
T = t_fft[1] - t_fft[0] # sampling interval
N = n_sa #Here it is n_sample
print("\nN value=")
print(N)
# 1/T = frequency
f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.xlim(0,15000)
# 2 / N is a normalization factor Here second half of the sequence gives us no new information that the half of the FFT sequence is the output we need.
plt.bar(f[:N // 2], np.abs(fft_value)[:N // 2] * 2 / N, width=15,color="red")
Output comes in the console (Only minimal prints I am pasting here)
freqs.min(), freqs.max()
-0.5 0.49997732426303854
freq_in_hertz
10000.0
Value at index 0: (19.949569768991054-17.456031216294324j)
Value at index 44099: (19.949569768991157+17.45603121629439j)
Value at index 1: (9.216783424692835-13.477631008179145j)
Value at index 44098: (9.216783424692792+13.477631008179262j)
N value=
80000
The frequency extraction is coming correctly but in the plot something I am doing is incorrect which I don't know.
Updating the work:
When I am change the multiplication factor 10 in the line n_sa = 10 * int(freq_in_hertz) to 5 gives me correct plot. Whether its correct or not I am not able to understand
In the line plt.xlim(0,15000) if I increase max value to 20000 again is not plotting. Till 15000 it is plotting correctly.
I generated this Sine_10000Hz.bin using Audacity tool where I generate a sine wave of freq 10000Hz of 1sec duration and a sampling rate of 44100. Then I exported this audio to signed 16bit with headerless (means raw PCM). I could able to regenerate this sine wave using this script. Also I want to calculate the FFT of this. So I expect a peak at 10000Hz with amplitude 32767. You can see i changed the multiplication factor 8 instead of 10 in the line n_sa = 8 * int(freq_in_hertz). Hence it worked. But the amplitude is showing incorrect. I will attach my new figure here
I'm not sure exactly what you are trying to do, but my suspicion is that the Sine_10000Hz.bin file isn't what you think it is.
Is it possible it contains more than one channel (left & right)?
Is it realy signed 16 bit integers?
It's not hard to create a 10kHz sine wave in 16 bit integers in numpy.
import numpy as np
import matplotlib.pyplot as plt
n_samples = 2000
f_signal = 10000 # (Hz) Signal Frequency
f_sample = 44100 # (Hz) Sample Rate
amplitude = 2**3 # Arbitrary. Must be > 1. Should be > 2. Larger makes FFT results better
time = np.arange(n_samples) / f_sample # sample times
# The signal
y = (np.sin(time * f_signal * 2 * np.pi) * amplitude).astype('int16')
If you plot 30 points of the signal you can see there are about 5 points per cycle.
plt.plot(time[:30], y[:30], marker='o')
plt.xlabel('Time (s)')
plt.yticks([]); # Amplitude value is artificial. hide it
If you plot 30 samples of the data from Sine_10000Hz.bin does it have about 5 points per cycle?
This is my attempt to recreate the FFT work as I understand it.
fft_value = np.fft.fft(y) # compute the FFT
freqs = np.fft.fftfreq(len(fft_value)) * f_sample # frequencies for each FFT bin
N = len(y)
plt.plot(freqs[:N//2], np.abs(fft_value[:N//2]))
plt.yscale('log')
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
I get the following plot
The y-axis of this plot is on a log scale. Notice that the amplitude of the peak is in the thousands. The amplitude of most of the rest of the data points are around 100.
idx_max = np.argmax(np.abs(fft_value)) # Find the peak in the coefficients
idx_min = np.argmin(np.abs(fft_value)) # Find the peak in the coefficients
print(f'idx_max = {idx_max}, idx_min = {idx_min}')
print(f'f_max = {freqs[idx_max]}, f_min = {freqs[idx_min]}')
print(f'fft_value[idx_max] {fft_value[idx_max]}')
print(f'fft_value[idx_min] {fft_value[idx_min]}')
produces:
idx_max = 1546, idx_min = 1738
f_max = -10010.7, f_min = -5777.1
fft_value[idx_max] (-4733.232076236707+219.11718299533203j)
fft_value[idx_min] (-0.17017443966211232+0.9557200531465061j)
I'm adding a link to a script I've build that outputs the FFT with ACTUAL amplitude (for real signals - e.g. your signal). Have a go and see if it works:
dt=1/frate in your constellation....
https://stackoverflow.com/a/53925342/4879610
After a long home work I could able to find my issue. As I mentioned in the Updating the work: the reason was with the number of samples which I took was wrong.
I changed the two lines in the code
n_sa = 8 * int(freq_in_hertz)
t_fft = np.linspace(0, 1, n_sa)
to
n_sa = y.size //number of samples directly taken from the raw 16bits
t_fft = np.arange(n_sa)/frate //Here we need to divide each samples by the sampling rate
This solved my issue.
My spectral output is
Special thanks to #meta4 and #YoniChechik for giving me some suggestions.

Tensorflow Endless eval [duplicate]

i am loading the cifar-10 data set , the methods adds the data to tensor array , so to access the data i used .eval() with session , on a normal tf constant it return the value , but on the labels and the train set which are tf array it wont
1- i am using docker tensorflow-jupyter
2- it uses python 3
3- the batch file must be added to data folder
i am using the first batch [data_batch_1.bin]from this file
http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
As notebook:
https://drive.google.com/open?id=0B_AFMME1kY1obkk1YmJHcjV0ODA
The code[As in tensorflow site but modified to read 1 patch] [check the last 7 lines for the data loading] :
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import urllib
import tensorflow as tf
from six.moves import xrange # pylint: disable=redefined-builtin
# Global constants describing the CIFAR-10 data set.
NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 5000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 1000
IMAGE_SIZE = 32
def _generate_image_and_label_batch(image, label, min_queue_examples,
batch_size, shuffle):
"""Construct a queued batch of images and labels.
Args:
image: 3-D Tensor of [height, width, 3] of type.float32.
label: 1-D Tensor of type.int32
min_queue_examples: int32, minimum number of samples to retain
in the queue that provides of batches of examples.
batch_size: Number of images per batch.
shuffle: boolean indicating whether to use a shuffling queue.
Returns:
images: Images. 4D tensor of [batch_size, height, width, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
# Create a queue that shuffles the examples, and then
# read 'batch_size' images + labels from the example queue.
num_preprocess_threads = 2
if shuffle:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
else:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
# Display the training images in the visualizer.
tf.image_summary('images', images)
return images, tf.reshape(label_batch, [batch_size])
def read_cifar10(filename_queue):
"""Reads and parses examples from CIFAR10 data files.
Recommendation: if you want N-way read parallelism, call this function
N times. This will give you N independent Readers reading different
files & positions within those files, which will give better mixing of
examples.
Args:
filename_queue: A queue of strings with the filenames to read from.
Returns:
An object representing a single example, with the following fields:
height: number of rows in the result (32)
width: number of columns in the result (32)
depth: number of color channels in the result (3)
key: a scalar string Tensor describing the filename & record number
for this example.
label: an int32 Tensor with the label in the range 0..9.
uint8image: a [height, width, depth] uint8 Tensor with the image data
"""
class CIFAR10Record(object):
pass
result = CIFAR10Record()
# Dimensions of the images in the CIFAR-10 dataset.
# See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
# input format.
label_bytes = 1 # 2 for CIFAR-100
result.height = 32
result.width = 32
result.depth = 3
image_bytes = result.height * result.width * result.depth
# Every record consists of a label followed by the image, with a
# fixed number of bytes for each.
record_bytes = label_bytes + image_bytes
# Read a record, getting filenames from the filename_queue. No
# header or footer in the CIFAR-10 format, so we leave header_bytes
# and footer_bytes at their default of 0.
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
result.key, value = reader.read(filename_queue)
# Convert from a string to a vector of uint8 that is record_bytes long.
record_bytes = tf.decode_raw(value, tf.uint8)
# The first bytes represent the label, which we convert from uint8->int32.
result.label = tf.cast(
tf.slice(record_bytes, [0], [label_bytes]), tf.int32)
# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
[result.depth, result.height, result.width])
# Convert from [depth, height, width] to [height, width, depth].
result.uint8image = tf.transpose(depth_major, [1, 2, 0])
return result
def inputs(eval_data, data_dir, batch_size):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
eval_data: bool, indicating if one should use the train or eval data set.
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames=[];
filenames.append(os.path.join(data_dir, 'data_batch_1.bin') )
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
print(filenames)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for evaluation.
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
width, height)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(resized_image)
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=False)
sess = tf.InteractiveSession()
train_data,train_labels = inputs(False,"data",6000)
print (train_data,train_labels)
train_data=train_data.eval()
train_labels=train_labels.eval()
print(train_data)
print(train_labels)
sess.close()
You must call tf.train.start_queue_runners(sess) before you call train_data.eval() or train_labels.eval().
This is a(n unfortunate) consequence of how TensorFlow input pipelines are implemented: the tf.train.string_input_producer(), tf.train.shuffle_batch(), and tf.train.batch() functions internally create queues that buffer records between different stages in the input pipeline. The tf.train.start_queue_runners() call tells TensorFlow to start fetching records into these buffers; without calling it the buffers remain empty and eval() hangs indefinitely.

How does shuffle = 'batch' argument of the .fit() layer work in the background?

When I train the model using the .fit() layer there is the argument shuffle preset to True.
Let's say that my dataset has 100 samples and that the batch size is 10. When I set shuffle = True then keras first randomly selects randomly the samples (now the 100 samples have a different order) and on the new order it will start creating the batches: batch 1: 1-10, batch 2: 11-20 etc.
If I set shuffle = 'batch' how is it supposed to work in the background? Intuitively and using the previous example of 100 samples dataset with batch size = 10 my guess would be that keras first allocates the samples to the batches (i.e. batch 1: samples 1-10 following the dataset original order, batch 2: 11-20 following the dataset original order as well, batch 3 ... so on so forth) and then shuffles the order of the batches. So the model now will be trained on the randomly ordered batches say for example: 3 (contains samples 21 - 30), 4 (contains samples 31 - 40), 7 (contains samples 61 - 70), 1 (contains samples 1 - 10), ... (I made up the order of the batches).
Is my thinking right or am I missing something?
Thanks!
Looking at the implementation at this link (line 349 of training.py) the answer seems to be positive.
Try this code for checking:
import numpy as np
def batch_shuffle(index_array, batch_size):
"""Shuffles an array in a batch-wise fashion.
Useful for shuffling HDF5 arrays
(where one cannot access arbitrary indices).
# Arguments
index_array: array of indices to be shuffled.
batch_size: integer.
# Returns
The `index_array` array, shuffled in a batch-wise fashion.
"""
batch_count = int(len(index_array) / batch_size)
# to reshape we need to be cleanly divisible by batch size
# we stash extra items and reappend them after shuffling
last_batch = index_array[batch_count * batch_size:]
index_array = index_array[:batch_count * batch_size]
index_array = index_array.reshape((batch_count, batch_size))
np.random.shuffle(index_array)
index_array = index_array.flatten()
return np.append(index_array, last_batch)
x = np.array(range(100))
x_s = batch_shuffle(x,10)

Python Librosa : What is the default frame size used to compute the MFCC features?

Using Librosa library, I generated the MFCC features of audio file 1319 seconds into a matrix 20 X 56829. The 20 here represents the no of MFCC features (Which I can manually adjust it). But I don't know how it segmented the audio length into 56829. What is the frame size it takes process the audio?
import numpy as np
import matplotlib.pyplot as plt
import librosa
def getPathToGroundtruth(episode):
"""Return path to groundtruth file for episode"""
pathToGroundtruth = "../../../season01/Audio/" \
+ "Season01.Episode%02d.en.wav" % episode
return pathToGroundtruth
def getduration(episode):
pathToAudioFile = getPathToGroundtruth(episode)
y, sr = librosa.load(pathToAudioFile)
duration = librosa.get_duration(y=y, sr=sr)
return duration
def getMFCC(episode):
filename = getPathToGroundtruth(episode)
y, sr = librosa.load(filename) # Y gives
data = librosa.feature.mfcc(y=y, sr=sr)
return data
data = getMFCC(1)
Short Answer
You can specify the change the length by changing the parameters used in the stft calculations. The following code will double the size of your output (20 x 113658)
data = librosa.feature.mfcc(y=y, sr=sr, n_fft=1012, hop_length=256, n_mfcc=20)
Long Answer
Librosa's librosa.feature.mfcc() function really just acts as a wrapper to librosa's librosa.feature.melspectrogram() function (which is a wrapper to librosa.core.stft and librosa.filters.mel functions).
All of the parameters pertaining to segementation of the audio signal - namely the frame and overlap values - are specified utilized in the Mel-scaled power spectrogram function (with other tune-able parameters specified for nested core functions). You specify these parameters as keyword arguments in the librosa.feature.mfcc() function.
All extra **kwargs parameters are fed to librosa.feature.melspectrogram() and subsequently to librosa.filters.mel()
By Default, the Mel-scaled power spectrogram window and hop length are the following:
n_fft=2048
hop_length=512
So assuming you used the default sample rate (sr=22050), the output of your mfcc function makes sense:
output length = (seconds) * (sample rate) / (hop_length)
(1319) * (22050) / (512) = 56804 samples
The parameters that you are able to tune, are the following:
Melspectrogram Parameters
-------------------------
y : np.ndarray [shape=(n,)] or None
audio time-series
sr : number > 0 [scalar]
sampling rate of `y`
S : np.ndarray [shape=(d, t)]
power spectrogram
n_fft : int > 0 [scalar]
length of the FFT window
hop_length : int > 0 [scalar]
number of samples between successive frames.
See `librosa.core.stft`
kwargs : additional keyword arguments
Mel filter bank parameters.
See `librosa.filters.mel` for details.
If you want to further specify characteristics of the mel filterbank used to define the Mel-scaled power spectrogram, you can tune the following
Mel Frequency Parameters
------------------------
sr : number > 0 [scalar]
sampling rate of the incoming signal
n_fft : int > 0 [scalar]
number of FFT components
n_mels : int > 0 [scalar]
number of Mel bands to generate
fmin : float >= 0 [scalar]
lowest frequency (in Hz)
fmax : float >= 0 [scalar]
highest frequency (in Hz).
If `None`, use `fmax = sr / 2.0`
htk : bool [scalar]
use HTK formula instead of Slaney
Documentation for Librosa:
librosa.feature.melspectrogram
librosa.filters.mel
librosa.core.stft

Resources