Understanding audio file spectrogram values - audio

I am currently struggling to understand how the power spectrum is stored in the kaldi framework.
I seem to have successfully created some data files using
$cmd JOB=1:$nj $logdir/spect_${name}.JOB.log \
compute-spectrogram-feats --verbose=2 \
scp,p:$logdir/wav_spect_${name}.JOB.scp ark:- \| \
copy-feats --compress=$compress $write_num_frames_opt ark:- \
ark,scp:$specto_dir/raw_spectogram_$name.JOB.ark,$specto_dir/raw_spectogram_$name.JOB.scp
Which gives me a large file with data point for different audio files, like this.
The problem is that I am not sure on how I should interpret this data set, I know that prior to this an fft is performed, which I guess is a good thing.
The output example given above is from a file which is 1 second long.
all the standard has been used for computing the spectogram, so the sample frequency should be 16 kHz, framelength = 25 ms and overlap = 10 ms.
The number of data points in the first set is 25186.
Given these informations, can I interpret the output in some way?
Usually when one performs fft, the frequency bin size can be extracted by F_s/N=bin_size where F_s is the sample frequency and N is the FFT length. So is this the same case? 16000/25186 = 0.6... Hz/bin?
Or am I interpreting it incorrectly?

Usually when one performs fft, the frequency bin size can be extracted by F_s/N=bin_size where F_s is the sample frequency and N is the FFT length.
So is this the same case? 16000/25186 = 0.6... Hz/bin?
The formula F_s/N is indeed what you would use to compute the frequency bin size. However, as you mention N is the FFT length, not the total number of samples. Based on the approximate 25ms framelength, 10ms hop size and the fact that your generated output data file has 98 lines of 257 values for some presumably real-valued input, it would seem that the FFT length used was 512. This would give you a frequency bin size of 16000/512 = 31.25 Hz/bin.
Based on this scaling, plotting your raw data with the following Matlab script (with the data previously loaded in the Z matrix):
fs = 16000; % 16 kHz sampling rate
hop_size = 0.010; % 10 millisecond
[X,Y]=meshgrid([0:size(Z,1)-1]*hop_size, [0:size(Z,2)-1]*fs/512);
surf(X,Y,transpose(Z),'EdgeColor','None','facecolor','interp');
view(2);
xlabel('Time (seconds)');
ylabel('Frequency (Hz)');
gives this graph (the dark red regions are the areas of highest intensity):

Related

Splitforlder without shuffling data - How to split dataset among train, test and validation on disk without shuffling the information (python)?

I am working on a deep learning model that uses a large amount of time series related data. As the data is too big to be loaded in RAM at once, I will use keras train_on_batch to train the model reading data from disk.
I am looking for a simple and fast process to split the data among train, validation and test folders.
I´ve tried "splitfolder" function, but could not deactivate the data shuffling (what is inapropriate for time series related data). Arguments on this function documentation does not inclued an option to turn shuffle on/off.
Code I´ve tried:
import splitfolders
input_folder = r"E:\Doutorado\apagar"
splitfolders.ratio(input_folder, output = r'E:\Doutorado\apagardivididos', ratio=(0.7, 0.2, 0.1),
group_prefix=None)
Resulting split data is shuffled, but this shuffle is a problem for my time series analysis...
source: https://pypi.org/project/split-folders/
splitfolders.ratio("input_folder", output="output",
seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values
Usage:
splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] [--move] folder_with_images
Options:
--output path to the output folder. defaults to output. Get created if non-existent.
--ratio the ratio to split. e.g. for train/val/test .8 .1 .1 -- or for train/val .8 .2 --.
--fixed set the absolute number of items per validation/test set. The remaining items constitute
the training set. e.g. for train/val/test 100 100 or for train/val 100.
Set 3 values, e.g. 300 100 100, to limit the number of training values.
--seed set seed value for shuffling the items. defaults to 1337.
--oversample enable oversampling of imbalanced datasets, works only with --fixed.
--group_prefix split files into equally-sized groups based on their prefix
--move move the files instead of copying

Strategy for data handling/custom colors in ShaderMaterial

I need strategic help to get going.
I need to animate a large network of THREE.LineSegments (about 150k segments, static) with custom (vertex) colors (RGBA). For each segment, I have 24 measured values over 7 days[so 5.04 × 10^7 measured values, or 2.016 × 10^8 vec4 color buffer array size, about 770Mb in float32array size].
The animation is going through each hour for each day in 2.5 second steps and needs to apply an interpolated color to each segment on a per-frame-basis (via time delta). To be able to apply an alpha value to the vertex colors I need to use a THREE.ShaderMaterial with a vec4 color attribute.
What I don´t get my head around is how to best handle that amount of data per vertex. Some ideas are to
calculate the RGBA values in render loop (between current color array and the one for the coming hour via interpolation with a time delta) and update the color buffer attribute[I expect a massive drop in framerate]
have a currentColor and nextColor buffer attribute (current hour and next hour), upload both anew in every step (2.5 sec) to the GPU and do the interpolation in the shader (with aditional time delta uniform)[seems the best option to me]
upload all data to the GPU initially, either with multiple custom attributes or one very large buffer array, and do the iteration and interpolation in the shader[might be even better if possible; I know I can set the offset for the buffer array to read in for each vertex, but not sure if that works as I think it does...]
do something in between, like upload the data for one day in a chunk instead of either all data or hourly data
Do the scenario and the ideas make sense? If so:
What would be the most suitable way of doing this?
I appreciate any additional advice.

Creating a measure that combines a percentage with a low decimal number?

I'm working on a project in Tableau (which uses functions very similar to Excel, if that helps) where I need a single measurement derived from two different measurements, one of which is a low decimal number (2.95 on the high end, 0.00667 on the low) and the other is a percentage (ranging from 29.8 to 100 percent).
Put another way, I have two tables detailing bus punctuality -- one is for high frequency routes and measured in Excess Waiting Time (EWT, in minutes), the other for low frequency routes and measured in terms of percent on time. I have a map of all the routes, and want to colour the lines based on how punctual that route is (thinner lines for routes with a low EWT or a high percentage on time; thicker lines for routes with high EWT or low percentage on time). In preparation for this, I've combined both tables and zero'd out the non-existent value.
I thought I'd do something like log(EWT + PercentOnTime), but am realizing that might not give the value I'm wanting (especially because I ultimately need an inverse of one or the other, since low EWT is favourable and high % on time favourable).
Any idea how I'd do this? Thanks!
If you are combining/comparing the metrics in an even manner and the data is relatively linear then all you need to do is normalise them.
If you have the EWT expected ranges (eg. 0.00667 to 2.95). Then a 2 would be
(2 - 0.00667)/(2.95 - 0.00667) = 0.67723 but because EWT is the inverse semantically to punctuality we need to use 1-0.67723 = 0.32277.
If you do the same for the Punctuality percentage range:
Eg. 80%
(80 - 29.8)/(100-29.8) = 0.7169
You can compare these metrics because they are normalised (between 0-1 : multiply by 100 to get percentages) if you are assuming the underlying metrics (1-EWT) and on time percentage (OTP) are analogous.
Thus you can combine these into a single table. You will want to ignore all zero'd values as this is actually an indication you have no data at these points.
you'll have to use an if statement to say something like :
if OTP > 0 then ([OTP-29.8])/(100-29.8) else (1-(([EWT]-0.00667)/(2.95-0.00667)))
Hope this helps.

Packet profile from netflow

I have netflow data from previous month in files per 5 minutes and I would like to do a packet profile of all this traffic. I need percentage representation of 1 packet flows, 2 packet flows etc. It is possible to do it in categories like 1 packet flow, 1-100 packet flows, 100 and more... Its not so important. But my question is how to do it. How to do percentage representation of data which I can't add together? Something like do percentage representation for every file and then do some type of average from it?
What do you mean with "I can't add together"? Actually you can do that with nfdump, if you look at the manual: -R expr /dir/file1:file2 Read all files from file1 to file2. For istance
nfdump -R /yournetflowfolder/nfcapd.201204051609:nfcapd.201204051639
will gather NetFlow informations from 16:09 to 16:39. Then you can do whatever query you need on that data.
It sounds like you're describing a histogram: You create 'bins' of the size you describe with the raw counts. The sum of the counts for the bins is the total number of sessions. To get the percentages of the total traffic, you just normalize by dividing each bin by the total flow count.
So, if you do a two-bin histogram where the first bin is the count of all sessions with < 100 packet flows and the other 100+ packet flows (note that there can't be gaps or overlaps), and it works out to 30 flows in the former and 60 in the latter, then the total number of flows is 90, and you have 33% of the flows being fewer than 100 packets.
When working with multiple files, the trick is to always use the same bin delineations and to store and work with the raw counts as long as possible and only derive the %s as the very last step. You can add together histograms with no trouble as long as their bins mean the same thing, and then when you normalize the result, you have for each bin the total percent for all files. If you're going to need to add a file, just keep track of the raw counts so that you can re-normalize when there's new data.
You can do this in a tool like Matlab pretty easily, but be careful because many of these tools will very kindly auto-determine bin widths for you. So, the histogram for one file might have bins {x < 100, 100 <= x < 200, x >= 200} and another file, {x < 90, 90 <= x < 180, x >=180} and you won't be able to add the results together.

splitting a flac image into tracks

This is a follow up question to Flac samples calculation.
Do I implement the offset generated by that formula from the beginning of the file or after the metadata where the stream starts (here)?
My goal is to programmatically divide the file myself - largely as a learning exercise. My thought is that I would write down my flac header and metadata blocks based on values learned from the image and then the actual track I get from the master image using my cuesheet.
Currently in my code I can parse each metadata block and end up where the frames start.
Suppose you are trying to decode starting at M:S.F = 3:45.30. There are 75 frames (CDDA sectors) per second, and obviously there are 60 seconds per minute. To convert M:S.F from your cue sheet into a sample offset value, I would first calculate the number of CDDA sectors to the desired starting point: (((60 * 3) + 45) * 75) + 30 = 16,905. Since there are 75 sectors per second, assuming the audio is sampled at 44,100 Hz there are 44,100 / 75 = 588 audio samples per sector. So the desired audio sample offset where you will start decoding is 588 * 16,905 = 9,940,140.
The offset just calculated is an offset into the decompressed PCM samples, not into the compressed FLAC stream (nor in bytes). So for each FLAC frame, calculate the number of samples it contains and keep a running tally of your position. Skip FLAC frames until you find the one containing your starting audio sample. At this point you can start decoding the audio, throwing away any samples in the FLAC frame that you don't need.
FLAC also supports a SEEKTABLE block, the use of which would greatly speed up (and alter) the process I just described. If you haven't already you can look at the implementation of the reference decoder.

Resources