CNTK for waveform input? - nlp

I want to use Neural Networks to classify periodic signals coming from a sensor. I've only done image stuff before with CNTK. I suppose its a bit like NLP in that a continuous waveform in the input -- but in my case it won't be audio, but something else. Can somebody point me to how I might get started on this? Thanks!

Could you check if the following links in the sequential order help ?
https://cntk.ai/pythondocs/Manual_How_to_feed_data.html#Comma-separated-values-(CSV)
https://github.com/Microsoft/CNTK/issues/2199

Related

Image representation for bpm to be compared to a spectrogram

Disclaimer: Complete beginner with neural networks & audio representation. Please bear with me.
I have this idea for my bachelor's thesis (MIR) that involves applying a beat-like time-based pattern to constrain where a CNN-based acoustic model finds onsets/offsets. The problem is that I'm having a hard time figuring out how to implement this concept.
The initial plan was to just insert both the spectrogram and the pattern into the CNN and hope it processes it, but I don't know what format the pattern should be in. I know CNNs are best at processing images but the initial format of said pattern is "time-based" (beats per minute/second). Can this number be represented as an image to be compared to the spectrogram? If so, in what format? Or should I handle this problem in a different way? Thank you in advance!

Image segmentation training set labeling

I am new to pytorch and Deep learning. I am trying to do image segmentation.
But , I am stuck at how to label training set images.
Can anyone please help me ?
This is one of my training image
I have two kinds of plants here - one is weed and another one is a good crop. I need to label them.
Can anyone tell me how can I do this ?
I am going to use deep neural network models ( like ResNet ) on the labelled data.
There is discussions here about segmentation tools for image labeling. You may find it useful.
Try with https://oclavi.com which is a web-based object annotation tool

Image Augmentation of Siamese CNN

I have a task to compare two images and check whether they are of the same class (using Siamese CNN). Because I have a really small data set, I want to use keras imageDataGenerate.
I have read through the documentation and have understood the basic idea. However, I am not quite sure how to apply it to my use case, i.e. how to generate two images and a label that they are in the same class or not.
Any help would be greatly appreciated?
P.S. I can think of a much more convoluted process using sklearn's extract_patches_2d but I feel there is an elegant solution to this.
Edit: It looks like creating my own data generator may be the way to go. I will try this approach.

Neural Networks for Audio/Sound Augmentation

What type of neural net architecture would one use to map sounds to other sounds? Neural nets are great at learning to go from sequences to other sequences, so sound augmentation/generation seems like it'd be a very popular application of them (but unfortunately, it's not - I could only find a (fairly old) magenta project dealing with it, and like, 2 other blog posts).
Assuming I have a sufficiently large dataset of input sounds / output sounds of the same length, how would i format the data? Perhaps train a CNN on spectrograms (something like cycleGAN or pix2pix), maybe use the actual data from the WAV file and use an LSTM? Is there some other type of weird architecture no one has heard about that's good for sound? Help me out please!
To anyone else doing a similar thing - the answer is using fast fourier transforms to get the data into a manageable state, and then people usually use RNNs or LSTMs to do stuff with the data - not CNNs.

Trying to come up with features to extract from sound waves to use for an AI song composer

I am planning on making an AI song composer that would take in a bunch of songs of one instrument, extract musical notes (like ABCDEFG) and certain features from the sound wave, preform machine learning (most likely through recurrent neural networks), and output a sequence of ABCDEFG notes (aka generate its own songs / music).
I think that this would be an unsupervised learning problem, but I am not really sure.
I figured that I would use recurrent neural networks, but I have a few questions on how to approach this:
- What features from the sound wave I should extract so that the output music is melodious?
Also, I have a few other questions as well
- Is it possible, with recurrent neural networks, to output a vector of sequenced musical notes (ABCDEF)?
- Any smart way I can feed in the features of the soundwaves as well as sequence of musical notes?
Well i did something similar once(making a shazam like app on matlab) , i think you can use FFT(Fast Fourier Transform ) to break it down into the constituent frequencies and their corresponding amplitudes .Then you can use the frequency range of different instruments to select them out of the whole bunch and classify .
I already tried something similar with an RNN (Recurrent Neural Network). Try using an LSTM network (Long Short Term Memory), they are a WAY better than RNNs for this type of data processing from what I read afterward, because they do not suffer from the "vanishing gradient problem".
What Chris Thaliyath said is a good hint on how to train the feature detector.

Resources