Is there any way to convert spectrogram into MFCCs directly? If yes, kindly provide python script for that. Thanks
I have finally got the answer!
I have converted power spectrum in db units back to its original scale using https://librosa.org/doc/0.8.1/generated/librosa.db_to_power.html#librosa.db_to_power
Then, I passed power spectrum directly into librosa.feature.mfcc to get mfccs.
Related
I currently have a PNG of a spectrogram such as:
I don't have the original audio file but am wondering if there is a way I can convert this into a SciPy spectrogram object. I was thinking I could try to convert the image to an audio file first, but it seems like there aren't many packages reconstructing spectrogram audio since there's already so much lost data.
Any ideas and suggestions would be appreciated!
I want to horizontally cut the spectrogram of a wav file into 24 pieces,and after measuring the power of each piece, and finally rank the pieces by power orders what should I do please
Could you show some code that you have written to try out the same? It would be easier to help if we have something to build upon and rectify issues, if any.
Additionally please try basic image manipulation to do the same. Instead of cutting you could divide the image into N (here 24) regions and analyze them in parallel using multiprocessing.
I have around 200,000 images that need to be rotated correctly.
Also, 30 images with their corresponding rotated images, how will I train opencv to achieve what I want? Some tips would be appreciated.
I'm using this library for opencv
Thanks!
Open each file using the readImage method as per their examples and the Matrix your callback receives has a rotate function you can use.
I have a 3D model as mesh structure or in .stl/.obj format which I converted to voxels using binvox voxelization tool. Using a Java program, I have done some processing on the voxel grid thus obtained. Now, I wish to covert this voxelized model back into a "smooth" mesh structure (or any other format), which can later be exported to .stl or .obj format.
Can someone suggest how can I achieve the last part, i.e. converting the voxel grid into some format for retrieving back the "smooth" surfaces ? Any help, including pointing to existing tools, or relevant theory in this direction will be appreciated.
Give a try to Marching Cubes algorithm. See http://paulbourke.net/geometry/polygonise/ for more details.
I'm working on an openGL project that involves a speaking cartoon face. My hope is to play the speech (encoded as mp3s) and animate its mouth using the audio data. I've never really worked with audio before so I'm not sure where to start, but some googling led me to believe my first step would be converting the mp3 to pcm.
I don't really anticipate the need for any Fourier transforms, though that could be nice. The mouth really just needs to move around when there's audio (I was thinking of basing it on volume).
Any tips on to implement something like this or pointers to resources would be much appreciated. Thanks!
-S
Whatever you do, you're going to need to decode the MP3s into PCM data first. There are a number of third-party libraries that can do this for you. Then, you'll need to analyze the PCM data and do some signal processing on it.
Automatically generating realistic lipsync data from audio is a very hard problem, and you're wise to not try to tackle it. I like your idea of simply basing it on the volume. One way you could compute the current volume is to use a rolling window of some size (e.g. 1/16 second), and compute the average power in the sound wave over that window. That is, at frame T, you compute the average power over frames [T-N, T], where N is the number of frames in your window.
Thanks to Parseval's theorem, we can easily compute the power in a wave without having to take the Fourier transform or anything complicated -- the average power is just the sum of the squares of the PCM values in the window, divided by the number of frames in the window. Then, you can convert the power into a decibel rating by dividing it by some base power (which can be 1 for simplicity), taking the logarithm, and multiplying by 10.