I have a long wav file with the repetition of the same syllable (/da/). I recorded two channels in the audio file:
Top - Speech signal
Bottom - Triggers occurring when the syllable was produced
I saved the timing of triggers in another file but now I also need to know when the syllables ended.
The trigger at the beginning is very accurate and is related to other file so I want to keep this timing.
How should I extract the timing when the syllables end? Can it be done in Praat or do I need something else?
Thanks in advance.
I managed to extract timing from the continuous file using Speech Filing System (SFS - http://www.phon.ucl.ac.uk/resource/sfs/).
Using GUI it can be located in Tools>Speech>Annotate>Find multiple endpoints.
There is also a command line tool which allows automatizing the analysis:
http://www.phon.ucl.ac.uk/resource/sfs/help/man/npoint.htm
Once the annotation is ready, timing and labels can be extracted to an external file (e.g. txt, csv, or xml).
Related
I have a flow where iOS app users will record a large video file and upload it to our server. After the fact, the user might want to extract certain portions of that larger video based on specific time stamps and generate a highlight reel that can be viewed and shared locally back on the iOS device.
As a FE developer I don't really have much experience with where to even start here. Our BE will be built in NodeJS. It seems to me that this should be a relatively straightforward problem to solve, but I don't know.
Are there APIs that make movie manipulation easy? Can I easily extract a clip based on a start and stop time and save that as a separate file? Are those costly tasks? Or not too bad?
I'm guessing that the response to this call would be a list of a series of file names that have been generated as a result of these clips being generated, that the iOS app could then pull down and load.
It's not quite as straightforward as it might seem as video files are quite structured with header information and indexing into the individual video and audio tracks and frames. Any splitting up or cropping needs to allow for this and also create new files with the correct headers and indexing etc.
Fortunately, there are indeed libraries that you can use to do this type of thing, one of the most powerful being ffmpeg.
There are projects which allow the ffmpeg command line tool be used programatically - the advantage of this approach is that you get to leverage the vast community knowledge base for ffmpeg command line.
One of the popular ones for nodejs is:
https://github.com/damianociarla/node-ffmpeg
You can then look at the ffmpeg documentation or community answers to find the particularly functionality you need - for example to crop video at a start and end time as you asked:
https://stackoverflow.com/a/42827058/334402
https://superuser.com/a/704118
The general idea is quite simple and will be of the format:
ffmpeg -i yourInputVideo.mp4 -ss 01:30:00 -to 02:30:00 -c copy copy yourNewOutputVideo.mp4
It's worth taking a look at the seeking info in the ffmpeg online documentation (https://ffmpeg.org/ffmpeg.html) to help understand the examples, especially the second one above:
-ss position (input/output)
When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.
When used as an output option (before an output url), decodes but discards input until the timestamps reach position.
position must be a time duration specification, see (ffmpeg-utils)the Time duration section in the ffmpeg-utils(1) manual.
I've been asked to find the actual runtime of a batch of files. Each of these files contains voice and silences (guided meditation type), and I need to find a way to measure the runtime of just the voice.
The manual way of doing this is opening a file, looking at the wave, identifying the silences and removing them so the final duration of the file is the "just voice" runtime. This can take me 3-4 minutes per file, and that's just too much for a batch of 1800 files.
So my question is: is there a way to automatically delete the silent parts? And if so, can it be scripted or automated in any way?
In my studio we work with Sound Forge and ProTools.
ProTools has this built in, select the region and edit->strip silence.
SoX can do this if you want to set up some scripts without using ProTools(nice blog post)
I have a situation here. Suppose I have two short audio files which contains some sounds. Suppose, first file has sound 'hello'(audio 1) and second file has 'bye'(audio 2) spoken by someone. There is another audio file which has 'hello'(audio 3) spoken by the same person but is a different recording.
How can I detect that audio 3 is similar to audio 1 (irrespective of the speaker)? I'm here dealing with sounds and not only speech. So there can be a whistle sound also in place of the words.
You would have to program a statistical analysis of each file, then use pattern matching to determine the level of similarity between them.
The simplest solution for words would be to license an api version of a speech engine such as Dragon, then convert the audio files to text output and compare them.
On TED.com they have transcriptions and they go to the appropriate section of the video when clicking a part of the transcription.
I want to do this for 80 hours of audios and transcriptions I have, on Linux with OSS.
This is the approach I'm thinking:
Start small with a 30 minuite sample
Split the audio up into 2 minute WAV file formatted chunks, even if it breaks words up
Run the phrase spotter from CMU Sphinx's long-audio-aligner on each chunk, with the transcript
Take the time index for identified words/phrases found in each bit and calculate the actual estimated time of the ngrams in the original audio file.
Does this seem like an efficient approach? Has anyone actually done this?
Are there alternate approaches that are worth trying like dumb word counting that may be accurate enough?
You can just feed all your audio and text in a long audio aligner and it will give you the timestamps of the words. Using this timestamps you can jump to the specific word in a file.
I'm not sure why do you want to split your audio or do something else.
I want to take a classical music piece in .mp3 (or other audio file if necessary) file and take the same music piece in *.midi file. then - I want to synchronize between them so as a result only the midi file would change and the timing of its beat would be synchronized with the .mp3. So lets say - if I would play them both on the same time they would play the same notes synchronizly.
How can I do so?
(I have cubase if the answer might be there...)
It's a tough task because general beat-tracking (follow tempo changes) hasn't yet been figured out.
There's at least one tool that does work though for matching an audio file to a midi file, assuming the audio file is almost identical to the midi file in terms of the score. But I can't remember it's named, never have used it. The place is to ask is the Music Information Retrieval community of scientists:
http://listes.ircam.fr/wws/info/music-ir
For manual mathcing, you can use modern DAW's like Logic, Pro Tools, etc, to help you with this by providing reasonably nice tools to build a detailed tempo-map of the audio file, and then the MIDI file would line right up with it, but it's a tedious task. You'll likely need tempo changes more often than every measure to get a nice alignment - it will be style-dependent.
You could use tools that already exist. For example, if you know the tempo of the mp3, then you could use this page to change the tempo on the midi file.