About Tkinter python 2.76 on Linux Mint 17.2 - linux

I have 2 functions as below:
def select_audio():
os.chdir("/home/norman/songbook")
top1.lower(root)
name=tkFileDialog.askopenfilename()
doit="play " + name
top1.lift(root)
os.system(doit)
def select_video():
os.chdir("/home/norman/Videos")
top2.lower(root)
name=tkFileDialog.askopenfilename()
doit="mpv --fs " + name
top2.lift(root)
os.system(doit)
They are selected from buttons to allow choosing and playing audio files or video files.
They work to some extent.
Videos are in a different directory and at the same level as the audio files.
It doesn't matter which I choose first I see the correct directory so I can play say a video, if after it's finished I choose audio it still shows the video directory.
Similarly if I first choose audio it still shows the audio directory if I select videos.
I have no idea why it does this. I am not an experienced programmer as you can probably tell from the code.

Some suggestions:
Use a raw string to make sure that Python doesn't try to interpret anything following a \ as an escape sequence:
Change os.chdir("/home/norman/whatever") to os.chdir(r"/home/norman/whatever")
It won't solve this problem, but it will avoid you future problems.
For tkFileDialog use the initialdir option:
Change name=tkFileDialog.askopenfilename() to
name=tkFileDialog.askopenfilename(initialdir=r"home/norman/whatever", parent=root)

Related

ffpmeg - how to detect if video crop is completed?

Thanks in advance.
I'm trying to crop a .mp4 video using an ffmpeg binary (within the context of an electron-react-app).
(The binary is run in a child process using execFile() and outputs to a temp folder which is later deleted)
ffmpeg varies considerably in the time it takes to complete the creation of a cropped video file (1sec to 18sec) depending on the computer (mac vs Windows).
I need to read the cropped video file.
I've set up an event listener in the Main process of electron
if (!monitorCroppedFile) {
console.log(`${croppedFilePath} doesn't exist`);
} else {
console.log(`${croppedFilePath} exists !`)
...readFile...;
Once monitorCroppedFile = true I read it using fs.readfile().
The problem is that ffmpeg initally creates the cropped file path but it sometimes takes ages to complete the process of cropping.
This results in the read file often being blank (as the read is triggered on detecting the file path of the cropped file).
I've tried using -preset ultrafast in the ffmpeg arguments but this only improves things on Windows marginally.
The problem doesn't occur on Macs.
Can anybody suggest a possible solution ? Is there a way to detect when the crop is fully completed ?
Many thanks.
Add -progress FILE to your command where FILE should be a filename. ffmpeg will log processing status to that file. Search for the line progress=end in it. Once you find it, you can read the file.

How do I get started training a custom voice model with Mozilla TTS on Ubuntu 20.04?

I'd like to create a custom voice in Mozilla TTS using audio samples I have recorded but am not sure how to get started. The Mozilla TTS project has documentation and tutorials, but I'm having trouble putting the pieces together -- it seems like there's some basic information missing that someone starting out needs to know to get going.
Some questions I have:
I see that there is a Docker image for Mozilla TTS, but that the documentation for it covers creating speech and doesn't mention training. Can I use the Docker image for training?
If I can't use the Docker image for training, how do I get a functional copy of Mozilla TTS running on my system with Python 3? I've tried following the commands that the project provides, but I get dependency errors, version conflicts, or errors about not having sufficient permission to install packages.
What information do I need in order to train the model? What audio formats do I need? I see that I need a metadata.csv file -- what do I need to put in that file? What do I customize in the config file?
Most of the configs reference a scale_stats.npy file -- how do I generate this?
How do I run the training?
After a lot of research and experimentation, I can share my learnings to answer my own questions.
Can the Mozilla TTS Docker image be used for training (TL;DR: "No")
The Mozilla TTS docker image is really geared for playback and doesn't seem equipped to be used for training. At least, even when running a shell inside the container, I could not get training to work. But after figuring out what was causing PIP to be unhappy, the process of getting Mozilla TTS up and running in Ubuntu turns out to be pretty straightforward.
Installing Mozilla TTS using Python 3, PIP, and a Virtual Environment
The documentation for Mozilla TTS doesn't mention anything about virtual environments, but IMHO it really should. Virtual environments ensure that dependencies for different Python-based applications on your machine don't conflict.
I'm running Ubuntu 20.04 on WSL, so Python 3 is already installed. Given that, from within my home folder, here are the commands I used to get a working copy of Mozilla TTS:
sudo apt-get install espeak
git clone https://github.com/mozilla/TTS mozilla-tts
python3 -m venv mozilla-tts
cd mozilla-tts
./bin/pip install -e .
This created a folder called ~/mozilla-tts in my home folder that contains the Mozilla TTS code. The folder is setup as a virtual environment, which means that as long as I execute python commands via ~/mozilla-tts/bin/python and PIP via ~/mozilla-tts/bin/pip, Python will use only the packages that exist in that virtual environment. That eliminates the need to be root when running pip (since we're not affecting system-wide packages), and it ensures no package conflicts. Score!
Prerequisites for Training a Model
For the best results when training a model, you will need:
Short audio recordings (at least 100) that are:
In 16-bit, mono PCM WAV format.
Between 1 and 10 seconds each.
Have a sample rate of 22050 Hz.
Have a minimum of background noise and distortion.
Have no long pauses of silence at the beginning, throughout the middle, and at the end.
A metadata.csv file that references each WAV file and indicates what text is spoken in the WAV file.
A configuration file tailored to your data set and chosen vocoder (e.g. Tacotron, WavGrad, etc).
A machine with a fast CPU (ideally an nVidia GPU with CUDA support and at least 12 GB of GPU RAM; you cannot effectively use CUDA if you have less than 8 GB OF GPU RAM).
Lots of RAM (at least 16 GB of RAM is preferable).
Preparing the Audio Files
If your source of audio is in a different format than WAV, you will need to use a program like Audacity or SoX to convert the files into WAV format. You should also trim out portions of audio that are just noise, umms, ahs, and other sounds from the speaker that aren't really words you're training on.
If your source of audio isn't perfect (i.e. has some background noise), is in a different format, or happens to be a higher sample rate or different resolution (e.g. 24-bit, 32-bit, etc.), you can perform some clean-up and conversion. Here's a script that is based on an earlier script from the Mozilla TTS Discourse forums:
from pathlib import Path
import os
import subprocess
import soundfile as sf
import pyloudnorm as pyln
import sys
src = sys.argv[1]
rnn = "/PATH/TO/rnnoise_demo"
paths = Path(src).glob("**/*.wav")
for filepath in paths:
target_filepath=Path(str(filepath).replace("original", "converted"))
target_dir=os.path.dirname(target_filepath)
if (str(filepath) == str(target_filepath)):
raise ValueError("Source and target path are identical: " + str(target_filepath))
print("From: " + str(filepath))
print("To: " + str(target_filepath))
# Stereo to Mono; upsample to 48000Hz
subprocess.run(["sox", filepath, "48k.wav", "remix", "-", "rate", "48000"])
subprocess.run(["sox", "48k.wav", "-c", "1", "-r", "48000", "-b", "16", "-e", "signed-integer", "-t", "raw", "temp.raw"]) # convert wav to raw
subprocess.run([rnn, "temp.raw", "rnn.raw"]) # apply rnnoise
subprocess.run(["sox", "-r", "48k", "-b", "16", "-e", "signed-integer", "rnn.raw", "-t", "wav", "rnn.wav"]) # convert raw back to wav
subprocess.run(["mkdir", "-p", str(target_dir)])
subprocess.run(["sox", "rnn.wav", str(target_filepath), "remix", "-", "highpass", "100", "lowpass", "7000", "rate", "22050"]) # apply high/low pass filter and change sr to 22050Hz
data, rate = sf.read(target_filepath)
# peak normalize audio to -1 dB
peak_normalized_audio = pyln.normalize.peak(data, -1.0)
# measure the loudness first
meter = pyln.Meter(rate) # create BS.1770 meter
loudness = meter.integrated_loudness(data)
# loudness normalize audio to -25 dB LUFS
loudness_normalized_audio = pyln.normalize.loudness(data, loudness, -25.0)
sf.write(target_filepath, data=loudness_normalized_audio, samplerate=22050)
print("")
To use the script above, you will need to check out and build the RNNoise project:
sudo apt update
sudo apt-get install build-essential autoconf automake gdb git libffi-dev zlib1g-dev libssl-dev
git clone https://github.com/xiph/rnnoise.git
cd rnnoise
./autogen.sh
./configure
make
You will also need SoX installed:
sudo apt install sox
And, you will need to install pyloudnorm via ./bin/pip.
Then, just customize the script so that rnn points to the path of the rnnoise_demo command (after building RNNoise, you can find it in the examples folder). Then, run the script, passing the source path -- the folder where you have your WAV files -- as the first command line argument. Make sure that the word "original" appears somewhere in the path. The script will automatically place the converted files in a corresponding path, with original changed to converted; for example, if your source path is /path/to/files/original, the script will automatically place the converted results in /path/to/files/converted.
Preparing the Metadata
Mozilla TTS supports several different data loaders, but one of the most common is LJSpeech. To use it, we can organize our data set to follow LJSpeech conventions.
First, organize your files so that you have a structure like this:
- metadata.csv
- wavs/
- audio1.wav
- audio2.wav
...
- last_audio.wav
The naming of the audio files doesn't appear to be significant. But, the files must be in a folder called wavs. You can use sub-folders inside wavs though, if so desired.
The metadata.csv file should be in the following format:
audio1|line that's spoken in the first file
audio2|line that's spoken in the second file
last_audio|line that's spoken in the last file
Note that:
There is no header line.
The columns are joined together with a pipe symbol (|).
There should be one row per WAV file.
The WAV filename is in the first column, without the wavs/ folder prefix, and without the .wav suffix.
The textual description of what's spoken in the WAV is written out in the second column, with all numbers and abbreviations spelled-out.
(I did observe that steps in the documentation for Mozilla TTS have you then shuffle the metadata file and then split it into a "training" set (metadata_train.csv) and "validation" set (metadata_val.csv), but none of the sample configs provided in the repo are actually configured to use these files. I've filed an issue about that because it's confusing/counter-intuitive to a beginner.)
Preparing the config.json File
You need to prepare a configuration file that describes how your custom TTS will be configured. This file is used by multiple parts of Mozilla TTS when preparing for training, performing training, and generating audio from your custom TTS. Unfortunately, though this file is very important, the documentation for Mozilla TTS largely glosses over how to customize this file.
To start, create a copy of the default Tacotron config.json file from the Mozilla repo. Then, be sure to customize at least the audio.stats_path, output_path, phoneme_cache_path, and datasets.path file.
You can customize other parameters if you so choose, but the defaults are a good place to start. For example, you can change the run_name to control the naming of folders containing your datasets.
Do not change the datasets.name parameter (leave it set to "ljspeech"); otherwise you'll get strange errors related to an undefined dataset type. It appears that the dataset name refers to the type of data loader used, rather than what you call your data set. Similarly, I haven't risked changing the model setting, since I don't yet know how that value gets used by the system.
Preparing scale_stats.npy
Most of the training configurations rely on a statistics file called scale_stats.npy that's generated based on the training set. You can use the ./TTS/bin/compute_statistics.py script inside the Mozilla TTS repo to generate this file. This script requires your config.json file as an input, and is a good step to sanity check that everything looks good up to this point.
Here's an example of a command you can run if you are inside the Mozilla TTS folder you created at the start of this tutorial (adjust paths to fit your project):
./bin/python ./TTS/bin/compute_statistics.py --config_path /path/to/your/project/config.json --out_path /path/to/your/project/scale_stats.npy
If successful, this will generate a scale_stats.npy file under /path/to/your/project/scale_stats.npy. Be sure that the path in the audio.stats_path setting of your config.json file matches this path.
Training the Model
It's now time for the moment of truth -- it's time to start training your model!
Here's an example of a command you can run to train a Tacotron model if you are inside the Mozilla TTS folder you created at the start of this tutorial (adjust paths to fit your project):
./bin/python ./TTS/bin/train_tacotron.py --config_path /path/to/your/project/config.json
This process will take several hours, if not days. If your machine supports CUDA and has it properly configured, the process will run more quickly than if you are just relying on CPU alone.
If you get any errors related to a "signal error" or "signal received", this typically indicates that your machine does not have enough memory for the operation. You can run the training with less parallelism but it will run much more slowly.
Note, on windows, following GuyPaddock's advice from prior, I had to use pip install -e. instead of leading with ./bin/pip, and I had to use python instead of python3
Might be obvious to someone else but I am not so familiar with python or path shortcuts in shell being customized etc.

I can´t play one sound with winsound, but I can play an other one with same code

I created a Program that creates a GUI with different buttons. Two of them are supposed to play a sound/music file (mp3) in a loop. For the one file it works, but when I try the other one, it just give the windows sound if something can´t open correctly in a loop. There is no error in the console.
I tried to rename the second file, because I thought it could be because of a number in the title, but it didn´t worked. Then I looked if the file can even be opened and played, which it can. It also isn´t because of a coding mistake, because both sounds have the same code-structure.
bgmusic1() works, the other one not:
def bgmusic1():
PlaySound(None, SND_ASYNC)
PlaySound("sounds/test.mp3", SND_LOOP | SND_ASYNC)
print("hi")
def bgmusic2():
PlaySound(None, SND_ASYNC)
PlaySound("sounds/test2.mp3", SND_LOOP | SND_ASYNC)
print("hi")

Playing a sound in a ipython notebook

I would like to be able to play a sound file in a ipython notebook.
My aim is to be able to listen to the results of different treatments applied to a sound directly from within the notebook.
Is this possible? If yes, what is the best solution to do so?
The previous answer is pretty old. You can use IPython.display.Audio now. Like this:
import IPython
IPython.display.Audio("my_audio_file.mp3")
Note that you can also process any type of audio content, and pass it to this function as a numpy array.
If you want to display multiple audio files, use the following:
IPython.display.display(IPython.display.Audio("my_audio_file.mp3"))
IPython.display.display(IPython.display.Audio("my_audio_file.mp3"))
A small example that might be relevant : http://nbviewer.ipython.org/5507501/the%20sound%20of%20hydrogen.ipynb
it should be possible to avoid gooing through external files by base64 encoding as for PNG/jpg...
The code:
import IPython
IPython.display.Audio("my_audio_file.mp3")
may give an error of "Invalid Source" in IE11, try in other browsers it should work fine.
The other available answers added an HTML element which I disliked, so I created the ringbell, which gets you both play a custom sound as such:
from ringbell import RingBell
RingBell(
sample = "path/to/sample.wav",
minimum_execution_time = 0,
verbose = True
)
and it also gets you a one-lines to play a bell when a cell execution takes more than 1 minute (or a custom amount of time for that matter) or is fails with an exception:
import ringbell.auto
You can install this package from PyPI:
pip install ringbell
If the sound you are looking for could be also a "Text-to-Speech", I would like to mention that every time a start some long process in the background, I queue the execution of a cell like this too:
from IPython.display import clear_output, display, HTML, Javascript
display(Javascript("""
var msg = new SpeechSynthesisUtterance();
msg.text = "Process completed!";
window.speechSynthesis.speak(msg);
"""))
You can change the text you want to hear with msg.text.

Combining multiple audio files in Python (with delay)

I'm looking to combine a range of different audio files (mp3) in Python. One of the requirements is that I need to be able to specify a delay at the end of each file. To illustrate, something like:
[file1.mp3--------3 seconds----------][delay---------2 seconds--------][file2.mp3]-------------4 seconds][delay---------2 seconds][file3.mp3----------3 seconds---------]
Does anyone here know of any mp3 libraries that can accomplish this? Python isn't really a necessity here. If it'll be easier in another language, that'll be fine.
I think FFmpeg can do this, given the right arguments. No real need to use a library.
To combine wav or aiff files, you can do something like this: (inspiration from here)
import aifc
def concatenate(*items):
data = []
for item in items:
f = aifc.open(item, 'rb')
data.append([f.getparams(), f.readframes(f.getnframes())])
f.close()
output = aifc.open('output.aif', 'wb')
output.setparams(data[0][0])
for item in data:
output.writeframes(item[1])
output.close()
See the link for the wav format (it's pretty much the same, but with the wave library)
To add silence, I would just make a one second silent file using your favorite audio editor and then concatenate in the proper amount of silence.

Resources