Dynamic Time Warp with Speech Signal Processing Toolkit (SPTK) output - audio

I'm an IT student and got an assignment to do about Dynamic Time Warping(DTW) using the Speech Signal Processing Toolkit (SPTK) and comparing some words spoken by 2 speakers and finding the similarities.
I managed to get the SPTK working and everything, collected 8 people(4 female, 4 male) who recorded 8 words each for me(same words for every person) and saved them as files with a .wav extension.
My .wav files are: RIFF (little-endian) data, WAVE audio, mono 16000 Hz.
I transfered every .wav file into .short data files.
I transfered every .short file to a .mcep file with this line of code:
x2x +sf < source_maleA.short | frame -l 400 -p 80 | window -l 400 -L 512 | mcep -l 512 -m 20 -a 0.42 > source_maleA.mcep
After that, I went to compare the .mcep files with this line of code:
dtw -m 24 target_maleB.mcep < source_maleA.mcep > source_maleA_target_maleB.dtw
The output of that command line should be a numeric value(probably a float/double/int value) or a few values. The problem is that I'm not sure how to open that .dtw files and in the documentation I get there isn't any good info about that. When I try to open it in any editor or cat it in the terminal, I get some strange letters as an output [picture 1].
In the documentation however it says that with the parameter -s [Score] I can get the score of the DTW process. So I tried it with this command line:
dtw -m 24 -s Scorefile target_maleB.mcep < source_maleA.mcep > source_maleA_target_maleB.dtw
I get a value, but in strange format.
I searched online and in many documentations about the .dtw file and couldn't find anything. I tried to convert the result into another format, but not any luck with that.
Tried to contact my mentor about it, but no answers so far and it's been a while already.
Anyone could give me any suggestion on what to do or anything else?
The documentation can be found on this site : http://sp-tk.sourceforge.net/ (sorry for not link, but still not enough reputation - will remove if I have to), but I don't think it's needed that much, since I think I pretty much understood the DTW process and think I've done it ok, it's just that the output is causing me problems.
Thanks in advance,
Marco.
picture 1

The score file is in float so you have to convert it to asci with the x2x command from SPTK:
x2x +fa scorefile.bin > scorefile.txt

Related

SPARK encoding issue while reading a csv with multiline=true option

I am stuck in an issue while trying to read a csv file with multiline=true option in spark that has characters like Ř and Á. The csv is read in utf-8 format ; But when we try to read the data by using multiline=true we get characters that are not equivalent to the ones that we had read. We get something like ŘÃ�. So essentially a word read as ZŘÁKO gets transformed to ZŘÃ�KO.I went through several other questions asked on stack overflow around the same issue but none of solution actually works !
I tried the following encodings while read/write operations : ‘US-ASCII’
‘ISO-8859-1’,‘UTF-8’,‘UTF-16BE’,‘UTF-16LE’,‘UTF-16’,SJIS and couple more but none of them could give me the expected result. But multiline=false generates the correct output somehow.
I cannot read/write the file as text as the current framework policy of project is around an ingestion framework where we read the file only once and then everything is expected to be done in-memory and I must use multiline as true.
I would really appreciate any thoughts on this matter. Thank You !
sample data:
id|name
1|ZŘÁKO
df=spark.read.format('csv').option('header',true).
option('delimter','|').option('multiline',true).option('encoding','utf-8').load()
df.show()
ouptut :
1|Z�KO
#trying to force utf-8 encoding as below :
df.withColumn("name", sql.functions.encode("name", 'utf-8'))
gives me this :
1|[22 5A c3..]
I tried the above steps with all the supported encodings in spark

Convert ANSYS MECHANICAL files to VTK using APDL

I have followed the script provided here by DaveD here:
How to read Ansys data files in ParaView?
But I am unable to get a result that Paraview can import. I attach a few screenshots, because several warnings came out while the script was being run. I got a vtk as output (360 MB, so I guess it contains something...), but Paraview displays the following error:
ERROR: In C:\glr\builds\paraview\paraview-ci\source-paraview\VTK\IO\Legacy\vtkUnstructuredGridReader.cxx, line 320
vtkUnstructuredGridReader (000001CECD70BC00): Unrecognized keyword: 0.00000e+00
I have never used APDL, so I will be happy if the author of the script or someone experienced using it could tell me what I did wrong (I continued clicking "yes" through all the windows and I got the output.vtk as I mentioned)
Thanks a lot in advance
enter image description here

ffpmeg - how to detect if video crop is completed?

Thanks in advance.
I'm trying to crop a .mp4 video using an ffmpeg binary (within the context of an electron-react-app).
(The binary is run in a child process using execFile() and outputs to a temp folder which is later deleted)
ffmpeg varies considerably in the time it takes to complete the creation of a cropped video file (1sec to 18sec) depending on the computer (mac vs Windows).
I need to read the cropped video file.
I've set up an event listener in the Main process of electron
if (!monitorCroppedFile) {
console.log(`${croppedFilePath} doesn't exist`);
} else {
console.log(`${croppedFilePath} exists !`)
...readFile...;
Once monitorCroppedFile = true I read it using fs.readfile().
The problem is that ffmpeg initally creates the cropped file path but it sometimes takes ages to complete the process of cropping.
This results in the read file often being blank (as the read is triggered on detecting the file path of the cropped file).
I've tried using -preset ultrafast in the ffmpeg arguments but this only improves things on Windows marginally.
The problem doesn't occur on Macs.
Can anybody suggest a possible solution ? Is there a way to detect when the crop is fully completed ?
Many thanks.
Add -progress FILE to your command where FILE should be a filename. ffmpeg will log processing status to that file. Search for the line progress=end in it. Once you find it, you can read the file.

RE: Transferring Python2 to Python3 on This Specific Line

I am attempting to change this line to become acceptable by python3 from a python2 set of source:
Here is the error:
TypeError: unicode strings are not supported, please encode to bytes:
'$PMTK251,9600*17\r\n'
Can anyone tell my why this is this way or how I can change it to suit Python3 methods?
It is a GPS set of source in Python2 that still works but I see that all ideas relating to Python2 will be gone from availability and/or is already pretty much done and gone.
So, my ideas were to update that line and others.
In python3, I receive errors relating to bytes and I have currently read about the idea of (arg, newline='') in source when attempting to make .csv files in Python3.
I am still at a loss w/ how to incorporate Python3 in this specific line.
I can offer more about the line or the rest of the source if necessary. I received this source from toptechboy.com. I do not think that fellow ever updated the source to work w/ Python3.
class GPS:
def __init__(self):
#This sets up variables for useful commands.
#This set is used to set the rate the GPS reports
UPDATE_10_sec = "$PMTK220,10000*2F\r\n" #Update Every 10 Seconds
UPDATE_5_sec = "$PMTK220,5000*1B\r\n" #Update Every 5 Seconds
UPDATE_1_sec = "$PMTK220,1000*1F\r\n" #Update Every One Second
UPDATE_200_msec = "$PMTK220,200*2C\r\n" #Update Every 200 Milliseconds
#This set is used to set the rate the GPS takes measurements
MEAS_10_sec = "$PMTK300,10000,0,0,0,0*2C\r\n" #Measure every 10 seconds
MEAS_5_sec = "$PMTK300,5000,0,0,0,0*18\r\n" #Measure every 5 seconds
MEAS_1_sec = "$PMTK300,1000,0,0,0,0*1C\r\n" #Measure once a second
MEAS_200_msec= "$PMTK300,200,0,0,0,0*2F\r\n" #Meaure 5 times a second
#Set the Baud Rate of GPS
BAUD_57600 = "$PMTK251,57600*2C\r\n" #Set Baud Rate at 57600
BAUD_9600 ="$PMTK251,9600*17\r\n" #Set 9600 Baud Rate
#Commands for which NMEA Sentences are sent
ser.write(BAUD_57600)
sleep(1)
ser.baudrate = 57600
GPRMC_ONLY = "$PMTK314,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0*29\r\n" #Send only the GPRMC Sentence
GPRMC_GPGGA = "$PMTK314,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0*28\r\n"#Send GPRMC AND GPGGA Sentences
SEND_ALL = "$PMTK314,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0*28\r\n" #Send All Sentences
SEND_NOTHING = "$PMTK314,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0*28\r\n" #Send Nothing
...
That is the GPS Class Mr. McWhorter wrote for a GPS Module in python2. I am trying to configure this python2 source into a workable python3 class.
I am receiving errors like "needs to be bytes" and/or "cannot use bytes here".
Anyway, if you are handy w/ Python3 and know where I am making mistakes on this source to transfer it over to Python3, please let me know. I have tried changing the source many times to accept bytes and to be read as a utf-string.
Here: Best way to convert string to bytes in Python 3? <<< This seems like the most popular topic on this subject but it does not answer my question so far (I think).
This line simply works when adding a b for bytes in front of the string...like so.
(b'$PMTK251,9600*17\r\n')
That should rid you of that error of TypeError: unicode strings are not supported, please encode to bytes:

About Tkinter python 2.76 on Linux Mint 17.2

I have 2 functions as below:
def select_audio():
os.chdir("/home/norman/songbook")
top1.lower(root)
name=tkFileDialog.askopenfilename()
doit="play " + name
top1.lift(root)
os.system(doit)
def select_video():
os.chdir("/home/norman/Videos")
top2.lower(root)
name=tkFileDialog.askopenfilename()
doit="mpv --fs " + name
top2.lift(root)
os.system(doit)
They are selected from buttons to allow choosing and playing audio files or video files.
They work to some extent.
Videos are in a different directory and at the same level as the audio files.
It doesn't matter which I choose first I see the correct directory so I can play say a video, if after it's finished I choose audio it still shows the video directory.
Similarly if I first choose audio it still shows the audio directory if I select videos.
I have no idea why it does this. I am not an experienced programmer as you can probably tell from the code.
Some suggestions:
Use a raw string to make sure that Python doesn't try to interpret anything following a \ as an escape sequence:
Change os.chdir("/home/norman/whatever") to os.chdir(r"/home/norman/whatever")
It won't solve this problem, but it will avoid you future problems.
For tkFileDialog use the initialdir option:
Change name=tkFileDialog.askopenfilename() to
name=tkFileDialog.askopenfilename(initialdir=r"home/norman/whatever", parent=root)

Resources