How to define the condition of a corrupted file for audio file in Python - python-3.x

I am using Python 3.6, Jupyter notebook by connecting to a remote machine. I have a large dataset of mp3 files. I use FFmpeg (version is 2.8.14-0ubuntu0.16.04.1.) to convert mp3 files to wav format.
My code below goes over the file path list and if the file is mp3 it converts it to wav format and deletes the mp3 file. The code works but for a few files it stops and gives error. I opened those files and saw that they have no duration and each of them has size 600 looking at the terminal folder size column but it might be a coincidence. The error is file not found for 'temp_name.wav'.
I can see that these corrupted files are not able to be converted to wav. When I delete them manually and run the code again it works. But I have large datasets and cannot know which files are corrupted beforehand. Is there a way to make the code (before converting the file to wav) if the file is corrupted it deletes it and continues to next file. I just don`t know how to define the condition of a corrupted file or if the file cannot be converted to wav.
# npaths is the list of full file paths
for fpath in npaths:
if (fpath.endswith(".mp3")):
cdir=os.path.dirname(fpath) # extract the directory of file
os.chdir(cdir) # change the directory to cdir
filename=os.path.basename(fpath) # extract the filename from the path
os.system("ffmpeg -i {0} temp_name.wav".format(filename))
ofnamepath=os.path.splitext(fpath)[0] # filename without extension
temp_name=os.path.join(cdir, "temp_name.wav")
new_name = os.path.join(ofnamepath+'.wav')
os.rename(temp_name,new_name) # use original filename with wav ext
old_file = os.path.join(ofnamepath+'.mp3') # find and delete the mp3
os.remove(old_file)

Related

Using Python to copy contents of multiple files and paste in a main file

I'll start by mentioning that I've no knowledge in Python but read online that it could help me with my situation.
I'd like to do a few things using (I believe?) a Python script.
I have a bunch of .yml files that I want to transfer the contents into one main .yml file (let's call it Main.yml). However, I'd also like to be able to take the name of each individual .yml and add it before it's content into Main.yml as "##Name". If possible, the script would look like each file in a directory, instead of having to list every .yml file I want it to look for (my directory in question only contains .yml files). Not sure if I need to specify, but just in case: I want to append the contents of all files into Main.yml & keep the indentation (spacing). P.S. I'm on Windows
Example of what I want:
File: Apes.yml
Contents:
Documentation:
"Apes":
year: 2009
img: 'link'
After running the script, my Main.yml would like like:
##Apes.yml
Documentation:
"Apes":
year: 2009
img: 'link'
I'm just starting out in Python too so this was a great opportunity to see if my newly learned skills work!
I think you want to use the os.walk function to go through all of the files and folders in the directory.
This code should work - it assumes your files are stored in a folder called "Folder" which is a subfolder of where your Python script is stored
# This ensures that you have the correct library available
import os
# Open a new file to write to
output_file = open('output.txt','w+')
# This starts the 'walk' through the directory
for folder , sub_folders , files in os.walk("Folder"):
# For each file...
for f in files:
# create the current path using the folder variable plus the file variable
current_path = folder+"\\"+f
# write the filename & path to the current open file
output_file.write(current_path)
# Open the file to read the contents
current_file = open(current_path, 'r')
# read each line one at a time and then write them to your file
for line in current_file:
output_file.write(line)
# close the file
current_file.close()
#close your output file
output_file.close()

Extract tar.gz{some integer} in python

I am trying to extract a file name with this format--> filename.tar.gz10
I have tried mutpile wayd but for all of them, I get the error that is unknow format. it works fine for files ends with tar.gz00. I tried to change the name but still does not work.
Here are what I have tried,
import tarfile
file = tarfile.open('filename.tar.gz10')
file.extractall('./extracted_path')
file.close()
Another way is,
shutil.unpack_archive('./filename.tar.gz10', './extracted_path', 'tar.gz17')
Thanks for your help in advance.
This coule be because the archive was split into smaller chunks, on linux you could do so using the split -b command so one big file is actually multiple smaller ones now, and they are named like
file.tar.gz01
file.tar.gz02
file.tar.gz03
file.tar.gz04
etc...
you wont be able to decompress these file individually, so you have to concatenate them first into one file then decompress.
To verify whther it was split or not, run file {filename} and if does not recognize it as a gzip compressed archive then it is propably split (this is why you get unknown format error)
You can try to do the following:
from glob import glob
import os
path = '/path/to/' # location of your files
list_of_files = glob(path + '*.tar.gz*') # list all gzip files
bash_command = 'gzip -dk filename.tar.gz' + ' '.join(list_of_files) # create bash command to concatenate the files
os.system(bash_command)

How do I write a python script to read through files in a Linux directory and perform certain actions?

I need to write a python script to read through files in a directory, retrieve the header record (which contains date)? I need to compare the date in the header record of each file with current date and if the difference is greater than 30 days. I need to delete such files.
I managed to come up with below code but not sure how to proceed since I am new to Python.
Example:
Sample file in the directory (/tmp/ah): abcdedfgh1234.123456
Header record : FILE-edidc40: 20200602-123539 46082 /tmp/ah/srcfile
I have below code for the list of files in the current directory. I need to pass the python equivalent of below actions on unix files
head -1 file|cut -c 15-22
Output: 20200206 (to compare with current date and if older than 30) then delete file(using rm command).
import os
def files in os.listdir(path):
for files in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
yield file
for file in files(".") : # prints the list of files

Change huge amount of data from NIST to RIFF wav file

So, I am writing a speech recognition program. To do that I downloaded 400MB of data from TIMIT. When I inteded to read the wav files (I tried two libraries) as follow:
import scipy.io.wavfile as wavfile
import wave
(fs, x) = wavfile.read('../data/TIMIT/TRAIN/DR1/FCJF0/SA1.WAV')
w = wave.open('../data/TIMIT/TRAIN/DR1/FCJF0/SA1.WAV')
In both cases they have the problem that the wav file format says 'NIST' and it must be in 'RIFF' format. (Something about sph also I readed but the nist file I donwloaded are .wav, not .sph).
I downloaded then SOX from http://sox.sourceforge.net/
I added the path correctly to my enviromental variables so that my cmd recognize sox. But I can't really find how to use it correctly.
What I need now is a script or something to make sox change EVERY wav file format from NIST to RIFF under certain folder and subfolder.
EDIT:
in reading a WAV file from TIMIT database in python I found a response that worked for me...
Running sph2pipe -f wav input.wav output.wav
What I need is a script or something that searches under a folder, all subfolders that contain a .wav file to apply that line of code.
Since forfiles is a Windows command, here is a solution for unix.
Just cd to the upper folder and type:
find . -name '*.WAV' | parallel -P20 sox {} '{.}.wav'
You need to have installed parallel and sox though, but for Mac you can get both via brew install. Hope this helps.
Ok, I got it finally. Go to the upper folder and run this code:
forfiles /s /m *.wav /c "cmd /c sph2pipe -f wav #file #fnameRIFF.wav"
This code searches for every file and make it readble for the python libs. Hope it helps!

Problems reading .bz2 or .tar.bz2 files as hdf5 in R

I downloaded some files with extension .tar.bz2. I was able to untar these into folders containing .bz2 files. These should unzip as hdf5 files (Metadata said they were hdf5) , but they unzip into files with no extensions I have tried the following but didnt work:
untar("File.tar.bz2")
#Read lines of one of the files from the unzipped file
readLines(bzfile("File1.bz2"))
[1] "‰HDF" "\032"
library (rhdf5)
#Explore just as a bzip2 file
bzfile("File1.bz2")
description "File1.bz2"
class "bzfile"
mode "rb"
text "text"
opened "closed"
can read "yes"
can write "yes"
#Try to read as hdf5 using rhdf5 library
h5ls(bzfile("File1.bz2"))
Error in h5checktypeOrOpenLoc(). Argument neither of class H5IdComponent nor a character.
Is there some sort of encoding I need to do? What am I missing? What should I do?

Resources