Can`t solve TypeError: '>' not supported between instances of 'NoneType' and 'int' - linux

I have a long list of audio files, and some of them are longer than an hour. I am using Jupyter notebook, Python 3.6 and TinyTag library to get a duration of audio. My code below goes over the files and if a file is longer than an hour, it splits the file into one-hour long pieces, and a leftover piece less than an hour, and copies the pieces as fname_1,fname_2, etc. The code was working for the previous datasets I tried, but this time after running for a while, I get the error below. I don`t know where this is coming from and how to fix it, I have already read the similar titled questions but their contents were different. Thanks in advance.
# fpaths is the list of filepaths
for i in range(0,len(fpaths)):
fpath=fpaths[i]
fname=os.path.basename(fpath)
fname0=os.path.splitext(fname)[0] #name without extension
tag = TinyTag.get(fname)
if tag.duration > 3600:
cmd2 = "ffmpeg -i %s -f segment -segment_time 3600 -c copy %s" %(fpath, fname0) + "_%d.wav"
os.system(cmd2)
os.remove(fpath)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-79d0ceebf75d> in <module>()
7 fname0=os.path.splitext(fname)[0]
8 tag = TinyTag.get(fname)
----> 9 if tag.duration > 3600:
10 cmd2 = "ffmpeg -i %s -f segment -segment_time 3600 -c copy %s" %(fpath, fname0) + "_%d.wav"
11 os.system(cmd2)
TypeError: '>' not supported between instances of 'NoneType' and 'int'

Seems like some of those results do not have a duration
Perhaps change it to:
if tag.duration and tag.duration > 3600:
.....

Related

SED style Multi address in Python?

I have an app that parses multiple Cisco show tech files. These files contain the output of multiple router commands in a structured way, let me show you an snippet of a show tech output:
`show clock`
20:20:50.771 UTC Wed Sep 07 2022
Time source is NTP
`show callhome`
callhome disabled
Callhome Information:
<SNIPET>
`show module`
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
1 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
2 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
3 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
4 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
21 0 Fabric Module N9K-C9504-FM-R ok
22 0 Fabric Module N9K-C9504-FM-R ok
23 0 Fabric Module N9K-C9504-FM-R ok
<SNIPET>
My app currently uses both SED and Python scripts to parse these files. I use SED to parse the show tech file looking for a specific command output, once I find it, I stop SED. This way I don't need to read all the file (these can get to be very big files). This is a snipet of my SED script:
sed -E -n '/`show running-config`|`show running`|`show running config`/{
p
:loop
n
p
/`show/q
b loop
}' $1/$file
As you can see I am using a multi address range in SED. My question specifically is, how can I achieve something similar in python? I have tried multiple combinations of flags: DOTALL and MULTILINE but I can't get the result I'm expecting, for example, I can get a match for the command I'm looking for, but python regex wont stop until the end of the file after the first match.
I am looking for something like this
sed -n '/`show clock`/,/`show/p'
I would like the regex match to stop parsing the file and print the results, immediately after seeing `show again , hope that makes sense and thank you all for reading me and for your help
You can use nested loops.
import re
def process_file(filename):
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
print(line)
for line1 in f:
print(line1)
if re.search(r'`show', line1):
return
The inner for loop will start from the next line after the one processed by the outer loop.
You can also do it with a single loop using a flag variable.
import re
def process_file(filename):
in_show = False
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
in_show = True
if in_show
print(line)
if re.search(r'`show', line1):
return

Mimicking bash wc functionalities using python

I have written a very simple python programme, called wc.py, which mimics "bash wc" behaviour to count the number of words, lines and bytes in a file. My programme is as follow:
import sys
path = sys.argv[1]
w = 0
l = 0
b = 0
for currentLine in file:
wordsInLine = currentLine.strip().split(' ')
wordsInLine = [word for word in wordsInLine if word != '']
w += len(wordsInLine)
b += len(currentLine.encode('utf-8'))
l += 1
#output
print(str(l) + ' ' + str(w) + ' ' + str(b))
In order to execute my programme you should execute the following command:
python3 wc.py [a file to read the data from]
As the result it shows
[The number of lines in the file] [The number of words in the file] [The number of bytes in the file] [the file directory path]
The files I used to test my code is as follow:
file.txt which contains the following data:
1
2
3
4
Executing "wc file.txt" returns
4 4 8
Executing "python3 wc.py file.txt" returns 4 4 8
Download "Annual enterprise survey: 2020 financial year (provisional) – CSV" from CSV file download
Executing "wc [fileName].csv" returns
37081 500273 5881081
Executing "python3 wc.py [fileName].csv" returns
37081 500273 5844000
and a [something].pdf file
Executing "wc [something].pdf" works.
Executing "python3 code.py" throws the following errors:
Traceback (most recent call last):
File "code.py", line 10, in <module>
for currentLine in file:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 10: invalid start byte
As you can see, the output of python3 code.py [something].pdf and python3 code.py [something].csv is not the same as what wc returns. Could you help me to find the reason of this erroneous behaviour in my code?
Regarding the CSV file, if you look at the difference between your result and that of wc:
5881081 - 5844000 = 37081 which is exactly the number of lines.
That is, every line has one additional character in the original file. That character is the carriage return \r which got lost in Python because you iterate over lines and don't specify the linebreaks. If you want a byte-correct result, you have to first identify the type of linebreaks used in the file (and watch out for inconsistencies throughout the document).

"wave.Error: unknown format: 3" after using librosa.resample. Is there anything wrong with the output of librosa?

I have a .wav file with a sample rate of 44.1khz, I want to resample it into 16khz by using librosa.resample. Though the output.wav sounds great, and it is 16khz, but I got an error when I'm trying to read it by wave.open.
and this problem is quite similar to mine:
Opening a wave file in python: unknown format: 49. What's going wrong?
This is my code:
if __name__ == "__main__":
input_wav = '1d13eeb2febdb5fc41d3aa7db311fa33.wav'
output_wav = 'result.wav'
y, sr = librosa.load(input_wav, sr=None)
print(sr)
y = librosa.resample(y, orig_sr=sr, target_sr=16000)
librosa.output.write_wav(output_wav, y, sr=16000)
wave.open(output_wav)
And I got error in the last step wave.open(output_wav)
The Exception is following:
Traceback (most recent call last):
File "/Users/range/Code/PycharmProjects/Speaker/test.py", line 204, in <module>
wave.open(output_wav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/wave.py", line 499, in open
return Wave_read(f)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/wave.py", line 163, in __init__
self.initfp(f)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/wave.py", line 143, in initfp
self._read_fmt_chunk(chunk)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/wave.py", line 260, in _read_fmt_chunk
raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 3
I just don't know why can't wave.open read the wav_flie, and I have to resample the wav to do my further work.
I wonder if the librosa.output.write changed the type of wav.
So I have to write the resample function by myself. Fortunately, it works.
This is my code:
def resample(input_wav, output_wav, tar_fs=16000):
audio_file = wave.open(input_wav, 'rb')
audio_data = audio_file.readframes(audio_file.getnframes())
audio_data_short = np.fromstring(audio_data, np.short)
src_fs = audio_file.getframerate()
dtype = audio_data_short.dtype
audio_len = len(audio_data_short)
audio_time_max = 1.0*(audio_len-1) / src_fs
src_time = 1.0 * np.linspace(0, audio_len, audio_len) / src_fs
tar_time = 1.0 * np.linspace(0, np.int(audio_time_max*tar_fs), np.int(audio_time_max*tar_fs)) / tar_fs
output_signal = np.interp(tar_time, src_time, audio_data_short).astype(dtype)
with wave.open(output_wav, 'wb') as f:
f.setnchannels(1)
f.setsampwidth(2)
f.setframerate(tar_fs)
f.writeframes(output_signal)
I hope if you can help me understand what's wrong when resampling the wav by librosa, and I'm glad to see my code can help other people who have the same problem. :)
I was working on a project and had the same error so dug in a bit and found that the issue is due to the default way in which librosa writes the wave file using write_wav() in the output module.
The problem is that the encoding quantification is 24 bit since it is "Floating Point PCM".
You can change bit quantification easily by using SoX. SoX is cross-platform command line utility which you can use to control specifics like the encoding format.
For example, you would do something like this to go from 24 bit encoding to 16 bit encoding:
sox audio.wav -b 16 -e signed-integer modified_audio.wav
(For Linux users): An alternative to sox since I couldn't use it. But I'm successfully convert it with ffmpeg on terminal by using the command:
ffmpeg -i input_wav.wav -ar 44100 -ac 1 -acodec pcm_s16le output_wav.wav
where "ar" = audio rate, and "ac" = audio channels.

Is it possible to delete the file if UnicodeEncodeError occur? [duplicate]

This question already has an answer here:
How to catch all exceptions in Try/Catch Block Python?
(1 answer)
Closed 3 years ago.
My code below goes through each .m4v file in the list and converts them to a .wav file using FFmpeg, and it works. I use python 3 jupyter environment.
for fpath in list:
if (fpath.endswith(".m4v")):
cdir=os.path.dirname(fpath)
os.chdir(cdir)
filename=os.path.basename(fpath)
os.system("ffmpeg -i {0} temp_name.wav".format(filename))
ofnamepath=os.path.splitext(fpath)[0]
temp_name=os.path.join(cdir, "temp_name.wav")
new_name = os.path.join(ofnamepath+'.wav')
os.rename(temp_name,new_name)
old_name=os.path.join(ofnamepath+'.m4v')
os.remove(old_name)
However, for this particular dataset I get the following error;
> UnicodeEncodeError Traceback (most recent call
> last) <ipython-input-10-bd3b17e409fa> in <module>()
>
>
> > 7 os.chdir(cdir)
> > 8 filename=os.path.basename(fpath)
> > ----> 9 os.system("ffmpeg -i {0} temp_name.wav".format(filename))
> > 10 ofnamepath=os.path.splitext(fpath)[0]
> > 11 temp_name=os.path.join(cdir, "temp_name.wav")
>
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 10-16: ordinal not in range(128)
Is it possible to do add an if comment line in the code something like;
if 'UnicodeEncodeError: 'ascii' codec can't encode'
delete that file and continue to the next file?
You can use a try and except block.
If an exception occurs while inside a try block, it will jump to the exception block. What's better is that you can even specify the exception.
Adding this to your code would look something like:
for fpath in list:
if (fpath.endswith(".m4v")):
cdir=os.path.dirname(fpath)
os.chdir(cdir)
filename=os.path.basename(fpath)
try:
os.system("ffmpeg -i {0} temp_name.wav".format(filename))
except UnicodeEncodeError:
print("Some failure message.. Continuing to next..")
# os.remove(filename)
continue # This skips the rest of the current iteration and jumps to the top of the loop.
ofnamepath=os.path.splitext(fpath)[0]
temp_name=os.path.join(cdir, "temp_name.wav")
new_name = os.path.join(ofnamepath+'.wav')
os.rename(temp_name,new_name)
old_name=os.path.join(ofnamepath+'.m4v')
os.remove(old_name)
Uncomment the # os.remove(filename) to have your files deleted. Are you sure you want to permanently delete them?

Problems Reading Zip of Shapefiles without loading memory

I've been trying to adapt Andrew Gaidus shapefile reading routine for my needs. The Jupyter Notebook I'm using acts like it partitioned the disk of my MacBook Pro so I can't read or write to disk. Gaidus has a good procedure for avoiding using disk, but is written for prior version of Python.
Here is the code:
dls = "https://github.com/ItsMeLarry/Coursera_Capstone/raw/master/tl_2010_25009_tract00%202.zip"
lynntracts = ZipFile(io.BytesIO(urllib.request.urlopen(dls).read()))
print("Done")
filenames = [y for y in sorted(lynntracts.namelist()) for ending in ['dbf', 'prj', 'shp', 'shx'] if y.endswith(ending)]
#For some reason, I get 8, instead of 4, filenames. The first 4 start with __MACOSX. I get rid of those. The problem I
#have with the 'TypeError' occurs no matter which set of 4 files I use.
print(filenames[0], 'Example of the 4 files that I remove in the for loop')
for i in range(0,4):
del filenames[0]
print(filenames)
dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
print(r.numRecords)
Opening with io.BytesIO cured the prior problem of byte/str collision. Now see the TypeError for the ZipFile.read. I get the same error if I use io.BytesIO when calling it. Here is error output followed by error info:
Done
__MACOSX/tl_2010_25009_tract00/._tl_2010_25009_tract00.dbf Example of the 4 files that I remove in the for loop
['tl_2010_25009_tract00/tl_2010_25009_tract00.dbf', 'tl_2010_25009_tract00/tl_2010_25009_tract00.prj', 'tl_2010_25009_tract00/tl_2010_25009_tract00.shp', 'tl_2010_25009_tract00/tl_2010_25009_tract00.shx']
TypeError Traceback (most recent call last)
in ()
12 del filenames[0]
13 print(filenames)
---> 14 dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
15 r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
16 print(r.numRecords)
in (.0)
12 del filenames[0]
13 print(filenames)
---> 14 dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
15 r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
16 print(r.numRecords)
TypeError: read() missing 1 required positional argument: 'name'
Clearly, I am a beginner. I've come up empty handed trying to research this. Where do I go? What do I need to understand here? Thanks

Resources