Is it possible to delete the file if UnicodeEncodeError occur? [duplicate] - python-3.x

This question already has an answer here:
How to catch all exceptions in Try/Catch Block Python?
(1 answer)
Closed 3 years ago.
My code below goes through each .m4v file in the list and converts them to a .wav file using FFmpeg, and it works. I use python 3 jupyter environment.
for fpath in list:
if (fpath.endswith(".m4v")):
cdir=os.path.dirname(fpath)
os.chdir(cdir)
filename=os.path.basename(fpath)
os.system("ffmpeg -i {0} temp_name.wav".format(filename))
ofnamepath=os.path.splitext(fpath)[0]
temp_name=os.path.join(cdir, "temp_name.wav")
new_name = os.path.join(ofnamepath+'.wav')
os.rename(temp_name,new_name)
old_name=os.path.join(ofnamepath+'.m4v')
os.remove(old_name)
However, for this particular dataset I get the following error;
> UnicodeEncodeError Traceback (most recent call
> last) <ipython-input-10-bd3b17e409fa> in <module>()
>
>
> > 7 os.chdir(cdir)
> > 8 filename=os.path.basename(fpath)
> > ----> 9 os.system("ffmpeg -i {0} temp_name.wav".format(filename))
> > 10 ofnamepath=os.path.splitext(fpath)[0]
> > 11 temp_name=os.path.join(cdir, "temp_name.wav")
>
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 10-16: ordinal not in range(128)
Is it possible to do add an if comment line in the code something like;
if 'UnicodeEncodeError: 'ascii' codec can't encode'
delete that file and continue to the next file?

You can use a try and except block.
If an exception occurs while inside a try block, it will jump to the exception block. What's better is that you can even specify the exception.
Adding this to your code would look something like:
for fpath in list:
if (fpath.endswith(".m4v")):
cdir=os.path.dirname(fpath)
os.chdir(cdir)
filename=os.path.basename(fpath)
try:
os.system("ffmpeg -i {0} temp_name.wav".format(filename))
except UnicodeEncodeError:
print("Some failure message.. Continuing to next..")
# os.remove(filename)
continue # This skips the rest of the current iteration and jumps to the top of the loop.
ofnamepath=os.path.splitext(fpath)[0]
temp_name=os.path.join(cdir, "temp_name.wav")
new_name = os.path.join(ofnamepath+'.wav')
os.rename(temp_name,new_name)
old_name=os.path.join(ofnamepath+'.m4v')
os.remove(old_name)
Uncomment the # os.remove(filename) to have your files deleted. Are you sure you want to permanently delete them?

Related

SED style Multi address in Python?

I have an app that parses multiple Cisco show tech files. These files contain the output of multiple router commands in a structured way, let me show you an snippet of a show tech output:
`show clock`
20:20:50.771 UTC Wed Sep 07 2022
Time source is NTP
`show callhome`
callhome disabled
Callhome Information:
<SNIPET>
`show module`
Mod Ports Module-Type Model Status
--- ----- ------------------------------------- --------------------- ---------
1 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
2 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
3 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
4 52 16x10G + 32x10/25G + 4x100G Module N9K-X96136YC-R ok
21 0 Fabric Module N9K-C9504-FM-R ok
22 0 Fabric Module N9K-C9504-FM-R ok
23 0 Fabric Module N9K-C9504-FM-R ok
<SNIPET>
My app currently uses both SED and Python scripts to parse these files. I use SED to parse the show tech file looking for a specific command output, once I find it, I stop SED. This way I don't need to read all the file (these can get to be very big files). This is a snipet of my SED script:
sed -E -n '/`show running-config`|`show running`|`show running config`/{
p
:loop
n
p
/`show/q
b loop
}' $1/$file
As you can see I am using a multi address range in SED. My question specifically is, how can I achieve something similar in python? I have tried multiple combinations of flags: DOTALL and MULTILINE but I can't get the result I'm expecting, for example, I can get a match for the command I'm looking for, but python regex wont stop until the end of the file after the first match.
I am looking for something like this
sed -n '/`show clock`/,/`show/p'
I would like the regex match to stop parsing the file and print the results, immediately after seeing `show again , hope that makes sense and thank you all for reading me and for your help
You can use nested loops.
import re
def process_file(filename):
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
print(line)
for line1 in f:
print(line1)
if re.search(r'`show', line1):
return
The inner for loop will start from the next line after the one processed by the outer loop.
You can also do it with a single loop using a flag variable.
import re
def process_file(filename):
in_show = False
with open(filename) as f:
for line in f:
if re.search(r'`show running-config`|`show running`|`show running config`', line):
in_show = True
if in_show
print(line)
if re.search(r'`show', line1):
return

Mimicking bash wc functionalities using python

I have written a very simple python programme, called wc.py, which mimics "bash wc" behaviour to count the number of words, lines and bytes in a file. My programme is as follow:
import sys
path = sys.argv[1]
w = 0
l = 0
b = 0
for currentLine in file:
wordsInLine = currentLine.strip().split(' ')
wordsInLine = [word for word in wordsInLine if word != '']
w += len(wordsInLine)
b += len(currentLine.encode('utf-8'))
l += 1
#output
print(str(l) + ' ' + str(w) + ' ' + str(b))
In order to execute my programme you should execute the following command:
python3 wc.py [a file to read the data from]
As the result it shows
[The number of lines in the file] [The number of words in the file] [The number of bytes in the file] [the file directory path]
The files I used to test my code is as follow:
file.txt which contains the following data:
1
2
3
4
Executing "wc file.txt" returns
4 4 8
Executing "python3 wc.py file.txt" returns 4 4 8
Download "Annual enterprise survey: 2020 financial year (provisional) – CSV" from CSV file download
Executing "wc [fileName].csv" returns
37081 500273 5881081
Executing "python3 wc.py [fileName].csv" returns
37081 500273 5844000
and a [something].pdf file
Executing "wc [something].pdf" works.
Executing "python3 code.py" throws the following errors:
Traceback (most recent call last):
File "code.py", line 10, in <module>
for currentLine in file:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 10: invalid start byte
As you can see, the output of python3 code.py [something].pdf and python3 code.py [something].csv is not the same as what wc returns. Could you help me to find the reason of this erroneous behaviour in my code?
Regarding the CSV file, if you look at the difference between your result and that of wc:
5881081 - 5844000 = 37081 which is exactly the number of lines.
That is, every line has one additional character in the original file. That character is the carriage return \r which got lost in Python because you iterate over lines and don't specify the linebreaks. If you want a byte-correct result, you have to first identify the type of linebreaks used in the file (and watch out for inconsistencies throughout the document).

Problems Reading Zip of Shapefiles without loading memory

I've been trying to adapt Andrew Gaidus shapefile reading routine for my needs. The Jupyter Notebook I'm using acts like it partitioned the disk of my MacBook Pro so I can't read or write to disk. Gaidus has a good procedure for avoiding using disk, but is written for prior version of Python.
Here is the code:
dls = "https://github.com/ItsMeLarry/Coursera_Capstone/raw/master/tl_2010_25009_tract00%202.zip"
lynntracts = ZipFile(io.BytesIO(urllib.request.urlopen(dls).read()))
print("Done")
filenames = [y for y in sorted(lynntracts.namelist()) for ending in ['dbf', 'prj', 'shp', 'shx'] if y.endswith(ending)]
#For some reason, I get 8, instead of 4, filenames. The first 4 start with __MACOSX. I get rid of those. The problem I
#have with the 'TypeError' occurs no matter which set of 4 files I use.
print(filenames[0], 'Example of the 4 files that I remove in the for loop')
for i in range(0,4):
del filenames[0]
print(filenames)
dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
print(r.numRecords)
Opening with io.BytesIO cured the prior problem of byte/str collision. Now see the TypeError for the ZipFile.read. I get the same error if I use io.BytesIO when calling it. Here is error output followed by error info:
Done
__MACOSX/tl_2010_25009_tract00/._tl_2010_25009_tract00.dbf Example of the 4 files that I remove in the for loop
['tl_2010_25009_tract00/tl_2010_25009_tract00.dbf', 'tl_2010_25009_tract00/tl_2010_25009_tract00.prj', 'tl_2010_25009_tract00/tl_2010_25009_tract00.shp', 'tl_2010_25009_tract00/tl_2010_25009_tract00.shx']
TypeError Traceback (most recent call last)
in ()
12 del filenames[0]
13 print(filenames)
---> 14 dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
15 r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
16 print(r.numRecords)
in (.0)
12 del filenames[0]
13 print(filenames)
---> 14 dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
15 r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
16 print(r.numRecords)
TypeError: read() missing 1 required positional argument: 'name'
Clearly, I am a beginner. I've come up empty handed trying to research this. Where do I go? What do I need to understand here? Thanks

Can`t solve TypeError: '>' not supported between instances of 'NoneType' and 'int'

I have a long list of audio files, and some of them are longer than an hour. I am using Jupyter notebook, Python 3.6 and TinyTag library to get a duration of audio. My code below goes over the files and if a file is longer than an hour, it splits the file into one-hour long pieces, and a leftover piece less than an hour, and copies the pieces as fname_1,fname_2, etc. The code was working for the previous datasets I tried, but this time after running for a while, I get the error below. I don`t know where this is coming from and how to fix it, I have already read the similar titled questions but their contents were different. Thanks in advance.
# fpaths is the list of filepaths
for i in range(0,len(fpaths)):
fpath=fpaths[i]
fname=os.path.basename(fpath)
fname0=os.path.splitext(fname)[0] #name without extension
tag = TinyTag.get(fname)
if tag.duration > 3600:
cmd2 = "ffmpeg -i %s -f segment -segment_time 3600 -c copy %s" %(fpath, fname0) + "_%d.wav"
os.system(cmd2)
os.remove(fpath)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-79d0ceebf75d> in <module>()
7 fname0=os.path.splitext(fname)[0]
8 tag = TinyTag.get(fname)
----> 9 if tag.duration > 3600:
10 cmd2 = "ffmpeg -i %s -f segment -segment_time 3600 -c copy %s" %(fpath, fname0) + "_%d.wav"
11 os.system(cmd2)
TypeError: '>' not supported between instances of 'NoneType' and 'int'
Seems like some of those results do not have a duration
Perhaps change it to:
if tag.duration and tag.duration > 3600:
.....

Why can't I decode('utf-16') success in Python3 (even work in Py2)?

In Py2:
(chr(145) + chr(78)).decode('utf-16')
I got u'\u4e91':
But in Py3:
(chr(145) + chr(78)).encode('utf-8').decode('utf-16')
I got an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x4e in position 2: truncated data
Sometimes, they work in a same way, such as (chr(93) + chr(78)), but sometimes not.
Why? And how can I do this right in Py3?
You have to use latin1 if you want to encode any byte tranparently:
(chr(145) + chr(78)).encode('latin1').decode('utf-16')
#'云'
chr(145) gets encoded with 2 bytes in utf8 (as with all values above 127):
chr(145).encode('utf8')
# b'\xc2\x91'
while it is what you wanted with latin1:
chr(145).encode('latin1')
# b'\x91'

Resources