hebrew encoding issue python

hebrew encoding issue python - python-3.x

I scraped a website containing hebrew characters and saved the data as a txt file.
When I open it in PyCharm it presents it in Hebrew:
"name":"פראמול אף ושיעול LIFE"}
</script><base href="https://shop.super-pharm.co.il/pharmacy/cold/flu/PARAMOL-AF-AND-SHIUL20%2B30/p/254954"/>
<title>
‫LIFE - פראמול אף ושיעול | סופר-פארם‬</title>
but when I open it in a notepad it presesnts it in chinese:
΀籘̉㰀瑨汭挠慬獳∽樠⁳獣瑳慲獮瑩潩獮•楤㵲爢汴•慬杮∽敨㸢格慥㹤㰠慢敳栠敲㵦栢瑴獰⼺猯潨⹰畳数⵲桰牡⹭潣椮⽬慮畴敲猯数楣污昭牯畭慬⽳楤瑥䰯䙉ⵅ佈䑏䅉匭䥌䵍剅⽓⽰〵〶㘷⼢ਾ琼瑩敬ਾ
When I open it with the open() command it presents the info as gibberish even when I use the encode() command
What's the problem?
file_name = str(x) + '_' + save_file_name + '_superpharm'
file_out = open(save_file_name + '/' + file_out, 'wb')
pickle.dump(strsoup, file_out)
enter image description here

Related

Escape Backslash with String Combination

I am working on a program that clicks a button when it changes color, and at the end of the program, it screenshots the webpage, saves the screenshot, and then moves it to a different directory. Everything seems to be working fine except for moving the file into a different folder. The code I am using to move the file is here:
os.replace("'\\'" + fileName, "'\\'" + saveName + "'\\'" + fileName)
I get the error:
FileNotFoundError: [WinError 3] The system cannot find the path specified: "'\'0.png" -> "'\'saves216'\'0.png"
I don't know how to get the backslash to escape without becoming a double backslash

Remove the extra quotes:
os.replace("\\" + fileName, "\\" + saveName + "\\" + fileName)
You directly escape a \ with another one:
>>> s = "\\" + "filename"
>>> print(s)
\filename

os.replace("\\" + fileName, "\\" + saveName + "\\" + fileName)
extra '' in yours
change the filename variable too if it has ''

Renaming files with python results in duplicate files

I have a folder with the following files:
[11111]Text.txt
[22222]Text.txt
[33333]Text.txt
[44444]Text.txt
I need rename the files to remove the [11111] designation from the beginning of the file name, however that results in duplicate file names.
I wrote a basic script out that will strip the [11111] from the first file, and if any duplication occurs with subsequent files it will name the file [Duplicate]_[#]_text.txt where [#] is a random number
When I ran the code, it renamed the first file correctly, and renamed the second file with the required string, but it did not continue with the other files, and instead presented the following error:
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'Destination/[33333]Text.txt' -> 'Destination/[Duplicate]_[1]Text.txt'
The code below is what I have currently, though i have tried several iterations also
Location = (Destination_Folder)
Dupe_Counter = random.randint(0,255)
for filename in os.listdir(Location):
try:
if filename.startswith("["):
os.rename(Location + filename, Location + filename[7:])
except:
os.rename(Location + filename, Location +'[Duplicate]_' + '[' + str(Dupe_Counter) +']' + filename[7:])
I'm assuming that it's not actually picking up the Dupe_Counter when creating new files, however I'm not 100% sure where i'm going wrong.
Any help appreciated.

In your Dupe_Counter you are generating a random number that can collide with the results sometimes. But on top of that, you are generating the random Dupe_Counter once only.
Try to generate a random number for each iteration.
Location = (Destination_Folder)
for filename in os.listdir(Location):
Dupe_Counter = random.randint(0,255)
try:
if filename.startswith("["):
os.rename(Location + filename, Location + filename[7:])
except:
os.rename(Location + filename, Location +'[Duplicate]_' + '[' + str(Dupe_Counter) +']' + filename[7:])
But I would recommend generating an increasing sequence for renaming the files and better understanding.
Something Like this:
Location = (Destination_Folder)
for filename in os.listdir(Location):
Dupe_Counter = 101
try:
if filename.startswith("["):
os.rename(Location + filename, Location + filename[7:])
except:
os.rename(Location + filename, Location +'[Duplicate]_' + '[' + str(Dupe_Counter) +']' + filename[7:])
Dupe_Counter += 1
Hope I've been of some help.

Python3 _io.TextIOWrapper error when opening a file with notepad

I am stuck from a couple of days on an issue in my micro Address Book project. I have a function that writes all records from a SQLite3 Db on file in order to open in via OS module, but as soon as I try to open the file, Python gives me the following error:
Error while opening tempfile. Error:startfile: filepath should be string, bytes or os.PathLike, not _io.TextIOWrapper
This is the code that I have to write records on file and to open it:
source_file_name = open("C:\\workdir\\temp.txt","w")
#Fetching results from database and storing in result variable
self.cur.execute("SELECT id, first_name, last_name, address1, address2, zipcode, city, country, nation, phone1, phone2, email FROM contacts")
result = self.cur.fetchall()
#Writing results into tempfile
source_file_name.write("Stampa Elenco Contatti\n")
for element in result:
source_file_name.write(str(element[0]) + "|" + str(element[1]) + "|" + str(element[2]) + "|" + str(element[3]) + "|" + str(element[4]) + "|" + str(element[5]) + "|" + \
str(element[6]) + "|" + str(element[7]) + "|" + str(element[8]) + "|" + str(element[9]) + "|" + str(element[10]) + "|" + str(element[11]) + "\n")
#TODO: Before exiting printing function you MUST:
# 1. filename.close()
# 2. exit to main() function
source_file_name.close()
try:
os.startfile(source_file_name,"open")
except Exception as generic_error:
print("Error while opening tempfile. Error:" + str(generic_error))
finally:
main()
Frankly I don't understand what this error means, in my previous code snippets I've always handled text files without issues, but I realize this time it's different because I am picking my stream from a database. Any ideas how to fix it?
Thanks in advance, and sorry for my english...

Your problem ultimately stems from poor variable naming. Here
source_file_name = open("C:\\workdir\\temp.txt","w")
source_file_name does not contain the source file name. It contains the source file itself (i.e., a file handle). You can't give that to os.startfile(), which expects a file path (as the error also says).
What you meant to do is
source_file_name = "C:\\workdir\\temp.txt"
source_file = open(source_file_name,"w")
But in fact, it's much better to use a with block in Python, as this will handle closing the file for you.
It's also better to use a CSV writer instead of creating the CSV manually, and it's highly advisable to set the file encoding explicitly.
import csv
# ...
source_file_name = "C:\\workdir\\temp.txt"
with open(source_file_name, "w", encoding="utf8", newline="") as source_file:
writer = csv.writer(source_file, delimiter='|')
source_file.write("Stampa Elenco Contatti\n")
for record in self.cur.fetchall():
writer.writerow(record)
# alternative to the above for loop on one line
# writer.writerows(self.cur.fetchall())

download an image file and add it to Put Request

I need to download a file from an url and use a put request to upload it somewhere else
Download is done with
r=requests.get(image_url, auth=HTTPBasicAuth( user , password))
header_content_type = r.headers.get('content-type')
fileType = header_content_type.split('/')[-1]
content_type = header_content_type.split(';')[-1]
file_extension = fileType.split(';',1)[0]
file_name = file_id+'.' + file_extension
open('downloads/' + file_name , 'wb').write(r.content)
which works fine and stores the file locally in the downloads folder.
I can open the image with any image viewer and it works fine.
the put request needs to look like
{ "data":"gsddfgdsfg...(base64) ", "filename":"example2.txt", "contentType":"plain/text" }
I have tried to do it like following
def build_step_attachment_json(path, filename, contentype):
with open(path+filename) as f:
encoded = base64.b64encode(f.read())
return '{ "data":"'+ encoded + '", "filename":"' + filename +'", "contentType":" '+ contentype + '" }'
but it fails with:
"UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 44: character maps to "

Proper placement of test within try/except loop (Python3)

I use a python script to webscrape for "Show Notes" and an mp3. When I encounter a page that has no show notes, this means the show was a Best Of, so I want to skip the download of the notes and mp3. I am not sure where the best place to insert the test would be. The snippet is as follows:
for show_html in showpage_htmls:
try:
p_html = s.get(show_html)
p_soup = BeautifulSoup(p_html.content, 'html.parser')
# set title for SHOW NOTES
title = ''
title = p_soup.title.contents[0]
# get SHOW NOTES chunk and remove unwanted characters (original mp3notes not changed)
mp3notes = ''
mp3notes = p_soup.find('div', {'class': 'module-text'}).find('div')
mp3notes = str(title) + str('\n') + str(mp3notes).replace('<div>','').replace('<h2>','').replace('</h2>','\n').replace('<p>','').replace('<br/>\n','\n').replace('<br/>','\n').replace('</p>','').replace('</div>','').replace('\u2032','')
# FIXME need to skip d/l if no notes
# set basename, mp3named and mp3showtxt
mp3basename = '{0}{1}{2}'.format(show_html.split('/')[3],show_html.split('/')[4],show_html.split('/')[5])
if (os.name == 'nt'):
mp3showtxt = mp3dir + '\\' + mp3basename + '.txt'
mp3named = mp3dir + '\\' + mp3basename + '.mp3'
else:
mp3showtxt = mp3dir + '/' + mp3basename + '.txt'
mp3named = mp3dir + '/' + mp3basename + '.mp3'
# save show notes to local
with open(mp3showtxt, 'w') as f:
try:
f.write(mp3notes)
print("Show notes " + mp3basename + " saved.")
except UnicodeEncodeError:
print("A charmap encoding ERROR occurred.")
print("Show notes for " + mp3basename + ".mp3 FAILED, but continuing")
finally:
f.close()
# FIXME need eyed3 to set mp3 tags since B&T are lazy
# get Full Show mp3 link
mp3url = p_soup.find('a', href = True, string = 'Full Show').get('href')
# get and save mp3
r = requests.get(mp3url)
with open(mp3named, 'wb') as f:
f.write(r.content)
print("Downloaded " + mp3basename + ".mp3.")
except AttributeError:
print(show_html + " did not exist as named.")
I would think an
if not (len(mp3notes) >= 50)
would work; just not sure where to put it or there is better way (more Pythonic).
Ideally, if the mp3notes are less than expected, no notes or mp3 for that show_html would be saved, and the script would start at the next show_html page.
Since I am new to Python, feel free to offer suggestions to making this more Pythonic as well; I am here to learn! Thanks.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

hebrew encoding issue python - python-3.x

Related

Escape Backslash with String Combination

Renaming files with python results in duplicate files

Python3 _io.TextIOWrapper error when opening a file with notepad

download an image file and add it to Put Request

Proper placement of test within try/except loop (Python3)

Categories

Resources