python 3.6 ascii codec error in urllib request - python-3.x

I'm trying to download picture from website with my python script, but every time i use georgian alphabet in url it gets error "UnicodeEncodeError: 'ascii' codec can't encode characters"
here is my code:
import os
import urllib.request
def download_image(url):
fullfilename = os.path.join('/images', 'image.jpg')
urllib.request.urlretrieve(url, fullfilename)
download_image(u'https://example.com/media/სდასდსადადსაფა_8QXjrbi.jpg')

I think it's better to use requests library in your example which deals with utf-8 characters.
Here is the code:
import requests
def download_image(url):
request = requests.get(url)
local_path = 'images/images.jpg'
with open(local_path, 'wb') as file:
file.write(request.content)
my_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/81/ერეკლე_II_ბავშვობის_სურათი.jpgw/459px-ერეკლე_II_ბავშვობის_სურათი.jpg'
download_image(my_url)

Related

Python opening a link with a variable

I have a problem with my code created in python.
I would like the URL API telegram to open with a change so that the downloaded item from the site is sent to chat.
# Import libraries
import requests
import urllib.request
import time
import sys
from bs4 import BeautifulSoup
stdoutOrigin=sys.stdout
sys.stdout = open("log.txt", "w")
# Set the URL you want to webscrape from
url = 'https://31asdasdasdasdasd.com/'
# Connect to the URL
response = requests.get(url)
# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(response.text, "html.parser")
zapisane = ''
row = soup.find('strong')
print(">> Ilosc opinii ktora przeszla:")
send = print(row.get_text()) # Print row as text
import urllib.request
u = urllib.request.urlopen("https://api.telegram.org/botid:ts/sendMessage?chat_id=-3channel1&text=")
You likely want to use a string format with a variable in your last line of code shown here. Here's a helpful resource for string formatting: https://www.geeksforgeeks.org/python-format-function/

Encoding error trying to write file with python

Here is the full script:
import requests
import bs4
res = requests.get('https://example.com')
soup = bs4.BeautifulSoup(res.text, 'lxml')
page_HTML_code = soup.prettify()
multiline_code = """{}""".format(page_HTML_code)
f = open("testfile.txt","w+")
f.write(multiline_code)
f.close()
So I'm trying to write the entire Downloaded HTML as a file while keeping it neat and clean.
I do understand that it has problems with the text and can't save certain characters, but I'm not sure how to encode the text correctly.
Can anyone help?
This is the error message that I will get
"C:\Location", line 16, in <module>
f.write(multiline_code)
File "C:\\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0421' in position 209: character maps to <undefined>
I did some digging around and this worked:
import requests
import bs4
res = requests.get('https://example.com')
soup = bs4.BeautifulSoup(res.text, 'lxml')
page_HTML_code = soup.prettify()
multiline_code = """{}""".format(page_HTML_code)
#add the Encoding part when opening file and this did the trick
with open('testfile.html', 'w+', encoding='utf-8') as fb:
fb.write(multiline_code)

outputting python script results into text

I have a want to save my python script's result into a txt file.
My python code
from selenium import webdriver
bro = r"D:\Developer\Software\Python\chromedriver.exe"
driver=webdriver.Chrome(bro)
duo=driver.get('http://www.lolduo.com')
body=driver.find_elements_by_tag_name('tr')
for post in body:
print(post.text)
driver.close()
Some codes that I've tried
import subprocess
with open("output.txt", "w") as output:
subprocess.call(["python", "./file.py"], stdout=output);
I tried this code and it only makes a output.txt file and has nothing inside it
D:\PythonFiles> file.py > result.txt
Exception:
UnicodeEncodeError: 'charmap' codec can't encode character '\u02c9' in
position 0: character maps to
and only prints out 1/3 of the results of the script into a text file.
You can try below code to write data to text file:
from selenium import webdriver
bro = r"D:\Developer\Software\Python\chromedriver.exe"
driver = webdriver.Chrome(bro)
driver.get('http://www.lolduo.com')
body = driver.find_elements_by_tag_name('tr')
with open("output.txt", "w", encoding="utf8") as output:
output.write("\n".join([post.text for post in body]))
driver.close()
You can try this. This Is my Python Code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
import time
bro = r"D:\Developer\Software\Python\chromedriver.exe"
driver = webdriver.Chrome(bro)
driver.get('http://www.lolduo.com')
body = driver.find_elements_by_tag_name('tr') .text
with open('output15.txt', mode='w') as f:
for post in body:
print(post)
f.write(post)
time.sleep(2)
driver.close()

Can't sent cyrillic text as the parameter of the request

I am trying to send get request with parameters which consist of some cyrillic text. I tried a lot of solutions and get other errors with encoding and decoding. Nothing helps.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import http.client
import json
import urllib.parse
data = {}
data["id"] = 123123
values = {}
values["someText"] = "йцколфыовал фыоварфылова фоывафлова"
data["values"] = values
conn = http.client.HTTPConnection("example.com")
params = {"data": json.dumps(data)}
conn.request("GET", "url_request?{}".format(urllib.parse.urlencode(params)))
res = conn.getresponse()
data = res.read()
I expect a successfull sending of the request with cyrillic text.
I got next error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position
254: invalid continuation byte "
Thanks.

Save HTML Source Code to File

How can I copy the source code of a website into a text file in Python 3?
EDIT:
To clarify my issue, here's what I have:
import urllib.request
def extractHTML(url):
f = open('temphtml.txt', 'w')
page = urllib.request.urlopen(url)
pagetext = page.read()
f.write(pagetext)
f.close()
extractHTML('http:www.google.com')
I get the following error for the f.write() function:
builtins.TypeError: must be str, not bytes
import urllib.request
site = urllib.request.urlopen('http://somesite.com')
data = site.read()
file = open("file.txt","wb") #open file in binary mode
file.writelines(data)
file.close()
Untested but should work.
EDIT: Updated for python3
Try this.
import urllib.request
def extractHTML(url):
urllib.request.urlretrieve(url, 'temphtml.txt')
It is easier, but if you still want to do it that way. This is the solution:
import urllib.request
def extractHTML(url):
f = open('temphtml.txt', 'w')
page = urllib.request.urlopen(url)
pagetext = str(page.read())
f.write(pagetext)
f.close()
extractHTML('https://www.google.com')
Your script gave an error saying it must be a string. Just convert bytes to a string with str().
Next I got an error saying no host was given. Google is a secured site so https: not http: and most importantly you forgot to include // at the end of https:.
probably you wanted to create something like that:
import urllib.request
class ExtractHtml():
def Page(self):
print("enter the web page name starting with 'http://': ")
url=input()
site=urllib.request.urlopen(url)
data=site.read()
file =open("D://python_projects/output.txt", "wb")
file.write(data)
file.close()
w=ExtractHtml()
w.Page()

Resources