Download images with requests by generated links

Download images with requests by generated links - python-3.x

i am not able to download from JavaScript generated images and to store them on my local machine. The code below is not giving any errors but there are no images in my folder. I tried already in many ways. Here is one of them:
path = "http://my.site.com/page/oo?_b=9V2FG34519CV2N56SLK567943N25J82V"
os.makedirs('C:/Images', exist_ok=True)
print('Dowload images %s...' % path)
res = request.get(path)
res.raise_for_status()
imageFile = open(os.path.join('logos', os.path.basename(res)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
i am trying since two days to solve this problem, so i would be gratefully if somebody can help me!

Once you've got the image url, you can use requests and shutil to save the image to a given directory (working for me for the given url):
import shutil
import requests
import os
url = 'http://epub.hpo.hu/e-kutatas/aa?_p=A554F6BCDBCEA51EFF1E0E17E777F3AC'
response = requests.get(url, stream=True)
with open(out_file_path, 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)

Related

Download pdf file using python3

I want to download this website 's pdf file using python3 https://qingarchives.npm.edu.tw/index.php?act=Display/image/207469Zh18QEz#74l

This might accomplish what you're trying to achieve:
import requests
# URL to be downloaded
url = "https://cfm.ehu.es/ricardo/docs/python/Learning_Python.pdf"
def download_pdf(url, file_name):
# Send GET request
response = requests.get(url)
# Save the PDF
with open(file_name, "wb") as f:
f.write(response.content)
download_pdf(url, 'myDownloadedFile.pdf')

from urllib import request
response = request.urlretrieve("https://cfm.ehu.es/ricardo/docs/python/Learning_Python.pdf", "learing_python.pdf")
or
import wget
URL = "https://cfm.ehu.es/ricardo/docs/python/Learning_Python.pdf"
response = wget.download(URL, ".learing_python.pdf")

Convert Web url to Image

I am trying to take screenshot of an URL but somehow it takes the screenshot of the gateway because of restricted entry. So tried adding ID and password to open the link but it does not for reason, could you help?
import requests
import urllib.parse
BASE = 'https://mini.s-shot.ru/1024x0/JPEG/1024/Z100/?' # we can modify size, format, zoom as needed
url = 'https://mail.google.com/mail/'#or whatever link you need
url = urllib.parse.quote_plus(url) #
print(url)
Id="XXXXXX"
import getpass
key = getpass.getpass('Password :: ')
path = 'target1.jpg'
response = requests.get(BASE + url+Id+Password, stream=True)
if response.status_code == 200:
with open(path, 'wb') as file:
for chunk in response:
file.write(chunk)
Thanks!

Loop url from dataframe and download pdf files in Python

Based on the code from here, I'm able to crawler url for each transation and save them into an excel file which can be downloaded here.
Now I would like to go further and click the url link:
For each url, I will need to open and save pdf format files:
How could I do that in Python? Any help would be greatly appreciated.
Code for references:
import shutil
from bs4 import BeautifulSoup
import requests
import os
from urllib.parse import urlparse
url = 'xxx'
for page in range(6):
r = requests.get(url.format(page))
soup = BeautifulSoup(r.content, "html.parser")
for link in soup.select("h3[class='sv-card-title']>a"):
r = requests.get(link.get("href"), stream=True)
r.raw.decode_content = True
with open('./files/' + link.text + '.pdf', 'wb') as f:
shutil.copyfileobj(r.raw, f)

An example of download a pdf file in your uploaded excel file.
from bs4 import BeautifulSoup
import requests
# Let's assume there is only one page.If you need to download many files, save them in a list.
url = 'http://xinsanban.eastmoney.com/Article/NoticeContent?id=AN201909041348533085'
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
link = soup.select_one(".lookmore")
title = soup.select_one(".newsContent").select_one("h1").text
print(title.strip() + '.pdf')
data = requests.get(link.get("href")).content
with open(title.strip().replace(":", "-") + '.pdf', "wb+") as f: # file name shouldn't contain ':', so I replace it to "-"
f.write(data)
And download successfully:

Here's bit different approach. You don't have to open those urls from the excel file as you can build the .pdf file source urls yourself.
For example:
import requests
urls = [
"http://data.eastmoney.com/notices/detail/871792/AN201909041348533085,JWU2JWEwJTk2JWU5JTljJTllJWU3JTg5JWE5JWU0JWI4JTlh.html",
"http://data.eastmoney.com/notices/detail/872955/AN201912101371726768,JWU0JWI4JWFkJWU5JTgzJWJkJWU3JTg5JWE5JWU0JWI4JTlh.html",
"http://data.eastmoney.com/notices/detail/832816/AN202008171399155565,JWU3JWI0JWEyJWU1JTg1JThiJWU3JTg5JWE5JWU0JWI4JTlh.html",
"http://data.eastmoney.com/notices/detail/831971/AN201505220009713696,JWU1JWJjJTgwJWU1JTg1JTgzJWU3JTg5JWE5JWU0JWI4JTlh.html",
]
for url in urls:
file_id, _ = url.split('/')[-1].split(',')
pdf_file_url = f"http://pdf.dfcfw.com/pdf/H2_{file_id}_1.pdf"
print(f"Fetching {pdf_file_url}...")
with open(f"{file_id}.pdf", "wb") as f:
f.write(requests.get(pdf_file_url).content)

Web-Scraping, get a csv file from url address

I have a stupid issue. I have an address that generates a csv file immediately when I copy that to the browser. But I need to do it with python code, so I tried to do something like that:
import urllib.request
url = 'https://www.quandl.com/api/v3/datasets/WSE/TSGAMES.csv?column_index=4&start_date=2018-01-01&end_date=2018-12-31&collapse=monthly&transform=rdiff&api_key=AZ964MpikzEYAyLGfJD2Y
csv = urllib.request.urlopen(url).read()
with open('file.csv', 'wb') as fx: # bytes, hence mode 'wb'
fx.write(csv)
But I got an error: raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Bad Request
Do you know the reason and could you help ?
Thanks for any help !

Edit I should state that your link did not work for me, and my quandl API is different then yours.
This is pretty easy to do with the requests module:
import requests
filename = 'test_file.csv'
link = 'your link here'
data = requests.get(link) # request the link, response 200 = success
with open(filename, 'wb') as f:
f.write(data.content) # write content of request to file
f.close()

That link doesn't work for me. Try it like this (generic example).
from urllib.request import urlopen
from io import StringIO
import csv
data = urlopen("http://pythonscraping.com/files/MontyPythonAlbums.csv").read().decode('ascii', 'ignore')
dataFile = StringIO(data)
csvReader = csv.reader(dataFile)
with open('C:/Users/Excel/Desktop/example.csv', 'w') as myFile:
writer = csv.writer(myFile)
writer.writerows(csvReader)

Save HTML Source Code to File

How can I copy the source code of a website into a text file in Python 3?
EDIT:
To clarify my issue, here's what I have:
import urllib.request
def extractHTML(url):
f = open('temphtml.txt', 'w')
page = urllib.request.urlopen(url)
pagetext = page.read()
f.write(pagetext)
f.close()
extractHTML('http:www.google.com')
I get the following error for the f.write() function:
builtins.TypeError: must be str, not bytes

import urllib.request
site = urllib.request.urlopen('http://somesite.com')
data = site.read()
file = open("file.txt","wb") #open file in binary mode
file.writelines(data)
file.close()
Untested but should work.
EDIT: Updated for python3

Try this.
import urllib.request
def extractHTML(url):
urllib.request.urlretrieve(url, 'temphtml.txt')
It is easier, but if you still want to do it that way. This is the solution:
import urllib.request
def extractHTML(url):
f = open('temphtml.txt', 'w')
page = urllib.request.urlopen(url)
pagetext = str(page.read())
f.write(pagetext)
f.close()
extractHTML('https://www.google.com')
Your script gave an error saying it must be a string. Just convert bytes to a string with str().
Next I got an error saying no host was given. Google is a secured site so https: not http: and most importantly you forgot to include // at the end of https:.

probably you wanted to create something like that:
import urllib.request
class ExtractHtml():
def Page(self):
print("enter the web page name starting with 'http://': ")
url=input()
site=urllib.request.urlopen(url)
data=site.read()
file =open("D://python_projects/output.txt", "wb")
file.write(data)
file.close()
w=ExtractHtml()
w.Page()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Download images with requests by generated links - python-3.x

Related

Download pdf file using python3

Convert Web url to Image

Loop url from dataframe and download pdf files in Python

Web-Scraping, get a csv file from url address

Save HTML Source Code to File

Categories

Resources