Downloading files using Python requests - python-3.x

I am writing a script to download files from Slack using the slack api and the requests library in Python. Anytime I download a file they all come out the same size (80kb) and they are all corrupted.
Here is my code:
def download_file(url, out):
try:
os.stat(out)
except:
os.mkdir(out)
local_filename = out + '\\' + url.split('/')[-1]
print('outputting to file: %s' % local_filename)
response = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
response.raw.decode_content = True
shutil.copyfileobj(response.raw,f)
return local_filename
I have tried various different methods posted throughout SO to download the files but have not been successful. I have also checked the URL's I am getting from the Slack API and they are correct since I can paste them in my browser and download the file.
Any help would be appreciated!

I figured out my problem. Since I am using the image's private url download from the Slack API file object, I needed to include a header in addition to the basic request with a token. To do this using the requests API:
response = request.get(url,stream = True, headers={'Authorization':'Bearer ' + my_token})

Related

Python Request: Image Downloaded via my code becomes corrupted image when opened in photoshop

Attached is my script that I used to pass each link I've collected using Python Selenium and then used requests.get() to download the images.
# Download Images
for i, idv_link in zip(range(len(all_links)), all_links):
r = requests.get(idv_link, stream=True)
file_name = f"{str(sku_no) + str(i)}.jpg"
# print(file_name)
if r.status_code == 200:
file_path =os.path.join(SKU_file, str(file_name))
with open(file_path, 'wb') as f:
for chunk in r:
f.write(chunk)
i += 1
print(f'Downloaded {len(all_links)} images')
The problem is all the image files downloaded via this method can be opened on any device or even uploaded to Google Drive but it cannot be opened in Photoshop as it would be displayed as corrupted image files.
Hope to discuss and see if its really a requests.get() issue? In that case, I may have to re-code to simulate a right-click into the link and download it method via Python Selenium?

Python Web Scraping - After Authentication - Traverse - Download from URL with no Extension

In Chrome, I will Log In to a website.
I will then inspect the site, go to Network and clear out existing.
I will then click on a link and snag the header which stores the cookie.
In Python, I then store the header in a dictionary and use that for the rest of the code.
def site_session(url, header_dict):
session = requests.Session()
t = session.get(url, headers=header_dict)
return soup(t.text, 'html.parser')
site = site_session('https://website.com', headers)
# Scrape the Site as usual until I reach a file I can't download..
This a video file but has no extension.
"https://website.sub.com/download"
Clicking on this link will open up the save dialog and I can save it. But not in Python..
Examining the Network, it appears it redirects to another url in which I was able to scrape.
"https://website.sub.com/download/file.mp4?jibberish-jibberish-jibberish
Trying to shorten it to just "https://website.sub.com/download/file.mp4" does not open.
with open(shortened_url, 'wb') as file:
response = requests.get(shorened_url)
file.write(response.content)
I've tried both the full url and shortened url and receive:
OSError: [Errno 22] Invalid argument.
Any help with this would be awesome!
Thanks!
I had to use the full url with the query and include the headers.
# Read the Header of the first URL to get the file URL
fake_url = requests.head(url, allow_redirects=True, headers=headers)
# Get the real file URL
real_url = requests.get(fake_url.url, headers=headers)
# strip out the name from the url here since it's a loop
# Open a file on pc and write the contents of the real_url
with open(stripped_name, 'wb') as file:
file.write(real_url.content)

Using Python to save a file that is sent to browser when visiting URL

When I visit the URL below in a browser, it automatically downloads a CSV. As the contents are updated daily, I want to write a Python command to get the latest file each time.
I've tried wget, requests and urllib.request - all without luck.
url = 'https://coronavirus.data.gov.uk/api/v1/data?filters=areaType=overview&structure=%7B%22areaType%22:%22areaType%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22date%22:%22date%22,%22newPeopleVaccinatedFirstDoseByPublishDate%22:%22newPeopleVaccinatedFirstDoseByPublishDate%22,%22cumPeopleVaccinatedFirstDoseByPublishDate%22:%22cumPeopleVaccinatedFirstDoseByPublishDate%22%7D&format=csv'
Anyone got any ideas? TIA
This works just fine for me:
import requests
url = 'https://coronavirus.data.gov.uk/api/v1/data?filters=areaType=overview&structure=%7B%22areaType%22:%22areaType%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22date%22:%22date%22,%22newPeopleVaccinatedFirstDoseByPublishDate%22:%22newPeopleVaccinatedFirstDoseByPublishDate%22,%22cumPeopleVaccinatedFirstDoseByPublishDate%22:%22cumPeopleVaccinatedFirstDoseByPublishDate%22%7D&format=csv'
r = requests.get(url)
with open("uk_data.csv", "wb") as f:
f.write(r.content)
The content is a bytes object, so you need to open the file in binary mode.

Download video in Python using requests module results in black video

This is the URL I tried to download: https://www.instagram.com/p/B-jEqo9Bgk9/?utm_source=ig_web_copy_link
This is a minimal reproducible example:
import os
import requests
def main():
filename = 'test.mp4'
r = requests.get('https://www.instagram.com/p/B-jEqo9Bgk9/?utm_source=ig_web_copy_link', stream=True)
with open(os.path.join('.', filename), 'wb') as f:
print('Dumping "{0}"...'.format(filename))
for chunk in r.iter_content(chunk_size=1024):
print(chunk)
if chunk:
f.write(chunk)
f.flush()
if __name__ == '__main__':
main()
The code runs fine but the video does not play. What am I doing wrong?
Your code is running perfectly fine, but you did not provide the correct link for the video. The link you used is for the Instagram web page, not the video. So you should not save the content as 'test.mp4', but rather as 'test.html'. If you open the file in a text editor (for example Notepad++), you will see that it contains the HTML code of the web page.
You'll need to parse the HTML to acquire the actual video URL, and then you can use the same code to download the video using that URL.
Currently, the line that starts with meta property="og:video" content= contains the actual video URL, but that may change in the future.
(Note that copyright may apply for videos on Instagram. I assume you have the rights to download and save this video.)

Download content-disposition from http response header (Python 3)

Im looking for a little help here. I've been using requests in Python to gain access to a website. Im able access the website and get a response header but im not exactly sure how to download the zip file contained in the Content-disposition. Im guessing this isnt a function Requests can handle or at least I cant seem to find any info on it. How do I gain access to the file and save it?
'Content-disposition': 'attachment;filename=MT0376_DealerPrice.zip'
Using urllib instead of requests because it's a lighter library :
import urllib
req = urllib.request.Request(url, method='HEAD')
r = urllib.request.urlopen(req)
print(r.info().get_filename())
Example :
In[1]: urllib.request.urlopen(urllib.request.Request('https://httpbin.org/response-headers?content-disposition=%20attachment%3Bfilename%3D%22example.csv%22', method='HEAD')).info().get_filename()
Out[1]: 'example.csv'
What you want to do is to access response.content. You may want to add better checks if the content really contains the file.
Little example snippet
response = requests.post(url, files={'name': 'filename.bin'}, timeout=60, stream=True)
if response.status_code == 200:
if response.headers.get('Content-Disposition'):
print("Got file in response")
print("Writing file to filename.bin")
open("filename.bin", 'wb').write(response.content)

Resources