Determine if url is a pdf or html file - python-3.x

I am requesting urls using the requests package in python (e.g. file = requests.get(url)). The urls do not specify an extension in them, and sometimes a html file is returned and sometimes a pdf is returned.
Is there a way of determining if the returned file is a pdf or a html, or more generally, what the file format is? The browser is able to determine, so I assume it must be indicated in the response.

This will be found in the Content-Type header, either text/html or application/pdf
import requests
r = requests.get('http://example.com/file')
content_type = r.headers.get('content-type')
if 'application/pdf' in content_type:
ext = '.pdf'
elif 'text/html' in content_type:
ext = '.html'
else:
ext = ''
print('Unknown type: {}'.format(content_type))
with open('myfile'+ext, 'wb') as f:
f.write(r.raw.read())

Related

django download view downloading only .xls instead of file with extension on model

I have my Django view where I upload the file from admin and users download it on the frontend when I download the file on the frontend the download is extension with only .xls i.e when I upload the file with .xlsx extension it is still downloading with .xls instead the file should be downloaded according to the extension either its xls or xlsx.
views.py
class myAPIView(APIView):
def get(self, request):
data = Model.objects.first()
filename = data.file.name
file_extention = filename.split('.')[-1]
response = HttpResponse(
data.file.path,
content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = \
'attachment; filename="output_file"'+ file_extention
return response
This is the standard that you can apply(edit the content-type for you.)
class myAPIView(APIView):
def get(self, request):
data = Model.objects.first()
filename = data.file # or data.file.name based on your models.
file_extention = filename.split('.')[-1] # something which is seprated by dot. in the last
response = HttpResponse(
file_path,
content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = \
'attachment; filename="output_file"'+ file_extention
return response

How to retrieve file object in python sent via Postman without any url

I am sending a file as an object via postman POST or PUT API like below:
How can I in Python -
get this file object
read and save
If you have a working request in Postman, you could copy autogenerated Code Snippet in Python - Requests format:
It might look like this:
import requests
url = "localhost:8080"
payload="<file contents here>"
headers = {
'Content-Type': 'application/octet-stream'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Answer to this question which i finally implemented without url -
#app.route('/uploadFIle', methods=['PUT'])
def uploadFile():
chunk_size = 4096
with open("/Users/xyz/Documents/filename", 'wb') as f:
while True:
chunk = request.stream.read(chunk_size)
if len(chunk) == 0:
break
f.write(chunk)
return jsonify({"success":"File transfer initiated"})

PYTHON FLASK - request.files displays as file not found eventhough it exits

I am trying to trigger an external api from postman by passing the uploadFile in the body as form-data. Below code throws me an error as 'FileNotFoundError: [Errno 2] No such file or directory:'
Note: In postman, uploadFile takes file from my local desktop as input. have also modified the postman settings to allow access for files apart from working directory
Any help would be highly appreciable.
Below is the Code:
#app.route('/upload', methods=['POST'])
#auth.login_required
def post_upload():
payload = {
'table_name': 'incident', 'table_sys_id': request.form['table_sys_id']
}
files = {'file': (request.files['uploadFile'], open(request.files['uploadFile'].filename,
'rb'),'image/jpg', {'Expires': '0'})}
response = requests.post(url, headers=headers, files=files, data=payload)
return jsonify("Success- Attachment uploaded successfully ", 200)
Below code throws me an error as 'FileNotFoundError: [Errno 2] No such file or directory:
Have you defined UPLOAD_FOLDER ? Please see: https://flask.palletsprojects.com/en/latest/patterns/fileuploads/#a-gentle-introduction
i am passing the attribute (upload file) in body as form-data, can this be passed as raw json
You cannot upload files with JSON. But one hacky way to achieve this is to base64 (useful reference) encode the file before sending it. This way you do not upload the file instead you send the file content encoded in base64 format.
Server side:
import base64
file_content = base64.b64decode(request.data['file_buffer_b64'])
Client side:
-> Javascript:
const response = await axios.post(uri, {file_buffer_b64: window.btoa(file)})
-> Python:
import base64
with open(request.data['uploadFile'], "rb") as myfile:
encoded_string = base64.b64encode(myfile.read())
payload = {"file_buffer_b64": encoded_string}
response = requests.post(url, data=payload)

Python Requests Post - Additional field is not recognized for file upload

I have to post a file using Multipart upload to a company-internal REST service. The endpoint needs the file as property "file" and it needs an additional property "DestinationPath". Here is what I do:
url = r"http://<Internal IP>/upload"
files = {
"DestinationPath": "/some/where/foo.txt",
"file": open("test.txt", "rb")
}
response = requests.post(url, files=files)
The server complains that it can't get the "DestinationPath". Full error message I receive is:
{'errors': {'DestinationPath': ['The DestinationPath field is required.']},
'status': 400,
'title': 'One or more validation errors occurred.',
'traceId': '00-1993fbc53ab2ee418b683915dd7a440a-2338bd9cf34d414a-00',
'type': 'https://tools.ietf.org/html/rfc7231#section-6.5.1'}
The file upload works in curl, thus it must be python specific.
You might want to try using the data argument instead of files.
response = requests.post(url, data=files)
Thanks to #etemple1 I found the solution to my question:
url = r"http://<Internal IP>/upload"
data = {
"DestinationPath": "/some/where/foo.txt",
}
with open("test.txt", "rb") as content:
files = {
"file": content.read(),
}
response = requests.post(url, data=data, files=files)
The data for the multipart upload needed to be divided between "data" and "files". They are later combined in the body of the http post by the requests library.

pyramid FileResponse encoding

I'm trying to serve base64 encoded image files and failing. Either I get UTF-8 encoded responses or the line return response errors in an interesting way. Mostly everything I've tried can be seen as commented out code in the excerpt below. Details of the traceback follow that.
My question is: How can I return base64 encoded files?
#import base64
#with open(sPath, "rb") as image_file:
#encoded_string = base64.b64encode(image_file.read())
dContentTypes = {
'bmp' : 'image/bmp',
'cod' : 'image/cis-cod',
'git' : 'image/gif',
'ief' : 'image/ief',
'jpe' : 'image/jpeg',
.....
}
sContentType = dContentTypes[sExt]
response = FileResponse(
sPath,
request=request,
content_type= sContentType#+';base64',
#content_encoding = 'base_64'
#content_encoding = encoded_string
)
return response
Uncommenting the line #content_encoding = encoded_string gives me the error:
AssertionError: Header value b'/9j/4AAQSkZJRgABAQAA' is not a string in ('Content-Encoding', b'/9j/4AAQSkZJRgABAQAA....')
FileResponse is used specifically for uploading a file as a response (hence the path argument). In you're case you want to base64-encode the file before uploading it. This means no FileResponse.
Since you've read the file into memory you can just upload the content in a Response.
response = Response(encoded_string,
request=request,
content_type=sContentType+';base64')
I'm not actually sure how content_encoding compares to the ;base64 on the type, but I think the encoding is used more commonly for gzipped content. YMMV.
The error you are seeing is telling you that Content-Type is not a string. Content-Type is an HTTP header. And as far as I know, HTTP headers must be strings.
I believe the base64 encoded file you want to pass as the body of the response. FileResponse is not appropriate here since you presumably want to pass encoded string as the body and FileResponse expects a path that it then reads in and sets the body.

Resources