Python Requests: Use * as wildcard part of URL - python-3.x

Let's say I want to get a zip file (and then extract it) from a specific URL.
I want to be able to use a wildcard character in the URL like this:
https://www.urlgoeshere.domain/+*+-one.zip
instead of this:
https://www.urlgoeshere.domain/two-one.zip
Here's an example of the code I'm using (URL is contrived):
import requests, zipfile, io
year='2016'
monthnum='01'
month='Jan'
zip_file_url='https://www.urlgoeshere.domain/two-one.zip'
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
Thanks in advance!

HTTP does not work that way. You must use the exact URL in order to request a page from the server.

I'm not sure if this helps you, but Flask has a feature that works similarly to what you require. Here's a working example:
#app.route('/categories/<int:category_id>')
def categoryDisplay(category_id):
''' Display a category's items
'''
# Get category and it's items via queries on session
category =session.query(Category).filter_by(id=category_id).one()
items = session.query(Item).filter_by(category_id=category.id)
# Display items using Flask HTML Templates
return render_template('category.html', category=category, items=items,
editItem=editItem, deleteItem=deleteItem, logged_in = check_logged_in())
the route decorator tells the web server to call that method when a url like */categories/(1/2/3/4/232...) is accessed. I'm not sure but I think you could do the same with the name of your zip as a String. See here (project.py) for more details.

Related

how to target specific input for posting with requests or requests_html

i need to fill a unique input here
https://outline.com/
when i do ctrl+U with firefox, the id is hidden by some js effect i think, in my opinion the id is 'source' but how can i see it and target it for posting ?
it tried this:
import requests, requests_html
import json
url_out = "https://outline.com/"
url_target = "https://www.whatyouwanthere"
r = requests.post(url_out, data=json.dumps(url_target))
is it possible to avoid selenium for this task ?
You were pretty far from the actual solution, but there are definitely more ways to do this.
This works:
import requests, json
url_out = "http://outline-server-788914346.us-west-2.elb.amazonaws.com/article"
url_target = "https://www.google.com"
r = requests.get(url = url_out, params = {'source_url': url_target})
data = r.json()
print(data)
I was able to find this by watching the network activity in devtools and taking a look at the app.js file in use on their site. The form on the main page just queries an API with the article URL and the API is what generates the short url on their service.
Honestly, I'm not sure if this breaks their ToS or not, you may want to check with them on that.

How to scrape image/file from web page in Python?

I try to use Python3.7.4 to backup pictures in a blog site, e.g.
http://s2.sinaimg.cn/mw690/001H6t4Fzy7zgC0WLXb01&690
If I input the above address in Firefox address bar, the file is shown correctly.
If I use following code to download picture, server always redirects to a default picture:
from requests import get # just to try different methods
from urllib.request import urlopen
from urllib.parse import urlsplit, urlunsplit, quote
# hard-coded address is randomly selected for debug purpose.
origPict = 'http://s2.sinaimg.cn/mw690/001H6t4Fzy7zgC0WLXb01&690'
p = urlsplit (origPict)
newP = quote (p.path)
origPict = urlunsplit ([p.scheme, p.netloc, newP, p.query, p.fragment])
try:
#url_file = urlopen(origPict)
#u = url_file.geturl ()
url_file = get (origPict)
u = url_file.url
if u != origPict:
raise Exception ('Failed to get picture ' + origPict)
...
Any clue why requests.get or urllib.urlopen don't like '&' in url?
Updates: Thanks for Artur's comments, I realize the question is not on request itself, but on site protection mechanism: js or cookies or something else in webpage feedback something to server to allow it to judge if request comes from scraper. So now the question turns to how to scrape image from web page which is more complex than simply download image from url.
It's not about & symbol, but about redirection. Try adding parameter allow_redirects=False to get, it should be okay

How to make multiple API calls from multiple pages in single URL

So the title is a little confusing I guess..
I have a script that I've been writing that will display some random data and other non-essentials when I open my shell. I'm using grequests to make my API calls since I'm using more than one URL. For my weather data, I use WeatherUnderground's API since it will offer active alerts. The alerts and conditions data are on separate pages. What I can't figure out is how to insert the appropriate name in the grequests object when it is making requests. Here is the code that I have:
URLS = ['http://api.wunderground.com/api/'+api_id+'/conditions/q/autoip.json',
'http://www.ourmanna.com/verses/api/get/?format=json',
'http://quotes.rest/qod.json',
'http://httpbin.org/ip']
requests = (grequests.get(url) for url in URLS)
responses = grequests.map(requests)
data = [response.json() for response in responses]
#json parsing from here
In the URL 'http://api.wunderground.com/api/'+api_id+'/conditions/q/autoip.json' I need to make an API request to conditions and alerts to retrieve the data I need. How do I do this without rewriting a fourth URLS string?
I've tried
pages = ['conditions', 'alerts']
URL = ['http://api.wunderground.com/api/'+api_id+([p for p in pages])/q/autoip.json']
but, as I'm sure some of you more seasoned programmers know, threw and exception. So how can I iterate through these pages, or will I have to write out both complete URLS?
Thanks!
Ok I was actually able to figure out how to call each individual page within the grequests object by using a simple for loop. Here is the the code that I used to produced the expected results:
import grequests
pages = ['conditions', 'alerts']
api_id = 'myapikeyhere'
for p in pages:
URLS = ['http://api.wunderground.com/api/'+api_id+'/'+p+'/q/autoip.json',
'http://www.ourmanna.com/verses/api/get/?format=json',
'http://quotes.rest/qod.json',
'http://httpbin.org/ip']
#create grequest object and retrieve results
requests = (grequests.get(url) for url in URLS)
responses = grequests.map(requests)
data = [response.json() for response in responses]
#json parsing from here
I'm still not sure why I couldn't figure this out before.
Documentation for the grequests library here

Python Requests Refresh

I'm trying to use python's requests library to log in to a website. It's a pretty simple code, and you can really get the gist of requests just by going on its website. I, however, want to check if I'm successfully logged in via the url. The problem I've encountered is when I initiate the post requests and give it (the variable p) a url, whether the html has changed or not I'm still passed the same url when I type print(p.url). Is there any way for me to refresh the browser or update the url to whatever it's currently set at?
(I can add a line for checking the url against itself later, but for now I just want to get the correct url)
#!usr/bin/env python3
import requests
payload = {'login': 'USERNAME,
'password': 'PASSWORD'}
with requests.Session() as s:
p = s.post('WEBSITE', data=payload)
#print p.text
print(p.url)
The usuage of python-requests may not as complex as you think. It will automatically handle the redirect of your post ( or session.get()).
Here, session.post() method return a response object:
r = s.post('website', data=payload)
which means r.url is current url you are looking for.
If you still want to refresh current page, just use:
s.get(r.url)
To verify whether you has login successfully, one solution is to do the login in your browser.
Based on the title or content of the webpage returned (i.e, use the content in r.text), you can judge whether you have made it.
BTW, python-requests is a great library, enjoy it.

Avoiding repetition with Flask - but is it too DRY?

Let us assume I serve data to colleagues in-office with a small Flask app, and let us also assume that it is a project I am not explicitly 'paid to do' so I don't have all the time in the world to write code.
It has occurred to me in my experimentation with pet projects at home that instead of decorating every last route with #app.route('/some/local/page') that I can do the following:
from flask import Flask, render_template, url_for, redirect, abort
from collections import OrderedDict
goodURLS = OrderedDict([('/index','Home'), ##can be passed to the template
('/about', 'About'), ##to create the navigation bar
('/foo', 'Foo'),
('/bar', 'Bar'), ##hence the use of OrderedDict
('/eggs', 'Eggs'), ##to have a set order for that navibar
('/spam', 'Spam')])
app = Flask(__name__)
#app.route('/<destination>')
def goThere(destination):
availableRoutes = goodURLS.keys():
if "/" + destination in availableRoutes:
return render_template('/%s.html' % destination, goodURLS=goodURLS)
else:
abort(404)
#app.errorhandler(404)
def notFound(e):
return render_template('/notFound.html'), 404
Now all I need to do is update my one list, and both my navigation bar and route handling function are lock-step.
Alternatively, I've written a method to determine the viable file locations by using os.walk in conjunction with file.endswith('.aGivenFileExtension') to locate every file which I mean to make accessible. The user's request can then be compared against the list this function returns (which obviously changes the serveTheUser() function.
from os import path, walk
def fileFinder(directory, extension=".html"):
"""Returns a list of files with a given file extension at a given path.
By default .html files are returned.
"""
foundFilesList = []
if path.exists(directory):
for p, d, files in walk(directory):
for file in files:
if file.endswith(extension):
foundFilesList.append(file)
return foundFilesList
goodRoutes = fileFinder('./templates/someFolderWithGoodRoutes/')
The question is, Is This Bad?
There are many aspects of Flask I'm just not using (mainly because I haven't needed to know about them yet) - so maybe this is actually limiting, or redundant when compared against a built-in feature of Flask. Does my lack of explicitly decorating each route rob me of a great feature of Flask?
Additionally, is either of these methods more or less safe than the other? I really don't know much about web security - and like I said, right now this is all in-office stuff, the security of my data is assured by our IT professional and there are no incoming requests from outside the office - but in a real-world setting, would either of these be detrimental? In particular, if I am using the backend to os.walk a location on the server's local disk, I'm not asking to have it abused by some ne'er-do-well am I?
EDIT: I've offered this as a bounty, because if it is not a safe or constructive practice I'd like to avoid using it for things that I'd want to like push to Heroku or just in general publicly serve for family, etc. It just seems like decorating every viable route with app.route is a waste of time.
There isn't anything really wrong with your solution, in my opinion. The problem is that with this kind of setup the things you can do are pretty limited.
I'm not sure if you simplified your code to show here, but if all you are doing in your view function is to gather some data and then select one of a few templates to render it then you might as well render the whole thing in a single page and maybe use a Javascript tab control to divide it up in sections on the client.
If each template requires different data, then the logic that obtains and processes the data for each template will have to be in your view function, and that is going to look pretty messy because you'll have a long chain of if statements to handle each template. Between that and separate view functions per template I think the latter will be quicker, even more so if you also consider the maintenance effort.
Update: based on the conversion in the comments I stand by my answer, with some minor reservations.
I think your solution works and has no major problems. I don't see a security risk because you are validating the input that comes from the client before you use it.
You are just using Flask to serve files that can be considered static if you ignore the navigation bar at the top. You should consider compiling the Flask app into a set of static files using an extension like Frozen-Flask, then you just host the compiled files with a regular web server. And when you need to add/remove routes you can modify the Flask app and compile it again.
Another thought is that your Flask app structure will not scale well if you need to add server-side logic. Right now you don't have any logic in the server, everything is handled by jQuery in the browser, so having a single view function works just fine. If at some point you need to add server logic for these pages then you will find that this structure isn't convenient.
I hope this helps.
I assume based on your code that all the routes have a corresponding template file of the same name (destination to destination.html) and that the goodURL menu bar is changed manually. An easier method would be to try to render the template at request and return your 404 page if it doesn't exist.
from jinja2 import TemplateNotFound
from werkzeug import secure_filename
....
#app.route('/<destination>')
def goThere(destination):
destTemplate = secure_filename("%s.html" % destination)
try:
return render_template(destTemplate, goodURLS=goodURLS)
except TemplateNotFound:
abort(404)
#app.errorhandler(404)
def notFound(e):
return render_template('/notFound.html'), 404
This is adapted from the answer to Stackoverflow: How do I create a 404 page?.
Edit: Updated to make use of Werkzeug's secure_filename to clean user input.

Resources