Getting redirected URL in python 3 - python-3.x

I want to get the address of a page after redirect. I have the following code
url = 'https://simple.wikipedia.org/wiki/Gcd'
print(urlopen(url).geturl())
But it doesn't work, it prints https://simple.wikipedia.org/wiki/Gcd, while it should print https://simple.wikipedia.org/wiki/Greatest_common_divisor.
So, what is the problem with it?

There is actually no problem. The URL you get when opening https://simple.wikipedia.org/wiki/Gcd is exactly that URL. The only way for the URL to change would be a redirect, and if you look at the response from that URL, you can see that it returns just a 200 status code. So there is no redirect.
However, when you open the URL in the browser, the URL does get changed to https://simple.wikipedia.org/wiki/Greatest_common_divisor. How does this happen when there is no redirect?
This is actually a new MediaWiki feature that rewrites the URL in the browser using the History API. It simply replaces the URL that is displayed in the browser—but without actually making a new request or being a true HTTP redirect.
It’s a functionality that only works in modern browsers with JavaScript enabled. Otherwise, you would stay on the Gcd URL which is also the behavior from older versions of MediaWiki.
You can learn more about this new MediaWiki feature in the Phabricator task T37045.
As for your “problem” with it, you should consider communicating with MediaWiki using the MediaWiki API which will also tell you when a page is a redirect.

Related

Image URL being redirected to other web site

I am trying to use an image from an existing website in my html page. The issue is that the image is not getting resolved correctly, Even when I hit the image URL directly on browser, its getting redirected to some other site. I have tried on all browsers but no luck.
I have heard about restricting resource using hot linking but in this case not sure whats happening. Any pointers/suggestions please.
Eg - I want to use image(http://www.acsisair.com.au/wp-content/themes/acsis-air/images/logo.png) but when hitting this URL after clearing browser cache, its being redirected to other website (http://www.nine.com.au)
It's very, very likely a form of hotlinking protection: if you change the URL subtly to make it clearly incorrect, you get a 404 page.
That means when you enter an URL to a resource that exists, you're given a header redirect. That's not an accident.
There's likely nothing you can (or should) do to circumvent this.
Use images you can host on your own site.

redirect https to http without postdata loss

On my website some pages are secure and some are not. I have a search form on each of my page (as a part of a template) and when I'm on https page and launch the search, the results page shows nothing. I know I must be loosing my posted data because of the redirection. Would excluding the results page from being redirected solve my problem? I want it to be http at all times. If that's the case - what exactly do I need to put inside of my .htaccess?
If you ensure the search form uses 'http:searchurl' in the action rather than just 'searchurl' then it should work

Is there any way to tell a browser that this is a bad URL to remember?

I'm sending emails to customers, and I'm providing a custom URL for each, which when they go to, will log them in.
This is fine, except if they are using a shared browser that will remember the URL.
Is there any way at all to suggest to the browser that it shouldn't remember a URL?
Edit: This question has nothing to do with caching of the page.
Have the link log them in once. Then make them create credentials that let them access the site in the future. Whats to stop a random person from typing in the url and gaining access to the content?
Yes. You can redirect them with a 301 or 302. Then the browser won't save the URL they went to. At least that work with the Mozilla based browsers and I would imagine others too.
Another way, it is uglier though is to reply with an error and include a body which does a refresh. Whether that works in most browsers, probably not. However, browsers do not cache pages that return an error (404 Page Not Found would work, you could also use 403 Forbidden.)
Other than that, there isn't much you can do. JavaScript does not allow you to temper with the history anymore...

HTTP Module causing problem with Google Cache

I have a live site ( I can't provide the URL ).
It is on sharepoint 2007. The pages were having a URL, later that was modified.
I wrote a http module and used response.redirect() to navigate user to the correct page.
But since the site was live previously; on searching on google.com, it shows the old URL only. Though the redirection works fine. I need to change the cached URL to new URL.
How can I do that ?
You need to understand the different redirect codes - by itself response.redirect() just redirects a browser (or bot) to another address.
You should have been issuing a 301 redirect then Google and other services (its been roumered that there are a few other games in town) would have eventually removed the old URL and replaced with the new URL and all your 'link juice' would be kept.
If you need to change the URL of a
page as it is shown in search engine
results, we recommended that you use a
server-side 301 redirect. This is the
best way to ensure that users and
search engines are directed to the
correct page. The 301 status code
means that a page has permanently
moved to a new location.
ASP.NET code for this
Response.Status = "301 Moved Permanently"
Response.addheader "Location", "http://www.newdomain.com/newurl/"
Response.end
Try to look here. Not sure, but it can help you.

Can I have 'friendly' url's without a URL rewriter in IIS?

Without having a url rewriter such as ISAPI_Rewrite available, is it possible to achieve the following:
I would like a user to browse to http://www.jjj.com/directory where /directory does not actually exist. IIS transfers the user to not-found.cfm.
At this point I can serve index.cfm i.e. http://www.jjj.com/directory/index.cfm.
The url will display just fine and the page loads even though the directory or index.cfm doesn't exist. However I'd like to be able to not have index.cfm in the url.
Ideal:
Page Request to http://www.jjj.com/directory
IIS loads not-found.cfm as the default 404 errorhandler.
Not found strips the CGI.query_string and uses cfswitches to funnel the user to the appropriate controller function. May use onMissingTemplate?
The page request never changes in the URL and the page loads transparently the user with 200 OK status
If a user requests http://www.jjj.com/directory/index.cfm I would 301 redirect to http://www.jjj.com/directory
Current:
Page Request to http://www.jjj.com/directory
IIS loads not-found.cfm as default 404 error handler.
Not found strips the CGI.query_string and uses cfswitches to funnel the user to the appropriate controller function.
The page request changes to http://www.jjj.com/directory/index.cfm with a 200 OK status
You're asking how to cut something but telling us you're not allowed to use a knife or anything resembling one.
Here's my only clever idea using onMissingTemplate().
GET /directory/
-> 404.cfm
-> <cfinclude template="#cgi.script_name#/special.cfm" />
-> fires onMissingTemplate() where you ignore the "special.cfm" bit and just use the rest of the requested path to figure out what controller to wire up to.
This is a kludgy hack, though, so I would try to avoid it myself. Maybe if you explain why ISAPI Rewriting isn't an option, then we might be able to help further.
You can tell IIS to have 404 and 403 errors execute a custom URL on your site (such as /urlhandler.cfm).
Then, you can parse the 'cgi.query_string' and route the application anyway you desire using cfinclude to simply include the correct 'template.cfm', or, you can reformat the input your framework is expecting, or, use a project like http://coldcourse.riaforge.org/.
Just one note, IIS will give you a URL that looks like this: '404;http://yoursite.com/the/url/you/wanted/to/route'.
Is IIS7 on the approved list of software? That can get you native url rewriting and side-step the whole issue.
Second option -- my CFM voodoo is rusty, but I think you can setup IIS6 to look for a CFM page (like you are doing) but then step in at the application level and do the url rewriting/repointing before it actually hits the 404 page.
Another way around it -- find an ISAPI url rewriter that is, say, under the MIT license. Build your own copy. Then have them install that as part of your software package.

Resources