I had some questions on db_redir_temp - nutch

I had injected some urls to crawl that is one round and I found some urls as db_redir_temp.
{"url":"http://www.universityhealth.org","pst":"temp_moved(13), lastModified=0: https://www.universityhealth.org/"}
{"url":"http://silvercappartners.com","pst":"temp_moved(13), lastModified=0: http://silvercappartners.com/index.html"}
may i know that the http://www.universityhealth.org is pointing to same url why it is showed db_redir_temp.
This url is pointing to http://silvercappartners.com to this url http://silvercappartners.com/index.html
should I consider the pst column will give the redirected url page.

The two URLs
http://www.universityhealth.org
https://www.universityhealth.org/
differ in one important point, the protocol (or scheme) - http vs. https. These are not always equivalent, eg. a web server may not support https. The other point (the trailing /) is irrelevant, the HTTP request for both the empty path and the server root path is GET / HTTP/1.1 (maybe using a different protocol version).
But true reason is simply that the server responded with HTTP/1.1 302 Found which is a redirect, see HTTP 302.
The "pst" or "protocol status" metadata field may include a message. For redirects it contains the redirect target.

Related

Should I use HTTP 302 instead of HTTP 301 in this case?

I have a website example.com
People come to my site to calculate some stuff and get their results like
example.com/result/oiwajefoijh238fjiow
example.com/result/jifomoiemowajefji33
They would spread the links on social networks like Twitter and Facebook.
But I don't want people from those links stay with the suffix. Is it good to do 301 or 302 from example.com/result/oiwajefoijh238fjiow to example.com?
I add those suffix for saving some status and it will be recognized by other servers (rather than browsers)
3×× is mainly used for REDIRECTION.
301 mean MOVED PERMANENTLY
The target resource has been assigned a new permanent URI and any future references to this resource ought to use one of the enclosed URIs.
302 means FOUND
The target resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client ought to continue to use the effective request URI for future requests.
You can read more here at HTTPSTATUSES

How to preserve referrer (Referer HTTP header) across subdomains?

I have a website running on www.example.com that makes GET requests to api.example.com to process a form. When I examine web server logs for api.example.com I see that requests from Safari get the full referer (e.g., www.example.com/page-where-request-originated). But requests from Chrome only get a partial referer (www.example.com).
I need the ability to track the full referring page when the request hits api.example.com. Reviewing the documentation for Referrer-Policy it seems my only option is to set it to unsafe-url. But that seems overkill because I only want the referrer to be sent for subdomains of example.com. Is that possible?
The only option I can find is strict-origin : Send the origin as referrer, but only when the request is no downgrade from https to http.
see: https://wiki.crashtest-security.com/enable-security-headers
Everything else will either omit the referrer completely or send the origin URL without any URL parameters.

How to interpret HTTP Status Code 302 in an IIS web log

I am looking at my IIS web log and notice some log records with an sc-status of 302.
I did research and am only more confused.
At first, it looks simple, if a little vague.
"This is an example of industry practice contradicting the standard.
[...] Therefore, HTTP/1.1 added status codes 303 and 307 to
distinguish between the two behaviours.[25] However, some Web
applications and frameworks use the 302 status code as if it were the
303."
While I understand the concept, I am not sure which meaning to apply when viewing an IIS web log. Do I treat the 302 status code as a 303 ("See Other" -- a way to redirect to a new URL) or as a 307 ("Temporary Redirect")?
307 causes a redirect using the same "verb" that the original url was requested with. That allows POST data to be preserved. By contrast, 301/302 will always cause a GET of the new url, losing any POST data that may have been present.
As well, with 301/302, the browser can cache the response and always go to the new url, bypassing the original url. 307 requires that the original URL be hit again, even if it does end up being another redirect.

Difference between IIS Redirect and Rewrite (in relation to redirecting)

The question may sound odd, but given an article, it is definitely possible to use the rewrite module to perform redirects just as with the redirect module. Both are able to issue a permanent redirect (301).
There is a question asking for the difference, but it talks about the rewrite module being used to purely rewrite not redirect. Another post makes this clear, but doesn't seem to get an adequate answer.
Hence, my question: What's the difference between these modules? Which is preferred over the other when it comes to redirects?
NOTE: THIS ANSWER DOES NOT answer difference between IIS Redirect (httpRedirect) vs URL Rewrite Module's Redirect but rather difference between URL Rewrite Module's (redirect vs rewrite).
If you are trying to hide complex URL (with querystrings) to more friendly URLs then Rewrite is the way to go as browser/Search Engines will always see 200OK and assume the content is coming from requested original URL.
If you are trying to indicate a change of resource to search engines/users of new URL then Redirect is the way to go as you are sending 301 status code saying that resource has moved from original to this new location.
IIS Redirect:
Redirecting happens at Client Side
Browser sees a different URL In address bar.
Client aware of a redirect URL.
301/302 can be issued. Edit: (303/307 can be issued too)
Good for SEO/Search Engine to indicate of new URL. mysite.com/abc to mysite.com/pqr
Can be redirected to same site or different site altogether.
IIS Rewrite:
Redirecting happens at Server Side
Browser does not see new URL in address bar.
Client unaware if content is served from a re-written URL.
No 301/302 are issued. This will have normal 200 OK assuming that rewritten URL Resource is available.
Good to hide unfriendly URL and also SEO. mysite.com/article/test-sub/ to mysite.com/article.aspx?id=test-sub
Generally for a resource within same site.
Request Handling (REDIRECT): www.mysite.com/abc to redirect to www.mysite.com/pqr
Client calls: www.mysite.com/abc
URL Rewrite Module sees a rule match for client URL and gives new redirect URL.
Server responds with 301 with new URL for client to call www.mystite.com/pqr
Client calls new URL www.mystite.com/pqr
Server responds with 200 OK for new URL. (address bar shows new URL)
Request Handling (REWRITE): www.mysite.com/abc which you want to point to www.mysite.com/pqr
Client calls: www.mysite.com/abc
URL Rewrite Module sees a rule match and provides new rewritten url to IIS i.e. www.mysite.com/pqr and Server makes request for that URL within IIS.
Server responds with 200 OK for original URL but with content from rewritten url. (address bar shows original URL and client does not know that you are serving content from different URL)

htaccess - Fetch static content from different path if page is HTTPS

As per the guidelines to speed up site, I have off-loaded all my static content (JS|CSS|IMAGES) to a subdomain static.example.com.
The site is working fine but the problem arises when I load the secure pages. Browser throws a warnings for "InsecureContent" being loaded into my page.
We have ssl for only root domain not subdomain. I can't get another one but want to handle it this way. As the subdomain folder can be accessed inside the root folder i want this.
I want to handle this with .htaccess this way ->
When referer is https ://www.example.com/anything.php
Rewrite http ://static.example.com/folder/file.ex to https ://www.example.com/static/folder/file.ex
Somebody help me doing this.
Not going to help at all. In order for anything inside an htaccess file to get processed, the request has to have been received by the webserver. When your browser pops up the "Insecure Content" warning, it hasn't sent the request yet. This warning pops up when negotiating the SSL connection, and in your case, your cert doesn't cover the domain the request is being made to. That means adding any sort of redirect on the server's end isn't going to help. You're still going to see the warning.
You need to ensure that your content points to http://static.example.com/ somehow, by either using a relative URI base or maybe absolute URLs that explicitly point to http://.
Another option may be to use some kind of javascript on the client side.

Resources