How does the browser know the correct URL path? - browser

Sorry for the basic question, but couldn't find a similar answer.
If I write this link: click me.
On my HTML page, and host that page at google.com - When I navigate to that page the link will automagically link to google.com/about.
My question is, how does it do that? Does the browser just know the internal link from the page you are currently on? Is it the server calculating the links? How does it know to add the google.com?
I'm building a web crawler that finds links on a site (including these internal links), and not sure if I can just add in the google.com or if browsers work out internal links a different way.

Related

How to setup a Share point Link for a google extension

So, I am trying to implement a SharePoint intranet site for an organization. However, there is one application in particular that they would like a link to on the homepage. Unfortunately this application can only be used via the IE tab google chrome extension (I know, dumb) but app devs have yet to add chromium compatibility.
Any way the link looks like this:
chrome-extension:
//hehijbfgiekmjfkfjpbkbammjbdenadd/nhc.htm#url=https://website.com/sub/sub.Hub.aspx
But share point requires a https:// on the beginning of a link.
If you throw that destination into chrome directly it navigates fine, but if you add say https://google.com/ on the front or https://*/ it doesn't work.
Is there a syntax that will allow me to put https:// on the front of this without getting a 404 error?
Never mind, I ended up re-directing this through IIS internally

Cannot open my github pages on WeChat App (Android, iOS both)

I built my website with github-pages. (Jekyll)
It opened well in web browser like chorme (pc, mobile both), Internet Explorer or something else.
But, the problem is that it cannot be opened in Wechat App (Android, iOS both). My access region is South-Korea (not china, for your information about chinese regulation)
(I am newbie on wechat and I don't know anything about Chinese online regulation. But I am sure that my blog is not blocked because I can access that blog on QQ browser)
Detail information
Imgur Image - Send URL to someone
I send URL to someone as above image.
Imgur Image - No access
At first, It opened well. But second time clicking url again, It didn't open with weird sentence as above image.
Tips. https://aceshipping.github.io
For account security, do not enter any info related to WeChat password in the Internet.
Continue (button)
But, Continue Button doesn't work and also there is no private-related information required on my github page. (You know that there is no login feature available in github page)
Please help me. I need to open this on WeChat, without other browsers.
TL;DR: use a .com/net/org/cn domain.
As of 2020/04, WeChat blocks links by default for public hosts like GitHub and Netlify, redirecting the page to https://weixin110.qq.com/cgi-bin/mmspamsupport-bin/newredirectconfirmcgi?main_type=1&evil_type=100&source=2&url=<your link here>&scene=1&devicetype=android-28&exportkey=...&pass_ticket=...&wechat_real_lang=(en or zh_CN or something). This page is a normal redirection page indicating "非微信官方网页,请确认是否继续访问。" ("not a WeChat official web page, please confirm to continue browsing.") when the language of WeChat is Chinese (whether simplified or traditional), but on other languages it redirects to a link like https://qbview.url.cn/getResourceInfo?appid=62&url=<your link here>&openid=<your WeChat account identifier>&version=10000&doview=1&platformtype=700. This seems to be a removed Tencent service which serves the page with no JavaScript and reduced CSS ('safe browsing' / 'reader mode' as on redirection pages), which was removed on some date in 2019, now serving on a bad SSL certificate and returns only 400 (Bad Request). It seems that the Chinese version of the weixin110 page is changed while in other languages the link remains unchanged, which leads to a bad link.
The HTML markup of those pages are listed, if anyone is interested: en, zh_CN (The words in the brackets are translations added by me)

Website A 'redirect' to subdomain of website B, with content of website A

There has been a question made towards me recently to do the following:
We have a website with Drupal running in IIS.
On that site is an URL Redirect to a website hosted externally, obviously with a name completely irrelevant to the name of our company.
The question now is the following;
They want to change to URL to a subdomain of our website. Example: from "www.external-site.com" to "www.sub.internal.com" (while still showing content of the external website)
They want the current page of that website to be reflected in the URL bar. So it wouldn't say "www.sub.internal.com", but it would say "www.sub.internal.com/solutions/page1.html" (instead of "www.external-site.com/solutions/page1.html")
It's possible that I forgot another 'condition' but have mentioned before this.
So, if someone visits through our URL Redirect to External-website, it needs to show our subdomain instead of their domain in the URL, AND it needs to show the current page when people start browsing while still using our subdomain in the URL.
Now, I checked the external-website, and it seems that most of the links available are relative links (if this would be any useful information).
Currently, the external website is hosted externally, and will remain to be so for next few years. (I believe we bought the company)
I have been asking around and looking up, and the best possible thing seems to use domain forwarding, but even then it still doesn't seem to comply with the entirety that they asked of me.
I am but a 'simple' .NET programmer, held responsible to do support for anything involving the websites, and I can't say I have extended knowledge about infrastructure. (But I can ask people to do this for me)
Is there anything that could solve this?
Thanks so much!
IIS's URL rewite and Application Request Routing (ARR) combo can help you what you want to achive. Here are few links which may guide you to configure ARR. Please note that these links dont exibit exact solution to your problem however you can take clue from it and fabricate your solution accordingly.
http://www.iis.net/learn/extensions/url-rewrite-module/reverse-proxy-with-url-rewrite-v2-and-application-request-routing
http://www.iis.net/learn/extensions/url-rewrite-module/reverse-proxy-rule-template
It sounds like you'll want to use a full-page iframe: do not redirect but show a page with an "inner page" instead: that inner page is the external web site. That way, users do not see the external site in their URL bar.
http://webdesign.about.com/od/iframes/a/aaiframe.htm
You need to configure the equivalent of Apache Virtual Host with Reverse Proxy on IIS.
See this answers:
https://serverfault.com/a/271030
and
https://stackoverflow.com/a/10003306/2131693

If a page is not linked to the main website, can search engines find it?

I want to put a secret page in my website (www.mywebsite.com). The page URL is "www.mywebsite.com/mysecretpage".
If there is no clickable link to this secret page in the home page (www.mywebsite.com), can search engines still find it?
If you want to hide from a web crawler: http://www.robotstxt.org/robotstxt.html
A web crawler collects links, and looks them up. So if your not linking to the site, and no one else is, the site won't be found on any search engine.
But you can't be sure, that someone looking for your page won't find it. If you want secret data, you should use a script of some kind, to grant access to those, who shall get access.
Here is a more useful link : http://www.seomoz.org/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines
No. A web spider crawls based on links from previous pages. If no page is linking it, search engine wouldn't be able to find it.

404 handler and dynamic pages that really don't exists... bad for SEO?

We have an IIS 404 asp.net handler that renders pages when an html page is not found. It uses the page's URL to query our Databases and builds rich relevant content on the fly. From what I can tell in the IIS logs and anaylyzing the pages from web browser tools there is NO indication the page does not actually exist and was dynamically generated.
In these cases is IIS actually sending a 404 to the client? Is there a redirect of any kind actually happening? Will Search engines punish me for this?
It's been 2 months and Google has indexed everything, but Bing and Yahoo have not indexed anything dynamic dispite my submitting various Directory pages, Sitemaps and Feeds with all my links. My home page is indexed on all search engines and has all my links. When I search very unique keywords in those links, I can see that bing and yahoo do see them on my Home Page links - but only there.
Is there anything I can run or check to make sure my dynamic pages are not somehow viewed as bad by Search engines? Any way to check if a 404 (whatever a 404 actually is to a client besides just another page) is returned to crawlers?
Many Thanks.
Is there anything I can run or check to make sure my dynamic pages are
not somehow viewed as bad by Search engines?
Dynamic pages are just fine. Most of the content on the Internet is dynamically produced. The search engines don't care if content is dynamic and, in fact, they usually do not know content is dynamic as all they see if the URL and the HTML that is produced by that URL.
Any way to check if a 404 (whatever a 404 actually is to a client
besides just another page) is returned to crawlers?
Use a tool like Firebug or the built in developer tools in Chrome to view your HTTP headers. Crawlers see the same headers a browser would see so that is an easy way to tell what headers your pages are sending out.

Resources