Web: is the convention "//mywebsite.com/mypage" with "//" a standard? - protocols

Explanation for people who don't know: this syntax means to Firefox : "use the same protocol than the one the page uses". So if the page is https and the link (or image) is //mywebsite.com/myimage.png Firefox will try to download it this way: https://mywebsite.com/myimage.png
(Edit my question if I'm wrong I don't want to say lies).
I'm wondering: is this a standard and all other Webbrowsers know it or if it's just something recent?
I'm sorry but I can't find the right words when googling for it ("convention" - "https" - "//" and so on don't give good results)
Thank you!

RFC3986, which defines the URI (Uniform Resource Identifier), which is the superset of URLs and URNs, is ambiguous. Appendix A, which defines the syntax, does not show the scheme as optional, but section 5.3, which covers reconstructing a URI, does show the scheme as optional.
That said, it's better for security purposes if you are explicit as to which scheme is used, to prevent the possibility of sensitive information being accidentally sent in the clear.

The inventor of the web, Tim Berners-Lee, has invented these slashes as well, but they have no practical use. He even apologized to the public for them.

Related

Removing ids from url [duplicate]

Hey guys! Working on a new Cake app and wondering if there is anyway for me to remove the ID-in-URL routing from Cake. Perhaps by passing the ID in POST somehow? Having the ID passed in as a URL param just seems really shoddy and unsafe. Thanks!
"Shoddy"? It's standard practice and a perfectly fine solution to have ids in the URL. Look at the URL of your question:
http://stackoverflow.com/questions/4638262/removing-id-from-cakephp-url
^^^^^^^
id
Also, there's absolutely nothing unsafe about showing an id in a URL. It's just a number that doesn't mean anything. If a user can do something "bad" only by knowing this id, your app is broken and insecure, not the id-passing mechanism.
Trying to work around this scheme means working around the fundamental principle of the HTML protocol and opens up a whole new can of worms.
Some people prefer using slugs instead of primary key ids. This is the removing-id-from-cakephp-url part of the URL from this page. Take a look at the SluggableBehavior.
However, slugs can change. Hence, having the primary key in your URL is useful if you want to have a permalink. StackOverflow does both so that it can support both permalinking from other sites, as well as for SEO reasons. :)
Regarding security issues, I guess the other answers have already pointed out that there are other ways to make your application secure.
Why do you care? URL-s are optimized for SEO reasons, an ID won't matter if it's ain't too long. If the latter, consider using a shorter one with numbers and letters in them instead, it will be as difficult to guess as a long one with just numbers.
If you are not using GET and you do not supply the params in the URL, your users won't be able to copy-paste the location.

Google docs viewer url parameters

Is there any sort of documentation on exactly what parameters you can put in the url of Google viewer?
Originally, I thought it was just url,embedded,chrome, but I've recently come accross other funny ones like a,pagenumber, and a few others for authentication etc.
Any clues?
One I know is "chrome"
If you've got https://docs.google.com/viewer?........;chrome=true
then you see a fairly heavy UI version of that doc, however with "chrome=false" you get a compact version.
But indeed, I'd like a complete list myself!
I know this question is very old and perhaps you already solved your issue, but for anyone on the internet who might be looking for an answer...
I have been looking for this recently, following a guide I found on GitHub Gist
https://gist.github.com/tzmartin/1cf85dc3d975f94cfddc04bc0dd399be
More specifically, the option to embed a certain page of pdf using
<iframe src="https://docs.google.com/viewer?srcid=[put your file id here]&pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="580px" height="480px"></iframe>
The best I could fing was this article (I suppose from a long time now)
https://weekly-geekly.github.io/articles/111647/index.html
HOWEVER, I tried modifying the attributes and the result was simply a redirect to
https://drive.google.com/file/d/[ID]/edit
https://drive.google.com/file/d/[ID]/preview or
https://drive.google.com/file/d/[ID]/view
AS OF MAY 2020, THIS SOLUTION PROBABLY DOESN'T WORK
I'm also on a quest to discover some of the parameters of the viewer.
the "chrome" parameter doesn't seem to do anything, though. Is this
supposed to be the same as embedded=true?
Parameters I know of:
url= (obviously)
embedded= (obviously)
hl= set language of UI (tooltips)
#:0.page.1 = jump to page 2 (page 1 is numbered 0) - this is unreliable and often requires a refresh after the first load,
defeating the purpose.
That said, when I use the Google Docs viewer on my site, "fit page to
screen" is the default view without any parameters. So maybe I'm
misunderstanding your question.
Source: For convenience, this is a full quote of the sole answer (it is from user k3david) to the crosspost of this question #Doc has posted to the Google support forum in 2011.
You can pass q=whatever to pass a search query to the viewer.

Is my site safe from XSS if I replace all '<' with '<'?

I'm wondering what the bare minimum to make a site safe from XSS is.
If I simply replace < with < in all user submitted content, will my site be safe from XSS?
Depends hugely on context.
Also, encoding less than only isn't that flash of an idea. You should just encode all characters which have special meaning and could be used for XSS...
<
>
"
'
&
For a trivial example of where encoding the less than won't matter is something like this...
Welcome to Dodgy Site. Please link to your homepage.
Malicious user enters...
http://www.example.com" onclick="window.location = 'http://nasty.com'; return false;
Which obviously becomes...
View user's website
Had you encoded double quotes, that attack would not be valid.
There are also case where the encoding of the page counts. Ie - if your page character set is not correct or does not match in all applicable spots, then there are potential vulnerabilities. See http://openmya.hacker.jp/hasegawa/security/utf7cs.html for details.
No. You have to escape all user input, regardless of what it contains.
Depending on the framework you are using, many now have an input validation module. A key piece I tell software students when I do lectures is USE THE INPUT VALIDATION MODULES WHICH ALREADY EXIST!
reinventing the wheel is often less effective than using the tried and tested modules which exist already. .Net has most of what you might need built in - really easy to whitelist (where you know the only input allowed) or blacklist (a bit less effective as known 'bad' things always change, but still valuable)
If you escape all user input you should be safe.
That mean EVERYTHING EVERYWHERE it shows up. Even a username on a profile.

Safe or unpractical to use UTF-8 page names or other text? - User submitted text!

I am working on a site that have an international aim; I.o.w., logged in users can add text in their own language. I am hoping for international page names and content.
An URL example, like the Japanese Wikipedia: http://ja.wikipedia.org/wiki/メインページ (Both pagename and content text).
I know by using UTF-8, I can do this, but how should I control it?
UTF-8 contains way to many languages/letters to control in a script, I guess, so how safe/unsafe is it to allow people to add UTF-8 text?
I can see that someone could add harmful code this way, but how to prevent it?
All information regarding safety/control when using UTF-8 is appreciated!
EDIT: PS! I use PHP and MySQL.
Warning: perhaps a slightly rusty response:
Note: not discussing host name (IDNS) issues.
The only completely safe thing here is to use %-escaped UTF-8. Some browsers will display this as what you want, and some will display the %-escapes. (e.g. http://foo.bar/%ee%cc%cf.html)
If you put 'real UTF-8' in the URLs, many things will work, but there may be unpleasant surprises lurking for some people in some browsers. I'm reading your question as dealing with 100% static content. If you are trying to do this with code behind the site, you have additional issues to work on.
The 'unpleasant surprises' would be (a) people finding the %xx's in the URL unreadable, (b) a browser that melts, (c) some data scraping or aggregating application that melts.
I wish I were more up to date on this, but I'm not, so my recommendation is to deploy a test site and then try to access it with everything you can put your hands on, including mobile phones. Persuade Google to index it, and see what happens there.
For domain names, this is called IDN. For page names, you may want to think of the possibility of IDN spoofs.
It's safe as long as you don't interpret it literally as SQL (SQL injection) or HTML (XSS) or any other language. Just escape any user-controlled input (request URL, request headers, request parameters, request body, etc..etc..) at the point it's going to be used in SQL or HTML.
It's unclear what server side programming language you're using, so I can't go further in detail.

How to get a description of a URL

I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.
Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.
These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.
You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.
Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google
I am not familiar with Google APIs, but perhaps there is an official way to get such information.
Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!
If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.

Resources