first I don't know whether it is a right place to ask this Question or not. When we open some specific site or submit some form or login to some site, then in the address bar some encrypted text are appended as a query string but I don't have any idea whether it is a session id or some thing else.
And if it is a session id then is it a good approach to disclose the session id.
Like. https://www.google.co.in/?gfe_rd=cr&ei=q1HiWI2kLO3s8Ae3raXwCQ
https://my.naukri.com/Inbox/viewRecruiterMails?id=d786bc1c09837cc9ca692d042c01186294584fccc83209d4fe409a9be01b6ec61edd7a843282321a
The string ei= in first example and id= in second one
What you are seeing is just encoded binary values (byte values). The first instance seems to be using base 64 for the encoding (probably the URL-safe variant of it) and the second one uses hexadecimal encoding of the bytes.
What the meaning is of the data (possibly after decoding) depends on the protocol defined for the site. There aren't any specific rules. ID's generally contain about 128 bits of randomness though.
Related
I have a screen sharing application in which all sensitive data is masked. There are some scenarios in which the customer types sensitive data, such as the SSN, into a non-masked field, thereby directly compromising our solution.
Is there a way I can detect that person is typing SSN or any sensitive data, without accessing the DB -- or any other server-side information -- just on the client side.
For instance, consider a form with SSN and address fields. How can I avoid displaying an SSN mis-entered in the Address field?
SSN
Address:888-9999-0988
EDIT
My approach as of now is storing all patterns in a property file on the client side. For example, a typical password in my application is 8 characters with following rules:
^(?=.*[A-Za-z])(?=.*\d)(?=.*[$#$!%*#?&])[A-Za-z\d$#$!%*#?&]{8,}$
I will have this regex stored on client side. Once the user starts typing, I will check whether he is typing a password or a username.
Similarly, I can break a phone number into country - area code format and can have a regex for that also.
But since SSN is pure 9 digit random number, I am stuck.
By extracting field names from DOM, I can get exact location of cursor and Field name in which User is typing data.And by using the type of data format they support I can have rough idea whether user is typing the required data or something else. Please correct me if I am wrong
September:
14th-2016
Since I have screen sharing solution running in a browser,it will be used as plug-able component by the websites.So it won't be easier to ask them to restructure their pages.
Below is the approach I have finalized based on your inputs.
I will have one file file with all patterns stored in it.
1.For Credit related info all have definite patterns.So it won't be any problem
2.Passwords also have some rules or patterns.So these can also be checked
To avoid data exposure I will show generic message till user types data,like the watsapp shows "User is typing".Once User tabs or comes to next field then only data will be shown after inline Pattern Validation.
Only thing left here is how to detect those fields like SSN, Phone number etc which does not have any pattern.Hope we can have some solution for this also.
Regards
Harry
In short, you can't.
How do you differentiate the sensitive data by value and format?
For instance, how do you differentiate SSN 212-73-6500 from NYC telephone number 212-736-5000 in time to blank out sensitive information? Similarly, how do you detect that the user is typing a password in the address field, before displaying any characters of the password?
You say that you cannot access the data base to match entered data with already known values. Unless you can dictate some differentiation to the user before sensitive data gets entered, you have no way under information theory to detect the mistake. You can't differentiate a password from a name or other alphanumeric input.
I see one possibility: treat all data as sensitive until cleared for display. Each input field gets masked until you verify that it cannot contain sensitive information.
Unfortunately, I don't know that you can do this strictly from the client side. Again, how do you differentiate a mis-typed SSN (left out or doubled a digit) from a phone number? How can you tell an alphabetic address from a password?
I believe that, to solve this, you will need to impose some restrictions on the sensitive input you request. Passwords cannot look anything like any other field. You separate confusable fields by as much as possible on the page ... and some of this may be problematic with the smooth flow of entering information.
EDIT RESPONSE
That's correct: you can set up a family of recognition rules and differentiate various types of input. However, you still have some innate problems in that you cannot differentiate some types of input until it's too late, such as the phone-or-SSN ambiguity above, or telling a password from a name or address (some addresses start with pure alphabetic characters).
Can you fulfill security requirements with severe partitioning? For instance, put all of the sensitive information on one page, and all of the displayed information on another. I'm looking for a way to force the user to know that certain information is sensitive.
Also, do you have to configure this for international use? Security requirements differ across cultures and nations. Keep that in mind when you design this application.
Okay so I was thinking today about Minecraft a game which so many of you are so familiar with, I'm sure and while my question isn't directly related to the game I find it much simply to describe my question using the game as an example.
My question is, is there any way a type of "seed" or string of characters can be used to recreate an instance of a program (not in the literal programming sense) by storing a code which when re-entered into this program as a string at run-time, could recreate the data it once held again, in fields, text boxes, canvases, for example, exactly as it was.
As I understand it, Minecraft takes the string of ASCII characters you enter, all which truly are numbers, and performs a series of operations on it which evaluate to some type of hash or number which is finite... this number (again as I understand) is the representation of that string you entered. So it makes sense that because a string when parsed by this algorithm will always evaluate to the same hash. 1 + 1 will always = 2 so a seeds value must always equal that seeds value in the end. And in doing so you have the ability to replicate exactly, worlds, by entering this sort of key which is evaluated the same on every machine.
Now, if we can exactly replicate worlds like this this is it possible to bring it into a more abstract concept like the following?...
Say you have an application, like Microsoft Word. Word saved the data you have entered as a file on your hard drive it holds formatting data, the strings you've entered, the format of the file... all that on a physical file... Now imagine if when you entered your essay into Word instead of saving it and bringing your laptop to school you instead click on parse and instead of creating a file, you are given a hash code... Now you goto school you know you have to print it. so you log onto the computer and open Word... Now instead of open there is an option now called evaluate you click it and enter the hash your other computer formulated and it creates the exact essay you have written.
Is this possible, and if so are there obvious implementations of this i simply am not thinking of or are just so seemingly part of everyday I don't think recognize it? And finally... if possible, what methods and algorithms would go into such a thing?
[EDIT]
I had to do some research on the anatomy of a seed and I think this explains it well
The limit is 32 characters or for a
numeric seed, 19 digits plus the minus sign.
Numeric seeds can range from -9223372036854775808 to
9223372036854775807 which is a total of 18446744073709551616 Text
strings entered will be "hashed" to one of the numeric seeds in the
above range. The "Seed for the World Generator" window only allows 32
characters to be entered and will not show or use any more than that."
BUT looking back on it lossless compression IS EXACTLY what I was
describing after re-reading the wiki page and remembering that (you
are very correct) the seed only partakes in the generation, the final
data is stores as a "physical" file on the HDD (which again, you are correct) is raw uncompressed data in a file
So in retrospect, I believe I was describing lossless compression, trying in my mind to figure out how the seed was able to replicate the exact same world, forgetting the seed was only responsible for generating the code, not the saving or compression of it.
So thank you for your help guys! It's really appreciated I believe we can call this one solved!
There are several possibilities to achieve this "string" that recovers your data. However they're not all applicable depending on the context.
An actual seed, which initializes for example a peudo-random number generator, then allows to recreate the same sequence of pseudo-random numbers (see this question).
This is possibly similar to what Minecraft relies on, because the whole process of how to create a world based on some choices (possibly pseudo-random choices) is known in advance. Even if we pretend that we have random numbers, computers are actually deterministic, which makes this possible.
If your document were generated randomly then this would be applicable: with the same seed, the same gibberish comes out.
Some key-value dictionary, or hash map. Then the values have to be accessible by both sides and the string is the key that allows to retrieve the value.
Think for example of storing your word file on an online server, then your key is the URL linking to your file.
Compressing all the information that is in your data into the string. This is much harder, and there are strong limits due to the entropy of the data. See Shannon's source coding theorem for example.
You would be better off (as in, it would be easier) to just compress your file with a usual algorithm (zip or 7z or something else), rather than reimplementing it yourself, especially as soon as your document starts having fancy things (different styles, tables, pictures, unusual characters...)
With the simple hypothesis of 27 possible characters (26 letters and the space), Shannon himself shows in Prediction and Entropy of Printed English (Bell System Technical Journal, 30: 1. January 1951 pp 50-64, online version) that there is about 2.14 bits of entropy per letter in English. That's about 550 characters encoded with your 32 character string.
While this is significantly better than the 8 bits we use for each ASCII character, it also shows it is very likely to be impossible to encode a document in English in less than a fourth of its size. Then you'd still have to add punctuation, and all the rest of the fuss.
I have a client with a website that looks as if it has been hacked. Random pages throughout the site will (seemingly at random) automatically forward to a youtube video. This happens for a while (not sure how long yet... still trying to figure that out) and then the redirect disappears. May have something to do with our site caching though. Regardless, the client isn't happy about it.
I'm searching the code base (this is a Wordpress site, but this question was generic enough that I put it here instead of in the Wordpress groups...) for "base64_decode" but not having any luck.
So, since I know the specific url that the site is getting forwarded to every time, I thought I'd search for the video id that is in the youtube url. This method could also be pertinent when the hack-inserted base64'd string is defined to a variable and then that variable is decoded (so a grep for "base64_decode" wouldn't necessarily come up with any answers that looked suspicious).
So, what I'm wondering is if there's a way to search for a substring of a string that has been base64'd and then inserted into the code. Like, take the substring I'm searching for, base64 it, and then search the code base for the resultant string. (Maybe after manipulating it slightly?)
Is there a way to do that? Is that method even valid? I don't really have any idea how the whole base64 algorithm works, or if this is possible, so I thought I'd quickly throw the question out here to see if anyone else did.
Nothing to it (for somebody with the chutzpah to call himself "Programmer Dan").
Well, maybe a little. Your have to know the encoding for the values 0 to 63.
In general, encoding to Base64 is done by taking three 8-bit characters of plain text at a time, breaking those bits into four sets of 6-bit numbers, and creating four characters of encoded text by converting the numbers (0 to 63) to arbitrary characters. Actually, the encoded characters aren't completely arbitrary, as they must be acceptable to pretty much ANY method of transmission, since that's the original reason for using Base64 encoding. The system I usually work with uses {A..Z,a..z,0..9,+,/} in that order.
If one wanted to be nasty (which one might expect in the case you're dealing with), one might change the order, or even the characters, during the process. Of course, if you have examples of the encoded Base64, you can see what the character set is (unless the encoding uses more than 64 characters). But you still have the possibility of things like changing the order as you encode or decode (simple rotation, for example). But, I digress. The question is about searching for encoded text, not deciphering deliberate obfuscation. I could tell you a lot about that, too.
Simple methodology:
Encode the plain text you're looking for. If the encoding results in one or two equal signs (padding) at the end, eliminate them and the last encoded character that precedes them. Search for the result.
Same as (1) except stick a blank on the front of your plain text. Eliminate the first two encoded characters. Search for the result.
Same as (2) except with two blanks on the front. Again, eliminate the first two encoded characters. Search for the result.
These three searches will find all files containing the encoding of the plain text you're looking for.
This is all “air code”, meaning off the top of my head, at best. Some might suggest I pulled it out of somewhere else. I can think of three possible problems with this algorithm, excluding any issues of efficiency. But, that’s what you get at this price.
Let me know if you want the working version. Or send me yours. Good luck.
Cplusman
I have a Node app that collects the user's address and sends it to the Yahoo PlaceFinder api (which returns the user's geolocation).
The user should fill out the "address" field and click submit.
Many different address formats are acceptable, such as:
90210 (just zip)
123 fake street, beverly hills, CA 90210
123 fake street, 90210
... etc
I'm not concerned if the user enters a valid address or not. I don't even want to think about what RegEx would be needed for that.
I am concerned about security.
What steps (if any) should I take to sanitize the user's input before processing it with with my Node app - http.get()request to YahooPlace finder api?
Let me give a fairly pragmatic answer beyond the usual "sanitise your untrusted data against a white list of acceptable values" response. When you say "collects the user's address", are you actually either:
Passing it to a location such as a database where you might be worried about injection attacks
Rendering it to a page where you might be worried about an XSS attack
If you're simply taking the input and passing it off to Yahoo, then the problem is more theirs than yours as you're not exposing yourself to either of the above attack vectors. You might find that if this is the case, there's not a lot of point in going down the sanitisation path which will inevitably be very difficult against something as variable as a free-form address.
I dare say the simplest method would be to apply a regex to their input, allowing only alphanumeric characters, along with perhaps a comma and a period. I'm not sure if that would allow all valid addresses through, but if it fails you could display an error message to the effect of "only use [A-z0-9,.]"
That should, in theory, mitigate most types of exploits you would see, as they would most likely need some form of control character to break your code. Barring an overflow of some sort, I'm pretty sure commas and periods are relatively harmless given your situation.
I don't get the Base64 encryption.
If one can decrypt a Base64 string, what is it's purpose?
Why is it being used for HTTP Basic auth?
It's like telling to someone my password is reversed into OLLEH.
People seeing OLLEH will know the original password was HELLO.
Base64 is not encryption -- it's an encoding. It's a way of representing binary data using only printable (text) characters.
See this paragraph from the wikipedia page for HTTP Basic Authentication:
While encoding the user name and password with the Base64 algorithm typically makes them unreadable by the naked eye, they are as easily decoded as they are encoded. Security is not the intent of the encoding step. Rather, the intent of the encoding is to encode non-HTTP-compatible characters that may be in the user name or password into those that are HTTP-compatible.
It's normally called base64 encoding, not encryption! The nice thing about base64 encoding is it allows you to represent (binary) data using only a limited, common-subset of the available characters, far more efficiently than just writing a string of 1s and 0s as ASCII for example.
Encryption requires a key (string or algorithm) in order to decrypt; hence the "crypt" (root:cryptography)
Encoding modifies/shifts/changes a character code into another. In this case, usual bytes of data can now be easily represented and transported using HTTP.
Base-64 encoding is part of the MIME specifications. It provides a transport-safe encoding for data that won't get chewed on if/when it gets relayed through a host that uses a different encoding scheme than that used by the original client.
There are lots of different hosts out on the intertubes and you can't really assume support for anything other than 7-bit ASCII, without risking data loss/confusion.
IBM mainframes, for instance, use an encoding called EBCDIC (which comes in lots of different flavors). It's codepoints are completely different from the code points used by ASCII-based 'puters -- in ASCII, the letters A-Z are 0x41 - 0x5A; in EBCDIC the letters A - Z aren't even a contiguous range: the letters A-I live at 0xC1 - 0xC9, the letters J-R live at 0xD1 - 0xD9 and the letters S-Z live at 0xE2 - 0xE9.
You might mean "Base 64 Encoding". Encryption is not the same as encoding.
Wikipedia: Encryption
In everyday language, a “code” is something secret. In science and engineering, a code is simply an agreement, a set of rules, of how to write something.
That code may be secret. In that case, it’s called an encryption. But in general, a code is not secret. Take the genetic code. It simply states that our DNA is built from four different bases – A, C, G and T and that three bases taken together form one amino acid. There’s also a table of which three letters form which amino acid.
There’s nothing secret about this code.
Likewise, Base64 is not a secret code. Rather, it’s a code that allows storing data in six bits per character (thus there are 64 different entities, i.e. the “base” of the system is 64, just as the base of our decimal system is 10, since there are 10 different entities called “digits”).
By default, message header field parameters in Hypertext Transfer Protocol (HTTP) messages cannot carry characters outside the ISO- 8859-1 character set.
If user name and password contains incompatible charset than HTTP would not be able to carry those text. to prevent from this we encode user name and password with base64 to make sure we are sending HTTP compatible char over HTTP. for more information see this Basic_access_authentication