Bittorrent tracker reply for scrape when '00' in infohash - bittorrent

My Question regards the scrape http reply when the infohash contains a '00', scrape request with %00, so there would be a '\0' in http reply.
I found some torrents which have zero bytes in infohash and I wonder how the tracker should reply to scrape requests for that torrents. I mean would a '\0' normally work? I tested this torrents with torrentspy and it always says:
"Torrent not found on Tracker; may be rotten"
I wrote a basic tracker to test that and verified that the complete scrape reply was written to socket but torrentspy still has the same output. When I have a look to that scrape reply in a webbrowser it ends at the '\0'
Is anyone able to bring light into the dark?

It's not entirely clear what your question is, but I'll try to fill some stuff in.
'\0' is no different than any other non-printable character. All non URL safe character always have to be escaped as %xx in the GET request to an HTTP tracker. Some trackers have bugs where they for instance assume that characters that don't need to be escaped are not escaped (i.e. they may store the escaped version of the info-hash as their internal representation). Some web servers may even assume that ` (single quote) always is escaped (allowing SQL injection attacks). All special characters must to be escaped, especially & and % (obviously).
The response is much simpler. Since the response from a tracker is bencoded, all info-hashes are sent in binary form. i.e. there's no encoding going on at all. When you decode it, you might want to hex encode the info-hash, since that's a more user friendly representation, and also the conventional way of printing them (on web-sites, magnet links, applications).
So, to your question. When you say "torrents which have zero bytes in infohash", I'm assuming you mean their info-hash has one more more bytes which is 0. This is perfectly normal and not uncommon. Would it normally work? Yes, you just need to make sure to escape it as %00, just like you would have to do with %01.
Your torrent is probably dead, i.e. the tracker stopped tracking it. That's what that error message means. It probably doesn't have anything to do with having a zero-byte in the info-hash.

Related

What is the encrypted text shown in address bar?

first I don't know whether it is a right place to ask this Question or not. When we open some specific site or submit some form or login to some site, then in the address bar some encrypted text are appended as a query string but I don't have any idea whether it is a session id or some thing else.
And if it is a session id then is it a good approach to disclose the session id.
Like. https://www.google.co.in/?gfe_rd=cr&ei=q1HiWI2kLO3s8Ae3raXwCQ
https://my.naukri.com/Inbox/viewRecruiterMails?id=d786bc1c09837cc9ca692d042c01186294584fccc83209d4fe409a9be01b6ec61edd7a843282321a
The string ei= in first example and id= in second one
What you are seeing is just encoded binary values (byte values). The first instance seems to be using base 64 for the encoding (probably the URL-safe variant of it) and the second one uses hexadecimal encoding of the bytes.
What the meaning is of the data (possibly after decoding) depends on the protocol defined for the site. There aren't any specific rules. ID's generally contain about 128 bits of randomness though.

How to filesearch for a substring of a base64'd string

I have a client with a website that looks as if it has been hacked. Random pages throughout the site will (seemingly at random) automatically forward to a youtube video. This happens for a while (not sure how long yet... still trying to figure that out) and then the redirect disappears. May have something to do with our site caching though. Regardless, the client isn't happy about it.
I'm searching the code base (this is a Wordpress site, but this question was generic enough that I put it here instead of in the Wordpress groups...) for "base64_decode" but not having any luck.
So, since I know the specific url that the site is getting forwarded to every time, I thought I'd search for the video id that is in the youtube url. This method could also be pertinent when the hack-inserted base64'd string is defined to a variable and then that variable is decoded (so a grep for "base64_decode" wouldn't necessarily come up with any answers that looked suspicious).
So, what I'm wondering is if there's a way to search for a substring of a string that has been base64'd and then inserted into the code. Like, take the substring I'm searching for, base64 it, and then search the code base for the resultant string. (Maybe after manipulating it slightly?)
Is there a way to do that? Is that method even valid? I don't really have any idea how the whole base64 algorithm works, or if this is possible, so I thought I'd quickly throw the question out here to see if anyone else did.
Nothing to it (for somebody with the chutzpah to call himself "Programmer Dan").
Well, maybe a little. Your have to know the encoding for the values 0 to 63.
In general, encoding to Base64 is done by taking three 8-bit characters of plain text at a time, breaking those bits into four sets of 6-bit numbers, and creating four characters of encoded text by converting the numbers (0 to 63) to arbitrary characters. Actually, the encoded characters aren't completely arbitrary, as they must be acceptable to pretty much ANY method of transmission, since that's the original reason for using Base64 encoding. The system I usually work with uses {A..Z,a..z,0..9,+,/} in that order.
If one wanted to be nasty (which one might expect in the case you're dealing with), one might change the order, or even the characters, during the process. Of course, if you have examples of the encoded Base64, you can see what the character set is (unless the encoding uses more than 64 characters). But you still have the possibility of things like changing the order as you encode or decode (simple rotation, for example). But, I digress. The question is about searching for encoded text, not deciphering deliberate obfuscation. I could tell you a lot about that, too.
Simple methodology:
Encode the plain text you're looking for. If the encoding results in one or two equal signs (padding) at the end, eliminate them and the last encoded character that precedes them. Search for the result.
Same as (1) except stick a blank on the front of your plain text. Eliminate the first two encoded characters. Search for the result.
Same as (2) except with two blanks on the front. Again, eliminate the first two encoded characters. Search for the result.
These three searches will find all files containing the encoding of the plain text you're looking for.
This is all “air code”, meaning off the top of my head, at best. Some might suggest I pulled it out of somewhere else. I can think of three possible problems with this algorithm, excluding any issues of efficiency. But, that’s what you get at this price.
Let me know if you want the working version. Or send me yours. Good luck.
Cplusman

How long is a 255-byte string in the context of HTTP requests?

I have a problem with long urls in the context of video streaming via the amazon cloudfront cdn and signedd urls on a media-box.
RFC 2068 - Hypertext Transfer Protocol -- HTTP/1.1 states:
Note: Servers should be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy implementations
may not properly support these lengths.
This seems to be exactly the limitation i am running into. Because 255 character long urls work, 256 characters don't.
However I am a little confused because I thought in ASCII a character is encoded by 7 bit.
I also know that the valid characters in an URL are entirely covered by the ascii alphabet.
I do know that it is common practice to extend to an 8-bit alphabet to support more characters or use the one bit missing to one byte for error detection/correction.
My question now is:
In the context of http requests. What exactly is meant when speaking of a length of 255 bytes. How many characters and how is this number reproduceable?
Ascii only uses the first 7 bits of each byte, but every character still takes up one byte.
255 bytes would be 255 characters here. This note is a rule of thumb. Unless you're going via a proxy, most URIs will be split in the request header (the host is generally in the Host header, not on the request line).
I think this 255-byte limit is generally obsolete with current software, although limits still exist.
Internet Explorer has a 2083-character limit on URLs. I think Firefox can handle more, but that's barely relevant if you want your site to work with IE anyway.
Signatures in URLs are tricky. There are workarounds where you'd strip the signature to the bare minimum: no certificate, no signer's name. In this case, the recipient is expected to know where the signature came from, and with which public key to verify it. This is more or less the SAML HTTP Redirect binding does (around line 594). If you can't use such tricks, you may just have to use POST.
There is no 255-character limit in HTTP (the protocol). There maybe a limit in certain software components, though.
Also note that RFC 2068 is really ancient, the latest and greatest on this can be found here: http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-19.html#rfc.section.3.1.1.p.10:
"Various ad-hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of up to 8000 octets."

How to sanitize a physical address (in Node... to prevent hacking)?

I have a Node app that collects the user's address and sends it to the Yahoo PlaceFinder api (which returns the user's geolocation).
The user should fill out the "address" field and click submit.
Many different address formats are acceptable, such as:
90210 (just zip)
123 fake street, beverly hills, CA 90210
123 fake street, 90210
... etc
I'm not concerned if the user enters a valid address or not. I don't even want to think about what RegEx would be needed for that.
I am concerned about security.
What steps (if any) should I take to sanitize the user's input before processing it with with my Node app - http.get()request to YahooPlace finder api?
Let me give a fairly pragmatic answer beyond the usual "sanitise your untrusted data against a white list of acceptable values" response. When you say "collects the user's address", are you actually either:
Passing it to a location such as a database where you might be worried about injection attacks
Rendering it to a page where you might be worried about an XSS attack
If you're simply taking the input and passing it off to Yahoo, then the problem is more theirs than yours as you're not exposing yourself to either of the above attack vectors. You might find that if this is the case, there's not a lot of point in going down the sanitisation path which will inevitably be very difficult against something as variable as a free-form address.
I dare say the simplest method would be to apply a regex to their input, allowing only alphanumeric characters, along with perhaps a comma and a period. I'm not sure if that would allow all valid addresses through, but if it fails you could display an error message to the effect of "only use [A-z0-9,.]"
That should, in theory, mitigate most types of exploits you would see, as they would most likely need some form of control character to break your code. Barring an overflow of some sort, I'm pretty sure commas and periods are relatively harmless given your situation.

What is the purpose of base 64 encoding and why it used in HTTP Basic Authentication?

I don't get the Base64 encryption.
If one can decrypt a Base64 string, what is it's purpose?
Why is it being used for HTTP Basic auth?
It's like telling to someone my password is reversed into OLLEH.
People seeing OLLEH will know the original password was HELLO.
Base64 is not encryption -- it's an encoding. It's a way of representing binary data using only printable (text) characters.
See this paragraph from the wikipedia page for HTTP Basic Authentication:
While encoding the user name and password with the Base64 algorithm typically makes them unreadable by the naked eye, they are as easily decoded as they are encoded. Security is not the intent of the encoding step. Rather, the intent of the encoding is to encode non-HTTP-compatible characters that may be in the user name or password into those that are HTTP-compatible.
It's normally called base64 encoding, not encryption! The nice thing about base64 encoding is it allows you to represent (binary) data using only a limited, common-subset of the available characters, far more efficiently than just writing a string of 1s and 0s as ASCII for example.
Encryption requires a key (string or algorithm) in order to decrypt; hence the "crypt" (root:cryptography)
Encoding modifies/shifts/changes a character code into another. In this case, usual bytes of data can now be easily represented and transported using HTTP.
Base-64 encoding is part of the MIME specifications. It provides a transport-safe encoding for data that won't get chewed on if/when it gets relayed through a host that uses a different encoding scheme than that used by the original client.
There are lots of different hosts out on the intertubes and you can't really assume support for anything other than 7-bit ASCII, without risking data loss/confusion.
IBM mainframes, for instance, use an encoding called EBCDIC (which comes in lots of different flavors). It's codepoints are completely different from the code points used by ASCII-based 'puters -- in ASCII, the letters A-Z are 0x41 - 0x5A; in EBCDIC the letters A - Z aren't even a contiguous range: the letters A-I live at 0xC1 - 0xC9, the letters J-R live at 0xD1 - 0xD9 and the letters S-Z live at 0xE2 - 0xE9.
You might mean "Base 64 Encoding". Encryption is not the same as encoding.
Wikipedia: Encryption
In everyday language, a “code” is something secret. In science and engineering, a code is simply an agreement, a set of rules, of how to write something.
That code may be secret. In that case, it’s called an encryption. But in general, a code is not secret. Take the genetic code. It simply states that our DNA is built from four different bases – A, C, G and T and that three bases taken together form one amino acid. There’s also a table of which three letters form which amino acid.
There’s nothing secret about this code.
Likewise, Base64 is not a secret code. Rather, it’s a code that allows storing data in six bits per character (thus there are 64 different entities, i.e. the “base” of the system is 64, just as the base of our decimal system is 10, since there are 10 different entities called “digits”).
By default, message header field parameters in Hypertext Transfer Protocol (HTTP) messages cannot carry characters outside the ISO- 8859-1 character set.
If user name and password contains incompatible charset than HTTP would not be able to carry those text. to prevent from this we encode user name and password with base64 to make sure we are sending HTTP compatible char over HTTP. for more information see this Basic_access_authentication

Resources