Tracker GET request parameters in Bittorrent - protocols

When using Bittorrent, I saw there are the parameters "numwant", "corrupt" and "key" in URL.
However, I found these paremeters don't be defined in BEP 3 (http://www.bittorrent.org/beps/bep_0003.html), so could someone tell me the meaning of the parameters, and where are the 3 parameters defined?
Also, before asking the questsion, I had searched the keyword "numwant" in the site www.bittorrent.org, and just found "numwant" appears in BEP 8, but the definition or explanation of the keyword can't be found.

While BEP3 is official, it's a terse and dense document. I would instead recommend you to use the inofficial: https://wiki.theory.org/index.php/BitTorrentSpecification
It's a lot easier to read and understand. It also document some early extensions to the protocol that you can't find elsewhere.
There you will find:
numwant: Optional. Number of peers that the client would like to receive from the tracker. This value is permitted to be zero. If omitted, typically defaults to 50 peers.
key: Optional. An additional identification that is not shared with any other peers. It is intended to allow a client to prove their identity should their IP address change.
Regarding corrupt, there is afaik no written documentation how it is defined, but it's rather simple; When a piece fails the hash check, that amount of data is accounted on the corrupt counter instead of the downloaded counter.
There is also a similar redundant counter, where data that is discharged because it's redundant is acconuted. This happens, for example, in end game mode, when the same chunk is requested from more than one peer.
Also, there is some additional info in my answer here: Understanding Bittorrent Tracker Request

Related

Is a DNS query with the authoritative bit set (or other bits used for responses) considered valid?

From RFC 1035:
Authoritative Answer - this bit is valid in responses,
and specifies that the responding name server is an
authority for the domain name in question section.
So, what happens if this bit is set in a DNS query (QD=0)? Do most DNS implementations treat the packet as invalid, or would the bit simply be ignored?
The same question applies to other bits that are specific to either queries or responses, such as setting the RD bit in a response.
My guess is that these bits are simply ignored if they aren't applicable to the packet in question, but I don't know for sure or how I would find out.
I'm asking because I'm writing my own DNS packet handler and want to know whether such packets should still be parsed or treated as invalid.
You either apply the Postel's law ("Be conservative in what you do, be liberal in what you accept from others") - which is often touted as one reason/condition of the success of interoperability of so many different things on top of the Internet - or if you strictly apply the RFC you deem it as invalid and you can reply immediately with FORMERR for example.
In the second case, as you will get deviating clients (not necessarily for your specific case, in the DNS world they are a lot of non conforming implementations on various points), you will need to define if you create specific rules (like ACLs) to accept some of them nevertheless because you deem them to be "important".
Note that at this stage your question is not really programming related (no code) so kind of offtopic here. But the answer also depends what kind of "packet handler" you are building. If it is for some kind of IDS/monitoring/etc. you need to parse "as much as possible" of the DNS traffic to report it. If it is to mimick a real world DNS resolver and just make sure it behaves like a resolver then you probably do not need to deal with every strange deviating case.
Also remember that all of this can be changed in transit, so if you receive some erroneous things it is not obviously always an error coming from the sender, it could be because of some intermediary, willingly or not.
To finish, it is impossible to predict everything you will get and in any wide enough experiment you will be surprised by the amount of traffic you can not undersand how it comes to exist. So instead of trying to define everything before starting you should instead iterate over versions, having a clear view of your target (parsing as much as possible for some kind of monitoring system OR being as lean/simple/secure/close to real world features for DNS resolution as possible).
And as for "how I would find out." you can study the source of various existing resolvers (bind, nsd, unbound, etc.) and see how they react. Or just launch them and throw at them some erroneous packets like you envision and see their reply. Some cases probably exist as unit/regression test and some tools like ZoneMaster could probably be extended (if not doing those specific tests already) to cover your cases.

Command Query Separation: commands must return void?

If CQS prevents commands from returning status variables, how does one code for commands that may not succeed? Let's say you can't rely on exceptions.
It seems like anything that is request/response is a violation of CQS.
So it would seem like you would have a set of "mother may I" methods giving the statuses that would have been returned by the command. What happens in a multithreaded / multi computer application?
If I have three clients looking to request that a server's object increase by one (and the object has limits 0-100). All check to see if they can but one gets it - and the other two can't because it just hit a limit. It would seem like a returned status would solve the problem here.
It seems like anything that is request/response is a violation of CQS.
Pretty much yes, hence Command-Query-Separation. As Martin Fowler nicely puts it:
The fundamental idea is that we should divide an object's methods into two sharply separated categories:
Queries: Return a result and do not change the observable state of the system (are free of side effects).
Commands: Change the state of a system but do not return a value [my emphasis].
Requesting that a server's object increase by one is a Command, so it should not return a value - processing a response to that request means that you are doing a Command and Query action at the same time which breaks the fundamental tenet of CQS.
So if you want to know what the server's value is, you issue a separate Query.
If you really need a request-response pattern, you either need to have something like a convoluted callback event process to issue queries for the status of a specific request, or pure CQS isn't appropriate for this part of your system - note the word pure.
Multithreading is a main drawback of CQS and can make it can hard to do. Wikipedia has a basic example and discussion of this and also links to the same Martin Fowler article where he suggests it is OK to break the pattern to get something done without driving yourself crazy:
[Bertrand] Meyer [the inventor of CQS] likes to use command-query separation absolutely, but there are
exceptions. Popping a stack is a good example of a query that modifies
state. Meyer correctly says that you can avoid having this method, but
it is a useful idiom. So I prefer to follow this principle when I can,
but I'm prepared to break it to get my pop.
TL;DR - I would probably just look at returning a response, even tho it isn't correct CQS.
Article "Race Conditions Don’t Exist" may help you to look at the problem with CQS/CQRS mindset.
You may want to step back and ask why counter value is absolutely necessary to know before sending a command? Apparently, you want to make decision on the client side whether you can increase counter more or not.
The approach is to let the server make such decision. Let all the clients send commands (some will succeed and some will fail). Eventually clients will get consistent view of the server object state (where limit has reached) and may finally stop sending such commands.
This time window of inconsistency leads to wrong decisions by the clients, but it never breaks consistency of the object (or domain model) on the server side as long as commands are handled adequately.

Viewing Linux TCP TCB

I need to find out information that should be kept in the TCP transmission control block (TCB), specifically I need to find out what sequence numbers are used for any particular session.
I have posted to other forums, looked through the procfs, searched Google, sent myself links from lmgtfy (dot) com :) No luck.
If there is no tool or hints in the procfs, would it be possible to somehow find out where this sort of information exists in memory and gather it from there such as using to dd to copy /dev/mem?
Thanks for any help on this in advance!!!!!
Well, I guess you first need to know what sequence number is and why it's been used, then you can look at particular implementation of sequence number generation.
Sequence numbers are 32 bit field and it's been used to mark every packet uniquely as if they can be acknowledged. And, being acknowledged
is important and it's an important feature of tcp for maintaining connection reliability. A full details could be found at the TCP rfc (http://www.ietf.org/rfc/rfc793.txt - section 3.3).
Now if you need to find out how Linux does it, you need to look at net/ipv4/tcp_ipv4.c::tcp_v4_init_sequence() this is used for generating ISN (Initial Sequence Number) whenever a new connection to be established and how latter sequence numbers are generated that's explained at rfc. So, look at the implementation of tcp_v4_init_sequence() and rfc, that will help you to understand uses and implemetation of sequence number usefully. Hope this will help!

are USENET or NNTP articles loaded in chronological or reverse chronological order?

sample newsrc:
thufir#dur:~$ cat .newsrc
gwene.com.androidcentral: 1-99999999
gwene.com.blogspot.emacsworld: 1-99999999
gwene.com.blogspot.googlecode: 1-99999999
gwene.com.blogspot.googlereader: 1-99999999
gwene.com.economist: 1-99999999
gwene.com.googlereader: 1-99999999
thufir#dur:~$
Now, I have leafnode configured to save messages for 999 days. When GNU NNTP connects to leafnode on localhost there can be a big of lag while things load. Currently, what I've done is to configure Leafnode to just store 35 days for particular groups, so that things load more quickly.
However, when looking over the .newsrc again, it seems that I could probably let leafnode store a very large number of articles and then configure the .newsrc which GNU NNTP uses so that only a small portion of those messages are fetched. If so, this would allow for long retention in leafnode, 999 days, which alleviating the delay in connecting to leafnode and loading the articles from GNU NNTP.
However, will that work? Will only the newer articles get loaded, or will it load the old articles?
Unfortunately, RFC 977 doesn't actually mention .newsrc files. Of course, whether GNU NNTP follows the RFC would be a separate question, but, at least according to the RFC, presumably newer questions are loaded and older questions left?
It looks like RFC 3977 clarifies RFC 977 a bit, so I'm reviewing that now.
You are actually asking about the behaviour of a particular NNTP client, not of the NNTP protocol per se. According to the spec, the only NNTP verb for retrieving articles is ARTICLE, and it fetches exactly one article. There is no constraint on the order in which a client makes ARTICLE "calls".
So to answer your question you'd need to look at the GNU NNTP library's documentation ... or the source code. And I suspect it also depends on how your code uses the library methods.
However, will that work? Will only the newer articles get loaded, or will it load the old articles?
If I was in your shoes, I'd try it out and see whether it works.
From the 'net:
Each line sets properties for the newsgroup named in the first field.
The name is immediately followed by a character that indicates whether
the owning user is currently subscribed to the group or not; a colon
indicates subscription, and an exclamation mark indicates
nonsubscription. The remainder of the line is a sequence of
comma-separated article numbers or ranges of article numbers,
indicating which articles the user has seen.
For my purposes, loading groups from a .newsrc file with GNU NNTP, it looks like the numbers are ignored -- although I'm not positive.

How could one design a secure and "self-destructing" email?

As most of you know, email is very insecure. Even with a SSL-secured connection between the client and the server that sends an email, the message itself will be in plaintext while it hops around nodes across the Internet, leaving it vulnerable to eavesdropping.
Another consideration is the sender might not want the message to be readable - even by the intended recipient - after some time or after it's been read once. There are a number of reasons for this; for example, the message might contain sensitive information that can be requested through a subpoena.
A solution (the most common one, I believe) is to send the message to a trusted third party, and a link to the that message to the recipient, who then reads this message from the 3rd party. Or the sender can send an encrypted message (using symmetric encryption) to the recipient and send the key to the 3rd party.
Either way, there is a fundamental problem with this approach: if this 3rd party is compromised, all your efforts will be rendered useless. For a real example of an incident like this, refer to debacles involving Crypto AG colluding with the NSA
Another solution I've seen was Vanish, which encrypts the message, splits the key into pieces and "stores" the pieces in a DHT (namely the Vuze DHT). These values can be easily and somewhat reliably accessed by simply looking the hashes up (the hashes are sent with the message). After 8 hours, these values are lost, and even the intended recipient won't be able to read the message. With millions of nodes, there is no single point of failure. But this was also broken by mounting a Sybil attack on the DHT (refer to the Vanish webpage for more information).
So does anyone have ideas on how to accomplish this?
EDIT: I guess I didn't make myself clear. The main concern is not the recipient intentionally keeping the message (I know this one is impossible to control), but the message being available somewhere.
For example, in the Enron debacle, the courts subpoenaed them for all the email on their servers. Had the messages been encrypted and the keys lost forever, it would do them no good to have encrypted messages and no keys.
(Disclaimer: I didn't read details on Vanish or the Sybil attack, which may be similar the what comes below)
First of all: Email messages are generally quite small, esp. compared to a 50 mb youtube vid you can download 10 times a day or more. On this I base the assumption that storage and bandwidth are not a real concern here.
Encryption, in the common sense of the word, introduces parts into your system that are hard to understand, and therefore hard to verify. (think of the typical openssl magic everybody just performs, but 99% of people really understand; if some step X on a HOWTO would say "now go to site X and upload *.cer *.pem and *.csr" to verify steps 1 to X-1, I guess 1 in 10 people will just do it)
Combining the two observations, my suggestion for a safe(*) and understandable system:
Say you have a message M of 10 kb. Take N times 10 kb from /dev/(u)random, possibly from hardware based random sources, call it K(0) to K(N-1). Use a simple xor operation to calculate
K(N) = M^K(0)^K(1)^...^K(N-1)
now, by definition
M = K(0)^K(1)^...^K(N)
i.e. to understand the message you need all K's. Store the K's with N different (more or less trusted) parties, using whatever protocol you fancy, under random 256 bit names.
To send a message, send the N links to the K's.
To destroy a message, make sure at least one K is deleted.
(*) as regards to safety, the system will be as safe as the safest party hosting a K.
Don't take a fixed N, don't have a fixed number of K's on a single node per message (i.e. put 0-10 K's of one message on the same node) to make a brute force attack hard, even for those who have access to all nodes storing keys.
NB: this of course would require some additional software, as would any solution, but the complexity of the plugins/tools required is minimal.
The self-destructing part is really hard, because the user can take a screenshot and store the screenshot unencrypted on his disk, etc. So I think you have no chance to enforce that (there will always be a way, even if you link to an external page). But you can however simply ask the recipient to delete it afterwards.
The encryption is on the other hand is not a problem at all. I wouldn't rely on TLS because even when the sender and the client are using it, there might other mail relies who don't and they might store the message as plain text. So, the best way would be to simple encrypt it explicitly.
For example I am using GnuPG for (nearly) all mails I write, which is based on some asymmetric encryption methods. Here I know that only those I have given explicitly permission can read the mail, and since there are plug-ins available for nearly all popular MUAs, I'ts also quite easy for the recipient to read the mail. (So, nobody has to encrypt the mail manually and might forgot to delete the unencrypted message from the disk...). And it's also possible to revoke the keys, if someone has stolen your private key for example (which is normally encrypted anyway).
In my opinion, GnuPG (or alternatively S/MIME) should be used all the time, because that would also help to make spamming more difficult. But thats probably just one of my silly dreams ;)
There are so many different ways of going about it which all have good and bad points, you just need to choose the right one for your scenario. I think the best way of going about it is the same as your 'most common' solution. The trusted third party should really be you - you create a website of your own, with your own authentication being used. Then you don't have to give your hypothetical keys to anyone.
You could use a two way certification method by creating your own client software which can read the emails, with the user having their own certificate. Better be safe than sorry!
If the recipient knows that the message might become unreadable later and they find the message valuable their intention will be to preserve it, so they will try to subvert the protection.
Once someone has seen the message unencrypted - which means in any perceivable form - either as text or as screen image - they can store it somehow and do whatever they want. All the measures with keys and so one only make dealing with the message inconvenient, but don't prevent extracting the text.
One of the ways could be to use self-destructing hardware as in Mission Impossible - the hardware would display the message and then destroy it, but as you can see it is inconvenient as well - the recipient would need to understand the message from viewing it only once which is not always possible.
So given the fact that the recipient might be interested in subverting the protection and the protection can be subverted the whole idea will likely not work as intended but surely will make dealing with messages less convenient.
If HTML format is used, you can have the message reference assets that you can remove at a later date. If the message is open at a later date, the user should see broken links..
If your environment allows for it, you could use the trusted boot environment to ensure that a trusted boot loader has been used to boot a trusted kernel, which could verify that a trusted email client is being used to receive the email before sending it. See remote attestation.
It would be the responsibility of the email client to responsibly delete the email in a timely fashion -- perhaps relying on in-memory store only and requesting memory that cannot be swapped to disk.
Of course, bugs can happen in programs, but this mechanism could ensure there is no intentional pathway towards storing the email.
The problem, as you describe it, does sound very close to the problem addressed by Vanish, and discussed at length in their paper. As you note, their first implementation was found to have a weakness, but it appears to be an implementation weakness rather than a fundamental one, and is therefore probably fixable.
Vanish is also sufficiently well-known that it's an obvious target for attack, which means that weaknesses in it have a good chance of being found, publicised, and fixed.
Your best option, therefore, is probably to wait for Vanish version 2. With security software, rolling your own is almost never a good idea, and getting something from an established academic security group is a lot safer.
IMO, the most practical solution for the situation is using Pidgin IM client with Off-the-Record (no-logging) and pidgin-encrypt (end-to-end assymetric-encryption) together. The message will be destroyed as soon as the chat window is closed, and in emergency, you can just unplug the computer to close the chat window.

Resources