General features or behaviours common to all or most application protocols defined in RFCs

General features or behaviours common to all or most application protocols defined in RFCs - protocols

Is there an RFC that recommends or defines some basic features an application protocol should or must contain? I'm thinking of (but not limited to):
The application protocol specification (APS) must have a method to
denote the end of a message (say by length values or end-of-message
characters).
The APS must specify the commands that initiate an action and all and
only the acceptable responses.
The APS must specify the format of each acceptable command and its
related data, if any.
The APS must define how to handle errors such as data that does not match any know command in the APS
(I'm sure there are others)
or, can a RFC-proposed APS contain whatever information it likes? Is it the case that the RFC approval process has these basic APS behaviours/features/properties "internalised" since any reliable APS will need these - and there is no need anywhere to explicitly define these properties/behaviours of an APS?

Related

What is the purpose of non-printable control characters in this email validation regular expression?

Background Information
We use SonarQube to obtain quality metrics regarding the codebase. SonarQube has flagged over a dozen bugs in our Node.js codebase, under rule S6324, related to an email validation regular expression advocated by a top ranking website on Google called emailregex.com. The website claims the regex is an RFC 5322 Official Standard. However, the control characters in the regex are flagged by SonarQube for removal because they're non-printable characters. Here is the regex:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
And here is the full list of control characters SonarQube complains about:
‘.\x0e…\x0e…\x0c…\x0c…\x0b…\x0c…\x1f…\x01…\x1f…\x01…\x01…\x09…\x08…\x0b…\x0b…\x0e…\x0b…\x08…\x0c…\x0e…\x09…\x01.’
Regular-Expressions.info's Email page does address a variation of the above regular expression as follows:
The reason you shouldn’t use this regex is that it is overly broad. Your application may not be able to handle all email addresses this regex allows. Domain-specific routing addresses can contain non-printable ASCII control characters, which can cause trouble if your application needs to display addresses...
However, I can't seem to find any information that explains why some sites are adding these non-printable control characters or what they mean by "domain-specific routing addresses". I have looked at some Stack Overflow regex questions and the Stack Overflow Regex Wiki. Control characters don't seem to be addressed.
The Question
Can someone please explain the purpose of these control-characters in the regular expression and possibly supply some examples of when this regular expression is useful?
(Note: Please avoid debates/discussion about what the best/worst regular expression is for validating emails. There doesn't seem to be agreement on that issue, which has been discussed and debated in many places on Stack Overflow and the broader Internet. This question is focused on understanding the purpose of control characters in the regular expression).
Update
I also reached out to the SonarQube community, and no one seems to have any answers.
Update
Still looking for authoritative answers which explain why the email regular expression above is specifically checking for non-printable control characters in email addresses.
There is this in the RFC5322 Section 5, but it's about the message body, not the address:
Security Considerations
Care needs to be taken when displaying messages on a terminal or
terminal emulator. Powerful terminals may act on escape sequences
and other combinations of US-ASCII control characters with a variety
of consequences. They can remap the keyboard or permit other
modifications to the terminal that could lead to denial of service or
even damaged data. They can trigger (sometimes programmable)

The Purpose
Can someone please explain the purpose of these control-characters in the regular expression [...]?
The purpose of those non-printable control characters would be to create a regex that conforms closesly to the RFCs defining email address format.
Just in case anyone is wondering- yes- the control characters in this email regex really do conform to the RFC specs. I think validating this is outside the scope of this question so I won't quote the spec in detail, but here are links to the relevant sections: 3.2.3 (atoms), 3.2.4 (quoted strings), 3.4 (address specification), 3.4.1 (addr-spec specification), 4.1 (Misc Obsolete Tokens). In summary, the local part and domain part of the address are allowed to contain quoted strings, which are allowed to contain certain non-printable control characters.
Quoting from SonarQube rule S6324 (emphasis added):
Entries in the ASCII table below code 32 are known as control characters or non-printing characters. As they are not common in JavaScript strings, using these invisible characters in regular expressions is most likely a mistake.
Following a spec is not a mistake. When a lint rule that is usually helpful hits a case in peoples' code where it is not helpful, people usually just use the lint tool's case-by-case ignore mechanism. I think this addresses the second clause of your bounty, which states:
What is a better alternative that will avoid breaking our site while also passing SonarQube's quality gate?
Ie. Use one of the provided mechanisms to make SonarQube ignore those rule violations. You could also choose to opt out of checking that rule entirely, but that's probably overkill.
For SonarQube, use NOSONAR comments to disable warnings on a case-by-case basis.
Examples of Usefulness
This comes down to context.
If your end goal is purely to validate whether any given email address is a valid email address as defined by the RFCs, then a regex that closely follows the RFC specs is very useful.
That's not everyone's end goal. Quoting from wikipedia:
Despite the wide range of special characters which are technically valid, organisations, mail services, mail servers and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (.), underscore (_) and hyphen (-). Common advice is to avoid using some special characters to avoid the risk of rejected emails.
There's nothing there that explains why most applications do not fully adhere to the spec, but you could speculate, or you could go try and ask their maintainers. For example, considerations such as simplicity could- in someone's context- be declared or seen as more important than full RFC complicance.
If your goal was to check if a given email address is a valid hotmail email address and to reject email addresses that are allowed by the RFCs but not by the subset that hotmail uses, then full RFC compliance would not be necessary (useful).

Is a DNS query with the authoritative bit set (or other bits used for responses) considered valid?

From RFC 1035:
Authoritative Answer - this bit is valid in responses,
and specifies that the responding name server is an
authority for the domain name in question section.
So, what happens if this bit is set in a DNS query (QD=0)? Do most DNS implementations treat the packet as invalid, or would the bit simply be ignored?
The same question applies to other bits that are specific to either queries or responses, such as setting the RD bit in a response.
My guess is that these bits are simply ignored if they aren't applicable to the packet in question, but I don't know for sure or how I would find out.
I'm asking because I'm writing my own DNS packet handler and want to know whether such packets should still be parsed or treated as invalid.

You either apply the Postel's law ("Be conservative in what you do, be liberal in what you accept from others") - which is often touted as one reason/condition of the success of interoperability of so many different things on top of the Internet - or if you strictly apply the RFC you deem it as invalid and you can reply immediately with FORMERR for example.
In the second case, as you will get deviating clients (not necessarily for your specific case, in the DNS world they are a lot of non conforming implementations on various points), you will need to define if you create specific rules (like ACLs) to accept some of them nevertheless because you deem them to be "important".
Note that at this stage your question is not really programming related (no code) so kind of offtopic here. But the answer also depends what kind of "packet handler" you are building. If it is for some kind of IDS/monitoring/etc. you need to parse "as much as possible" of the DNS traffic to report it. If it is to mimick a real world DNS resolver and just make sure it behaves like a resolver then you probably do not need to deal with every strange deviating case.
Also remember that all of this can be changed in transit, so if you receive some erroneous things it is not obviously always an error coming from the sender, it could be because of some intermediary, willingly or not.
To finish, it is impossible to predict everything you will get and in any wide enough experiment you will be surprised by the amount of traffic you can not undersand how it comes to exist. So instead of trying to define everything before starting you should instead iterate over versions, having a clear view of your target (parsing as much as possible for some kind of monitoring system OR being as lean/simple/secure/close to real world features for DNS resolution as possible).
And as for "how I would find out." you can study the source of various existing resolvers (bind, nsd, unbound, etc.) and see how they react. Or just launch them and throw at them some erroneous packets like you envision and see their reply. Some cases probably exist as unit/regression test and some tools like ZoneMaster could probably be extended (if not doing those specific tests already) to cover your cases.

Tracker GET request parameters in Bittorrent

When using Bittorrent, I saw there are the parameters "numwant", "corrupt" and "key" in URL.
However, I found these paremeters don't be defined in BEP 3 (http://www.bittorrent.org/beps/bep_0003.html), so could someone tell me the meaning of the parameters, and where are the 3 parameters defined?
Also, before asking the questsion, I had searched the keyword "numwant" in the site www.bittorrent.org, and just found "numwant" appears in BEP 8, but the definition or explanation of the keyword can't be found.

While BEP3 is official, it's a terse and dense document. I would instead recommend you to use the inofficial: https://wiki.theory.org/index.php/BitTorrentSpecification
It's a lot easier to read and understand. It also document some early extensions to the protocol that you can't find elsewhere.
There you will find:
numwant: Optional. Number of peers that the client would like to receive from the tracker. This value is permitted to be zero. If omitted, typically defaults to 50 peers.
key: Optional. An additional identification that is not shared with any other peers. It is intended to allow a client to prove their identity should their IP address change.
Regarding corrupt, there is afaik no written documentation how it is defined, but it's rather simple; When a piece fails the hash check, that amount of data is accounted on the corrupt counter instead of the downloaded counter.
There is also a similar redundant counter, where data that is discharged because it's redundant is acconuted. This happens, for example, in end game mode, when the same chunk is requested from more than one peer.
Also, there is some additional info in my answer here: Understanding Bittorrent Tracker Request

Relationship between Parameter set context and model mode?

Origen has modes for the top level DUT and IP. However, the mode API doesn't allow the flexibility to define attributes at will. There are pre-defined attributes, some of which (e.g. typ_voltage) look specific to a particular company or device.
In contrast, the Parameters module does allow flexible parameter/attribute definitions to be created within a 'context'. What is really the conceptual difference between a chip 'mode' and a parameter 'context'? They both require the user to set them.
add_mode :mymode do |m|
m.typ_voltage = 1.0.V
# I believe I am limited to what I can define here
end
define_params :mycontext do |params|
params.i.can.put.whatever.i.want = 'bdedkje'
end
They both contain methods with_modes and with_params that look similar in function. Why not make the mode attributes work exactly like the more flexible params API?
thx

Being able to arbitrarily add named attributes to a mode seems like a good idea to me, but you are right that it is not supported today.
No particular reason for that other than nobody has seen a need for it until now, but there would be no problems accepting a PR to add it.
Ideally, when implementing that, it would be good to try and do it via a module which can then be included into other classes to provide the same functionality e.g. to give pins, bits, etc. the same ability.

Varibale Number Of Characteristics in Custom GATT Service

I am defining a custom GATT profile and have some questions which I could not find definite answers of on Bluetooth specifications.
Can there be multiple characteristics of same type (UUID) defined in a single service?
Can there be variable number of characteristics of same type (UUID) in a service?
For example, depending upon system operation, a peripheral can accumulate variable number of copies of some data.
Can these copies be sent as characteristics to the central when asked for?
Suppose we have a table of data and we want to give access to it in two forms – row wise and column wise.
Can such a requirement be handled in terms of characteristics?
I imagine it like if you request for reading the characteristic with UUID A, it will be read in rows and UUID B will be in columns; is it possible and the right way to do so?

I've just found this unanswered question. Not sure if it's still needed, but here's my answer:
Yes. Page 2224 (Vol.3, PartG: Generic Attribute Profile: 3.3.1.Characteristic Declaration) of Core_v4.2.pdf says: "A service may have multiple characteristic definitions with the same Characteristic UUID".
Yes, it's possible. But in this case you must implement ServiceChanged characteristic. See Vol.3, PartG: Generic Attribute Profile: 2.5.2.Attribute Caching and 7.1.Service Changed.
Yes. It is up to your implementation to define what data is hidden behind custom characteristics.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string