How can we Securely Handle liveness checking messages in IKEv2 with notify payload INVALID_IKE_SPI - security

This is a question hitting my mind but can not come up with solution.
Suppose there is a IKE tunnel between two peers (peer_1,peer_2). Now there is an attacker who wants to break this tunnel. What the attacker is doing is that for every keep alive Informational Request from peer_1 to peer_2, he/she(attacker) replies back with INVALID_IKE_SPI notify payload and obviously this message would be in plain text. This results peer_1 believing the IKE_SA got some problem and after implementation specific retry the peer_1 closes the tunnel(Although rfc 7296 specifies that peer receiving such reply should not change its state but there should be an end of retrying keep alive to get rid of network flood). As a result the attacker wins.
Is there anything IKEv2 Protocol itself says to prevent this type of situation?
If anyone knows about this please reply me back or some solution will be also helpful.

Citing RFC 7296, section 2.4, paragraph 3:
Since IKE is designed to operate in spite of DoS attacks from the
network, an endpoint MUST NOT conclude that the other endpoint has
failed based on any routing information (e.g., ICMP messages) or IKE
messages that arrive without cryptographic protection (e.g., Notify
messages complaining about unknown SPIs). An endpoint MUST conclude
that the other endpoint has failed only when repeated attempts to
contact it have gone unanswered for a timeout period or when a
cryptographically protected INITIAL_CONTACT notification is received
on a different IKE SA to the same authenticated identity. An
endpoint should suspect that the other endpoint has failed based on
routing information and initiate a request to see whether the other
endpoint is alive. To check whether the other side is alive, IKE
specifies an empty INFORMATIONAL request that (like all IKE requests)
requires an acknowledgement (note that within the context of an IKE
SA, an "empty" message consists of an IKE header followed by an
Encrypted payload that contains no payloads). If a cryptographically
protected (fresh, i.e., not retransmitted) message has been received
from the other side recently, unprotected Notify messages MAY be
ignored. Implementations MUST limit the rate at which they take
actions based on unprotected messages.
I think that (for the sake of clarity) the relevant types of an attacker should be considered:
1/ An attacker able to drop arbitrary packets (i.e. an active MitM)
this one is able to perform DOS just by dropping packets and AFAIK there is nothing that can prevent him doing so. He does not need any sophistication to break the communication.
2/ An attacker unable to drop packets
this one can not prevent peer_2's legitimate responses (to peer_1's INFORMATIONAL requests) reaching peer_1.
thus peer_1 receives the response (before all retries timeout) and knows that peer_2 is alive.
3/ An attacker able to drop some packets
then it is a race and the outcome depends on the configuration of the peers and the percentage of packets the attacker is able to drop.
EDIT>
I would understand the questioned "case 2 attacker" scenario this way:
by receiving the attacker's unprotected INVALID_IKE_SPI notify (spoofed by the attacker from peer_2's address) peer_1 can (at most) only suspect that peer_2 has failed (as it MUST not conclude that the other endpoint has failed based on IKE massages without cryptographic protection)
it may decide (see note below) to issue a liveness check by sending an empty INFORMATIONAL request to peer_2 (which is cryptographically protected)
the "case 2 atacker" is unable to tamper with this request, so it should reach peer_2 (it might involve some implementation specific retransmits, as specified)
peer_2 (as it is alive) responds with an acknowledgement (which is cryptographically protected)
the "case 2 atacker" is unable to tamper with this response, so it should reach peer_1
upon receiving this response (which is a fresh, cryptographically protected message from peer_2), peer_1 knows that peer_2 is alive and keeps the SAs (as nothing has happened)
Note: The "Implementations MUST limit the rate at which they take actions based on unprotected messages" part means, that peer_1 should not perform this liveness check on every unprotected Notify message received and some implementation specific rate limiting mechanism must be in place (probably to prevent traffic amplification).
Desclaimer: I am no crypto expert, so please do validate my thoughts.

Related

Clarification on "Credit-Control-Failure-Handling " AVP

I need clarification on "Credit-Control-Failure-Handling" AVP, and I'd be appreciated if someone can explain about its enumerated values.
So, as I understand if CCFH is in TERMINATE mode (0), and the client doesn't receive CCA within Tx timer period, then the request is regarded as failed and basically session will be terminated.
However, if CCFH is in CONTINUE mode (1), quoted from RFC4006:
When the Credit-Control-Failure-Handling AVP is set to CONTINUE,
the credit-control client SHOULD re-send the request to an
alternative server in the case of transport or temporary failures,
provided that a failover procedure is supported in the credit-
control server and the credit-control client, and that an
alternative server is available. Otherwise, the service SHOULD be
granted, even if credit-control messages can't be delivered.
So, my understanding is unlike TERMINATE mode, if the CCA does not answer within Tx timer, the client would provide the service to the end-user.
My question is what if the server actually sends a CCA error message like (DIAMETER_TOO_BUSY or other error messages) within Tx timer to the client. Is the client still provide the service to the end-user or not?
TERMINATE is exactly to terminate.
Later on same doc (or actually - its newer version 8506) says:
When the Tx timer expires, the Diameter Credit-Control client always
terminates the service if the CCFH is set to the value TERMINATE.
The credit-control session may be moved to an alternative server only
if a protocol error DIAMETER_TOO_BUSY or DIAMETER_UNABLE_TO_DELIVER
is received before the Tx timer expires. Therefore, the value
TERMINATE is not appropriate if proper failover behavior is desired.

Verify routing via PDUR

In order to verify if a message is received in the COM layer, we can add a Ipdu callout for the Pdu/Signal and wait for the breakpoint to be hit while debugging.
This is not the case for Pdu routing.
If a message is routed via the PduR , it never goes to the Com Layer.
Hence there is no possibility to verify if the message is received by the device(i.e PduR has no callout functionality).
Is there a way where we can verify if the message is received by PduR, and is successfully copied to a Tx Pdu to be sent out(i.e Verify successful gatewaying)?
Keep in mind, that PduR can sometimes have multiple destinations, we have such ECUs, that are routing messages e.g. locally to Com and at the same time, route them to transmit on a different network.
The PduR is triggered by RxIndications and TxConfirmations (and their Tp-interface counterparts).
So, for a normal routing relationship, you should hook on RxIndication for a RxPdu, and could e.g. wait for a TxConfirmation of the TxPdu, which tells, that the TxPdu was transmitted.
Keep in mind, that:
a RxPdu could be queued, which means, they will maybe not directly be transmitted. This might be handy in case of streaming Pdus like XCP, in order to keep the ordering of the PDUs if they can currently not be transmitted.
Routing Paths might be enabled/disabled at runtime, e.g. system conditions handled by BswM Rules and ActionLists calling PduR_[Enable|Disable]Routing(<routingpathgroupId>)

MQTT what is the purpose or usage of Last Will Testament?

I'm surely missing something about how the whole MQTT protocol works, as I can't grasp the usage pattern of Last Will Testament messages: what's their purpose?
One example I often see is about informing that a device has gone offline. It doesn't make very much sense to me, since it's obvious that if a device isn't publishing any data it may be offline or there could be some network problems.
So, what are some practical usages of the LWT? What was it invented for?
LWT messages are not really concerned about detecting whether a client has gone offline or not (that task is handled by keepAlive messages).
LWT messages are about what happens after the client has gone offline.
The analogy is that of a real last will:
If a person dies, she can formulate a testament, in which she declares what actions should be taken after she has passed away. An executor will heed those wishes and execute them on her behalf.
The analogy in the MQTT world is that a client can formulate a testament, in which it declares what message should be sent on it's behalf by the broker, after it has gone offline.
A fictitious example:
I have a sensor, which sends crucial data, but very infrequently.
It has formulated a last will statement in the form of [topic: '/node/gone-offline', message: ':id'], with :id being a unique id for the sensor. I also have a emergency-subscriber for the topic 'node/gone-offline', which will send a SMS to my phone every time a message is published on that channel.
During normal operation, the sensor will keep the connection to the MQTT-broker open by sending periodic keepAlive messages interspersed with the actual sensor readings. If the sensor goes offline, the connection to the broker will time out, due to the lack of keepAlives.
This is where LWT comes in: If no LWT is specified, the broker doesn't care and just closes the connection. In our case however, the broker will execute the sensor's last will and publish the LWT-message '/node/gone-offline: :id'. The message will then be consumed to my emergency-subscriber and I will be notified of the sensor's ID via SMS so that I can check up on what's going on.
In short:
Instead of just closing the connection after a client has gone offline, LWT messages can be leveraged to define a message to be published by the broker on behalf of the client, since the client is offline and cannot publish anymore.
Just because a device is not publishing does not mean it is not online or there is a network problem.
Take for example a sensor that monitors a value that only changes very infrequently, good design says that the sensor should only publish the changes to help reduce bandwidth usage as periodically publishing the same value is wasteful. If the value is published as a retained value then any new subscriber will always get the current value without having to wait for the sensor value to change and it publish again.
In this case the LWT is used to published when the sensor fails (or there is a network problem) so we know of the problem as soon at the client keep alive times out.
A in-depth article about Last-Will-and-Testament messages is available in the MQTT Essentials Blog Post series: http://www.hivemq.com/mqtt-essentials-part-9-last-will-and-testament/.
To summarize the blog post:
The Last Will and Testament feature is used in MQTT to notify other clients about an ungracefully disconnected client.
MQTT is often used in scenarios were unreliable networks are very common. Therefore it is assumed that some clients will disconnect ungracefully from time to time, because they lost the connection, the battery is empty or any other imaginable case. It would be good to know if a connected client has disconnected gracefully (which means with a MQTT DISCONNECT message) or not, in order to take appropriate action.

Using SSL with Netty at the beginning of a connection, then disabling it

I'm writing a server application and its client counterpart that both use Netty for the network layer. I find myself facing typical safety concerns about sending a password from a client to the server so I decided SSL was the safest way of doing this.
I know of the securechat example and will use this to modify my pipelines accordingly. However, I would also like to disable SSL after password transmission and acknowledge to save a few precious CPU cycles on server side, which may be busy with many other clients. The ChannelPipeline documentation states that:
"Once attached, the coupling between the channel and the pipeline is permanent; the channel cannot attach another pipeline to it nor detach the current pipeline from it."
The idea is then to not change the pipeline on-the-fly, which is prohibited, but to somehow tell the SslHandler in the pipeline that it should stop encrypting messages at some point. I was thinking of creating a class inheriting from SslHandler, overriding its handleDownstream function to call context.sendDownstream(evt) after some point in the communication.
Question 1: Is this a bad idea, that is, disabling SSL at some point ?
To allow a block in the pipeline (say a Decoder) telling another block (say SslHandler) that it should change its behaviour from now on, I thought I could create, say, an AtomicBoolean in my ChannelPipelineFactory's getPipeline() and pass it to the constructor of both the Decoder and the SslHandler.
Question 2: Is this a bad idea, that is, sharing state between pipeline blocks ? I'm worried I might screw up the multithreading of Netty here: are the blocks of a pipeline working on a single message, one at a time ? i.e.: does the first block wait for the completion of the last block before pulling the next message ?
EDIT:
Oh my bad, this is from the ChannelPipeline page I had been visiting many times and quoting in this very question:
"A ChannelHandler can be added or removed at any time because a ChannelPipeline is thread safe. For example, you can insert a SslHandler when sensitive information is about to be exchanged, and remove it after the exchange."
So this answers question 2 about modifying the pipeline's content on-the-fly, and not the pipeline reference itself.
I'm not sure about the efficacy of turning off SSL once established, but I think you have misinterpreted the mutability of the pipeline. Once a given channel is associated with a pipeline, that association is immutable. However, the handlers in the pipeline can be safely modified. That is to say, you can add and remove handlers as your protocol requires. Accordingly,you should be able to remove the SSL handler once it has served its purpose.
You can remove SslHandler from the pipeline with ChannelPipeline.remove(..) then it should turn your connection to plaintext. Please file a bug if it does not work - we actually have not tried that scenario in production :-)
I'm not sure about Netty, but in principle, you could indeed carry on with plain traffic on the same TCP connection. There are a few downsides:
Only the authentication would be secured. A MITM could perform actions other than those intended by the user. (This is similar to using HTTP Digest to some extent: the credentials are protected, but the request/response entities aren't.)
From an implementation point of view, this is tricky to get right. The TLS specification says:
If the application protocol using TLS provides that any data may be
carried over the underlying transport after the TLS connection is
closed, the TLS implementation must receive the responding
close_notify alert before indicating to the application layer that
the TLS connection has ended.
This implies that you're going to synchronise your stream somehow to wait for the close_notify response, before carrying on with your plain traffic.
The SSLEngine programming model is rather complex, and you may find that the Netty API isn't necessary handling this situation.
While it may make sense to want to save a few CPU cycles, most of the SSL/TLS overhead is in the handshake, which you'll be doing anyway. The symmetric cryptographic operations used for the actual encryption of the data are much less expensive. (You should try to measure this overhead to see if it really is a problem.)

How to avoid flooding a message queue?

I'm working on an application that is divided in a thin client and a server part, communicating over TCP. We frequently let the server make asynchronous calls (notifications) to the client to report state changes. This avoids that the server loses too much time waiting for an acknowledgement of the client. More importantly, it avoids deadlocks.
Such deadlocks can happen as follows. Suppose the server would send the state-changed-notification synchronously (please note that this is a somewhat constructed example). When the client handles the notification, the client needs to synchronously ask the server for information. However, the server cannot respond, because he is waiting for an answer to his question.
Now, this deadlock is avoided by sending the notification asynchronously, but this introduces another problem. When asynchronous calls are made more rapidly than they can be processed, the call queue keeps growing. If this situation is maintained long enough, the call queue will get totally full (flooded with messages). My question is: what can be done when that happens?
My problem can be summarized as follows. Do I really have to choose between sending notifications without blocking at the risk of flooding the message queue, or blocking when sending notifications at the risk of introducing a deadlock? Is there some trick to avoid flooding the message queue?
Note: To repeat, the server does not stall when sending notifications. They are sent asynchronously.
Note: In my example I used two communicating processes, but the same problem exists with two communicating threads.
If the server is sending informational messages to the client, which you yourself say are asynchronous, it should not have to wait for a reply from the client. If they are not informational, in other words they require an answer, I would say a server should never send such messages to a client, and their presence indicates a poor design.
If you have a constant congestion problem, there is little you can do other than gracefully fail and notify the client that no new messages can be posted; then it is up to the client to maintain a backlog of messages to be posted.
Introducing a priority queue and using message expiration/filtering could allow you to free up space in the queue, but that really just postpones the problem. If possible, you could also aggregate messages or ignore duplicate messages, but again the problem does not seem to be the queue itself. (Not to mention that the more complex queue logic could eat up valuable resources that would be better used actually processing messages.)
Depending on what the server side does, you could introduce result hashing for long computations, offload some types of messages to a dedicated device, check if the server waits unreasonably long for I/O operations, and a myriad of other techniques. Profile if possible, at least try to find out which message(s) causes congestion.
Oh, and the business solution: Compare cost of estimated development time to the cost of better hardware and conclude that you should just buy a more powerful server (or an additional one).
Depending on how important these messages are you might want to look into Message Expiration, or perhaps a Message Filter, though it sounds like your architecture may be incorrect.
I would rather fix the logic in the server side. The message queue should not stall waiting for the answer. Rather have a state machine which can also receive those info queries while it is waiting for the answer from the client.
Of course you can still flood your message queue, but with TCP you can handle it pretty easily.
The best way, I believe, would be to add another state to your client. This I borrowed from the SMPP protocol specs.
Add a congestion state to the client, whereby it always checks the queue length, assuming this is possible, and therefore once a certain threshold is attained, say 1000 unprocessed messages, the client sends the server a message indicating that it's congested and the server will be required to cease all messaging until it receives a notification indicating that the client is no longer congested.
Alternatively, on the server side, if there is a certain number of pending replies, the server could simply cease sending messages until the client replies a certain number of them.
These thresholds can be dynamically calculated or fixed, depending.....

Resources