CCFH - Do a retransmitted CCR message set the T bit during CCFH failover? - diameter-protocol

CCFH - Do a CCR message set the T bit during failover ?
Scenario
I have a client and 2 OCS servers.
I didnt get the CCA response (CCA -I ) from first OCS.
After my TX timer expiry I am retransmitting CCR I to second OCS.
Question here is -
Do the retransmitted CCR needs the T bit to be set ?
In RFC I cannot see this specifically mentioned anywhere

You can find the answer in the RFC 4006 Diameter Credit-Control Application
If the timer Tx expires, the credit-control client MUST continue the
service and wait for a possible late answer. If the request times
out, the credit-control client re-transmits the request (marked with
T-flag) to a backup credit-control server, if possible. If the re-
transmitted request also times out, or if a temporary error is
received in answer, the credit-control client buffers the request if
the value of the Direct-Debiting-Failure-Handling AVP is set to
TERMINATE_OR_BUFFER.

Related

Clarification on "Credit-Control-Failure-Handling " AVP

I need clarification on "Credit-Control-Failure-Handling" AVP, and I'd be appreciated if someone can explain about its enumerated values.
So, as I understand if CCFH is in TERMINATE mode (0), and the client doesn't receive CCA within Tx timer period, then the request is regarded as failed and basically session will be terminated.
However, if CCFH is in CONTINUE mode (1), quoted from RFC4006:
When the Credit-Control-Failure-Handling AVP is set to CONTINUE,
the credit-control client SHOULD re-send the request to an
alternative server in the case of transport or temporary failures,
provided that a failover procedure is supported in the credit-
control server and the credit-control client, and that an
alternative server is available. Otherwise, the service SHOULD be
granted, even if credit-control messages can't be delivered.
So, my understanding is unlike TERMINATE mode, if the CCA does not answer within Tx timer, the client would provide the service to the end-user.
My question is what if the server actually sends a CCA error message like (DIAMETER_TOO_BUSY or other error messages) within Tx timer to the client. Is the client still provide the service to the end-user or not?
TERMINATE is exactly to terminate.
Later on same doc (or actually - its newer version 8506) says:
When the Tx timer expires, the Diameter Credit-Control client always
terminates the service if the CCFH is set to the value TERMINATE.
The credit-control session may be moved to an alternative server only
if a protocol error DIAMETER_TOO_BUSY or DIAMETER_UNABLE_TO_DELIVER
is received before the Tx timer expires. Therefore, the value
TERMINATE is not appropriate if proper failover behavior is desired.

how to handle when validity time expire in Credit Control Server?

*i want to know that what happen when validity time expire, some think that know is change state IDLE and remove from session, but what kind of request we send to client *
The Tcc timer supervises an ongoing credit-control session in the
credit-control server. It is RECOMMENDED to use the Validity-Time
as input to set the Tcc timer value. In case of transient failures in the network, the Diameter credit-control server might change to Idle state. To avoid this, the Tcc timer MAY be set so that Tcc equals to 2 x Validity-Time. in case of the timers expire diameter need to send a notification to connected devices that this session is no longer active so delete info about this session. Diameter send Session Termination Request and delete all related data if this session, also from the database.

when does a client timeout starts?

Okay here is the scenario:
Clients send a request at 10:00:00 (H:M:S). That request is stored in the IIS app pool QUEUE until there is an available thread for that request. Thread is released and the app pool now received the request that it has to process. The time is 10:00:15.
When did the client start waiting for his response - at 10:00:00 or 10:00:15?
Client timeout period started at 10:00:00. The client has no idea what's going on with the internals of the server, nor even network latency. All it knows is when the request was sent and when a response was received (if at all).
While there may be more granular timeouts at the platform-specific message handler level (SendTimeout, ReceiveHeadersTimeout, ReceiveDataTimeout), the Timeout defined on .NET Standard-compliant implementations of HttpClient is end-to-end. Per Microsoft:
The HttpClient.Timeout property is intended to be exactly what you are
referring to as the 99% case: an end-to-end timeout after which the
request expires. The WinHttpHandler API is intended to provide a
deeper control to developers for more advanced scenarios. In keeping
with this intention, we have more granular timeouts on that type since
we have gotten developer requests in the past who asked for control
over a specific stage of the request.

How can we Securely Handle liveness checking messages in IKEv2 with notify payload INVALID_IKE_SPI

This is a question hitting my mind but can not come up with solution.
Suppose there is a IKE tunnel between two peers (peer_1,peer_2). Now there is an attacker who wants to break this tunnel. What the attacker is doing is that for every keep alive Informational Request from peer_1 to peer_2, he/she(attacker) replies back with INVALID_IKE_SPI notify payload and obviously this message would be in plain text. This results peer_1 believing the IKE_SA got some problem and after implementation specific retry the peer_1 closes the tunnel(Although rfc 7296 specifies that peer receiving such reply should not change its state but there should be an end of retrying keep alive to get rid of network flood). As a result the attacker wins.
Is there anything IKEv2 Protocol itself says to prevent this type of situation?
If anyone knows about this please reply me back or some solution will be also helpful.
Citing RFC 7296, section 2.4, paragraph 3:
Since IKE is designed to operate in spite of DoS attacks from the
network, an endpoint MUST NOT conclude that the other endpoint has
failed based on any routing information (e.g., ICMP messages) or IKE
messages that arrive without cryptographic protection (e.g., Notify
messages complaining about unknown SPIs). An endpoint MUST conclude
that the other endpoint has failed only when repeated attempts to
contact it have gone unanswered for a timeout period or when a
cryptographically protected INITIAL_CONTACT notification is received
on a different IKE SA to the same authenticated identity. An
endpoint should suspect that the other endpoint has failed based on
routing information and initiate a request to see whether the other
endpoint is alive. To check whether the other side is alive, IKE
specifies an empty INFORMATIONAL request that (like all IKE requests)
requires an acknowledgement (note that within the context of an IKE
SA, an "empty" message consists of an IKE header followed by an
Encrypted payload that contains no payloads). If a cryptographically
protected (fresh, i.e., not retransmitted) message has been received
from the other side recently, unprotected Notify messages MAY be
ignored. Implementations MUST limit the rate at which they take
actions based on unprotected messages.
I think that (for the sake of clarity) the relevant types of an attacker should be considered:
1/ An attacker able to drop arbitrary packets (i.e. an active MitM)
this one is able to perform DOS just by dropping packets and AFAIK there is nothing that can prevent him doing so. He does not need any sophistication to break the communication.
2/ An attacker unable to drop packets
this one can not prevent peer_2's legitimate responses (to peer_1's INFORMATIONAL requests) reaching peer_1.
thus peer_1 receives the response (before all retries timeout) and knows that peer_2 is alive.
3/ An attacker able to drop some packets
then it is a race and the outcome depends on the configuration of the peers and the percentage of packets the attacker is able to drop.
EDIT>
I would understand the questioned "case 2 attacker" scenario this way:
by receiving the attacker's unprotected INVALID_IKE_SPI notify (spoofed by the attacker from peer_2's address) peer_1 can (at most) only suspect that peer_2 has failed (as it MUST not conclude that the other endpoint has failed based on IKE massages without cryptographic protection)
it may decide (see note below) to issue a liveness check by sending an empty INFORMATIONAL request to peer_2 (which is cryptographically protected)
the "case 2 atacker" is unable to tamper with this request, so it should reach peer_2 (it might involve some implementation specific retransmits, as specified)
peer_2 (as it is alive) responds with an acknowledgement (which is cryptographically protected)
the "case 2 atacker" is unable to tamper with this response, so it should reach peer_1
upon receiving this response (which is a fresh, cryptographically protected message from peer_2), peer_1 knows that peer_2 is alive and keeps the SAs (as nothing has happened)
Note: The "Implementations MUST limit the rate at which they take actions based on unprotected messages" part means, that peer_1 should not perform this liveness check on every unprotected Notify message received and some implementation specific rate limiting mechanism must be in place (probably to prevent traffic amplification).
Desclaimer: I am no crypto expert, so please do validate my thoughts.

MQTT what is the purpose or usage of Last Will Testament?

I'm surely missing something about how the whole MQTT protocol works, as I can't grasp the usage pattern of Last Will Testament messages: what's their purpose?
One example I often see is about informing that a device has gone offline. It doesn't make very much sense to me, since it's obvious that if a device isn't publishing any data it may be offline or there could be some network problems.
So, what are some practical usages of the LWT? What was it invented for?
LWT messages are not really concerned about detecting whether a client has gone offline or not (that task is handled by keepAlive messages).
LWT messages are about what happens after the client has gone offline.
The analogy is that of a real last will:
If a person dies, she can formulate a testament, in which she declares what actions should be taken after she has passed away. An executor will heed those wishes and execute them on her behalf.
The analogy in the MQTT world is that a client can formulate a testament, in which it declares what message should be sent on it's behalf by the broker, after it has gone offline.
A fictitious example:
I have a sensor, which sends crucial data, but very infrequently.
It has formulated a last will statement in the form of [topic: '/node/gone-offline', message: ':id'], with :id being a unique id for the sensor. I also have a emergency-subscriber for the topic 'node/gone-offline', which will send a SMS to my phone every time a message is published on that channel.
During normal operation, the sensor will keep the connection to the MQTT-broker open by sending periodic keepAlive messages interspersed with the actual sensor readings. If the sensor goes offline, the connection to the broker will time out, due to the lack of keepAlives.
This is where LWT comes in: If no LWT is specified, the broker doesn't care and just closes the connection. In our case however, the broker will execute the sensor's last will and publish the LWT-message '/node/gone-offline: :id'. The message will then be consumed to my emergency-subscriber and I will be notified of the sensor's ID via SMS so that I can check up on what's going on.
In short:
Instead of just closing the connection after a client has gone offline, LWT messages can be leveraged to define a message to be published by the broker on behalf of the client, since the client is offline and cannot publish anymore.
Just because a device is not publishing does not mean it is not online or there is a network problem.
Take for example a sensor that monitors a value that only changes very infrequently, good design says that the sensor should only publish the changes to help reduce bandwidth usage as periodically publishing the same value is wasteful. If the value is published as a retained value then any new subscriber will always get the current value without having to wait for the sensor value to change and it publish again.
In this case the LWT is used to published when the sensor fails (or there is a network problem) so we know of the problem as soon at the client keep alive times out.
A in-depth article about Last-Will-and-Testament messages is available in the MQTT Essentials Blog Post series: http://www.hivemq.com/mqtt-essentials-part-9-last-will-and-testament/.
To summarize the blog post:
The Last Will and Testament feature is used in MQTT to notify other clients about an ungracefully disconnected client.
MQTT is often used in scenarios were unreliable networks are very common. Therefore it is assumed that some clients will disconnect ungracefully from time to time, because they lost the connection, the battery is empty or any other imaginable case. It would be good to know if a connected client has disconnected gracefully (which means with a MQTT DISCONNECT message) or not, in order to take appropriate action.

Resources