Clarification on "Credit-Control-Failure-Handling " AVP - diameter-protocol

I need clarification on "Credit-Control-Failure-Handling" AVP, and I'd be appreciated if someone can explain about its enumerated values.
So, as I understand if CCFH is in TERMINATE mode (0), and the client doesn't receive CCA within Tx timer period, then the request is regarded as failed and basically session will be terminated.
However, if CCFH is in CONTINUE mode (1), quoted from RFC4006:
When the Credit-Control-Failure-Handling AVP is set to CONTINUE,
the credit-control client SHOULD re-send the request to an
alternative server in the case of transport or temporary failures,
provided that a failover procedure is supported in the credit-
control server and the credit-control client, and that an
alternative server is available. Otherwise, the service SHOULD be
granted, even if credit-control messages can't be delivered.
So, my understanding is unlike TERMINATE mode, if the CCA does not answer within Tx timer, the client would provide the service to the end-user.
My question is what if the server actually sends a CCA error message like (DIAMETER_TOO_BUSY or other error messages) within Tx timer to the client. Is the client still provide the service to the end-user or not?

TERMINATE is exactly to terminate.
Later on same doc (or actually - its newer version 8506) says:
When the Tx timer expires, the Diameter Credit-Control client always
terminates the service if the CCFH is set to the value TERMINATE.
The credit-control session may be moved to an alternative server only
if a protocol error DIAMETER_TOO_BUSY or DIAMETER_UNABLE_TO_DELIVER
is received before the Tx timer expires. Therefore, the value
TERMINATE is not appropriate if proper failover behavior is desired.

Related

when does a client timeout starts?

Okay here is the scenario:
Clients send a request at 10:00:00 (H:M:S). That request is stored in the IIS app pool QUEUE until there is an available thread for that request. Thread is released and the app pool now received the request that it has to process. The time is 10:00:15.
When did the client start waiting for his response - at 10:00:00 or 10:00:15?
Client timeout period started at 10:00:00. The client has no idea what's going on with the internals of the server, nor even network latency. All it knows is when the request was sent and when a response was received (if at all).
While there may be more granular timeouts at the platform-specific message handler level (SendTimeout, ReceiveHeadersTimeout, ReceiveDataTimeout), the Timeout defined on .NET Standard-compliant implementations of HttpClient is end-to-end. Per Microsoft:
The HttpClient.Timeout property is intended to be exactly what you are
referring to as the 99% case: an end-to-end timeout after which the
request expires. The WinHttpHandler API is intended to provide a
deeper control to developers for more advanced scenarios. In keeping
with this intention, we have more granular timeouts on that type since
we have gotten developer requests in the past who asked for control
over a specific stage of the request.

How can we Securely Handle liveness checking messages in IKEv2 with notify payload INVALID_IKE_SPI

This is a question hitting my mind but can not come up with solution.
Suppose there is a IKE tunnel between two peers (peer_1,peer_2). Now there is an attacker who wants to break this tunnel. What the attacker is doing is that for every keep alive Informational Request from peer_1 to peer_2, he/she(attacker) replies back with INVALID_IKE_SPI notify payload and obviously this message would be in plain text. This results peer_1 believing the IKE_SA got some problem and after implementation specific retry the peer_1 closes the tunnel(Although rfc 7296 specifies that peer receiving such reply should not change its state but there should be an end of retrying keep alive to get rid of network flood). As a result the attacker wins.
Is there anything IKEv2 Protocol itself says to prevent this type of situation?
If anyone knows about this please reply me back or some solution will be also helpful.
Citing RFC 7296, section 2.4, paragraph 3:
Since IKE is designed to operate in spite of DoS attacks from the
network, an endpoint MUST NOT conclude that the other endpoint has
failed based on any routing information (e.g., ICMP messages) or IKE
messages that arrive without cryptographic protection (e.g., Notify
messages complaining about unknown SPIs). An endpoint MUST conclude
that the other endpoint has failed only when repeated attempts to
contact it have gone unanswered for a timeout period or when a
cryptographically protected INITIAL_CONTACT notification is received
on a different IKE SA to the same authenticated identity. An
endpoint should suspect that the other endpoint has failed based on
routing information and initiate a request to see whether the other
endpoint is alive. To check whether the other side is alive, IKE
specifies an empty INFORMATIONAL request that (like all IKE requests)
requires an acknowledgement (note that within the context of an IKE
SA, an "empty" message consists of an IKE header followed by an
Encrypted payload that contains no payloads). If a cryptographically
protected (fresh, i.e., not retransmitted) message has been received
from the other side recently, unprotected Notify messages MAY be
ignored. Implementations MUST limit the rate at which they take
actions based on unprotected messages.
I think that (for the sake of clarity) the relevant types of an attacker should be considered:
1/ An attacker able to drop arbitrary packets (i.e. an active MitM)
this one is able to perform DOS just by dropping packets and AFAIK there is nothing that can prevent him doing so. He does not need any sophistication to break the communication.
2/ An attacker unable to drop packets
this one can not prevent peer_2's legitimate responses (to peer_1's INFORMATIONAL requests) reaching peer_1.
thus peer_1 receives the response (before all retries timeout) and knows that peer_2 is alive.
3/ An attacker able to drop some packets
then it is a race and the outcome depends on the configuration of the peers and the percentage of packets the attacker is able to drop.
EDIT>
I would understand the questioned "case 2 attacker" scenario this way:
by receiving the attacker's unprotected INVALID_IKE_SPI notify (spoofed by the attacker from peer_2's address) peer_1 can (at most) only suspect that peer_2 has failed (as it MUST not conclude that the other endpoint has failed based on IKE massages without cryptographic protection)
it may decide (see note below) to issue a liveness check by sending an empty INFORMATIONAL request to peer_2 (which is cryptographically protected)
the "case 2 atacker" is unable to tamper with this request, so it should reach peer_2 (it might involve some implementation specific retransmits, as specified)
peer_2 (as it is alive) responds with an acknowledgement (which is cryptographically protected)
the "case 2 atacker" is unable to tamper with this response, so it should reach peer_1
upon receiving this response (which is a fresh, cryptographically protected message from peer_2), peer_1 knows that peer_2 is alive and keeps the SAs (as nothing has happened)
Note: The "Implementations MUST limit the rate at which they take actions based on unprotected messages" part means, that peer_1 should not perform this liveness check on every unprotected Notify message received and some implementation specific rate limiting mechanism must be in place (probably to prevent traffic amplification).
Desclaimer: I am no crypto expert, so please do validate my thoughts.

CCFH - Do a retransmitted CCR message set the T bit during CCFH failover?

CCFH - Do a CCR message set the T bit during failover ?
Scenario
I have a client and 2 OCS servers.
I didnt get the CCA response (CCA -I ) from first OCS.
After my TX timer expiry I am retransmitting CCR I to second OCS.
Question here is -
Do the retransmitted CCR needs the T bit to be set ?
In RFC I cannot see this specifically mentioned anywhere
You can find the answer in the RFC 4006 Diameter Credit-Control Application
If the timer Tx expires, the credit-control client MUST continue the
service and wait for a possible late answer. If the request times
out, the credit-control client re-transmits the request (marked with
T-flag) to a backup credit-control server, if possible. If the re-
transmitted request also times out, or if a temporary error is
received in answer, the credit-control client buffers the request if
the value of the Direct-Debiting-Failure-Handling AVP is set to
TERMINATE_OR_BUFFER.

MQTT what is the purpose or usage of Last Will Testament?

I'm surely missing something about how the whole MQTT protocol works, as I can't grasp the usage pattern of Last Will Testament messages: what's their purpose?
One example I often see is about informing that a device has gone offline. It doesn't make very much sense to me, since it's obvious that if a device isn't publishing any data it may be offline or there could be some network problems.
So, what are some practical usages of the LWT? What was it invented for?
LWT messages are not really concerned about detecting whether a client has gone offline or not (that task is handled by keepAlive messages).
LWT messages are about what happens after the client has gone offline.
The analogy is that of a real last will:
If a person dies, she can formulate a testament, in which she declares what actions should be taken after she has passed away. An executor will heed those wishes and execute them on her behalf.
The analogy in the MQTT world is that a client can formulate a testament, in which it declares what message should be sent on it's behalf by the broker, after it has gone offline.
A fictitious example:
I have a sensor, which sends crucial data, but very infrequently.
It has formulated a last will statement in the form of [topic: '/node/gone-offline', message: ':id'], with :id being a unique id for the sensor. I also have a emergency-subscriber for the topic 'node/gone-offline', which will send a SMS to my phone every time a message is published on that channel.
During normal operation, the sensor will keep the connection to the MQTT-broker open by sending periodic keepAlive messages interspersed with the actual sensor readings. If the sensor goes offline, the connection to the broker will time out, due to the lack of keepAlives.
This is where LWT comes in: If no LWT is specified, the broker doesn't care and just closes the connection. In our case however, the broker will execute the sensor's last will and publish the LWT-message '/node/gone-offline: :id'. The message will then be consumed to my emergency-subscriber and I will be notified of the sensor's ID via SMS so that I can check up on what's going on.
In short:
Instead of just closing the connection after a client has gone offline, LWT messages can be leveraged to define a message to be published by the broker on behalf of the client, since the client is offline and cannot publish anymore.
Just because a device is not publishing does not mean it is not online or there is a network problem.
Take for example a sensor that monitors a value that only changes very infrequently, good design says that the sensor should only publish the changes to help reduce bandwidth usage as periodically publishing the same value is wasteful. If the value is published as a retained value then any new subscriber will always get the current value without having to wait for the sensor value to change and it publish again.
In this case the LWT is used to published when the sensor fails (or there is a network problem) so we know of the problem as soon at the client keep alive times out.
A in-depth article about Last-Will-and-Testament messages is available in the MQTT Essentials Blog Post series: http://www.hivemq.com/mqtt-essentials-part-9-last-will-and-testament/.
To summarize the blog post:
The Last Will and Testament feature is used in MQTT to notify other clients about an ungracefully disconnected client.
MQTT is often used in scenarios were unreliable networks are very common. Therefore it is assumed that some clients will disconnect ungracefully from time to time, because they lost the connection, the battery is empty or any other imaginable case. It would be good to know if a connected client has disconnected gracefully (which means with a MQTT DISCONNECT message) or not, in order to take appropriate action.

How to avoid flooding a message queue?

I'm working on an application that is divided in a thin client and a server part, communicating over TCP. We frequently let the server make asynchronous calls (notifications) to the client to report state changes. This avoids that the server loses too much time waiting for an acknowledgement of the client. More importantly, it avoids deadlocks.
Such deadlocks can happen as follows. Suppose the server would send the state-changed-notification synchronously (please note that this is a somewhat constructed example). When the client handles the notification, the client needs to synchronously ask the server for information. However, the server cannot respond, because he is waiting for an answer to his question.
Now, this deadlock is avoided by sending the notification asynchronously, but this introduces another problem. When asynchronous calls are made more rapidly than they can be processed, the call queue keeps growing. If this situation is maintained long enough, the call queue will get totally full (flooded with messages). My question is: what can be done when that happens?
My problem can be summarized as follows. Do I really have to choose between sending notifications without blocking at the risk of flooding the message queue, or blocking when sending notifications at the risk of introducing a deadlock? Is there some trick to avoid flooding the message queue?
Note: To repeat, the server does not stall when sending notifications. They are sent asynchronously.
Note: In my example I used two communicating processes, but the same problem exists with two communicating threads.
If the server is sending informational messages to the client, which you yourself say are asynchronous, it should not have to wait for a reply from the client. If they are not informational, in other words they require an answer, I would say a server should never send such messages to a client, and their presence indicates a poor design.
If you have a constant congestion problem, there is little you can do other than gracefully fail and notify the client that no new messages can be posted; then it is up to the client to maintain a backlog of messages to be posted.
Introducing a priority queue and using message expiration/filtering could allow you to free up space in the queue, but that really just postpones the problem. If possible, you could also aggregate messages or ignore duplicate messages, but again the problem does not seem to be the queue itself. (Not to mention that the more complex queue logic could eat up valuable resources that would be better used actually processing messages.)
Depending on what the server side does, you could introduce result hashing for long computations, offload some types of messages to a dedicated device, check if the server waits unreasonably long for I/O operations, and a myriad of other techniques. Profile if possible, at least try to find out which message(s) causes congestion.
Oh, and the business solution: Compare cost of estimated development time to the cost of better hardware and conclude that you should just buy a more powerful server (or an additional one).
Depending on how important these messages are you might want to look into Message Expiration, or perhaps a Message Filter, though it sounds like your architecture may be incorrect.
I would rather fix the logic in the server side. The message queue should not stall waiting for the answer. Rather have a state machine which can also receive those info queries while it is waiting for the answer from the client.
Of course you can still flood your message queue, but with TCP you can handle it pretty easily.
The best way, I believe, would be to add another state to your client. This I borrowed from the SMPP protocol specs.
Add a congestion state to the client, whereby it always checks the queue length, assuming this is possible, and therefore once a certain threshold is attained, say 1000 unprocessed messages, the client sends the server a message indicating that it's congested and the server will be required to cease all messaging until it receives a notification indicating that the client is no longer congested.
Alternatively, on the server side, if there is a certain number of pending replies, the server could simply cease sending messages until the client replies a certain number of them.
These thresholds can be dynamically calculated or fixed, depending.....

Resources