Handle corrupted SPI data using STM32F1 MCU

Handle corrupted SPI data using STM32F1 MCU - linux

I'm developing an application for a custom board with a STM32F1 MCU which needs to be able to recover from a unexpected data corruption.
The data flow is as follows:
The master device (a Linux machine) sends a request to the slave which parses the message and gets ready to send a reply. Then the master reads the reply. The exchange is fast (#18MHz) and implemented like this:
if (::ioctl(_fd, SPI_IOC_MESSAGE(2), &transaction) < 0) {
warn("message not sent");
return false;
}
The delay between these two messages is ~50us. The message length is fixed.
On the STM side I use a DMA-driven SPI driver that is implemented in the way I'm going to write below.
I'm using the SPI2 which is clocked off APB1#36MHz (HSE#24 MHz; AHB#72MHz; APB1#36MHz).
After the SPI is configured to read the message (fixed length!) by issuing a DMA request on RXNEIE (CR2->RXDMAEN). After the message is processed the answer is getting transmitted via DMA1 (CR2->TXDMAEN).
Everything works like a charm until I interfere somehow. The scenario I'm trying to recover from is unplugging SCLK line while transferring.
I'm struggling to recover from this. I'm going to lay out my thoughts because I'm not sure where the bug is.
The DMA is configured to handle fixed length messages. That's why when I interfere somehow, the DMA controller waits until the whole message is processed and the buffer gets shifted. Suppose, I got a one third of the message when the SCLK suddenly vanished. DMA will be waiting for the rest two thirds. The master continues to send requests. Hence after SCLK is back, 2/3 of the next message will be placed in the buffer. The DMA interrupt is issued but the remaining trail of the last message is lost. It's lost for sure, but I can detect that using an ERRIE flag to issue an interrupt on OVR flag that is going to be set.
I've tried to handle that interrupt but to no avail.
The interrupt handler I have now checks if BSY flag is set (the trail is getting process by SPI controller). If it's set I kill DMA (that already starts to handle the next message) and leave OVR flag. Once BSY is cleared I clear OVR and reset DMA for reception.
This doesn't help much.
Another option I might use is a dedicated timer that gets reset on rising edge on SCLK (AN3109 application note inspired solution). This way I could implement a DMA timeout. If I got only the part of a message I can generate an interrupt on timer overflow if SCLK is not with us for a long time. This solution has issues, though.
I know the description is vague but I've tried my best and hope somebody with a greater insight might help.

Install an interrupt handler on the CS line. On rising edge, abort everything and start over DMA if the transfer is not yet complete. Use the SSI bit in SPI_CR1, set on rising edge, clear on falling edge.

Related

TCP close() vs shutdown() in Linux OS

I know there are already a lot similar questions in stackoverflow, but nothing seems convincing. Basically trying to understand under what circumstances I need to use one over the other or use both.
Also would like to understand if close() & shutdown() with shut_rdwr are the same.

Closing TCP connections has gathered so much confusion that we can rightfully say either this aspect of TCP has been poorly designed, or is lacking somewhere in documentation.
Short answer
To do it the proper way, you should use all 3: shutdown(SHUT_WR), shutdown(SHUT_RD) and close(), in this order. No, shutdown(SHUT_RDWR) and close() are not the same. Read their documentation carefully and questions on SO and articles about it, you need to read more of them for an overview.
Longer answer
The first thing to clarify is what you aim for, when closing a connection. Presumably you use TCP for a higher lever protocol (request-response, steady stream of data etc.). Once you decide to "close" (terminate) connection, all you had to send/receive, you sent and received (otherwise you would not decide to terminate) - so what more do you want? I'm trying to outline what you may want at the time of termination:
to know that all data sent in either direction reached the peer
if there are any errors (in transmitting the data in process of being sent when you decided to terminate, as well as after that, and in doing the termination itself - which also requires data being sent/received), the application is informed
optionally, some applications want to be non-blocking up to and including the termination
Unfortunately TCP doesn't make these features easily available, and the user needs to understand what's under the hood and how the system calls interact with what's under the hood. A key sentence is in the recv manpage:
When a stream socket peer has performed an orderly shutdown, the
return value will be 0 (the traditional "end-of-file" return).
What the manpage means here is, orderly shutdown is done by one end (A) choosing to call shutdown(SHUT_WR), which causes a FIN packet to be sent to the peer (B), and this packet takes the form of a 0 return code from recv inside B. (Note: the FIN packet, being an implementation aspect, is not mentioned by the manpage). The "EOF" as the manpage calls it, means there will be no more transmission from A to B, but application B can, and should continue to send what it was in the process of sending, and even send some more, potentially (A is still receiving). When that sending is done (shortly), B should itself call shutdown(SHUT_WR) to close the other half of the duplex. Now app A receives EOF and all transmission has ceased. The two apps are OK to call shutdown(SHUT_RD) to close their sockets for reading and then close() to free system resources associated with the socket (TODO I haven't found clear documentation taht says the 2 calls to shutdown(SHUT_RD) are sending the ACKs in the termination sequence FIN --> ACK, FIN --> ACK, but this seems logical).
Onwards to our aims, for (1) and (2) basically the application must somehow wait for the shutdown sequence to happen, and observe its outcome. Notice how if we follow the small protocol above, it is clear to both apps that the termination initiator (A) has sent everything to B. This is because B received EOF (and EOF is received only after everything else). A also received EOF, which is issued in reply to its own EOF, so A knows B received everything (there is a caveat here - the termination protocol must have a convention of who initiates the termination - so not both peers do so at once). However, the reverse is not true. After B calls shutdown(SHUT_WR), there is nothing coming back app-level, to tell B that A received all data sent, plus the FIN (the A->B transmission had ceased!). Correct me if I'm wrong, but I believe at this stage B is in state "LAST_ACK" and when the final ACK arrives (step #4 of the 4-way handshake), concludes the close but the application is not informed unless it had set SO_LINGER with a long-enough timeout. SO_LINGER "ON" instructs the shutdown call to block (be performed in the forground) hence the shutdown call itself will do the waiting.
In conclusion what I recommend is to configure SO_LINGER ON with a long timeout, which causes it to block and hence return any errors. What is not entirely clear is whether it is shutdown(SHUT_WR) or shutdown(SHUT_RD) which blocks in expectation of the LAST_ACK, but that is of less importance as we need to call both.
Blocking on shutdown is problematic for requirement #3 above where e.g. you have a single-threaded design that serves all connections. Using SO_LINGER may block all connections on the termination of one of them. I see 3 routes to address the problem:
shutdown with LINGER, from a different thread. This will of course complicate a design
linger in background and either
2A. "Promote" FIN and FIN2 to app-level messages which you can read and hence wait for. This basically moves the problem that TCP was meant to solve, one level higher, which I consider hack-ish, also because the ensuing shutdown calls may still end in a limbo.
2B. Try to find a lower-level facility such as SIOCOUTQ ioctl described here that queries number of unACKed bytes in the network stack. The caveats are many, this is Linux specific and we are not sure if it aplies to FIN ACKs (to know whether closing is fully done), plus you'd need to poll taht periodically, which is complicated. Overall I'm leaning towards option 1.
I tried to write a comprehensive summary of the issue, corrections/additions welcome.

TCP sockets are bidirectional - you send and receive over the one socket. close() stops communication in both directions. shutdown() provides another parameter that allows you to specify which direction you might want to stop using.
Another difference (between close() and shutdown(rw)) is that close() will keep the socket open if another process is using it, while shutdown() shuts down the socket irrespective of other processes.
shutdown() is often used by clients to provide framing - to indicate the end of their request, e.g. an echo service might buffer up what it receives until the client shutdown()s their send side, which tells the server that the client has finished, and the server then replies; the client can receive the reply because it has only shutdown() writing, not reading, through its socket.

Close will close both send and receving end of socket.If you want only sending part of socket should be close not receving part or vice versa you can use shutdown.
close()------->will close both sending and receiving end.
shutdown()------->only want to close sending or receiving.
argument:SHUT_RD(shutdown reading end (receiving end))
SHUT_WR(shutdown writing end(sending end))
SHUT_RDWR(shutdown both)

Android BLE callback OnWriteCallback stops after few seconds

I am trying to write next packet synchronously based on the OnCharacteristicWrite call back condition to achieve a maximum throughput. But for some reason it stops triggering OnCharacteristicWrite callback at very initial after 1-2 sec of period and it never get called even I resend the packets. It works well if I add the delay per packet but I do not want to add any delay to achieve maximum throughput.
Is there any way I could achieve the maximum throughput without adding any delay?
Also what exactly sending multiple packets per connection interval means (and Is there any way I could achieve it through the peripheral)?

If you use Write Without Response (see https://developer.android.com/reference/android/bluetooth/BluetoothGattCharacteristic.html#setWriteType(int)), you will be able to send multiple packets per connection interval.
Android KitKat unfortunately has broken flow control when you send multiple packets with "Write Without Response". If you try on a newer Android device, it should work properly.
If the writeCharacteristic method returns true, it just means it has passed your packet to the Bluetooth process. You can see the exact logic in the source code at https://android.googlesource.com/platform/frameworks/base/+/fe2bf16a2b287c3c748cd6fa7c14026becfe83ff/core/java/android/bluetooth/BluetoothGatt.java#1081. Basically it returns true if the characteristic has the write property, the gatt object is valid and there is currently no other pending GATT operation going on.
The onCharacteristicWrite callback will send status=0 when the Write Response has arrived (for Write With Response) or the Bluetooth stack is ready and has buffer space to accept a new packet (for Write Without Response).
I recently wrote a post about that here you could read: onCharacteristicWrite and onNotificationSent are being called too fast - how to acquire real outgoing data rates?.
If you want a simple workaround for KitKat you could write 10 packets as Write Without Response and then the 11th packet as Write With Response and then start over with Write Without Responses. That should give you decent performance.

Restoring Keyboard IRQ

I'm very new to the Linux Kernel Module development and trying to write a simple kernel module which can later be extended as the keyboard driver.
I tried following two approaches:
Interrupt Based Approach
I started writing the code after following the guide given here. But the only problem is that the machine freezes when I run rmmod because it is not able to restore the IRQ to the original keyboard driver.
Is there any way to save the device name & device id of the original keyboard driver before requesting the IRQ in init() and then restore everything back to normal once the exit() i.e. cleanup_module() is fired?
void cleanup_module() {
/* Something to restore everything back to normal */
free_irq(1, NULL);
}
Polling Approach
In this approach, I am continuously polling for the Key Pressed & Released by using a while loop and then copying the input back to the user.
while(!(inb(0x64) & 0x1) || (input = inb(0x60)) & 0x80);
The problem I'm facing here is that it never comes out of the while loop. I'm assuming that is because the original keyboard driver serves the request.
Is there any way to get the request forwarded from the original keyboard driver?
I appreciate any help/pointers on this.
Thanks!

I'm afraid that I do not see how this can work as long as the normal kernel keyboard driver is also controlling the keyboard, since both drivers will be attempting to control the device. The kernel i8042 driver (I assume that is the relevant one for you) registers its interrupt as shared, and if your driver managed to register for the same interrupt then it will also have registered its handler as shared, so that both got notified on interrupts and would race to access the device.
If you registered a shared handler, this might also explain the crash when you unload it: unregistering a shared interrupt handler only works when the second parameter contains a valid dev_id; so unregistering would fail when called with NULL as you did, but the handler code would still be unloaded from memory. This would leading to a crash on a future interrupt.
Regarding your polling approach, yes, since the normal driver is notified on interrupts it is rather likely to beat you to reading the keyboard.

Linux CAN bus transmission timeout

Scenario
There is a Linux-powered device connected to a CAN bus. The device periodically transmits the CAN message. The nature of the data carried by this message is like measurement rather than command, i.e. only the most recent one is actually valid, and if some messages are lost that is not an issue as long as the latest one was received successfully.
Then the device in question is being disconnected from the CAN bus for some amount of time that is much longer than the interval between subsequent message transmissions. The device logic is still trying to transmit the messages, but since the bus is disconnected the CAN controller is unable to transmit any of them so the messages are being accumulated in the TX queue.
Some time later the CAN bus connection is restored, and all the accumulated messages are being kicked on the bus one by one.
Problem
When the CAN bus connection is restored, undefined amount of outdated messages will be transmitted from the TX queue.
While the CAN bus connection is still not available but TX queue is already full, transmission of some most recent messages (i.e. the only valid messages) will be discarded.
Once the CAN bus connection is restored, there would be short term traffic burst while the TX queue is being flushed. This can alter the Time Triggered Bus Scheduling if one is used (it is in my case).
Question
My application uses SocketCAN driver, so basically the question should be applied to SocketCAN, but other options are considered too if there are any.
I see two possible solutions: define a message transmission timeout (if a message was not transmitted during some predefined amount if time, it will be discarded automatically), or abort transmission of outdated messages manually (though I doubt it is possible at all with socket API).
Since the first option seems to be most real to me, the question is:
How does one define TX timeout for CAN interface under Linux?
Are there other options exist to solve the problems described above, aside from TX timeouts?

My solution for this problem was shutting down and bringing the device up again:
void
clear_device_queue
(void)
{
if (!queue_cleared)
{
const char
*dev = getenv("MOTOR_CAN_DEVICE");
char
cmd[1024];
sprintf(cmd, "sudo ip link set down %s", dev);
system(cmd);
usleep(500000);
sprintf(cmd, "sudo ip link set up %s", dev);
system(cmd);
queue_cleared = true;
}
}

I don't know the internals of SocketCAN, but I think the larger part of the problem should be solved on a more general, logical level.
Before, there is one aspect to clarify:
The question includes tag safety-critical...
If the CAN communication is not relevant to implement a safety function, you can pick any solution you find useful. There may be parts of the second alternative which are useful for you in this case too, but those are not mandatorx.
If the communication is, however used in a safety-relevant context, there must be a concept that takes into account the requirements imposed by IEC 61508 (safety of programmable electronic systems in general) and IEC 61784-x/62280 (safe communcation protocols).
Those standards usually lead to some protocol measures that come in handy with any embedded communication, but especially for the present problem:
Add a sequence counter to the protocol frames.
The receiver shall monitor that it the counter values it sees don't make larger "jumps" than allowed (e.g., if you allow to miss 2 frames along the way, max. counter increment may be +3. CAN bus may redouble a frame, so a counter increment of +0 must be tolerated, too.
The receiver must monitor that every received frame is followed by another within a timeout period. If your CAN connection is lost and recovered in the meantime, it depends if the interruption was longer or within the timeout.
Additionally, the receiver may monitor that a frame doesn't follow the preceding one too early, but if the frames include the right data, this usually isn't necessary.
[...] The nature of the data carried by this message is like measurement rather than command, i.e. only the most recent one is actually valid, and if some messages are lost that is not an issue as long as the latest one was received successfully.
Through CAN, you shall never communicate "commands" in the meaning that every one of them can trigger a change, like "toggle output state" or "increment set value by one unit" because you never know whether the frame reduplication hits you or not.
Besides, you shall never communicate "anything safety-relevant" through a single frame because any frame may be lost or broken by an error. Instead, "commands" shall be transferred (like measurements) as a stream of periodical frames with measurement or set value updates.
Now, in order to get the required availability out of the protocol design, the TX queue shouldn't be long. If you actually feel as you need that queue, it could be that the bus is overloaded, compared to the timing requirements it faces. From my point of view, the TX "queue" shouldn't be longer than one or two frames. Then, the problem of recovering the CAN connection is nearly fixed...

realtime midi input and synchronisation with audio

I have built a standalone app version of a project that until now was just a VST/audiounit. I am providing audio support via rtaudio.
I would like to add MIDI support using rtmidi but it's not clear to me how to synchronise the audio and MIDI parts.
In VST/audiounit land, I am used to MIDI events that have a timestamp indicating their offset in samples from the start of the audio block.
rtmidi provides a delta time in seconds since the previous event, but I am not sure how I should grab those events and how I can work out their time in relation to the current sample in the audio thread.
How do plugin hosts do this?
I can understand how events can be sample accurate on playback, but it's not clear how they could be sample accurate when using realtime input.
rtaudio gives me a callback function. I will run at a low block size (32 samples). I guess I will pass a pointer to an rtmidi instance as the userdata part of the callback and then call midiin->getMessage( &message ); inside the audio callback, but I am not sure if this is thread-sensible.
Many thanks for any tips you can give me

In your case, you don't need to worry about it. Your program should send the MIDI events to the plugin with a timestamp of zero as soon as they arrive. I think you have perhaps misunderstood the idea behind what it means to be "sample accurate".
As #Brad noted in his comment to your question, MIDI is indeed very slow. But that's only part of the problem... when you are working in a block-based environment, incoming MIDI events cannot be processed by the plugin until the start of a block. When computers were slower and block sizes of 512 (or god forbid, >1024) were common, this introduced a non-trivial amount of latency which results in the arrangement not sounding as "tight". Therefore sequencers came up with a clever way to get around this problem. Since the MIDI events are already known ahead of time, these events can be sent to the instrument one block early with an offset in sample frames. The plugin then receives these events at the start of the block, and knows not to start actually processing them until N samples have passed. This is what "sample accurate" means in sequencers.
However, if you are dealing with live input from a keyboard or some sort of other MIDI device, there is no way to "schedule" these events. In fact, by the time you receive them, the clock is already ticking! Therefore these events should just be sent to the plugin at the start of the very next block with an offset of 0. Sequencers such as Ableton Live, which allow a plugin to simultaneously receive both pre-sequenced and live events, simply send any live events with an offset of 0 frames.
Since you are using a very small block size, the worst-case scenario is a latency of .7ms, which isn't too bad at all. In the case of rtmidi, the timestamp does not represent an offset which you need to schedule around, but rather the time which the event was captured. But since you only intend to receive live events (you aren't writing a sequencer, are you?), you can simply pass any incoming MIDI to the plugin right away.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string