Thank you for reading this and for all of your help. Anyway...I am trying to implement a crc16 with polynomial x^16 + x^12 + x^5 + 1 in verilog. The problem I have encountered is that I don't get the entire packet of data at one point in time. I get a 32 bit word at a time and the number of words is dynamic but is at least 4 words and can be as high as 16384 words or higher. The time is not much of an issue because I am running on a 150 MHz clk and the input is coming in at most a 33 MHz clk but may be a 10 MHz. This does not really affect me because I am first accepting the data via a FIFO.
I have been trying to develop an FSM but have really hit a roadblock. One idea is for me to wait for all the data and then just input the entire thing as one big data packet; however, this seems really inefficient and I just don't think I need to do this. Plus it could take up valuable resources. Another way I was playing with was to input the first word and do the XOR operation. Then when the input data only has 1 to 2 bits left that are not xored (not sure if that is worded correctly) I would input the next word. Upon the input I would continue to compute the CRC followed by another input until the last word is imputed into the module.
With this method I would need to implement a counter or a shift register in some fashion. Anyway, any help would be nice. This goes into a command parser/packet parser. Thank you so much for your help.
A CRC calculation doesn't need to be done serially 1-bit at a time. You can essentially "unroll" the calculation to come up with the individual equations for each bit of a parallel CRC generator. With that, you can create a CRC generator that processes 32-bits of input data at a time, matching your datapath width. This should simplify your design as well as make it higher performance (processing each bit serially wouldn't meet your throughput requirements anyway, unless you don't mind holding off incoming data while the hw generates the CRC).
Related
I'm having trouble understanding this error I'm getting in Yosys.
I copied the relevant (I think) code below.
reg signed [15:0] wb1 [0:131071];
reg signed [27:0] currentAttrWB [0:4094];
always #(posedge clk)
currentAttrWB <= wb1[attrWBoffset +: 4094];
ERROR (on last line): "currentAttrWB does map to an unexpanded memory!"
What I'm trying to do is select a sub-range of 4095 16bit words in a long array of 131072 16bit words, using indexed part select (+:). This range would be offset within the long array using attrWBoffset. But obviously there is something I am not understanding. At first I thought this was because currentAttrWB was not large enough to contain 4095 16bit words. But I still get the error when bumping its register up to 28bits.
I guess I need to understand what is meant by expanded and unexpanded.
Thanks for your help.
Yosys does not support memories, i.e. anything defined as reg [x:0] mem[0:y]; being on the left hand side of an assignment. I am not sure if this is a Yosys limitation or a Verilog one, but such a pattern makes little sense for an FPGA application where memories are accessed one element at a time.
If Yosys could implement such a pattern, it would have to map the memory to LUTs and flipflops rather than use dedicated RAM resources, as such a simultaneous transfer isn't possible with block RAM. There are very few FPGAs with over 2 million flipflops available, and if you had one you probably wouldn't want to fill it up with something like this.
What you probably want is a counter that counts from 0 to 4094 and copies one entry every clock cycle until it completes.
I have 2 different clocks, one for reading and one for writing. I am using gray-code to synchronize the pointers with an additional 2 flip-flops for synchronization on the differnt clock of the input signal.
The articles that I have read indicate how to determine the full and empty signal using gray code by comparing the 2MSB for full state and equality for empty state.
However, I need to get the number of elements in the buffer and not just the full or empty signals. Is this possible to do with gray code?
In a comment you ask about the common clock and mentioned that your depth is not a power of two.
First : Edit your original post and add that question and the information.
Second: In an a-synchronous FIFO there is no common clock. The write operations are all run from the write clock. The read operations are all run from the read clock. The critical part is to exchange information between the clock domains. That is where the gray code comes in.
Third: An a-synchronous FIFO uses gray code because only one bit changes at a time. Important there is that the process is circular. Thus the difference between your last and your first value also only differs by one bit:
Counter Gray-code
000 000
001 001
010 011
011 010
100 110
101 111
110 101
111 100 <-- Last
000 000 <-- First again
This works if and only if the depth (and thus the counters) are a power of two. Therefore an a-synchronous FIFO always has a depth which is a power of two.
If you must have a different depth you can add a synchronous FIFO to the beginning or the end. However if you think about it: a FIFO is just an elastic buffer. The behavior if it is e.g. 16 entries deep or 12 entries is not different, other then that you have the potential to store more values.
Last: As supercat said: You convert from binary to Gray code, cross to the other clock domain, then convert Gray code to binary again.
In the end clock domain you can safely compare read and write counters to determine the fill-level of the FIFO.
If the level is needed on both read and write side you have to implement this process twice, once in each clock domain.
The most understandable way to compute the difference between two gray-code values is to synchronize them with a common clock, convert them to binary, and then do an ordinary binary subtraction on them. While it may be possible to design a fully-combinatorial circuit that would compute the difference between two gray-code values in such a way that if all bits of one particular value are stable, and one bit in the other value changes, only one bit in the output would change and all others would remain stable, such a design would be much more complicated than one which simply synchronizes both counters, converts to binary, and subtracts.
I'm working with a serial protocol. Messages are of variable length that is known in advance. On both transmission and reception sides, I have the message saved to a shift register that is as long as the longest possible message.
I need to calculate CRC32 of these registers, the same as for Ethernet, as fast as possible. Since messages are variable length (anything from 12 to 64 bits), I chose serial implementation that should run already in parallel with reception/transmission of the message.
I ran into a problem with organization of data before calculation. As specified here , the data needs to be bit-reversed, padded with 32 zeros and complemented before calculation.
Even if I forget the part about running in parallel with receiving or transmitting data, how can I effectively get only my relevant message from max-length register so that I can pad it before calculation? I know that ideas like
newregister[31:0] <= oldregister[X:0] // X is my variable length
don't work. It's also impossible to have the generate for loop clause that I use to bit-reverse the old vector run variable number of times. I could use a counter to serially move data to desired length, but I cannot afford to lose this much time.
Alternatively, is there an operation that would directly give me the padded and complemented result? I do not even have an idea how to start developing such an idea.
Thanks in advance for any insight.
You've misunderstood how to do a serial CRC; the Python question you quote isn't relevant. You only need a 32-bit shift register, with appropriate feedback taps. You'll get a million hits if you do a Google search for "serial crc" or "ethernet crc". There's at least one Xilinx app note that does the whole thing for you. You'll need to be careful to preload the 32-bit register with the correct value, and whether or not you invert the 32-bit data on completion.
EDIT
The first hit on 'xilinx serial crc' is xapp209, which has the basic answer in fig 1. On top of this, you need the taps, the preload value, whether or not to invert the answer, and the value to check against on reception. I'm sure they used to do all this in another app note, but I can't find it at the moment. The basic references are the Ethernet 802.3 spec (3.2.8 Frame check Sequence field, which was p27 in the original book), and the V42 spec (8.1.1.6.2 32-bit frame check sequence, page 311 in the old CCITT Blue Book). Both give the taps. V42 requires a preload to all 1's, invert of completion, and gives the test value on reception. Warren has a (new) chapter in Hacker's Delight, which shows the taps graphically; see his website.
You only need the online generators to check your solution. Be careful, though: they will generally have different preload values, and may or may not invert the result, and may or may not be bit-reversed.
Since X is a viarable, you will need to bit assignments with a for-loop. The for-loop needs to be inside an always block and the for-loop must static unroll (ie the starting index, ending index, and step value must be constants).
for(i=0; i<32; i=i+1) begin
if (i<X)
newregister[i] <= oldregister[i];
else
newregister[i] <= 1'b0; // pad zeros
end
The Problem
I have a textfile which contains one string per line (linebreak \r\n). This file is secured using CRC16 in two different ways.
CRC16 of blocks of 4096 bytes
CRC16 of blocks of 32768 bytes
Now I have to modify any of these 4096 byte blocks, so it (the block)
contains a specific string
does not change the size of the textfile
has the same CRC value as the original block (same for the 32k block, that contains this 4k block)
Depart of that limitations I may do any modifications to the block that are required to fullfill it as long as the file itself does not break its format. I think it is the best to use any of the completly filled 4k blocks, not the last block, that could be really short.
The Question
How should I start to solve that problem? The first thing I would come up is some kind of bruteforce but wouldn't it take extremly long to find the changes that will result in both CRC values stay the same? Is there probably a mathematical way to solve that?
It should be done in seconds or max. few minutes.
There are math ways to solve this but I don't know them. I'm proposing a brute-force solution:
A block looks like this:
SSSSSSSMMMMEEEEEEE
Each character represents a byte. S = start bytes, M = bytes you can modify, E = end bytes.
After every byte added to the CRC it has a new internal state. You can reuse the checksum state up to that position that you modify. You only need to recalculate the checksum for the modified bytes and all following bytes. So calculate the CRC for the S-part only once.
You don't need to recompute the following bytes either. You just need to check whether the CRC state is the same or different after the modification you made. If it is the same, the entire block will also be the same. If it is different, the entire block is likely to be different (not guaranteed, but you should abort the trial). So you compute the CRC of just the S+M' part (M' being the modified bytes). If it equals the state of CRC(S+M) you won.
That way you have much less data to go through and a recent desktop or server can do the 2^32 trials required in a few minutes. Use parallelism.
Take a look at spoof.c. That will directly solve your problem for the CRC of the 4K block. However you will need to modify the code to solve the problem simultaneously for both the CRC of the 4K block and the CRC of the enclosing 32K block. It is simply a matter of adding more equations to solve. The code is extremely fast, running in O(log(n)) time, where n is the length of the message.
The basic idea is that you will need to solve 32 linear equations over GF(2) in 32 or more unknowns, where each unknown is a bit location that you are permitting to be changed. It is important to provide more than 32 unknowns with which to solve the problem, since if you pick exactly 32, it is not at all unlikely that you will end up with a singular matrix and no solution. The spoof code will automatically find non-singular choices of 32 unknown bit locations out of the > 32 that you provide.
I'm fairly new to hardware design and I'm not sure how to approach this problem. I'm working with a 64 bit wide stream that also has End Of Packet and Start of Packet signals. I need to find a particular byte sequence at an offset from SOP. The goal is to pass the stream to another module, and every time SOP is asserted, a match signal will tell the next module whether or not the byte sequence will be found in the incoming packet.
I think I need to shift the signal into a large shift register (16x64 to fit the search space) and do the comparison on those slices. But then it seems I would also need shift registers for SOP and EOP to keep those signals in sync with the data (match would be asserted along with SOP). Am I on the right track, or is there a better approach?
In that case I think you're onto the right idea. If the downstream module must know if the match exists before receiving the SOP, then I would just make a 16 or 17 stage pipeline of all the data and the two control signals.
If that's too many registers for some kind of area constraint, you might consider using a small ram to hold the packets while you're waiting to do the check.