Optimize this comparator for better synthesis - verilog

I have a module which is basically a LUT whose input is 64 bits. The LUT always block consists of a case statement which compares the input to over 200 different integers. The default case in the case statement checks if the input is > 100 or not before assigning the output a default value.
My problem is that when I synthesize, it leads to a 65 bit comparator, and I was wondering if there are better ways of doing it so that a large comparator isn't synthesized.
Here's my code snippet:
always #(in)
case (in)
-100: out <= 495050;
-99: out <= 500000;
99: out <= 99500000;
100: out <= 99504950;
if (in > 100)
out <= 99504950;
out <= 495050;

Assuming that in is a 64 bit number, what you can do is to chop it off such that you only have to 'compare' the lowest few bits, and then you can do quick checks to see if the number is outside of the range needed.
For example, let's just chop off in at 8 bits, and assign it to an 8 bit signed register. This should allow you to represent between -128 and 127.
You can test if the full number is larger than 127 by: !in[63] && (|in[62:8]) (check if any upper bit is 1, and the MSB is not set).
You can test if the full number is less than -128 by: in[63] && !(&in[62:8]) (check if any upper bit is 0, and the MSB is set).
Now you know three things:
if the number is larger than 127
if the number is between 127 and -128
and if the number is less than -128.
You should be able to use a small 8-bit LUT for the inbetween case, or use your default values if it's in either of the upper ranges.
Note I might expect a good synthesizer to do this automatically for you, but if you look at the generated netlist and it's too large you can try this to see if it gives you a better result.

It seems like You have calculated table with some function values of input x = [-100;100]. If so, it would be better to store them in memory one after another starting from some base address. So to read them, You can write base + X + 100 value on the address bus, and obtain value you need.
In case you need a gigantic multiplexer, you may want to try using a "parallel" case directive.
As for comparator in "default" - I have the same problem, so I am waiting for an answer.
Is there a method in verilog to start reading ROM data from a specific address?

I've designed a ROM for coefficients and an up-down counter to read these coefficients one by one but there are two cases for the starting point where a specific number of coefficients for type1 and another set of coefficients for type 2 ... so for example for type 1 I want to start from address zero and for type 2 start from address 30 ... I remember that someone told me it is possible using some # or something but I don't remember what is the actual way to do this
this for my counter code
module UDcounter(input clk,rst,up,GItype,
output reg [5:0]addr);
always #(posedge clk,posedge rst)
if (rst)
if (GItype) //assume 1 is a long GI type
// addr=6'b000000;
if (up)
else addr=addr-1;
else //for short GI
if (up)
else addr=addr-1;
the error here is that every clock cycle it start addressing from addr=0 for example and the output address is always 1 (for the +1) line
So what I understood from your question is that you want to design a ROM which will store coefficients.
Going by your question I assume that you have two types of coefficient viz type a & type b stored in the ROM, say the starting address for type a is 0 and for type b is 30. To go about accessing the ROM you would want two counters viz addr_ptr_a and addr_ptr_b which will act as address pointers, lets assume that the ROM has about 60 address locations then addr_ptr_a will count from 0 to 29 and addr_ptr_b will count from 30 to 60.
The GItype signal can be used to determine which counter to enable.
I am assuming a sequential read operation, for a random read operation you would need a separate logic to generate the read address.

Loop Convergence - Verilog Synthesis

I am trying to successively subtract a particular number to get the last digit of the number (without division). For example when q=54, we get q=4 after the loop. Same goes for q=205, output is q=5.
The iteration should converge logically. However, I am getting an error:
"[Synth 8-3380] loop condition does not converge after 2000 iterations"
I checked the post - Use of For loop in always block. It says that the number of iterations in a loop must be fixed.
Then I tried to implement this loop with fixed iterations as well like below (just for checking if this atleast synthesizes):
But the above does not work too. Getting the same error "[Synth 8-3380] loop condition does not converge after 2000 iterations". Logically, it should be 10 iterations as I had declared the value of loopco=8.
Any suggestions on how to implement the above functionality in verilog will be helpful.
That code can not be synthesized. For synthesis the loop has to have a compile time known number of iterations. Thus it has to know how many subtractions to make. In this case it can't.
Never forget that for synthesis you are converting a language to hardware. In this case the tool needs to generate the code for N subtractions but the value of N is not known.
You are already stating that you are trying to avoid division. That suggest to me you know the generic division operator can not be synthesized. Trying to work around that using repeated subtract will not work. You should have been suspicious: If it was the easy it would have been done by now.
You could build it yourself if you know the upper limit of q (which you do from the number of bits):
wire [5:0] q;
reg [3:0] rem;
always #( * )
if (q<6'd10)
rem = q;
else if (q<6'd20)
rem = q - 6'd10;
else if (q<6'd30)
rem = q - 6'd20;
rem = q - 6'd60;
Just noticed this link which pops up next to your question which shows it has been asked in the past:
How to NOT use while() loops in verilog (for synthesis)?

Use of For loop in always block

I am writing a Verilog code for calculating the number of digits in a decimal number. In the code below I have initialised the value of c to be equal to a.
I was able to get the simulation results correctly but unable to syntesise and the error is due to 'c=a'. How can I get rid of the error ? Is there any other logic to calculate the number of digits ?
Error: [Synth 8-3380] loop condition does not converge after 2000 iterations
Code :-
module numdigits(a,b);
parameter n=100;
input [0:n-1] a;
reg [0:n-1] d,c;
always #(*)
In order for a for loop to be synthesisable, it must be static: that is, the maximum number of iterations round the loop must be fixed. It might seem that there is a maximum number of iterations of your loop, given that a has a fixed number of bits, but remember that your synthesiser doesn't simulate your code, so it cannot tell that.
You need to refactor your code; you need to write it in such a way so that the maximum number of loop iterations is fixed. In other words, the number of iterations of the loop must be fixed, but you can jump out early if you wish (using the disable statement).

How do you move non-zero elements in an array to the top in a single cycle?

I have the following 8-bit array:
How do I make it to the following in a single cycle (without iterating the element one by one)?
I know how to do it in software (MATLAB), but I'm not sure how to do it with combinational logic.
% initialise temporary vectors
TempType = zeros(maxType,1);
TempStart = zeros(maxType,1);
TempStop = zeros(maxType,1);
index = 1;
% remove zero elements from the middle
for j = 1:maxType
if (PreType(j) > 0 && PreStart(j) > 0 && PreStop(j) > 0)
TempType(index) = PreType(j);
TempStart(index) = PreStart(j);
TempStop(index) = PreStop(j);
index = index + 1;
I think any simplified sorting algorithm can do the job. For example, here is a modified bubble sort solution implemented in a single cycle:
module MoveZeros;
parameter W1 = 8;
parameter W2 = 10;
integer i, j;
logic [W1-1:0] array[W2-1:0] = {0,4,0,0,5,0,2,0,0,1};
logic [W1-1:0] temp;
always_comb begin
for (i=W2-1 ; i >=0 ; i=i-1)
for (j=W2-1 ; j >= 0 ; j=j-1)
if (array[j]==0 && array[j-1] != 0) begin
temp = array[j];
array[j] = array [j-1];
array[j-1] = temp;
# array = '{4, 5, 2, 1, 0, 0, 0, 0, 0, 0}
Working example on edaplayground. Depending on your cycle time and the width of your input array (W2), you may want to break this algorithm into multiple cycles.
Synthesis tools unroll loops, therefore, the synthesized circuit will have O(W2^2) comparators and multiplexers, which can explode. Hence for bigger arrays, a multi-cycle solution is the way to go.
This is not an answer, which would take several hours of work, but SO's comments are not up to this sort of question. You should ask on comp.arch.fpga, if it's still alive.
Start by finding a datasheet for one of the old asynchronous fall-through FIFOs; these will include a circuit diagram. You don't really want to do anything like this, because the stage-to-stage handshaking is hairy, and you can't apply all 8 values simultaneously, but it'll give you ideas for a more synchronous implementation. Adapting a fall-through FIFO to do what you want is trivial - just ignore zero inputs.
If you can go up to 8 clock cycles, a more synchronous implementation is easy, with relatively limited hardware.
One cycle doesn't look too difficult, but will use more hardware. How sure are you that you must do it in one cycle? How much hardware can you use? If you've got a free PLL/DLL I'd be inclined to use that to get an 8x clock.
Actually, with the benefit of more than 2 minutes thought, this seems pretty easy, even in one cycle.
Say you've got 8 registers with your 8 inputs (I0-I7), and 8 output registers (Q0-Q7). Each output register has associated logic which selects an input register for source data. The Q0 selector finds the lowest-numbered I register which contains non-zero data. The Q1 selector finds the next highest I register which contains non-zero data, and so on. Each selector drives a mux which loads the corresponding output register. Q0 requires an 8-1 mux (eight 8-bit inputs from I0-I7, one 8-bit output which goes to the input of Q0). Q1 requires a 7-1 mux (the inputs can only be I1-I7), and so on, until Q7, which doesn't require a mux at all (it can only be driven by I7).
The only smarts are in the selectors which find the source data for each output register. The Q7 selector is trivial; Q7 can only select I7, and only if all of I0-I7 contain non-zero data. Q6 is a bit more complicated, and so on.
If you can't see how to code a selector, ask specifically about that one in a new question, to avoid all the comments.

Rounding floating point numbers in Verilog?

So I am working with 64 bit floating point numbers on Verilog for synthesis, all are in the region of {-1,1}. Now, I am trying to create something like a Histogram which I guess I could do by creating a RAM to work like a 2D array. But I am facing issues with the rounding.
For Example,
I have a value 0.94394(FltPt). I would like to convert this into just,
0.82394 = 8 and 0.8862 =9 (All data are in 64 bit flt pt)
so that I can access that specific address on the RAM.
What would be the the most ideal way to round this, using another multiplier is too much overhead. Is there some trick I could do by truncating a part of the bits? Should I convert them to Fixed Point?
Two options I can think of:
The simplest is to change your bins so the boundaries are powers of 2. Then you can just use the some of bits of the input directly to address your histogram. I would have to go look at the floating point format to know which bits to use.
The other possibility is to just do a bunch of comparisons to see what bin to put it. L
You would have do this for both of the coordinates.
reg [4:0] ram_aadr;
always #* begin
if(data < -.95)
ram_addr = 5'd0;
else if(data < -.85)
ram_addr = 5'd1;
else if(data < .95)
ram_addr = 5'd19;
ram_addr = 5'd20;
