FIFO almost full and empty conditions Verilog - verilog

Suppose i am having a FIFO with depth 32 and width 8 bit.There is a valid bit A in all 32 locations.If this bit is 1 in all locations we have full condition and if 0 it will be empty condition.My Requirement is if this bit A at one location is 0 and all locations of this bit A is 1. when reaches to 30th location it should generate Almost_full condition.
Help me out please.
Thanks in Advance.

So you have a 32 bit vector and you want to check only one of the bits is 0. If speed is not much of a concern I will use a for loop to do this.
If speed is a concern I will get this done in 5 iterations. You can do this by divide and check method. Check two 16 bit words in parallel. Then divide this into two 8 bits and check them in parallel. And depending on where the zero is divide that particular 8 bit into 4 bits and check and so on.
If at any point you have zeros in both the parts, then you can exit the checking and conclude that almost_full = 0;

Related

Bounded Buffer problem using semaphores in Linux

I'm trying to understand the bounded buffer problem(consumer/producer) more clearly, As I know one of the solutions to this problem is using 3 semaphores
1. FULL-which holds full places in the array
2. EMPTY-which holds the available places in the array
3. MUTEX-which holds number 1 or 0
could it be possible to explain further what does that mean for example if the number of FULL is negative? or the number of empty is negative?
Could that be that mutex will not be 1 or 0? if so what does that mean?

MInimal time to compute the minimal value

I was asked such question, what is the minimal time needed to compute the minimal value of an unsorted array of 32 integers, given that you have 8 cores and each comparison takes 1 minute. My solution is 6 minutes, assuming that each core operates independently. Divide the array into 8 portions, each has 4 integers, 8 cores concurrently compute the local min of each portion, takes 3 minutes, (3 comparisons in each portion). Then 4 cores to compute the local min of those 8 local mins, 1 minute. Then 2 cores to compute the 4 local mins, 1 minute, then 1 core to compute the global min among the remaining 2 mins, 1 minute. Therefore, the total amount is 6 minutes. However, it didn't seem to be the answer that the interviewer was looking for. So what do you guys think about it? Thank you
If you assume that the program is CPU-bound, which is fairly ridiculous, but seems to be where you were going with your analysis, then you need to decide how to divide the work to gain something by multithreading.
8 pieces of 4 integers each seems arbitrary. Interviewers usually like to see a thought process. Being mathematically general, let us compute total orderings over subsets of the problem. How hard is it to compute a total ordering, and what is the payoff?
Total ordering of N items, picking arbitrarily when two items are equal, requires N*(N-1)/2 comparisons and eliminates (N-1) items. Let's make a table.
N = 2: 1 comparison, 1 elimination.
N = 3: 3 comparisons, 2 eliminations.
N = 4: 6 comparisons, 3 eliminations.
Clearly it's most efficient to work with pairs (N = 2), but the other operations are useful if resources would otherwise be idle.
Minute 1-3: Eliminate 24 candidates using operations with N = 2, 8 at a time.
Minute 4: Now there are 8 candidates. Keeping N = 2 would leave 4 cores idle. Setting N = 3 uses 2 more cores per operation and yields 1 more elimination. So do two operations with N = 3 and one with N = 2, eliminating 2+2+1 = 5 candidates. Or, use 6 cores with N = 4 and two with N = 1 to eliminate 3+1+1 = 5. The result is the same.
Minute 5: Only 3 candidates remain, so set N = 3 for the last round.
If you keep the CPUs busy, it takes 5 minutes using a mix of two higher-level abstractions. More energy is spent because this isn't the most efficient way to solve the problem, but it is faster.
I'm going to assume that comparing two "integers" is a black box that takes 1 minute to complete, but we can cache those comparisons and only do any particular comparison once.
There's not much you can do until you're down to 8 candidates (3 minutes). But you don't want to leave cores sitting idle if you can help it. Let's say that the candidates are numbered 1 through 8. Then in minutes 4 you can compare:
1v2 3v4 5v6 7v8 AND 1v5 2v6 3v7 4v8
If we're lucky, this eliminates 6 candidates, and we can use minute 5 to to pick the winner.
If we're not lucky, this leaves 4 candidares (for example, 1, 3, 6, and 8), and that step didn't gain us anything over the original approach. In minute 5, we need to throw everything at it (to beat the original approach). But there are 8 cores, and C(4,2)=6 possible pairings. So we can make every possible comparison (and leave 2 cores idle), and get our winner in 5 minutes.
Those are really big integers, too big to fit into CPU cache, so multithreading doesn't really help you — this problem is I/O bound. (I suppose it depends on the specifics of the I/O bottleneck, but let's not pick nits.)
Since you need exactly N-1 comparisons, the answer is 31.

CRC16 collision (2 CRC values of blocks of different size)

The Problem
I have a textfile which contains one string per line (linebreak \r\n). This file is secured using CRC16 in two different ways.
CRC16 of blocks of 4096 bytes
CRC16 of blocks of 32768 bytes
Now I have to modify any of these 4096 byte blocks, so it (the block)
contains a specific string
does not change the size of the textfile
has the same CRC value as the original block (same for the 32k block, that contains this 4k block)
Depart of that limitations I may do any modifications to the block that are required to fullfill it as long as the file itself does not break its format. I think it is the best to use any of the completly filled 4k blocks, not the last block, that could be really short.
The Question
How should I start to solve that problem? The first thing I would come up is some kind of bruteforce but wouldn't it take extremly long to find the changes that will result in both CRC values stay the same? Is there probably a mathematical way to solve that?
It should be done in seconds or max. few minutes.
There are math ways to solve this but I don't know them. I'm proposing a brute-force solution:
A block looks like this:
SSSSSSSMMMMEEEEEEE
Each character represents a byte. S = start bytes, M = bytes you can modify, E = end bytes.
After every byte added to the CRC it has a new internal state. You can reuse the checksum state up to that position that you modify. You only need to recalculate the checksum for the modified bytes and all following bytes. So calculate the CRC for the S-part only once.
You don't need to recompute the following bytes either. You just need to check whether the CRC state is the same or different after the modification you made. If it is the same, the entire block will also be the same. If it is different, the entire block is likely to be different (not guaranteed, but you should abort the trial). So you compute the CRC of just the S+M' part (M' being the modified bytes). If it equals the state of CRC(S+M) you won.
That way you have much less data to go through and a recent desktop or server can do the 2^32 trials required in a few minutes. Use parallelism.
Take a look at spoof.c. That will directly solve your problem for the CRC of the 4K block. However you will need to modify the code to solve the problem simultaneously for both the CRC of the 4K block and the CRC of the enclosing 32K block. It is simply a matter of adding more equations to solve. The code is extremely fast, running in O(log(n)) time, where n is the length of the message.
The basic idea is that you will need to solve 32 linear equations over GF(2) in 32 or more unknowns, where each unknown is a bit location that you are permitting to be changed. It is important to provide more than 32 unknowns with which to solve the problem, since if you pick exactly 32, it is not at all unlikely that you will end up with a singular matrix and no solution. The spoof code will automatically find non-singular choices of 32 unknown bit locations out of the > 32 that you provide.

bitshift large strings for encoding QR Codes

As an example, suppose a QR Code data stream contains 55 data words (each one byte in length) and 15 error correction words (again one byte). The data stream begins with a 12 bit header and ends with four 0 bits. So, 12 + 4 bits of header/footer and 15 bytes of error correction, leaves me 53 bytes to hold 53 alphanumeric characters. The 53 bytes of data and 15 bytes of ec are supplied in a string of length 68 (str68). The problem seems simple enough - concatenate 2 bytes of (right-shifted) header data with str68 and then left shift the entire 70 bytes by 4 bits.
This is the first time in many years of programming that I have ever needed to do something like this, I am a c and bit shifting noob, so please be gentle... I have done a little investigation and so far have not been able to figure out how to bitshift 70 bytes of data; any help would be greatly appreciated.
Larger QR codes can hold 2000 bytes of data...
You need to look at this 4 bits at a time.
The first 4 bits you need to worry about are the lower bits of the first byte. Fortunately this is an easy case because they need to end up in the upper bits of the first byte.
The next 4 bits you need to worry about are the upper bits of the second byte. These need to end up as the lower bits of the first byte.
The next 4 bits you need to worry about are the lower bits of the second byte. But fortunately you already know how to do this because you already did it for the first byte.
You continue in this vein until you have dealt with the lower bytes of the 70th byte.

Variable substitution faster than in-line integer in Vic-20 basic?

The following two (functionally equivalent) programs are taken from an old issue of Compute's Gazette. The primary difference is that program 1 puts the target base memory locations (7680 and 38400) in-line, whereas program 2 assigns them to a variable first.
Program 1 runs about 50% slower than program 2. Why? I would think that the extra variable retrieval would add time, not subtract it!
10 PRINT"[CLR]":A=0:TI$="000000"
20 POKE 7680+A,81:POKE 38400+A,6:IF A=505 THEN GOTO 40
30 A=A+1:GOTO 20
40 PRINT TI/60:END
Program 1
10 PRINT "[CLR]":A=0:B=7600:C=38400:TI$="000000"
20 POKE B+A,81:POKE C+A,6:IF A=505 THEN GOTO 40
30 A=A+1:GOTO 20
40 PRINT TI/60:END
Program 2
The reason is that BASIC is fully interpreted here, so the strings "7680" and "38400" need to be converted to binary integers EVERY TIME line 20 is reached (506 times in this program). In program 2, they're converted once and stored in B. So as long as the search-for-and-fetch of B is faster than convert-string-to-binary, program 2 will be faster.
If you were to use a BASIC compiler (not sure if one exists for VIC-20, but it would be a cool retro-programming project), then the programs would likely be the same speed, or perhaps 1 might be slightly faster, depending on what optimizations the compiler did.
It's from page 76 of this issue: http://www.scribd.com/doc/33728028/Compute-Gazette-Issue-01-1983-Jul
I used to love this magazine. It actually says a 30% improvement. Look at what's happening in program 2 and it becomes clear, because you are looping a lot using variables the program is doing all the memory allocation upfront to calculate memory addresses. When you do the slower approach each iteration has to allocate memory for the highlighted below as part of calculating out the memory address:
POKE 7680+A,81:POKE 38400+A
This is just the nature of the BASIC Interpreter on the VIC.
Accessing the first defined variable will be fast; the second will be a little slower, etc. Parsing multi-digit constants requires the interpreter to perform repeated multiplication by ten. I don't know what the exact tradeoffs are between variables and constants, but short variable names use less space than multi-digit constants. Incidentally, the constant zero may be parsed more quickly if written as a single decimal point (with no digits) than written as a digit zero.

Resources