I am taking a class in microprocessing, and having some trouble writing a program that will hold a value in a port for two seconds before moving on to the next port.
Can any one help this make more sense?
I have thought of using NOP but realized thats a bit unrealistic, I have tried ACALL DELAY but for some reason its pulling up as an unknown command.
I am stumped at this point and would appreciate any help I could get.
I am using the DS89C450 With a clock of 11 MHz, i've tried asking the professor and he tells me its a peice of cake you should have this no problem, but reading and writing code is breand new to me ive only been doing it for two weeks. when i look at the book its almost like it written in chinese its hard to make sense of it, my fellow class mates are just as stummped as i am, i figured my final resort would be to ask someone online that might of had a similar problem or someone who has a little more insight that might be able to pont me in the right direction.
I know i need to load each port with the specified value my problems lies in the switching of the ports giving them the 2 second delay.
My program look likes this MOV P0, #33H MOV P1, #7FH MOV P2, B7H MOV P3, EFH so with these four ports being loaded with these values i need P0 to go to P1, P1-P2 and so on when getting to P3 its value needs to go to P0 and loop it all. i was going to use SJMP to loop it back to the start so the program is always running
While doing this there is the two second delay where each value only stays in each port for only two seconds thats what still fuzzy, does the rest sound right ?
i have done something similar in PIC 16f84 micro-controller
to make a delay you have 2 ways either use interrupts or loops
since you know the Instructions_per_second you can use a loop to generate the required number of instructions that takes the required time
this link illustrates how to determine the loop indexes (as you might need nested loops if the number of instructions required is required .. in PIC i had to make 1 million instruction to make a delay of 1 second)
I've never done this with that particular chip (and I don't know the assembly syntax it supports), but a pseudocode approach would be something like this:
Load initial values into ports
Initialize counter with (delay in seconds * clock ticks per second) / (clock ticks in loop)
While counter != 0
Decrement counter
Swap port values:
P3 -> temp, P2 -> P3, P1 -> P2, P0 -> P1, temp -> P0
Loop (4 times?)
I think this is all you really need for structure. Based on my 10 minute reading on 8051 assembly, the delay loop would look like:
MOV A, b6h ; ~91 ticks/sec # 11 ms/tick
DELAY: DEC A
JNZ DELAY ; NOP-type delay loop
Related
I have 2 different clocks, one for reading and one for writing. I am using gray-code to synchronize the pointers with an additional 2 flip-flops for synchronization on the differnt clock of the input signal.
The articles that I have read indicate how to determine the full and empty signal using gray code by comparing the 2MSB for full state and equality for empty state.
However, I need to get the number of elements in the buffer and not just the full or empty signals. Is this possible to do with gray code?
In a comment you ask about the common clock and mentioned that your depth is not a power of two.
First : Edit your original post and add that question and the information.
Second: In an a-synchronous FIFO there is no common clock. The write operations are all run from the write clock. The read operations are all run from the read clock. The critical part is to exchange information between the clock domains. That is where the gray code comes in.
Third: An a-synchronous FIFO uses gray code because only one bit changes at a time. Important there is that the process is circular. Thus the difference between your last and your first value also only differs by one bit:
Counter Gray-code
000 000
001 001
010 011
011 010
100 110
101 111
110 101
111 100 <-- Last
000 000 <-- First again
This works if and only if the depth (and thus the counters) are a power of two. Therefore an a-synchronous FIFO always has a depth which is a power of two.
If you must have a different depth you can add a synchronous FIFO to the beginning or the end. However if you think about it: a FIFO is just an elastic buffer. The behavior if it is e.g. 16 entries deep or 12 entries is not different, other then that you have the potential to store more values.
Last: As supercat said: You convert from binary to Gray code, cross to the other clock domain, then convert Gray code to binary again.
In the end clock domain you can safely compare read and write counters to determine the fill-level of the FIFO.
If the level is needed on both read and write side you have to implement this process twice, once in each clock domain.
The most understandable way to compute the difference between two gray-code values is to synchronize them with a common clock, convert them to binary, and then do an ordinary binary subtraction on them. While it may be possible to design a fully-combinatorial circuit that would compute the difference between two gray-code values in such a way that if all bits of one particular value are stable, and one bit in the other value changes, only one bit in the output would change and all others would remain stable, such a design would be much more complicated than one which simply synchronizes both counters, converts to binary, and subtracts.
I'm working with a serial protocol. Messages are of variable length that is known in advance. On both transmission and reception sides, I have the message saved to a shift register that is as long as the longest possible message.
I need to calculate CRC32 of these registers, the same as for Ethernet, as fast as possible. Since messages are variable length (anything from 12 to 64 bits), I chose serial implementation that should run already in parallel with reception/transmission of the message.
I ran into a problem with organization of data before calculation. As specified here , the data needs to be bit-reversed, padded with 32 zeros and complemented before calculation.
Even if I forget the part about running in parallel with receiving or transmitting data, how can I effectively get only my relevant message from max-length register so that I can pad it before calculation? I know that ideas like
newregister[31:0] <= oldregister[X:0] // X is my variable length
don't work. It's also impossible to have the generate for loop clause that I use to bit-reverse the old vector run variable number of times. I could use a counter to serially move data to desired length, but I cannot afford to lose this much time.
Alternatively, is there an operation that would directly give me the padded and complemented result? I do not even have an idea how to start developing such an idea.
Thanks in advance for any insight.
You've misunderstood how to do a serial CRC; the Python question you quote isn't relevant. You only need a 32-bit shift register, with appropriate feedback taps. You'll get a million hits if you do a Google search for "serial crc" or "ethernet crc". There's at least one Xilinx app note that does the whole thing for you. You'll need to be careful to preload the 32-bit register with the correct value, and whether or not you invert the 32-bit data on completion.
EDIT
The first hit on 'xilinx serial crc' is xapp209, which has the basic answer in fig 1. On top of this, you need the taps, the preload value, whether or not to invert the answer, and the value to check against on reception. I'm sure they used to do all this in another app note, but I can't find it at the moment. The basic references are the Ethernet 802.3 spec (3.2.8 Frame check Sequence field, which was p27 in the original book), and the V42 spec (8.1.1.6.2 32-bit frame check sequence, page 311 in the old CCITT Blue Book). Both give the taps. V42 requires a preload to all 1's, invert of completion, and gives the test value on reception. Warren has a (new) chapter in Hacker's Delight, which shows the taps graphically; see his website.
You only need the online generators to check your solution. Be careful, though: they will generally have different preload values, and may or may not invert the result, and may or may not be bit-reversed.
Since X is a viarable, you will need to bit assignments with a for-loop. The for-loop needs to be inside an always block and the for-loop must static unroll (ie the starting index, ending index, and step value must be constants).
for(i=0; i<32; i=i+1) begin
if (i<X)
newregister[i] <= oldregister[i];
else
newregister[i] <= 1'b0; // pad zeros
end
Thank you for reading this and for all of your help. Anyway...I am trying to implement a crc16 with polynomial x^16 + x^12 + x^5 + 1 in verilog. The problem I have encountered is that I don't get the entire packet of data at one point in time. I get a 32 bit word at a time and the number of words is dynamic but is at least 4 words and can be as high as 16384 words or higher. The time is not much of an issue because I am running on a 150 MHz clk and the input is coming in at most a 33 MHz clk but may be a 10 MHz. This does not really affect me because I am first accepting the data via a FIFO.
I have been trying to develop an FSM but have really hit a roadblock. One idea is for me to wait for all the data and then just input the entire thing as one big data packet; however, this seems really inefficient and I just don't think I need to do this. Plus it could take up valuable resources. Another way I was playing with was to input the first word and do the XOR operation. Then when the input data only has 1 to 2 bits left that are not xored (not sure if that is worded correctly) I would input the next word. Upon the input I would continue to compute the CRC followed by another input until the last word is imputed into the module.
With this method I would need to implement a counter or a shift register in some fashion. Anyway, any help would be nice. This goes into a command parser/packet parser. Thank you so much for your help.
A CRC calculation doesn't need to be done serially 1-bit at a time. You can essentially "unroll" the calculation to come up with the individual equations for each bit of a parallel CRC generator. With that, you can create a CRC generator that processes 32-bits of input data at a time, matching your datapath width. This should simplify your design as well as make it higher performance (processing each bit serially wouldn't meet your throughput requirements anyway, unless you don't mind holding off incoming data while the hw generates the CRC).
I'm writing a simple GB emulator (wow, now that's something new, isn't it), since I'm really taking my first steps in emu.
What i don't seem to understand is how to correctly implement the CPU cycle and unconditional jumps.
Consider the command JP nn (unconditional jump to memory adress pointed out), like JP 1000h, if I have a basic loop of:
increment PC
read opcode
execute command
Then after the JP opcode has been read and the command executed (read 1000h from memory and set PC = 1000h), the PC gets incremented and becomes 1001h, thus resulting in bad emulation.
tl;dr How do you emulate jumps in emulators, so that PC value stays correct, when having cpu loops that increment PC?
The PC should be incremented as an 'atomic' operation every time it is used to return a byte. This means immediate operands as well as opcodes.
In your example, the PC would be used three times, once for the opcode and twice for the two operand bytes. By the time the CPU has fetched the three bytes and is in a position to load the PC, the PC is already pointing to the next instruction opcode after the second operand but, since actually implementing the instruction reloads the PC, it doesn't matter.
Move increment PC to the end of the loop, and have it performed conditionally depending on the opcode?
I know next to nothing about emulation, but two obvious approaches spring to mind.
Instead of hardcoding PC += 1 into the main loop, let the evaluation if each opcode return the next PC value (or the offset, or a flag saying whether to increment it, or etc). Then the difference between jumps and other opcodes (their effect on the program counter) is definable along with everything else about them.
Knowing that the main loop will always increment the PC by 1, just have the implementation of jumps set the PC to target - 1 rather than target.
The following two (functionally equivalent) programs are taken from an old issue of Compute's Gazette. The primary difference is that program 1 puts the target base memory locations (7680 and 38400) in-line, whereas program 2 assigns them to a variable first.
Program 1 runs about 50% slower than program 2. Why? I would think that the extra variable retrieval would add time, not subtract it!
10 PRINT"[CLR]":A=0:TI$="000000"
20 POKE 7680+A,81:POKE 38400+A,6:IF A=505 THEN GOTO 40
30 A=A+1:GOTO 20
40 PRINT TI/60:END
Program 1
10 PRINT "[CLR]":A=0:B=7600:C=38400:TI$="000000"
20 POKE B+A,81:POKE C+A,6:IF A=505 THEN GOTO 40
30 A=A+1:GOTO 20
40 PRINT TI/60:END
Program 2
The reason is that BASIC is fully interpreted here, so the strings "7680" and "38400" need to be converted to binary integers EVERY TIME line 20 is reached (506 times in this program). In program 2, they're converted once and stored in B. So as long as the search-for-and-fetch of B is faster than convert-string-to-binary, program 2 will be faster.
If you were to use a BASIC compiler (not sure if one exists for VIC-20, but it would be a cool retro-programming project), then the programs would likely be the same speed, or perhaps 1 might be slightly faster, depending on what optimizations the compiler did.
It's from page 76 of this issue: http://www.scribd.com/doc/33728028/Compute-Gazette-Issue-01-1983-Jul
I used to love this magazine. It actually says a 30% improvement. Look at what's happening in program 2 and it becomes clear, because you are looping a lot using variables the program is doing all the memory allocation upfront to calculate memory addresses. When you do the slower approach each iteration has to allocate memory for the highlighted below as part of calculating out the memory address:
POKE 7680+A,81:POKE 38400+A
This is just the nature of the BASIC Interpreter on the VIC.
Accessing the first defined variable will be fast; the second will be a little slower, etc. Parsing multi-digit constants requires the interpreter to perform repeated multiplication by ten. I don't know what the exact tradeoffs are between variables and constants, but short variable names use less space than multi-digit constants. Incidentally, the constant zero may be parsed more quickly if written as a single decimal point (with no digits) than written as a digit zero.