Converting 12 MHz system clock signal on FPGA to 1 MHz Signal output at a 50% duty cycle.
I understand that I need to divide by 2 # 50/50 duty cycle to get 6 MHz, and then divide by 2 again to get to 3 MHz, and then divide by 3 to get to 1 MHz. Is this the correct method?
Also, how would I implement this in RTL Verilog code?
Is this the correct method?
No. First, operating on clocks in logic is often difficult to route appropriately, especially in multiple stages. Second, it is especially difficult to divide a clock by 3 and get a 50% duty cycle without either negative-edge or DDR flip-flops, both of which are often unavailable in FPGA fabric.
The correct method is to use your FPGA's clocking resources. Most modern FPGAs will have one or more onboard DLLs or PLLs which can be used to manage clock signals.
On Xilinx parts, these resources are known as the DCM, PLL, and/or MMCM, and can be instantiated using the ClockGen IP core.
On Altera/Intel parts, these resources can be configured through the PLL and other megafunctions.
On Lattice parts, these resources are known as the sysCLOCK PLL, and can be configured using IPexpress.
Related
I have a signal that checks if the data is available in memory block and does some computation/logic (Which is irrelevant).
I want a signal called "START_SIG" to go high X-time (nanoseconds) before the first rising edge of the clock cycle that is at 10 MHz Frequency. This only goes high if it detects there is data available and does further computation as needed.
Now, how can this be done? Also, I cannot set a delay since this must be RTL Verilog. Therefore, it must be synthensizable on an FPGA (Artix7 Series).
Any suggestions?
I suspect an XY problem, if start sig is produced by logic in the same clock domain as your processing then timing will likely be met without any work on your part (10MHz is dead slow in FPGA terms), but if you really needed to do something like this there are a few ways (But seriously you are doing it wrong!).
FPGA logic is usually synchronous to one or more clocks,generally needing vernier control within a clock period is a sign of doing it wrong.
Use a {PLL/MCM/Whatever} to generate two clocks, one dead slow at 10Mhz, and something much faster, then count the fast one from the previous edge of the 10MHz clock to get your timing.
Use an MCMPLL or such (platform dependent) to generate two 10Mhz clocks with a small phase shift, then gate one of em.
Use a long line of inverter pairs (attribute KEEP (VHDL But verilog will have something similar) will be your friend), calibrate against your known clock periodically (it will drift with temperature, day of the week and sign of the zodiac), this is neat for things like time to digital converters, possibly combined with option two for fine trimming. Shades of ring oscs about this one, but whatever works.
I use this ALU block diagram as a learning material : http://www.righto.com/2013/09/the-z-80-has-4-bit-alu-heres-how-it.html
I am not familiar with electronics. I am currently believing that a clock cycle is needed to move data from registers or latch to another register or latch, eventually throught a net of logical gates.
So here is my understanding of what happens for and ADD :
Cycle 1 : move registers to internal latchs
Cycle 2 : move low nibbles internal latchs to internal result latch (through the ALU)
Cycle 3, in parallell :
move high nibbles internal latchs to destination register (through the ALU)
move internal result latch to register
I think operations cycle 3 are done in parallell because there are two 4 bits bus (for high and low nibbles) and the register bus seems to be 8 bits.
Per the z80 data sheet:
The PC is placed on the address bus at the beginning of the M1 cycle.
One half clock cycle later the MREQ signal goes active. At this time
the address to the memory has had time to stabilize so that the
falling edge of MREQ can be used directly as a chip enable clock to
dynamic memories. The RD line also goes active to indicate that the
memory read data should be enabled onto the CPU data bus. The CPU
samples the data from the memory on the data bus with the rising edge
of the clock of state T3 and this same edge is used by the CPU to turn
off the RD and MREQ signals. Thus, the data has already been sampled
by the CPU before the RD signal becomes inactive. Clock state T3 and
T4 of a fetch cycle are used to refresh dynamic memories. The CPU uses
this time to decode and execute the fetched instruction so that no
other operation could be performed at this time.
So it appears mostly to be about memory interfacing to read the opcode rather than actually doing the addition — decode and execution occurs entirely within clock states T3 and T4. Given that the z80 has a 4-bit ALU, it would take two operations to perform an 8-bit addition. Which likely explains the use of two cycles.
I've to design an IO-module for an industrial control system in a CAN-bus network.
The IO-pins (10-40 pins) have to be all multi purpose: digital and analog in- and output. Further the pins have to serve as a communication port when needed: Modbus RTU, modbus TCP, DALI, etc. (Analog input max 7 channels)
I understand that all of this options need different HW; like galvanic isolation or different voltage levels etc.
Costs have to be as low as possible.
I was thinking of making this bit of additional hardware as a plug-in module or as an optional additional sandwich PCB.
My question is: Is an FPGA the right choice for this because of the reconfigurable purpose of the IO-pins? (Xilinx, altera/intel and microsemi have FPGA's with ADC's)
You didn't specify if IOs have to be reconfigurable at compile or runtime. In most cases, you cannot change IO properties (type, voltage, terminations,etc.) once HDL code is compiled into FPGA bitstream.
What is the maximum clock frequency that can be generated with Altera PLLs in DE1-SOC board?
I can't find a reference to a maximum PLL frequency in any of the Cyclone V documentation. However, it appears (from my own experimentation) that the Altera PLL megafunction/IP Core won't product a generated clock with a frequency faster than 1.6 GHz (1600 MHz).
That said, I doubt you'll be able to clock any CV circuitry (even fully pipelined) that quickly.
I am doing some performance profiling for part of my program. And I try to measure the execution with the following four methods. Interestingly they show different results and I don't fully understand their differences. My CPU is Intel(R) Core(TM) i7-4770. System is Ubuntu 14.04. Thanks in advance for any explanation.
Method 1:
Use the gettimeofday() function, result is in seconds
Method 2:
Use the rdtsc instruction similar to https://stackoverflow.com/a/14019158/3721062
Method 3 and 4 exploits Intel's Performance Counter Monitor (PCM) API
Method 3:
Use PCM's
uint64 getCycles(const CounterStateType & before, const CounterStateType &after)
Its description (I don't quite understand):
Computes the number core clock cycles when signal on a specific core is running (not halted)
Returns number of used cycles (halted cyles are not counted). The counter does not advance in the following conditions:
an ACPI C-state is other than C0 for normal operation
HLT
STPCLK+ pin is asserted
being throttled by TM1
during the frequency switching phase of a performance state transition
The performance counter for this event counts across performance state transitions using different core clock frequencies
Method 4:
Use PCM's
uint64 getInvariantTSC (const CounterStateType & before, const CounterStateType & after)
Its description:
Computes number of invariant time stamp counter ticks.
This counter counts irrespectively of C-, P- or T-states
Two samples runs generate result as follows:
(Method 1 is in seconds. Methods 2~4 are divided by a (same) number to show a per-item cost).
0.016489 0.533603 0.588103 4.15136
0.020374 0.659265 0.730308 5.15672
Some observations:
The ratio of Method 1 over Method 2 is very consistent, while the others are not. i.e., 0.016489/0.533603 = 0.020374/0.659265. Assuming gettimeofday() is sufficiently accurate, the rdtsc method exhibits the "invariant" property. (Yep I read from Internet that current generation of Intel CPU has this feature for rdtsc.)
Methods 3 reports higher than Method 2. I guess its somehow different from the TSC. But what is it?
Methods 4 is the most confusing one. It reports an order of magnitude larger number than Methods 2 and 3. Shouldn't it be also kind of cycle counts? Let alone it carries the "Invariant" name.
gettimeofday() is not designed for measuring time intervals. Don't use it for that purpose.
If you need wall time intervals, use the POSIX monotonic clock. If you need CPU time spent by a particular process or thread, use the POSIX process time or thread time clocks. See man clock_gettime.
PCM API is great for fine tuned performance measurement when you know exactly what you are doing. Which is generally obtaining a variety of separate memory, core, cache, low-power, ... performance figures. Don't start messing with it if you are not sure what exact services you need from it that you can't get from clock_gettime.