RISC-V ISA most important 40 instructions to be implemented - riscv

I have to implement RISC-V architecture (ISA for pipelined processor) in C++. As all ISA can not be implemented, can some one tell me about most important approx. 40 Instructions set which i should include?
please help

The most important subset is RV32I. It is about 40 instruction in size.
https://riscv.org/specifications/
Chapter 2
RV32I was designed to be suficient to form a compiler target and to
support modern operating system environments. The ISA was also
designed to reduce the hardware required in a minimal
implementation. RV32I contains 47 unique instructions, though a
simple implementation might cover the eight SCALL/SBREAK/CSRR*
instructions with a single SYSTEM hardware instruction that always
traps and might be able to implement the FENCE and FENCE.I
in- structions as NOPs, reducing hardware instruction count to 38
total. RV32I can emulate almost any other ISA extension (except the A
extension, which requires additional hardware support for atomicity).

Related

Are RISC-V instruction execution durations standardized for the sake of cryptographic security?

Some cryptographic functions require a consistent execution duration to avoid timing attacks. I read that such functions targeting x86 are hard to write for reasons potentially including the emulated nature of the ISA and out-of-order processing. Therefore preventing timing attacks on the x86 is not easy because it depends on complex, and/or unknown factors in any given moment.
In a standard RISC-V core, are instruction timings predictably consistent relative to each another? What about in the case of a standard core with out-of-order processing or proprietary implementations of the base ISA?
RISC-V could be implemented in a machine with deterministic latencies; this has to do more with the implementation than the ISA.
See this project for a RISC-V implementation that supports predictable-latency execution: https://github.com/pretis/flexpret. It was developed for the embedded space, but would seem to be suitable for your proposed application as well.
It is important differentiate an ISA from an implementation of it. Nothing in the RISC-V spec mandates the instruction execution latencies. Most implementations will do whatever gives them the highest performance. A security paranoid processor could be designed to have consistent latencies for all instructions and yet still conform to the RISC-V spec.
A nice feature of RISC-V is that plenty of opcode space was intentionally left unused to make room for ISA extensions. There appear to be no publicly announced plans for a crypto extension, so this feature could be incorporated into a crypto extension when it is made if needed.
I'm not sure about core, but I've read that in RISC-V Cryptography Extensions Volume I (riscv-crypto-spec-scalar-v1.0.1.pdf), cryptographic instructions are required of this:
This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
So in the context of cryptographic-specific instructions, yes.
"is there a standard for how long each instruction should take to complete relative to other operations?"
No.
Such behavior will be consistent with all other major ISAs as far as I am aware of.
An out-of-order processor will execute instructions as their dependencies resolve. Cache misses and the potentially random nature of issue select will mean that successive loop iterations will behave differently with regards to when instructions execute relative to one another. Any number of other micro-architecture issues get in the way, including instruction fetch misses, dcache misses, resource stalls causing replays, etc. Even a typical in-order core will face such issues.
how does the RISC-V team plan to address potential standard or non-standard complexity that a cryptographic library developer must find some way to address?
I can't speak for the RISC-V team, but if I may hazard a guess, I suspect that this (and similar) areas will involve the wider community to discuss and address such issues.

What can make a program not capable to take advantages of 64 bit system?

I am looking into Google V8 Javascript Engine. It is said that they are having problems for porting to 64 bit systems.
What kind of programming or programming constraints can make a program a 32-bit or 64-bit specific, apart from building and testing them on 64 bit machine with 64 bit settings ?
You may check this wiki which says:-
The main disadvantage of 64-bit architectures is that, relative to
32-bit architectures, the same data occupies more space in memory (due
to longer pointers and possibly other types, and alignment padding).
This increases the memory requirements of a given process and can have
implications for efficient processor cache utilization. Maintaining a
partial 32-bit model is one way to handle this, and is in general
reasonably effective. For example, the z/OS operating system takes
this approach, requiring program code to reside in 31-bit address
spaces (the high order bit is not used in address calculation on the
underlying hardware platform) while data objects can optionally reside
in 64-bit regions.

What is the difference between x64 and IA-64?

I was on Microsoft's website and noticed two different installers, one for x64 and one for IA-64. Reference:Installing the .NET Framework 4.5, 4.5.1
My understanding is that IA-64 is a subclass of x64, so I'm curious why it would have a separate installer.
x64 is used as a short term for the 64 bit extensions of the "classical" x86 architecture; almost any "normal" PC produced in the last years have a processor based on such architecture.
AMD invented the AMD64 extensions; Intel was more or less forced to implement them, and called them first IA-32e, then EM64T and finally Intel 64 (actually, the AMD and Intel extensions aren't exactly the same, but they are almost identical).
Many people also call this stuff x86-64, to have a vendor-independent name and to stress the fact that it's the 64 bit evolution of the x86 architecture. All the "regular" PCs that are sold with "64 bit processors" run on x86-64 architecture.
IA-64 (Intel Architecture 64) is an almost completely unrelated 64 bit architecture (also known as Itanium), developed by Intel initially for high-end servers. It was said that Itanium could have been a replacement for the x86 architecture, but this architecture didn't have much success (for various reasons), so it's unlikely that you'll ever need the IA-64 installers.
For more information, you may have a look at the wikipedia articles on x86-64 and Itanium.
IA-64 is the Intel Itanium Architecture. This is a Very Long Instruction Word (VLIW) processor instruction set.
x86_64 is the normal 64-bit architecture that is used by processors inside every laptop / desktop in today's computers. This processor is a dynamic processor.
The main difference between these two is that
In VLIW, the compiler resolves the dependencies between instructions and schedules them appropriately. The processor merely executes them.
With a dynamic processor, the compiler just schedules the instructions without worrying about dependencies. The processor takes care of dependencies, reorders them and executes them appropriately.
VLIW code is dependent on each chip's internal architecture. The compiler needs to know that information. The advantage of them is that it can extract much more parallelism than dynamic processors can give.
The code is independent on each chip's internal architecture for dynamic processors. It just needs to follow the instruction set. So code compiled on one machine can run on other machines very easily. The disadvantage though is that limited parallelism can be exploited from dynamic processors. And the internal logic and design is very complex and intricate than VLIW.
Nevertheless, dynamic processors are used today mostly by consumers (individuals), so they can run code compiled / generated on any machine. VLIW processors are used by servers and enterprises because of the parallelism they can produce.
they are different
IA-64 is itanium - an architecture for servers
x64 is what 64bit intel core and amd cpus implement
x64 is short for x86-64 which is an extension of the x86 instruction set.
IA-64 is for the Itanium 64 bit Architecture (by Intel)
IA-64 is for computers running Intel Itanium 64 bit processors. They do not support running 32 bit applications like x64 processors do. A special version of Windows is needed to run on these processors, thus the two different installers.
They have different instruction set, this is the key point.

What is Intel microcode?

From what I've read it's used to fix bugs in the CPU without modifying the BIOS.
From my basic knowledge of Assembly I know that assembly instructions are split into microcodes internally by the CPU and executed accordingly. But intel somehow gives access to make some updates while the system is up and running.
Anyone has more info on them? Is there any documentation regarding what can it be done with microcodes and how can they be used?
EDIT:
I've read the wikipedia article: didn't figure out how can I write some on my own, and what uses it would have.
In older times, microcode was heavily used in CPU: every single instruction was split into microcode. This enabled relatively complex instruction sets in modest CPU (consider that a Motorola 68000, with its many operand modes and eight 32-bit registers, fits in 40000 transistors, whereas a single-core modern x86 will have more than a hundred millions). This is not true anymore. For performance reasons, most instructions are now "hardwired": their interpretation is performed by inflexible circuitry, outside of any microcode.
In a recent x86, it is plausible that some complex instructions such as fsin (which computes the sine function on a floating point value) are implemented with microcode, but simple instructions (including integer multiplication with imul) are not. This limits what can be achieved with custom microcode.
That being said, microcode format is not only very specific to the specific processor model (e.g. microcode for a Pentium III and a Pentium IV cannot be freely exchanged with eachother -- and, of course, using Intel microcode for an AMD processor is out of the question), but it is also a severely protected secret. Intel has published the method by which an operating system or a motherboard BIOS may update the microcode (it must be done after each hard reset; the update is kept in volatile RAM) but the microcode contents are undocumented. The Intel® 64 and IA-32 Architectures Software Developer’s Manual (volume 3a) describes the update procedure (section 9.11 "microcode update facilities") but states that the actual microcode is "encrypted" and clock-full of checksums. The wording is vague enough that just about any kind of cryptographic protection may be hidden, but the bottom-line is that it is not currently possible, for people other than Intel, to write and try some custom microcode.
If the "encryption" does not include a digital (asymmetric) signature and/or if the people at Intel botched the protection system somehow, then it may be conceivable that some remarkable reverse-engineering effort could potentially enable one to produce such microcode, but, given the probably limited applicability (since most instructions are hardwired), chances are that this would not buy much, as far as programming power is concerned.
Think loosely about a virtual machine or simulator where say for example qemu-arm can simulate an arm processor on an x86 host, ideally the software running on the simulated arm has no idea that it isnt a real arm. Take this idea to the level where the whole chip is designed such that it always looks like you are an x86, the software never knows there is some programmable items inside the chip. And that some other processor inside is somewhat designed for the purpose of implementing/simulating an x86. Supposedly the popular AMD 29000 product line just went away because the hardware team and perhaps processor/core became the guts of an early x86 clone. Transmeta, where Linus worked, had a vliw processor that was made to be a low power x86. In that case the translation layer was not (as much of) a secret. Vliw, very long instruction word, RISC taken to the extreme, is the kind of thing you build for this kind of task.
No it is not as much of an emulation layer as I am implying, there isnt some linux running there with a qemu program inside each chip. It is somewhere between hardwired where there is no software/microcode in the middle and a full blow emulation. The programmable bits may be like an fpga, programmable gates, or it may be software or programmable state machines, meaning not-programmable gates, just what runs on the gates is programmable.
Your non-x86, non-big iron type processors. Take ARM for example, are hardwired, no microcode. Microcontrollers, PIC, MSP430, AVR, assume these are not microcoded. Basically do not assume all processors are microcoded, few if any processor families are. It is just that the ones we deal with in PCs have been and may still be, so it may feel like they all are.
As fun as it may sound to play with this microcode, it is likely very specific to the processor family, and you likely will never gain access to how it works unless you work for Intel or AMD, each of which likely have their own internals. So you would need to get a job at one of the two, then work your way through the trenches to become one of what is likely an elite team that does this work. And once you get that far your career is trapped, your skills may be limited to one job at one company. You might have more fun programming individual gpus on a video card, something that is documented or at least has tools, something you can do today without spending 10 years at AMD or Intel to possibly get nowhere.

Why doesn't Linux use the hardware context switch via the TSS?

I read the following statement:
The x86 architecture includes a
specific segment type called the Task
State Segment (TSS), to store hardware
contexts. Although Linux doesn't use
hardware context switches, it is
nonetheless forced to set up a TSS for
each distinct CPU in the system.
I am wondering:
Why doesn't Linux use the hardware support for context switch?
Isn't the hardware approach much faster than the software approach?
Is there any OS which does take advantage of the hardware context switch? Does windows use it?
At last and as always, thanks for your patience and reply.
-----------Added--------------
http://wiki.osdev.org/Context_Switching got some explanation.
People as confused as me could take a look at it. 8^)
The x86 TSS is very slow for hardware multitasking and offers almost no benefits when compared to software task switching. (In fact, I think doing it manually beats the TSS a lot of times)
The TSS is known also for being annoying and tedious to work with and it is not portable, even to x86-64. Linux aims at working on multiple architectures so they probably opted to use software task switching because it can be written in a machine independent way. Also, Software task switching provides a lot more power over what can be done and is generally easier to setup than the TSS is.
I believe Windows 3.1 used the TSS, but at least the NT >5 kernel does not. I do not know of any Unix-like OS that uses the TSS.
Do note that the TSS is mandatory. The thing that OSs do though is create a single TSS entry(per processor) and everytime they need to switch tasks, they just change out this single TSS. And also the only fields used in the TSS by software task switching is ESP0 and SS0. This is used to get to ring 0 from ring 3 code for interrupts. Without a TSS, there would be no known Ring 0 stack which would of course lead to a GPF and eventually triple fault.
Linux used to use HW-based switching, in the pre-1.3 timeframe iirc. I believe sw-based context switching turned out to be faster, and it is more flexible.
Another reason may have been minimizing arch-specific code. The first port of Linux to a non-x86 architecture was Alpha. Alpha didn't have TSS, so more code could be shared if all archs used SW switching. (Just a guess.) Unfortunately the kernel changelogs for the 1.2-1.3 kernel period are not well-preserved, so I can't be more specific.
Linux doesn't use a segmented memory model, so this segmentation specific feature isn't used.
x86 CPUs have many different kinds of hardware support for context switching, so the distinction isn't hardware vs software, but more how does an OS use the various hardware features available. It isn't necessary to use them all.
Linux is so efficiency focussed that you can bet that someone has profiled every option that is possible, and that the options currently used are the best available compromise.

Resources