Where does the term firmware come from? - history

I've heard that the term firmware comes from it being between hardware and software. I have also heard that it refers to software that comes from the firm (company) that builds the hardware.
When was the term first used and what is the origin of the term?

From Wikipedia:
The term "firmware" was coined by
Ascher Opler in a 1967 Datamation
article. Originally, it meant the
microcode – contents of a writable
control store (a specialized small
area of RAM memory), which defined and
implemented the computer's instruction
set....Firmware has evolved to mean
almost any programmable content of a
hardware device, not only machine code
for a processor, but also
configurations and data for
application-specific integrated
circuits (ASICs), programmable logic
devices, etc.

http://en.wikipedia.org/wiki/Firmware#Origin_of_the_term
Firmware
Origin of the term
The term "firmware" was coined by Ascher Opler in a 1967 Datamation article.1 Originally, it meant the microcode – contents of a writable control store (a specialized small area of RAM memory), which defined and implemented the computer's instruction set. The firmware could be reloaded if needed to specialize or modify the instructions that the central processing unit (CPU) could execute. As originally used, firmware was contrasted with hardware (the CPU itself) and software (normal instructions executing on a CPU). It was not composed of CPU machine instructions, but of lower-level microcode involved in the implementation of machine instructions. It existed on the boundary of hardware and software, thus the term firmware.
Later the term was broadened to include any type of microcode, whether in RAM or ROM.
Still later, the term was again broadened in popular usage to denote anything ROM-resident, including processor machine instructions for BIOS, bootstrap loaders, or specialized applications.
When it comes to the subject of updating the firmware to a new version, a typical procedure until the mid 1990s, was to replace a storage medium containing firmware, usually a socketed ROM. Nowadays, this approach is largely abandoned in presence of firmware's capability to overwrite itself in a convenient, purely electronic operation.

A simple query to wikipedia will answer this, really..

Related

I/O Memory Mapping

I am reviewing the essentials of I/O, and while I think I understand most of what's going on, I'm still confused as to how either physical addresses or separate ports are mapped to individual devices. Does the computer poll the bus on system boot, assigning addresses to devices one by one, or are there fixed addresses that are loaded into memory somewhere? If this is done via the BIOS, how is this memory layout information relayed to the operating system?
Thanks for your help!
(this question has been asked and answered before, you should search first)
depends on the platform, you were not specific
some systems, some peripherals in those systems, are hardcoded by the chip/system designers.
for pci(e), as defined by that, you enumerate the bus(es) searching for attached peripherals, and those peripherals configuration spaces (which are defined by the peripheral vendor per their needs) indicate how many and how big they need. For an x86 pc, the bios does this enumeration not the operating system. for other platforms it depends on that platform it may be the bootloader or operating system. but someone has to take the available space (basically hardcoded essentially for that platform knowing the platform and what is used already) and divide it up. for x86 it used to be just one gig that was divided up in the 32 bit days, and still happens on some systems, but for 64 bit systems the bioses open that up to 2gig for everyting, and can place that in a high address space to avoid ram (ever wonder why your 32 bit system with 4gig of dram only had 3gig usable?). naturally a flat memory space is only an illusion, the windows asked for by the pci peripherals can be small windows into their space, video cards with lots of ram for example. you use the csrs to move the window about, kind of like standing in your house looking out a small window and physically moving side to side to see more stuff through the window, but only the size of the window at any one time.
same goes for usb, it is enumerated, the busses are searched and the peripherals answer. with usb though it doesnt map into the address space of the host.
how the operating system finds this information is heavily dependent on the type of system. with bios on an x86 there is a known way to get that info, I think you can also get at the same info in dos (yes dos is still heavily used). for non pcie or usb the operating system drivers have to find the peripherals or just know, if the platform is consistent (address of the serial ports in a pc) or have a way of finding them without harming other devices or crashing. there are the cases where the operating system itself did the enumeration. or the bootloader if that is the place where enumeration happened. but each combination of bootloaders and operating systems on top of various platforms may each have their own different solution, no reason to expect them to be the same.
okay you did say bios and have a bios tag, implying x86 systems. the bios does pci/pcie enumeration at boot time, if you dont setup your bios to know that your operating system is 64 bit it may take a gig out of your lower 4Gig space for the pcie devices (and if you set for 64 bit but install a 32 bit operating system, then you are in trouble there for other reasons). I dont remember, but would assume there are bios calls the operating system can use to find out what the bios had done, should not be hard at all to find this information. Anything not discoverable in this way is likely legacy and hardcoded or uses legacy techniques for being discoverable (isa bus style search for a bios across a range of addresses, etc). the pcie/usb vendor and product id information tell the drivers what is there and from that they have hardcoded offsets into those spaces to complete the addresses needed to communicate with the peripherals.

Are RISC-V instruction execution durations standardized for the sake of cryptographic security?

Some cryptographic functions require a consistent execution duration to avoid timing attacks. I read that such functions targeting x86 are hard to write for reasons potentially including the emulated nature of the ISA and out-of-order processing. Therefore preventing timing attacks on the x86 is not easy because it depends on complex, and/or unknown factors in any given moment.
In a standard RISC-V core, are instruction timings predictably consistent relative to each another? What about in the case of a standard core with out-of-order processing or proprietary implementations of the base ISA?
RISC-V could be implemented in a machine with deterministic latencies; this has to do more with the implementation than the ISA.
See this project for a RISC-V implementation that supports predictable-latency execution: https://github.com/pretis/flexpret. It was developed for the embedded space, but would seem to be suitable for your proposed application as well.
It is important differentiate an ISA from an implementation of it. Nothing in the RISC-V spec mandates the instruction execution latencies. Most implementations will do whatever gives them the highest performance. A security paranoid processor could be designed to have consistent latencies for all instructions and yet still conform to the RISC-V spec.
A nice feature of RISC-V is that plenty of opcode space was intentionally left unused to make room for ISA extensions. There appear to be no publicly announced plans for a crypto extension, so this feature could be incorporated into a crypto extension when it is made if needed.
I'm not sure about core, but I've read that in RISC-V Cryptography Extensions Volume I (riscv-crypto-spec-scalar-v1.0.1.pdf), cryptographic instructions are required of this:
This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
So in the context of cryptographic-specific instructions, yes.
"is there a standard for how long each instruction should take to complete relative to other operations?"
No.
Such behavior will be consistent with all other major ISAs as far as I am aware of.
An out-of-order processor will execute instructions as their dependencies resolve. Cache misses and the potentially random nature of issue select will mean that successive loop iterations will behave differently with regards to when instructions execute relative to one another. Any number of other micro-architecture issues get in the way, including instruction fetch misses, dcache misses, resource stalls causing replays, etc. Even a typical in-order core will face such issues.
how does the RISC-V team plan to address potential standard or non-standard complexity that a cryptographic library developer must find some way to address?
I can't speak for the RISC-V team, but if I may hazard a guess, I suspect that this (and similar) areas will involve the wider community to discuss and address such issues.

what is the corresponding register in SPARC architecture for x86 CR3?

I know that in x86 architecture, I can read CR3 register in kernel context
to follow page directory of kernel.
now I am trying to do the same work from linux with SPARC architecture.
how can I access page directory of kernel in SPARC?
what is the corresponding register in SPARC as x86 CR3?
are their paging mechanism same??
ps. what about ARM?, I have some documents about these, but I need more...
thank you in advance.
On SPARC TLB-faulting is handled in software, so there is nothing like CR3. You'll have to check the current process data structure to find the required information.
ARM on the other hand uses hardware translation, the MMU is handled as a coprocessor using MRC/MCR for accessing the Translation Table Base Register. See ARMs website for more information: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0056d/BABIDADG.html
The SPARC specification as such does not mandate the use of a MMU nor require a specific implementation of a MMU; it merely defines the interface between CPU and MMU (largely concerned about traps that need to be generated via MMU activity).
That said - and I have to state here that I only know about Sun's / Fujitsu's SPARC CPUs, not about the embedded stuff (LEON and predecessors) - SPARC CPUs as far back as sun4 workstation CPUs of 1990 have had MMUs.
As with many non-mandated SPARC CPU features, control of the MMU happens through what's called Address Space Identifiers (ASIs).
These ASIs are a feature of the SPARC architecture that can best be described as a mix between x86 segmentation and memory mapped registers. Their use changes what an "address" means to a SPARC CPU (that's just like the use of a segment [reg] in x86 changes the meaning of the "address") - except that there's no configurable "descriptor table" behind ASI address ranges, but usually hardware-specific control registers. I.e. they're not "mapped" into the ordinary physical address space, but into alternate address ranges.
First - in sun4, sun4c, sun4d and sun4m architectures (32bit sparcv7), the MMU was called srmmu (SPARC reference MMU) and implemented a two-level hardware table walk. This is deprecated and I don't remember out of my head what the control regs for this were.
When Sun created the sun4u architecture, a hardware-implemented translation table walk was considered both too-high an overhead as well as too memory-intensive; therefore the table-walking MMU implementation was completely yanked in favour of implementing most (but not all) of MMU functionality in software. In particular, the only thing programmable into the hardware are the contents of the TLB, translation lookaside buffer - meaning if a mapping isn't readily cached in the TLB, a MMU miss trap occurs and the trap handler will perform a table lookup, reprogramming the TLB at the end (so that re-issuing the instruction afterwards will succeed). That's where the name, sfmmu (Software MMU) comes from. The MMU is controlled through ASI_MMU, while the actual context register is CPU-specific ...
See, for reference:
pg. 351ff of the 2007 UltraSPARC architecture reference manual - this lists the ASI numbers involved with MMU control.
OpenSolaris sourcecode, use of ASI_MMU_CTX.
OpenSolaris sourcecode, sfmmu_asm.s (beware of eyes bleeding and brain becoming mushy)
OpenSolaris sourcecode, hat_sfmmu.c
that's the software-side hash chain / translation table walks; it might have the dubious honor of being the largest sourcefile in the Solaris kernel ...
Re, ARM: I suggest you ask the question again, as a separate post. Over the existance of ARM multiple (both mmu-less and with-mmu) implementations have evolved (there is, for example, the ARMv7 instruction set based Cortex-A8/M3 with/out MMU). MMU specifications by ARM themselves are usually called VMSA (Virtual Memory Systems Architecture) and there are several revisions of it. This posting is already too long to contain more details ;-)

What is Intel microcode?

From what I've read it's used to fix bugs in the CPU without modifying the BIOS.
From my basic knowledge of Assembly I know that assembly instructions are split into microcodes internally by the CPU and executed accordingly. But intel somehow gives access to make some updates while the system is up and running.
Anyone has more info on them? Is there any documentation regarding what can it be done with microcodes and how can they be used?
EDIT:
I've read the wikipedia article: didn't figure out how can I write some on my own, and what uses it would have.
In older times, microcode was heavily used in CPU: every single instruction was split into microcode. This enabled relatively complex instruction sets in modest CPU (consider that a Motorola 68000, with its many operand modes and eight 32-bit registers, fits in 40000 transistors, whereas a single-core modern x86 will have more than a hundred millions). This is not true anymore. For performance reasons, most instructions are now "hardwired": their interpretation is performed by inflexible circuitry, outside of any microcode.
In a recent x86, it is plausible that some complex instructions such as fsin (which computes the sine function on a floating point value) are implemented with microcode, but simple instructions (including integer multiplication with imul) are not. This limits what can be achieved with custom microcode.
That being said, microcode format is not only very specific to the specific processor model (e.g. microcode for a Pentium III and a Pentium IV cannot be freely exchanged with eachother -- and, of course, using Intel microcode for an AMD processor is out of the question), but it is also a severely protected secret. Intel has published the method by which an operating system or a motherboard BIOS may update the microcode (it must be done after each hard reset; the update is kept in volatile RAM) but the microcode contents are undocumented. The Intel® 64 and IA-32 Architectures Software Developer’s Manual (volume 3a) describes the update procedure (section 9.11 "microcode update facilities") but states that the actual microcode is "encrypted" and clock-full of checksums. The wording is vague enough that just about any kind of cryptographic protection may be hidden, but the bottom-line is that it is not currently possible, for people other than Intel, to write and try some custom microcode.
If the "encryption" does not include a digital (asymmetric) signature and/or if the people at Intel botched the protection system somehow, then it may be conceivable that some remarkable reverse-engineering effort could potentially enable one to produce such microcode, but, given the probably limited applicability (since most instructions are hardwired), chances are that this would not buy much, as far as programming power is concerned.
Think loosely about a virtual machine or simulator where say for example qemu-arm can simulate an arm processor on an x86 host, ideally the software running on the simulated arm has no idea that it isnt a real arm. Take this idea to the level where the whole chip is designed such that it always looks like you are an x86, the software never knows there is some programmable items inside the chip. And that some other processor inside is somewhat designed for the purpose of implementing/simulating an x86. Supposedly the popular AMD 29000 product line just went away because the hardware team and perhaps processor/core became the guts of an early x86 clone. Transmeta, where Linus worked, had a vliw processor that was made to be a low power x86. In that case the translation layer was not (as much of) a secret. Vliw, very long instruction word, RISC taken to the extreme, is the kind of thing you build for this kind of task.
No it is not as much of an emulation layer as I am implying, there isnt some linux running there with a qemu program inside each chip. It is somewhere between hardwired where there is no software/microcode in the middle and a full blow emulation. The programmable bits may be like an fpga, programmable gates, or it may be software or programmable state machines, meaning not-programmable gates, just what runs on the gates is programmable.
Your non-x86, non-big iron type processors. Take ARM for example, are hardwired, no microcode. Microcontrollers, PIC, MSP430, AVR, assume these are not microcoded. Basically do not assume all processors are microcoded, few if any processor families are. It is just that the ones we deal with in PCs have been and may still be, so it may feel like they all are.
As fun as it may sound to play with this microcode, it is likely very specific to the processor family, and you likely will never gain access to how it works unless you work for Intel or AMD, each of which likely have their own internals. So you would need to get a job at one of the two, then work your way through the trenches to become one of what is likely an elite team that does this work. And once you get that far your career is trapped, your skills may be limited to one job at one company. You might have more fun programming individual gpus on a video card, something that is documented or at least has tools, something you can do today without spending 10 years at AMD or Intel to possibly get nowhere.

Why doesn't Linux use the hardware context switch via the TSS?

I read the following statement:
The x86 architecture includes a
specific segment type called the Task
State Segment (TSS), to store hardware
contexts. Although Linux doesn't use
hardware context switches, it is
nonetheless forced to set up a TSS for
each distinct CPU in the system.
I am wondering:
Why doesn't Linux use the hardware support for context switch?
Isn't the hardware approach much faster than the software approach?
Is there any OS which does take advantage of the hardware context switch? Does windows use it?
At last and as always, thanks for your patience and reply.
-----------Added--------------
http://wiki.osdev.org/Context_Switching got some explanation.
People as confused as me could take a look at it. 8^)
The x86 TSS is very slow for hardware multitasking and offers almost no benefits when compared to software task switching. (In fact, I think doing it manually beats the TSS a lot of times)
The TSS is known also for being annoying and tedious to work with and it is not portable, even to x86-64. Linux aims at working on multiple architectures so they probably opted to use software task switching because it can be written in a machine independent way. Also, Software task switching provides a lot more power over what can be done and is generally easier to setup than the TSS is.
I believe Windows 3.1 used the TSS, but at least the NT >5 kernel does not. I do not know of any Unix-like OS that uses the TSS.
Do note that the TSS is mandatory. The thing that OSs do though is create a single TSS entry(per processor) and everytime they need to switch tasks, they just change out this single TSS. And also the only fields used in the TSS by software task switching is ESP0 and SS0. This is used to get to ring 0 from ring 3 code for interrupts. Without a TSS, there would be no known Ring 0 stack which would of course lead to a GPF and eventually triple fault.
Linux used to use HW-based switching, in the pre-1.3 timeframe iirc. I believe sw-based context switching turned out to be faster, and it is more flexible.
Another reason may have been minimizing arch-specific code. The first port of Linux to a non-x86 architecture was Alpha. Alpha didn't have TSS, so more code could be shared if all archs used SW switching. (Just a guess.) Unfortunately the kernel changelogs for the 1.2-1.3 kernel period are not well-preserved, so I can't be more specific.
Linux doesn't use a segmented memory model, so this segmentation specific feature isn't used.
x86 CPUs have many different kinds of hardware support for context switching, so the distinction isn't hardware vs software, but more how does an OS use the various hardware features available. It isn't necessary to use them all.
Linux is so efficiency focussed that you can bet that someone has profiled every option that is possible, and that the options currently used are the best available compromise.

Resources