How to test if your Linux Support SSE2 - linux

Actually I have 2 questions:
Is SSE2 Compatibility a CPU issue or Compiler issue?
How to check if your CPU or Compiler support SSE2?
I am using GCC Version:
gcc (GCC) 4.5.1
When I tried to compile a code it give me this error:
$ gcc -O3 -msse2 -fno-strict-aliasing -DHAVE_SSE2=1 -DMEXP=19937 -o test-sse2-M19937 test.c
cc1: error: unrecognized command line option "-msse2"
And cpuinfo showed this:
processor : 0
vendor : GenuineIntel
arch : IA-64
family : 32
model : 1
model name : Dual-Core Intel(R) Itanium(R) Processor 9140M
revision : 1
archrev : 0
features : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs : 4
cpu MHz : 1669.000503
itc MHz : 416.875000
BogoMIPS : 3325.95
siblings : 2
physical id: 0
core id : 0
thread id : 0

The CPU needs to be able to execute SSE2 instrcutions, and the compiler needs to be able to generate them.
To check if your cpu supports SSE2:
# cat /proc/cpuinfo
It will be somewhere under "flags" if it is supported.
Update: So you cpu doesn't support it.
For the compiler:
# gcc -dumpmachine
# gcc --version
Target of your compiler needs to a kind of x86*, since only this cpus support sse2, which is part of the x86 instruction set
AND
gcc version needs to be >= 3.1 (most likely, since this is about 10 years old or something) for supporting SSE2.
Update: So your compiler doesn't support it on this target, it will if you are using it as a cross compiler for x86.

It's both. The compiler/assembler need to be able to emit/handle SSE2 instructions, and then the CPU needs to support them. If your binary has SSE2 instructions with no conditions attached and you try to run it on a Pentium II you are out of luck.
The best way is to check your GCC manual. For example my GCC manpage refers to the -msse2 option which will allow you to explicitly enable SSE2 instructions in the binaries. Any relatively recent GCC or ICC should support it. As for your cpu, check the flags line in /proc/cpuinfo.
It would be best, though, to have checks in your code using cpuid etc, so that SSE2 sections can be disabled in CPUs that do not support it and your code can fall back on a more common instruction set.
EDIT:
Note that your compiler needs to either be a native compiler running on a x86 system, or a cross-compiler for x86. Otherwise it will not have the necessary options to compile binaries for x86 processors, which includes anything with SSE2.
In your case the CPU does not support x86 at all. Depending on your Linux distribution, there might be packages with the Intel IA32EL emulation layer for x86-software-on-IA64, which may allow you to run x86 software.
Therefore you have the following options:
Use a cross-compiler that will run on IA64 and produce binaries for x86. Cross-compiler toolchains are not an easy thing to setup though, because you need way more than just the compiler (binutils, libraries etc).
Use Intel IA32EL to run a native x86 compiler. I don't know how you would go about installing a native x86 toolchain and all the libraries that your project needs in your distributions does not support it directly. Perhaps a full-blown chroot'ed installation of an x86 distribution ?
Then if you want to test your build on this system you have to install Intel's IA32EL for Linux.
EDIT2:
I suppose you could also run a full x86 linux distribution on an emulator like Bochs or QEMU (with no virtualization of course). You are definitely not going to be dazzled by the resulting speeds though.

Another trick not yet mentioned is do:
gcc -march=native -dM -E - </dev/null | grep SSE2
and get:
#define __SSE2_MATH__ 1
#define __SSE2__ 1
With -march=native you are checking both your compiler and your CPU. If you give a different -march for a specific CPU, like -march=bonnell you can check for that CPU.
Consult your gcc docs for the correct version of gcc:
https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Submodel-Options.html

use asm to check the existence of sse2
enter code here
static
bool HaveSSE2()
{
return false;
__asm mov EAX,1 ;
__asm cpuid ;
__asm test EDX, 4000000h ;test whether bit 26 is set
__asm jnz yes ;yes
return false;
yes:
return true;
}

Try running:
lshw -class processor | grep -w sse2
and look under the processor section.

Related

Regarding QT creator in Lubuntu 16.04 (i386) ICOP board

I have installed qtcreator in lubuntu 16.04 and when trying to open it, i am getting an error
This program requires an x86 processor that supports SSE2 extension, at least a Pentium 4 or newer
Aborted (core dumped)
can someone help me to solve this problem.
I'm using ICOP borad with Lubuntu 16.04
You'll probably need to compile from source with -mno-sse (Or just -mno-sse2 if your CPU has SSE1 but not SSE2). If you're not cross-compiling from a faster machine, use -march=native to enable all the instruction sets your CPU supports, and not enable any that it doesn't.)
The 32-bit qtcreator package probably enables SSE2 on purpose, because detected it and printed an error instead of just dying with a SIGILL. Likely it can be built from source (or the Ubuntu source package) with a different config.
Apparently 32-bit Ubuntu is intended to run on CPUs without SSE2, according to this guide posted in the Ubuntu forums. (It's talking about old desktops with old mainstream CPUs, not modern embedded, but same difference.) So this might be considered a bug.
gcc's 32-bit code-gen does default to assuming cmov support and other P6 (Pentium Pro / Pentium II) instructions, but I guess Ubuntu configures their 32-bit gcc to not enable -msse2 by default. So you couldn't even boot the kernel on a P5 Pentium or older. (Makes sense, if you have SSE2 you probably have an x86-64 capable CPU; running on 32-bit-only CPUs is one of the few reasons for not just using x86-64 Ubuntu. But some people do use 32-bit systems for some reason on modern HW, and gimping it too much by disabling cmov and other P6 new instructions might be undesirable.)
A few years ago (like 2013 maybe?) I booted an Ubuntu live CD on an Athlon XP (SSE1 but not SSE2). It mostly booted to the desktop, but there was a popup from one program that it had died with SIGILL. i.e. it tried to run an SSE2 instruction and got an illegal-instruction exception. I guess this would be considered a but if 32-bit Ubuntu really does aim to support CPUs without SSE2

nodejs on ALIX / AMD Geode running voyage linux leads to "invalid machine instruction"

Result of below investigation is: Recent Node.js is not portable to AMD Geode (or other non-SSE x86) Processors !!!
I dived deeper into the code and got stuck in ia32-assembler implementation, that deeply integrates SSE/SSE2 instructions into their code (macros, macros, macros,...). The main consequence is, that you can not run a recent version of node.js on AMD geode processors due to the lack of newer instuction set extensions. The fallback to 387 arithmetics only works for the node.js code, but not for the javascript V8 compiler implementation that it depends on. Adjusting V8 to support non-SSE x86 processors is a pain and a lot of effort.
If someone produces proof of the contrary, I would be really happy to hear about ;-)
Investigation History
I have a running ALIX.2D13 (https://www.pcengines.ch), which has an AMD Geode LX as the main processor. It runs voyage linux, a debian jessi based distribution for resource restricted embedded devices.
root#voyage:~# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 10
model name : Geode(TM) Integrated Processor by AMD PCS
stepping : 2
cpu MHz : 498.004
cache size : 128 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow 3dnowprefetch vmmcall
bugs : sysret_ss_attrs
bogomips : 996.00
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
When I install nodejs 8.x following the instructions on https://nodejs.org/en/download/package-manager/, I get some "invalid machine instruction" (not sure if correct, but translated from german error output). This also happens, when I download the binary for 32-bit x86 and also when I compile it manually.
After the answers below, I changed the compiler flags in deps/v8/gypfiles/toolchain.gypi by removing -msse2 and adding -march=geode -mtune=geode. And now I get the same error but with a stack trace:
root#voyage:~/GIT/node# ./node
#
# Fatal error in ../deps/v8/src/ia32/assembler-ia32.cc, line 109
# Check failed: cpu.has_sse2().
#
==== C stack trace ===============================
./node(v8::base::debug::StackTrace::StackTrace()+0x12) [0x908df36]
./node() [0x8f2b0c3]
./node(V8_Fatal+0x58) [0x908b559]
./node(v8::internal::CpuFeatures::ProbeImpl(bool)+0x19a) [0x8de6d08]
./node(v8::internal::V8::InitializeOncePerProcessImpl()+0x96) [0x8d8daf0]
./node(v8::base::CallOnceImpl(int*, void (*)(void*), void*)+0x35) [0x908bdf5]
./node(v8::internal::V8::Initialize()+0x21) [0x8d8db6d]
./node(v8::V8::Initialize()+0xb) [0x86700a1]
./node(node::Start(int, char**)+0xd3) [0x8e89f27]
./node(main+0x67) [0x846845c]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb74fc723]
./node() [0x846a09c]
Ungültiger Maschinenbefehl
root#voyage:~/GIT/node#
If you now look into this file, you will find the following
... [line 107-110]
void CpuFeatures::ProbeImpl(bool cross_compile) {
base::CPU cpu;
CHECK(cpu.has_sse2()); // SSE2 support is mandatory.
CHECK(cpu.has_cmov()); // CMOV support is mandatory.
...
I commented the line but still the "Ungültiger Maschinenbefehl" (Invalid machine instruction).
This is what gdb ./node shows (executed run):
root#voyage:~/GIT/node# gdb ./node
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
[...]
This GDB was configured as "i586-linux-gnu".
[...]
Reading symbols from ./node...done.
(gdb) run
Starting program: /root/GIT/node/node
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb7ce2b40 (LWP 29876)]
[New Thread 0xb74e2b40 (LWP 29877)]
[New Thread 0xb6ce2b40 (LWP 29878)]
[New Thread 0xb64e2b40 (LWP 29879)]
Program received signal SIGILL, Illegal instruction.
0x287a23c0 in ?? ()
(gdb)
I think, it is necessary to compile with debug symbols...
make clean
make CFLAGS="-g"
No chance to resolve all SSE/SSE2-Problems... Giving up! See my topmost section
Conclusion: node.js + V8 normally requires SSE2 when running on x86.
On the V8 ports page: x87 (not officially supported)
Contact/CC the x87 team in the CL if needed. Use the mailing list v8-x87-ports.at.googlegroups.com for that purpose.
Javascript generally requires floating point (every numeric variable is floating point, and using integer math is only an optimization), so it's probably hard to avoid having V8 actually emit FP math instructions.
V8 is currently designed to always JIT, not interpret. It starts off / falls-back to JITing un-optimized machine code when it's still profiling, or when it hits something that makes it "de-optimize".
There is an effort to add an interpreter to V8, but it might not help because the interpreter itself will be written using the TurboFan JIT backend. It's not intended to make V8 portable to architectures it doesn't currently know how to JIT for.
Crazy idea: run node.js on top of a software emulation layer (like Intel's SDE or maybe qemu-user) that could emulate x86 with SSE/SSE2 on an x86 CPU supporting only x87. They use dynamic translation, so would probably run at near-native speed for code that didn't use any SSE instructions.
This may be crazy because node.js + V8 probably some virtual-memory tricks that might confuse an emulation layer. I'd guess that qemu should be robust enough, though.
Original answer left below as a generic guide to investigating this kind of issue for other programs. (tip: grep the Makefiles and so on for -msse or -msse2, or check compiler command lines for that with pgrep -a gcc while it's building).
Your cpuinfo says it has CMOV, which is a 686 (ppro / p6) feature. This says that Geode supports i686. What's missing compared to a "normal" CPU is SSE2, which is enabled by default for -m32 (32-bit mode) in some recent compiler versions.
Anyway, what you should do is compile with -march=geode -O3, so gcc or clang will use everything your CPU supports, but no more.
-O3 -msse2 -march=geode would tell gcc that it can use everything Geode supports as well as SSE2, so you need to remove any -msse and -msse2 options, or add -mno-sse after them. In node.js, deps/v8/gypfiles/toolchain.gypi was setting -msse2.
Using -march=geode implies -mtune=geode, which affects code-gen choices that don't involve using new instructions, so with luck your binary will run faster than if you'd simply used -mno-sse to control instruction-set stuff without overriding -mtune=generic. (If you're building on the geode, you could use -march=native, which should be identical to using -march=geode.)
The other possibility is the problem instructions are in Javascript functions that were JIT-compiled.
node.js uses V8. I did a quick google search, but didn't find anything about telling V8 to not assume SSE/SSE2. If it doesn't have a fall-back code-gen strategy (x87 instructions) for floating point, then you might have to disable JIT altogether and make it run in interpreter mode. (Which is slower, so that may be a problem.)
But hopefully V8 is well-behaved, and checks what instruction sets are supported before JITing.
You should check by running gdb /usr/bin/node, and see where it faults. Type run my_program.js on the GDB command line to start the program. (You can't pass args to node.js when you first start gdb. You have to specify args from inside gdb when you run.)
If the address of the instruction that raised SIGILL is in a region of memory that's mapped to a file (look in /proc/pid/maps if gdb doesn't tell you), that tells you which ahead-of-time compiled executable or library is responsible. Recompile it with -march=geode.
If it's in anonymous memory, it's most likely JIT-compiler output.
GDB will print the instruction address when it stops when the program receives SIGILL. You can also print $ip to see the current value of EIP (the 32-bit mode instruction pointer).

difference between i386:x64-32 vs i386 vs i386:x86_64

Can someone explain the difference between the three architectures?
Actually when I built a 64-bit application in Linux, I got a link error saying:
skipping incompatible library.a when searching for library.a
Then I used objdump -f on that library and I got the below output:
a.o: file format elf32-x86-64
architecture: i386:x64-32, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000
Does it mean the library is 32-bit? Is that the reason I am getting the linker error?
There are 3 common ABIs usable on standard Intel-compatible machines (not Itanium).
The classic 32-bit architecture, often called "x86" for short, which has triples like i[3-6]86-linux-gnu. Registers and pointers are both 32 bits.
The 64-bit extension originally from AMD, often called "amd64" for short, which has GNU triple of x86_64-linux-gnu. Registers and pointers are both 64 bits.
The new "x32" ABI, with a triple of x86_64-linux-gnux32. Registers are 64 bits, but pointers are only 32 bits, saving a lot of memory in pointer-heavy workflows. It also ensures all the other 64-bit only processor features are available.
Each of the above has its on system call interface, own ld.so, own complete set of libraries, etc. But it is possible to run all 3 on the same kernel.
On Linux, their loaders are:
% objdump -f /lib/ld-linux.so.2 /lib64/ld-linux-x86-64.so.2 /libx32/ld-linux-x32.so.2
/lib/ld-linux.so.2: file format elf32-i386
architecture: i386, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x00000a90
/lib64/ld-linux-x86-64.so.2: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000000c90
/libx32/ld-linux-x32.so.2: file format elf32-x86-64
architecture: i386:x64-32, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x00000960
Now, if you're getting the message about "skipping incompatible library", that means something is messed up with your configuration. Make sure you don't have bad variables in the environment or passed on the command line, or files installed outside of your package manager's control.
Beyond usual full 64bit and good old 32bit ABI there is a special ABI (inspired by SGI n32 envirnment) where pointers are 32bit (thus they are 32bit apps), but it is designed to run on 64bit host and have full access to all x64 goodies:
native x64 registers and math
more registers
SSE2/3/4, AVX1/2/...
Full 4Gb address space on 64bit host
It is called x32 ABI, link: https://en.wikipedia.org/wiki/X32_ABI
UPDATE
On Ubuntu system I have to install two packages (with deps) to get x32 working:
> sudo apt install gcc-multilib
> sudo apt install libx32stdc++-5-dev
Then compiling simlple C++ code with g++ -mx32 hellow.cpp works, making x32 executable
> file a.out
./a.out: ELF 32-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /libx32/ld-linux-x32.so.2, for GNU/Linux 3.4.0

How can I compile a 32-bit .o file with gcc in my 64-bit machine?

Trying to learn NASM Assembly. I have a 64-bit machine, with Ubuntu. Recently I decided to test the push and pop instructions. I do this:
nasm -felf64 Test.asm
Apparently they are not supported in 64-bit mode. Alright, no problem, I'll just do it for 32 then:
nasm -felf Test.asm
And now, as always,
gcc Test.o
But it now tells me
i386 architecture of input file 'Test.o' is incompatible with i386:x86-64 output
I don't quite grasp the error here. How can I test push and pop in my 64-bit machine, if apparently I can't compile 32-bit programs?
How about "-m32"?
And I think you need to care dependent library(e.g libc), see: Use 32bit shared library from 64bit application?
First, you can use push and pop in 64-bit code, just not with 32-bit registers. If you push and pop 64-bit registers, it'll work fine. In most cases, you can use 32-bit registers in 64-bit code, just not push and pop. There may be other exceptions, but I'm not aware of 'em.
64-bit code uses different system call numbers, puts the parameters in different registers, and uses syscall instead of int 0x80. However, the old int 0x80 interface with the old system call numbers and parameters in the old registers still works. This gives you kind of "mixed" code and may not be a Good Idea, but it works. How long it will continue to work in future kernels is anybody's guess. You may be better off to learn "proper" 64-bit code.
But there are (still!) a lot more 32-bit examples out there. You can tell Nasm -f elf32 (just -f elf is an alias, but I'd use the "full name" just for clarity). If you're using gcc, tell it -m32. If you're using ld directly, tell it -m elf_i386. You do have choices, but they have to be compatible with each other.
how about the "-march=i386" ? see:
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html

Cross compile from linux to ARM-ELF (ARM926EJ-S/MT7108)

I have installed all cross compile packages on my ubuntu system so far but am having a
problem and need some help.
Processor : ARM926EJ-S rev 5 (v5l)
BogoMIPS : 184.72
Features : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant : 0x0
CPU part : 0x926
CPU revision : 5
Cache type : write-back
Cache clean : cp15 c7 ops
Cache lockdown : format C
Cache format : Harvard
I size : 32768
I assoc : 4
I line length : 32
I sets : 256
D size : 32768
D assoc : 4
D line length : 32
D sets : 256
Hardware : MT7108
Revision : 0000
Serial : 0000000000000000
This is the target machine I need to cross compile for. What flags should I
use when compiling?
You have an ARMv5 with no floating-point processor. It should have been enough with -march=armv5 and -mfloat-abi=soft flags.
However if those flags doesn't work for you, I would suggest writing the smallest c application for testing the toolchain.
/* no includes */
int main(void) {
return 42;
}
and compiling it with most complete/strict flags
$arm-linux-gnueabi-gcc -Wall --static -O2 -marm -march=armv5 simple.c -o simple
after this, push simple to target, run it then issue an echo $? to verify if you would get 42. If it works, try to see if you can get printf working. If that one also works, you are pretty much set for everything. If printf fails, easiest solution would be to find right toolchain for your target.
apt-cache search arm | grep ^gcc- gives the following list,
gcc-4.7-aarch64-linux-gnu - GNU C compiler
gcc-4.7-arm-linux-gnueabi - GNU C compiler
gcc-4.7-arm-linux-gnueabi-base - GCC, the GNU Compiler Collection (base package)
gcc-4.7-arm-linux-gnueabihf - GNU C compiler
gcc-4.7-arm-linux-gnueabihf-base - GCC, the GNU Compiler Collection (base package)
gcc-4.7-multilib-arm-linux-gnueabi - GNU C compiler (multilib files)
gcc-4.7-multilib-arm-linux-gnueabihf - GNU C compiler (multilib files)
gcc-aarch64-linux-gnu - The GNU C compiler for arm64 architecture
gcc-arm-linux-gnueabi - The GNU C compiler for armel architecture
gcc-arm-linux-gnueabihf - The GNU C compiler for armhf architecture
You should install gcc-arm-linux-gnueabi which is an alias for gcc-4.7-arm-linux-gnueabi. gcc-4.7-multilib-arm-linux-gnueabi is also possible, but more complicated. Use the flags, -march=armv5te -mtune=arm926ej-s -msoft-float -mfloat-abi=soft. You can do more tuning by specifying the --param NAME=VALUE option to gcc with parameters tuned to your systems memory sub-system timing.
You may not be able to use these gcc versions as your Linux maybe compiled with OABI and/or be quite ancient compared to the one the compiler was built for. In some cases, the libc will call a newer Linux API, which may not be present. If the compiler/libc was not configured to be backwards compatible, then it may not work with your system. You can use crosstool-ng to create a custom compiler that is built to suit your system, but this is much more complex.

Resources