Why Python running on Intel rather than Apple? - pytorch

I am running a Python program that uses Torch. I'm using the Torch nightly build (which supports MPS/M1) and putting my tensors on mps, indicating that the M1 processor is being used as a GPU. However, Activity Monitor lists the "Kind" of my Python process as "Intel" rather than "Apple" (see image).
I am assuming this slows down my program. Any clue why this is happening, and how I can make sure Python runs as "Apple" rather than "Intel?
I confirmed that the GPU is being used (via Activity Monitor), and that my tensors are on MPS (rather than CPU or CUDA). I expected the program kind to be "Apple", but it is "Intel" instead.

Try running the Python program with the arch argument if you want it to run on the M1 CPU rather than the Intel processor:
python -m arch -arch arm64 my_program.py
The Intel CPU will be used if software needs functionality that the M1 processor does not provide. The maximum size of integers that the Python interpreter can handle may be checked using the sys.maxsize property to see if the M1 processor is in use. The Python interpreter is operating on a 32-bit architecture, which may not be optimal for the M1 CPU, if the value is less than 2^63-1. You might try utilizing a 64-bit Python interpreter in this situation.

Related

CPU/Threads usage on M1 Pro (Apple Silicon) using openMP

hope someone knows the answer to this...
I have a code that compiles perfectly well with openMP (it uses libsharp). However, I am finding it impossible to make the M1 Pro chip use all the 8 or 10 cores I have.
I am setting the threads variable correctly as export OMP_NUM_THREADS=10 such that the code correctly identifies it's supposed to be running with 10 threads (see image below showing a print-screen from my activity monitor):
Activity Monitor Print Screen
Print screen is showing that the code is compiled for Apple Silicon, uses 10 threads but not much of the CPU available.
Does anyone know how to properly compile/set the number of threads such that all the cores will be used?
This is trivial in x86 architectures.
Not really an answer, but long for a comment...
If both LLVM and GCC behave the same then it's not an OpenMP runtime issue. (And your monitor output shows that the correct number of threads have been created). I'm also not certain that it's really an Arm issue.
Are you comparing with an Apple x86 machine (so running the same operating system), or with a Linux x86 system?
The scheduling decisions of the two OSes are likely different, and (for instance) MacOS has no interface to bind threads to logicalCPUs.
As well as that, there's the issue of having some fast and some slow cores. That could mean that statically scheduled loops are inefficient.
I'm also confused by the fact that you arm to show multiple instances of your code running at the same time, so you are explicitly causing over-subscription of the logicalCPUs...

Theano Installation On windows 64

Im new in Python and Theano library. I want to install Theano on windows 7-64. I have a display adapters :
Intel(R) HD Graphics 3000 which is not compatible with NVIDA.
My QUESTIONS:
1-Is obligatory to install CUDA to i can use Theano?
2- Even if i have an Ubuntu opearting system, with the same display adapters, CUDA still mandatory?
Any help!
Thanks
You do not need CUDA to run Theano.
Theano can run on either CPU or GPU. If you want to run on GPU you must (currently) use CUDA which means you must be using a NVIDIA display adapter. Without CUDA/NVIDIA you must run on CPU.
There is no disadvantage to running on CPU other than speed -- Theano can be much faster on GPU but everything that runs on a GPU will also run on a CPU as long as it has been coded generically (the default and standard method for Theano code).

Monitoring the instructions of a running program in ubuntu?

I'm a little stuck here.
The idea is that I'd like to get a file of every instruction run by a program during it's execution. I'd like to do it with just the executable in hand (no source) and be able to determine what operation is occuring on what address when.
For example, I'd like to be able to run it on Google Chrome, Firefox, etc.
I want to use this for a performance prediction system I'm working on. I figure if I'm able to obtain each instruction that is executed in order it is executed on system 1, I can attempt to simulate/model the run time of an identical program being run on system 2, because I'll be able to predict(although I know not with 100% accuracy) L1/L2 cache-misses, L1/L2 cache-hits, TLB hits/misses, page faults, time taken on floating point multiplication operations, etc.
I'd like to try to do this on two different systems:
System 1: Ubuntu 10.10 on Intel Core 2 Duo CPU
System 2: Ubuntu 12.04 on system with 2x AMD Sixteen Core Opteron model 6274
(I can definitely change the OS's as neccessary, but would prefer to stay with Ubuntu, if possible)
Is this possible / how could I go about doing it? I know with debuggers, you can use them to step through everything, but I don't have the source available.
I think, you can use qemu (or even bochs) or valgrind to monitor every executed instruction. They are x86 binary translation tools (excluding bochs - which is an interpreter of x86 code). There is a valgrind tool called cachegrind (+ kcachegrind gui), which is ready to emulate cache by instrumenting every memory access and simulating some L1/L2 cache model (sizes may be configured via command line options).
To get deeper (into pipeline) you may want to look on free ptlsim (http://www.ptlsim.org/)

Executing Binary Files

I downloaded a binary file that was compiled (a C program) using GCC 4.4.4 for x86-64 Red Hat Linux.
Is it normal that when I try to run it on a Mac OS X (running Lion so also x86-64) running GCC 4.2.1 that it would say: cannot execute binary file? It can't detect it as a binary file.
Why would it do that? I believe the gcc version has nothing to do with that since the file has already been compiled. It has been compiled for x86-64 of which both machines run. Can someone please explain?
There are different binary formats. Linux systems use ELF for executables and libraries, but Mac OS X uses the Mach-O format. Windows uses another still: PE format.
It's highly unlikely that a binary compiled for a particular OS will run on another. Either get a binary for Mac, or get the source and compile it yourself.
There are many things that will cause issues - version of libc and libstdc++, there can be difference in versions of .so libraries - different API interface to the OS itself. Or even a different binary format (ie VMS binaries do not run on AIX).
Although Rad Hat Linux and Mac OS X are both 'Unix based', they cannot run each other's binaries. Just like you can't run Windows binaries on Macs and vice versa.
binaries in this sense are compiled to make operating system calls, when your program has a printf() that boils down to operating system calls. If the operating system it was compiled for is say a 64 bit redhat linux then that likely means the binary is going to look for redhat linux names for shared libraries in redhat linux paths. Which has absolutely nothing in any way shape or form to do with a completely different operating system, Mac OS X, and its system calls and shared or static libraries, etc. Its like taking a wheel off of a mini cooper and trying to bolt it onto a bicycle. Yes at one point in time it was raw metal and rubber, and could have been formed into a bike tire. But once you make that binary, the car tire or the bike tire, that is what it is. sometimes you find an emulator like wine that emulates windows on top of a posix system. or a virtual machine like vmware that lets you run a whole different operating system on top of another by virtualizing a whole computer.
it is also true that you cannot generally expect to take any old C program and have it compile and run on any operating system that say has a gcc compiler. yes you can learn to write c programs that are portable, but you have to carefully stick to libraries that are supported on all the target platforms. so even taking the source code for your program to the mac and compiling it is not necessarily going to just work.

Is QEMU good for learning programming in assembler for ARM and PowerPC?

I want to learn programming in assembler for PowerPC and ARM, but I'm unable to buy real hardware for this purpose. I'm thinking about using QEMU for that. However I'm not sure if it emulates both architectures enough well, that I'll compile and run my programs in native assembler on it?
QEMU works well for testing program correction (i.e. whether the code would properly run on an actual ARM or PowerPC) but it is not good for testing program efficiency: the emulation is not cycle accurate, and speed measured with QEMU cannot be reliably (or even unreliably) correlated with speed on true hardware.
Also, QEMU will not trap unaligned memory accesses, which is not a problem for PowerPC emulation (the PowerPC tolerates unaligned accesses) but may be for ARM (an unaligned access, e.g. reading a 32-bit word in RAM from an address which is not a multiple of 4, will work fine with QEMU but would trigger an exception on a true ARM processor).
Apart from these points, QEMU is fine for assembly development on ARM or MIPS (haven't tried PowerPC, because I found an old iBook on eBay for that; but I have done ARM and MIPS assembly with QEMU and then ran the resulting code on true hardware, and this worked). You can either emulate a whole system and run Debian in it (in which case the compiler, linker, text editor... will also run in emulation), or use the "user-mode emulation" where the ARM/MIPS executable is run directly, with a wrapper which converts system calls into those for the host PC (this assumes that the host is a PC running Linux). The latter is more convenient (you have access to your normal home directory, programming tools are native...) but requires installing cross-development tools. See buildroot for that (and link with -static, this will avoid many headaches).
Since I have found signs that Debian for PowerPC and for ARM can run on QEMU, I suppose this won't be a problem.

Resources