Inline assembly in Haskell - haskell

Can I somehow use inline assembly in Haskell (similar to what GCC does for C)?
I want to compare my Haskell code to the reference implementation (ASM) and this seems the most straightforward way. I guess I could just call Haskell from C and use GCC inline assembly, but I'm still interested if I can do it the other way around.
(I'm on Linux/x86)

There are two ways:
Call C via the FFI, and use inline assembly on the C side.
Write a CMM fragment that calls C (without the FFI), and uses inlined assembly.
Both solutions use inline assembly on the C side. The former is the most idiomatic. Here's an example, from the rdtsc package:
cycles.h:
static __inline__ ticks getticks(void)
{
unsigned int tbl, tbu0, tbu1;
do {
__asm__ __volatile__ ("mftbu %0" : "=r"(tbu0));
__asm__ __volatile__ ("mftb %0" : "=r"(tbl));
__asm__ __volatile__ ("mftbu %0" : "=r"(tbu1));
} while (tbu0 != tbu1);
return (((unsigned long long)tbu0) << 32) | tbl;
}
rdtsc.c:
unsigned long long rdtsc(void)
{
return getticks();
}
rdtsc.h:
unsigned long long rdtsc(void);
rdtsc.hs:
foreign import ccall unsafe "rdtsc.h" rdtsc :: IO Word64
Finally:
A slightly non-obvious solution is to use the LLVM or Harpy packages to call some generated assembly.

Related

multiple assembly instruction using asm volatile in c code

I need to modified an random number generator using rdrand(only),
It implemented in c code as follows.
uint64_t _rdrand(void)
{
uint64_t r;
__asm__ volatile("rdrand %0\n\t" : "=r"(r));
return r;
}
Now i need to modify such that it returns only if carry flag is set. (According to rdrand documentation).I think it can be implimented by jc instruction,but don't know how to use inside __asm__ volatile.please help me.

How can I select a static library to be linked while ARM cross compiling?

I have an ARM cross compiler in Ubuntu(arm-linux-gnueabi-gcc) and the default archtecture is ARMv7. However, I want to compile an ARMv5 binary. I do this by giving the compiler the -march=armv5te option.
So far, so good. Since my ARM system uses BusyBox, I have to compile my binary statically linked. So I give gcc the -static option.
However, I have a problem with libc.a which the linker links to my ARMv5 binary. This file is compiled with the ARMv7 architecture option. So, even if I cross-compile my ARM binary with ARMv5, I can't run it on my BusyBox based ARMv5 box.
How can I solve this problem?
Where can I get the ARMv5 libc.a static library, and how can I link it?
Thank you in advance.
You have two choices,
Get the right compiler.
Write your own 'C' Library.
Get the right compiler.
You are always safest to have a compiler match your system. This applies to x86 Linux and various distributions. You are lucky if different compilers work. It is more difficult when you cross-compile as often the compiler will not be automatically synced. Try to run a program on a 1999 x86 Mandrake Linux compiled on your 2014 Ubuntu system.
As well as instruction compatibility (which you have identified), there are ABI and OS dependencies. Specifically, the armv7 is most likely hardfloat (has floating point FPU and register call convention) and you need a softfloat (emulated FPU). The specific glibc (or ucLibc) has specific calls and expectations of the Linux OS. For instance, the way threads works has changed over time.
Write your own
You can always use -fno-builtin and -ffreestanding as well as -static. Then you can not use any libc functions, but you can program them your self.
There are external source, like Mark Martinec's snprintf and building blocks like write() which is easy to implement,
#define _SYS_IOCTL_H 1
#include <linux/unistd.h>
#include <linux/ioctl.h>
static inline int write(int fd, void *buf, int len)
{
int rval;
asm volatile ("mov r0, %1\n\t"
"mov r1, %2\n\t"
"mov r2, %3\n\t"
"mov r7, %4\n\t"
"swi #0\n\t"
"mov %0, r0\n\t"
: "=r" (rval)
: "r" (fd),
"r" (buf),
"r" (len),
"Ir" (__NR_write)
: "r0", "r1", "r2", "r7");
return rval;
}
static inline void exit(int status)
{
asm volatile ("mov r0, %0\n\t"
"mov r7, %1\n\t"
"swi #0\n\t"
: : "r" (status),
"Ir" (__NR_exit)
: "r0", "r7");
}
You have to add your own start-up machinery taken care of by the 'C' library,
/* Called from assembler startup. */
int main (int argc, char*argv[])
{
write(STDOUT, "Hello world\n", sizeof("Hello world\n"));
return 0;
}
/* Wrapper for main return code. */
void __attribute__ ((unused)) estart (int argc, char*argv[])
{
int rval = main(argc,argv);
exit(rval);
}
/* Setup arguments for estart [like main()]. */
void __attribute__ ((naked)) _start (void)
{
asm(" sub lr, lr, lr\n" /* Clear the link register. */
" ldr r0, [sp]\n" /* Get argc... */
" add r1, sp, #4\n" /* ... and argv ... */
" b estart\n" /* Let's go! */
);
}
If this is too daunting, because you need to implement a lot of functionality, then you can try and get various library source and rebuild them with -fno-builtin and make sure that the libraries do not get linked with the Ubuntu libraries, which are incompatible.
Projects like crosstool-ng can allow you to build a correct compiler (maybe with more advanced code generation) that suits the armv5 system exactly. This may seem like a pain, but the alternatives above aren't easy either.

Compiler Support of members of SSE vector types like m128_f32[x]

This might sound stupid but is there a way to activate support of the inner members of an SSE vector type ?
I know this works fine on MSVC, And I ve found some comments on forums and SO like this.
The question, is can I activate this on CLang at least without creating my own unions ?
Thank you
[edit, workaround]
Currently I decided to create a vec4 type to help me.
here is the code
#include <emmintrin.h>
#include <cstdint>
#ifdef _WIN32
typedef __m128 vec4;
typedef __m128i vec4i;
typedef __m128d vec4d;
#else
typedef union __declspec(align(16)) vec4{
float m128_f32[4];
uint64_t m128_u64[2];
int8_t m128_i8[16];
int16_t m128_i16[8];
int32_t m128_i32[4];
int64_t m128_i64[2];
uint8_t m128_u8[16];
uint16_t m128_u16[8];
uint32_t m128_u32[4];
} vec4;
typedef union __declspec(align(16)) vec4i{
uint64_t m128i_u64[2];
int8_t m128i_i8[16];
int16_t m128i_i16[8];
int32_t m128i_i32[4];
int64_t m128i_i64[2];
uint8_t m128i_u8[16];
uint16_t m128i_u16[8];
uint32_t m128i_u32[4];
} vec4i;
typedef union __declspec(align(16)) vec4d{
double m128d_f64[2];
} vec4d;
#endif
On recent clangs, this Just Works without you needing to do anything at all:
#include <immintrin.h>
float foo(__m128 x) {
return x[1];
}
AFAIK it Just Works in recent GCC builds as well.
However, I should note the following:
Consider carefully whether or not you really need to do element-wise access in your vector code. If you can keep your operations in-lane, they will almost certainly be significantly more efficient.
If you really do need to do a significant number of lanewise or horizontal operations, and you don’t need portability, consider using Clang extended vectors (or “OpenCL vectors") instead of the basic SSE intrinsic types. You can pass them to intrinsics just like __m128 and friends, but they also have much nicer syntax for vector-scalar operations, lane wise operations, vector literals, etc.

what's the difference between gcc __sync_bool_compare_and_swap and cmpxchg?

to use cas, gcc provides some useful functions such as
__sync_bool_compare_and_swap
but we can also use asm code like cmpxchg
bool ret;
__asm__ __volatile__(
"lock cmpxchg16b %1;\n"
"sete %0;\n"
:"=m"(ret),"+m" (*(volatile pointer_t *) (addr))
:"a" (old_value.ptr), "d" (old_value.tag), "b" (new_value.ptr), "c" (new_value.tag));
return ret;
I have grep the source code of gcc 4.6.3, and found that __sync_bool_compare_and_swap is implemented use
typedef int (__kernel_cmpxchg_t) (int oldval, int newval, int *ptr);
#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) 0xffff0fc0)
it seems that 0xffff0fc0 is the adress of some kernel helper functions
but in gcc 4.1.2 , there is no such codes like __kernel_cmpxchg_t, and I can't find the implementation of __sync_bool_compare_and_swap.
so what's the difference between __sync_bool_compare_and_swap and cmpxchg?
is __sync_bool_compare_and_swap implemented by cmpxchg?
and with kernel helper function __kernel_cmpxchg_t, is it implementd by cmpxchg?
thanks!
I think the __kernel_cmpxchg is a fallback which Linux makes available on some architectures which don't have native hardware support for CAS. E.g. ARMv5 or something like that.
Usually, GCC inline expands the _sync* builtins. Unless you're really interested in GCC internals, an easier way to find out what it does is to make a simple C example and look at the ASM the compiler generates.
Consider
#include <stdbool.h>
bool my_cmpchg(int *ptr, int oldval, int newval)
{
return __sync_bool_compare_and_swap(ptr, oldval, newval);
}
Compiling this on an x86_64 Linux machine with GCC 4.4 the following asm is generated:
my_cmpchg:
.LFB0:
.cfi_startproc
movl %esi, %eax
lock cmpxchgl %edx, (%rdi)
sete %al
ret
.cfi_endproc

How do I use Haskell's FFI on structs?

I have created the following C library for reading an image:
typedef struct {
unsigned int height;
unsigned int width;
unsigned char* red; //length=height*width
unsigned char* green;
unsigned char* blue;
} Contents;
Contents readJPEGFile(const char* inFilename);
I can't really find any info using arrays and structs with the Foreign Function Interface.
How would I proceed to be able to use my library in Haskell?
I tried to use the following example as a base: http://therning.org/magnus/archives/315 but then the hsc file was compiled down to a hs file that only contained the above c-code and nothing more (and of course it can't be compiled).
The basic FFI support includes only scalar types. Everything else you wind up doing with address arithmetic. The section on foreign types in the FFI documentation gives the basics, and you can find an example in the FFI Cookbook.
At one time you could use tools like Green Card and H/Direct to generate marshalling and unmarshalling code for you. For reasons I don't understand, these tools have not been updated in a long time. As far as I can tell the current tool of choice is hsc2hs.
Edit: As noted in comment (thanks ephemient), c2hs is also popular, and since c2hs is from Manuel Chakravarty it is likely to be good.
It sounds as if you have a build issue; I do seem to recall that I used the very page you reference as an example when I was writing an FFI interface into the Windows Win32 DDEML library. For example, one of the structures we use is
typedef struct tagHSZPAIR {
HSZ hszSvc;
HSZ hszTopic;
} HSZPAIR, *PHSZPAIR;
#include "ddeml.h" brings this in to the DDEML.hsc file. We access it with:
data HSZPair = HSZPair HSZ HSZ
instance Storable HSZPair where
sizeOf _ = (#size HSZPAIR)
alignment = sizeOf
peek ptr = do svc <- (#peek HSZPAIR, hszSvc) ptr
topic <- (#peek HSZPAIR, hszTopic) ptr
return $ HSZPair svc topic
poke ptr (HSZPair svc topic) = do (#poke HSZPAIR, hszSvc) ptr svc
(#poke HSZPAIR, hszTopic) ptr topic
Unfortunately, I can't show you what this compiles to at the moment because I don't have a Windows box handy, but the generated code was just as above, except with #size HSZPAIR replaced with (64) or whatever and so on.
(If you really want to see what was generated, or need help doing your build, e-mail me and I'll help you out.)
Hackage has several packages which use FFI which you could look at for examples.

Resources