Where LSB at index 0 and MSB is at index 63. Similarly it should extend to u32 and other types.
let my_num: u64 = 100; // 0b1100100
let msb = get_msb(my_num); // 0
let lsb = get_lsb(my_num); // 0
Correction: MSB should be 0 at the 63rd bit, not 1 at index 6
As explained in the comments, you get the LSB and MSB of u64 with n & 1 and (n >> 63) & 1 respectively.
Doing it completely generically is somewhat of a hassle in Rust, though, because generics require operations like shifting, masking, and even the construction of 1 to be fully specified upfront. However, this is where the num-traits crate comes to the rescue. Along with its cousin num, it is the de facto standard for generic Rust in the field of numerics, providing (among others) the PrimInt trait that makes get_msb() and get_lsb() straightforward:
use num_traits::PrimInt;
pub fn get_lsb<N: PrimInt>(n: N) -> N {
n & N::one()
}
pub fn get_msb<N: PrimInt>(n: N) -> N {
let shift = std::mem::size_of::<N>() * 8 - 1;
(n >> shift) & N::one()
}
fn main() {
assert_eq!(get_lsb(100u32), 0);
assert_eq!(get_lsb(101u32), 1);
assert_eq!(get_msb(100u32), 0);
assert_eq!(get_msb(u32::MAX), 1);
}
Playground
I'm implementing xoshiro256++ and I'm able to generate pseudo-random 64-bit unsigned integers. The next challenge is generating uniform doubles in the unit interval from the PRNGs u64's.
a 64-bit unsigned integer x should be converted to a 64-bit double using the expression
(x >> 11) * 0x1.0p-53
How would one evaluate this in Rust? On trying this I get the compile error:
error: hexadecimal float literal is not supported
--> src/main.rs:30:56
|
30 | f64::from_bits((x >> 11).wrapping_mul(0x1.0p-53))
| ^^^^^
You can use the hexf crate:
#[macro_use]
extern crate hexf;
fn main() {
let v = hexf64!("0x1.0p-53");
println!("{}", v);
}
See also:
How to create a static string at compile time
So I'm trying to get a random number, but I'd rather not have it come back as uint instead of int... Not sure if this match is right, either, but the compiler doesn't get that far because it's never heard of this from_uint thing I'm trying to do:
fn get_random(max: &int) -> int {
// Here we use * to dereference max
// ...that is, we access the value at
// the pointer location rather than
// trying to do math using the actual
// pointer itself
match int::from_uint(rand::random::<uint>() % *max + 1) {
Some(n) => n,
None => 0,
}
}
from_uint is not in the namespace of std::int, but std::num: http://doc.rust-lang.org/std/num/fn.from_uint.html
Original answer:
Cast a u32 to int with as. If you cast uint or u64 to int, you risk overflowing into the negatives (assuming you are on 64 bit). From the docs:
The size of a uint is equivalent to the size of a pointer on the particular architecture in question.
This works:
use std::rand;
fn main() {
let max = 42i;
println!("{}" , get_random(&max));
}
fn get_random(max: &int) -> int {
(rand::random::<u32>() as int) % (*max + 1)
}
In trying to understand how stack memory works, I wrote the following code to display addresses of where data gets stored:
fn main() {
let a = "0123456789abcdef0";
let b = "123456789abcdef01";
let c = "23456789abcdef012";
println!("{:p} {}", &a, a.len());
println!("{:p} {}", &b, b.len());
println!("{:p} {}", &c, c.len());
}
The output is:
0x7fff288a5448 17
0x7fff288a5438 17
0x7fff288a5428 17
It implies that all 17 bytes are stored in a space of 16 bytes, which can't be right. My one guess is that there's some optimization happening, but I get the same results even when I build with --opt-level 0.
The equivalent C seems to do the right thing:
#include <stdio.h>
#include <string.h>
int main() {
char a[] = "0123456789abcdef";
char b[] = "123456789abcdef0";
char c[] = "23456789abcdef01";
printf("%p %zu\n", &a, strlen(a) + 1);
printf("%p %zu\n", &b, strlen(b) + 1);
printf("%p %zu\n", &c, strlen(c) + 1);
return 0;
}
Output:
0x7fff5837b440 17
0x7fff5837b420 17
0x7fff5837b400 17
String literals "..." are stored in static memory, and the variables a, b, c are just (fat) pointers to them. They have type &str, which has the following layout:
struct StrSlice {
data: *const u8,
length: uint
}
where the data field points at the sequence of bytes that form the text, and the length field says how many bytes there are.
On a 64-bit platform this is 16-bytes (and on a 32-bit platform, 8 bytes). The real equivalent in C (ignoring null termination vs. stored length) would be storing into a const char* instead of a char[], changing the C to this prints:
0x7fff21254508 17
0x7fff21254500 17
0x7fff212544f8 17
i.e. the pointers are 8 bytes apart.
You can check these low-level details using --emit=asm or --emit=llvm-ir, or clicking the corresponding button on the playpen (possibly adjusting the optimisation level too). E.g.
fn main() {
let a = "0123456789abcdef0";
}
compiled with --emit=llvm-ir and no optimisations gives (with my trimming and annotations):
%str_slice = type { i8*, i64 }
;; global constant with the string's text
#str1042 = internal constant [17 x i8] c"0123456789abcdef0"
; Function Attrs: uwtable
define internal void #_ZN4main20h55efe3c71b4bb8f4eaaE() unnamed_addr #0 {
entry-block:
;; create stack space for the `a` variable
%a = alloca %str_slice
;; get a pointer to the first element of the `a` struct (`data`)...
%0 = getelementptr inbounds %str_slice* %a, i32 0, i32 0
;; ... and store the pointer to the string data in it
store i8* getelementptr inbounds ([17 x i8]* #str1042, i32 0, i32 0), i8** %0
;; get a pointer to the second element of the `a` struct (`length`)...
%1 = getelementptr inbounds %str_slice* %a, i32 0, i32 1
;; ... and store the length of the string (17) in it.
store i64 17, i64* %1
ret void
}
I have been looking at LLVM lately, and I find it to be quite an interesting architecture. However, looking through the tutorial and the reference material, I can't see any examples of how I might implement a string data type.
There is a lot of documentation about integers, reals, and other number types, and even arrays, functions and structures, but AFAIK nothing about strings. Would I have to add a new data type to the backend? Is there a way to use built-in data types? Any insight would be appreciated.
What is a string? An array of characters.
What is a character? An integer.
So while I'm no LLVM expert by any means, I would guess that if, eg, you wanted to represent some 8-bit character set, you'd use an array of i8 (8-bit integers), or a pointer to i8. And indeed, if we have a simple hello world C program:
#include <stdio.h>
int main() {
puts("Hello, world!");
return 0;
}
And we compile it using llvm-gcc and dump the generated LLVM assembly:
$ llvm-gcc -S -emit-llvm hello.c
$ cat hello.s
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-linux-gnu"
#.str = internal constant [14 x i8] c"Hello, world!\00" ; <[14 x i8]*> [#uses=1]
define i32 #main() {
entry:
%retval = alloca i32 ; <i32*> [#uses=2]
%tmp = alloca i32 ; <i32*> [#uses=2]
%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
%tmp1 = getelementptr [14 x i8]* #.str, i32 0, i64 0 ; <i8*> [#uses=1]
%tmp2 = call i32 #puts( i8* %tmp1 ) nounwind ; <i32> [#uses=0]
store i32 0, i32* %tmp, align 4
%tmp3 = load i32* %tmp, align 4 ; <i32> [#uses=1]
store i32 %tmp3, i32* %retval, align 4
br label %return
return: ; preds = %entry
%retval4 = load i32* %retval ; <i32> [#uses=1]
ret i32 %retval4
}
declare i32 #puts(i8*)
Notice the reference to the puts function declared at the end of the file. In C, puts is
int puts(const char *s)
In LLVM, it is
i32 #puts(i8*)
The correspondence should be clear.
As an aside, the generated LLVM is very verbose here because I compiled without optimizations. If you turn those on, the unnecessary instructions disappear:
$ llvm-gcc -O2 -S -emit-llvm hello.c
$ cat hello.s
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-linux-gnu"
#.str = internal constant [14 x i8] c"Hello, world!\00" ; <[14 x i8]*> [#uses=1]
define i32 #main() nounwind {
entry:
%tmp2 = tail call i32 #puts( i8* getelementptr ([14 x i8]* #.str, i32 0, i64 0) ) nounwind ; <i32> [#uses=0]
ret i32 0
}
declare i32 #puts(i8*)
[To follow up on other answers which explain what strings are, here is some implementation help]
Using the C interface, the calls you'll want are something like:
LLVMValueRef llvmGenLocalStringVar(const char* data, int len)
{
LLVMValueRef glob = LLVMAddGlobal(mod, LLVMArrayType(LLVMInt8Type(), len), "string");
// set as internal linkage and constant
LLVMSetLinkage(glob, LLVMInternalLinkage);
LLVMSetGlobalConstant(glob, TRUE);
// Initialize with string:
LLVMSetInitializer(glob, LLVMConstString(data, len, TRUE));
return glob;
}
Using the C API, instead of using LLVMConstString, you could use LLVMBuildGlobalString. Here is my implementation of
int main() {
printf("Hello World, %s!\n", "there");
return;
}
using C API:
LLVMTypeRef main_type = LLVMFunctionType(LLVMVoidType(), NULL, 0, false);
LLVMValueRef main = LLVMAddFunction(mod, "main", main_type);
LLVMTypeRef param_types[] = { LLVMPointerType(LLVMInt8Type(), 0) };
LLVMTypeRef llvm_printf_type = LLVMFunctionType(LLVMInt32Type(), param_types, 0, true);
LLVMValueRef llvm_printf = LLVMAddFunction(mod, "printf", llvm_printf_type);
LLVMBasicBlockRef entry = LLVMAppendBasicBlock(main, "entry");
LLVMPositionBuilderAtEnd(builder, entry);
LLVMValueRef format = LLVMBuildGlobalStringPtr(builder, "Hello World, %s!\n", "format");
LLVMValueRef value = LLVMBuildGlobalStringPtr(builder, "there", "value");
LLVMValueRef args[] = { format, value };
LLVMBuildCall(builder, llvm_printf, args, 2, "printf");
LLVMBuildRetVoid(builder);
I created strings like so:
LLVMValueRef format = LLVMBuildGlobalStringPtr(builder, "Hello World, %s!\n", "format");
LLVMValueRef value = LLVMBuildGlobalStringPtr(builder, "there", "value");
The generated IR is:
; ModuleID = 'printf.bc'
source_filename = "my_module"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
#format = private unnamed_addr constant [18 x i8] c"Hello World, %s!\0A\00"
#value = private unnamed_addr constant [6 x i8] c"there\00"
define void #main() {
entry:
%printf = call i32 (...) #printf(i8* getelementptr inbounds ([18 x i8], [18 x i8]* #format, i32 0, i32 0), i8* getelementptr inbounds ([6 x i8], [6 x i8]* #value, i32 0, i32 0))
ret void
}
declare i32 #printf(...)
For those using the C++ API of LLVM, you can rely on IRBuilder's CreateGlobalStringPtr :
Builder.CreateGlobalStringPtr(StringRef("Hello, world!"));
This will be represented as i8* in the final LLVM IR.
Think about how a string is represented in common languages:
C: a pointer to a character. You don't have to do anything special.
C++: string is a complex object with a constructor, destructor, and copy constructor. On the inside, it usually holds essentially a C string.
Java/C#/...: a string is a complex object holding an array of characters.
LLVM's name is very self explanatory. It really is "low level". You have to implement strings how ever you want them to be. It would be silly for LLVM to force anyone into a specific implementation.