Reducing instructions in while loop - riscv

So I have this piece of code:
i=0; while(arr[i] != value) { i = i+1; }
and I want to write it in Assembly.
Suppose the register x20 holds the variable i, register x21 the variable value and register x22 the address of the array. Also for simplicity there is no need to check whether the value is inside the array.
The code is:
add x22, x0, x0 # i =0
loop: Condition code # Retrieving arr[i] and storing it into register x10
beq x10, x21, exit # Comparing arr[i] to value
addi x22, x22, 1 # i = i+1
j loop.
exit: ...
Is it possible for this code to be reduced ?

Yes. We rearrange the loop to exit at the bottom, with a conditional branch:
add x22, x0, x0 # i =0
j loopStart
loop: Condition code # Retrieving arr[i] and storing it into register x10
addi x22, x22, 1 # i = i+1
loopStart:
sll x11, x22, 2
add x11, x22, x11
lw x10, 0(x11)
bne x10, x21, loop # Comparing arr[i] to value
exit: ...
Though this is the same number of instructions, one — the unconditional branch — is no longer inside the loop.
Next, we can transform the loop to using pointers as follows:
p=arr;
while (*p != value) p++;
i=p-arr;
This will remove the indexing computation inside the loop.

Related

Auto-vectorization of a loop shuffling 4 int elements and taking absolute differences vs. the previous arrangement

I'm trying, without succeeding, to make the Rust compiler (rustc 1.66.0) auto-vectorize the following function (without using any unsafe intrinsics):
pub fn get_transforms(s: &mut [i32]) -> u32 {
assert_eq!(s.len(), 4);
let mut transforms = 1;
while s[0] > 0 || s[1] > 0 || s[2] > 0 || s[3] > 0 {
// Calculate deltas
let d0 = s[0] - s[1];
let d1 = s[1] - s[2];
let d2 = s[2] - s[3];
let d3 = s[3] - s[0];
// Assign absolute values
s[0] = d0.abs();
s[1] = d1.abs();
s[2] = d2.abs();
s[3] = d3.abs();
transforms += 1;
}
transforms
}
My idea is to make it perform the subtractions and the abs() once (using 16 byte registers) instead of four times.
I've read that iterators might help, but I haven't found an easy way to use them here.
Compiler flags don't seem to able to help either.
Here's the link to the Compiler Explorer output I'm using as reference: https://godbolt.org/z/7sqq6bszT
As #ChayimFriedman noted in the comments, you need to add -Copt-level=3 to the command line. But additionally, you need to make sure that the boolean-expression is not evaluated lazily, i.e., use the | operator instead of || (requires parentheses around the comparisons):
while (s[0] > 0) | (s[1] > 0) | (s[2] > 0) | (s[3] > 0) {
// rest of code unmodified
Compiling this produces the following loop:
.LBB0_13:
vpshufd xmm2, xmm0, 57
vpsubd xmm0, xmm0, xmm2
vpabsd xmm0, xmm0
inc eax
vpcmpgtd xmm2, xmm0, xmm1
vmovmskps ecx, xmm2
test ecx, ecx
jne .LBB0_13
The reason why | instead of || helps the compiler is that for expressions like a || b the b expression shall not be evaluated if a is true already -- most of the time this requires a branch depending on the value of a. On the other hand, for a | b both a and b are evaluated every time. Of course, the compiler should be allowed to do any optimization under the "as-if" rule, i.e., if evaluating b does not have any side effects (and the compiler can prove that), a || b and a | b could be compiled to the same code.
N.B.: If you are sure that s[i] are never negative (which is obviously true, if the inputs are non-negative), you can replace s[i] > 0 by s[i] != 0, which will reduce the vpcmpgtd+vmovmskps+test to a single vptest.

ARM Assembly accepting '-'

So my code currently converts a string into a integer. My code is currently working and functional however, i need it to accept an "-" at the beginning of the input to judge if it is a negative or not. I have no idea how to do this and i can not find any sources. I am currently passing in (test3: .asciz "-48") into register r0. and when i run in the debugger i am receiving 45. here is my code for reference.
.global stoi
.text
stoi:
push {r4,r5,r6,r8,r9,lr}
#r0 = buffer
mov r1,#0 #r1 = n = 0
mov r9,#0 #buffer counter
mov r4,#48 #0 checker
mov r5,#57 #9 checker
b 5f
5:
ldrb r3,[r0,r9] #r3 = c
b 1f
1:
cmp r3,r4 #cmp c to 0(48 in ASCII)
bge 2f
b 4f
2:
cmp r3,r5 #cmp c to 9(57 in ASCII)
ble 3f
b 4f
3:
sub r6,r3,#'0' #r6 = (c - '0')
#strb r6,[r0,r9]
add r1,r1,r1,lsl#2 #r1 = n * 10
add r1,r1,r1,lsl#0
add r1,r1,r6 #n = n * 10 + (c - '0')
add r9,r9,#1 #add to buffer
b 5b
4:
mov r0,r1
pop {r4,r5,r6,r8,r9,pc}
use the same code you're using now, and add these changes:
1) skip the '-' if it's the first char.
Right now you're stopping if a non-digit char is read, you receive 45 (in R3), it is the ascii of '-'. Afais R1 should still be 0 tho
2) At the end, add a check if the first char is a '-', and if it is, subtract r1 from 0 (since 0 - x is -x)
( 3) remove the b 1f, it's not needed ;) )

MIPS, Number of occurrences in a string located in the stack

I have an exercise to solve in MIPS assembly (where I have some doubts but other things are clear) but I have some problem to write it's code. The exercise ask me:
Write a programm that, obtained a string from keyboard, count the occurrences of the character with the higher number of occurrences and show it.
How I can check all the 26 characters and find who has the higher occurences?
Example:
Give me a string: Hello world!
The character with the higher occurrences is: l
Thanks alot for the future answer.
P.s.
This is my first part of the programm:
#First message
li $v0, 4
la $a0, mess
syscall
#Stack space allocated
addi $sp, $sp, -257
#Read the string
move $a0, $sp
li $a1, 257
li $v0, 8
syscall
Since this is your assignment I'll leave the MIPS assembly implementation to you. I'll just show you the logic for the code in a higher-level language:
// You'd keep these variables in some MIPS registers of your choice
int c, i, count, max_count=0;
char max_char;
// Iterate over all ASCII character codes
for (c = 0; c < 128; c+=1) {
count = 0;
// Count the number of occurences of this character in the string
for (i = 0; string[i]!=0; i+=1) {
if (string[i] == c) count++;
}
// Was is greater than the current max?
if (count > max_count) {
max_count = count;
max_char = c;
}
}
// max_char now hold the ASCII code of the character with the highest number
// of occurences, and max_count hold the number of times that character was
// found in the string.
#Michael, I saw you answered before I posted, I just want to repeat that with a more detailed answer. If you edit your own to add some more explanations, then I will delete mine. I did not edit yours directly, because I was already half-way there when you posted. Anyway:
#Marco:
You can create a temporary array of 26 counters (initialized to 0).
Each counter corresponds to each letter (i.e. the number each letter occurs). For example counter[0] corresponds to the number of occurences of letter 'a', counter[1] for letter 'b', etc...
Then iterate over each character in the input character-sequence and for each character do:
a) Obtain the index of the character in the counter array.
b) Increase counter["obtained index"] by 1.
To obtain the index of the character you can do the following:
a) First make sure the character is not capital, i.e. only 'a' to 'z' allowed and not 'A' to 'Z'. If it is not, convert it.
b) Substract the letter 'a' from the character. This way 'a'-'a' gives 0, 'b'-'a' gives 1, 'c'-'a' gives 2, etc...
I will demonstrate in C language, because it's your exercise on MIPS (I mean the goal is to learn MIPS Assembly language):
#include <stdio.h>
int main()
{
//Maximum length of string:
int stringMaxLength = 100;
//Create string in stack. Size of string is length+1 to
//allow the '\0' character to mark the end of the string.
char str[stringMaxLength + 1];
//Read a string of maximum stringMaxLength characters:
puts("Enter string:");
scanf("%*s", stringMaxLength, str);
fflush(stdin);
//Create array of counters in stack:
int counter[26];
//Initialize the counters to 0:
int i;
for (i=0; i<26; ++i)
counter[i] = 0;
//Main counting loop:
for (i=0; str[i] != '\0'; ++i)
{
char tmp = str[i]; //Storing of str[i] in tmp, to write tmp if needed,
//instead of writing str[i] itself. Optional operation in this particular case.
if (tmp >= 'A' && tmp <= 'Z') //If the current character is upper:
tmp = tmp + 32; //Convert the character to lower.
if (tmp >= 'a' && tmp <='z') //If the character is a lower letter:
{
//Obtain the index of the letter in the array:
int index = tmp - 'a';
//Increment its counter by 1:
counter[index] = counter[index] + 1;
}
//Else if the chacacter is not a lower letter by now, we ignore it,
//or we could inform the user, for example, or we could ignore the
//whole string itself as invalid..
}
//Now find the maximum occurences of a letter:
int indexOfMaxCount = 0;
int maxCount = counter[0];
for (i=1; i<26; ++i)
if (counter[i] > maxCount)
{
maxCount = counter[i];
indexOfMaxCount = i;
}
//Convert the indexOfMaxCount back to the character it corresponds to:
char maxChar = 'a' + indexOfMaxCount;
//Inform the user of the letter with maximum occurences:
printf("Maximum %d occurences for letter '%c'.\n", maxCount, maxChar);
return 0;
}
If you don't understand why I convert the upper letter to lower by adding 32, then read on:
Each character corresponds to an integer value in memory, and when you make arithmetic operations on characters, it's like you are making them to their corresponding number in the encoding table.
An encoding is just a table which matches those letters with numbers.
For example 'a' corresponds to number 97 in ASCII encoding/decoding/table.
For example 'b' corresponds to number 98 in ASCII encoding/decoding/table.
So 'a'+1 gives 97+1=98 which is the character 'b'. They are all numbers in memory, and the difference is how you represent (decode) them. The same table of the encoding, is also used for decoding of course.
Examples:
printf("%c", 'a'); //Prints 'a'.
printf("%d", (int) 'a'); //Prints '97'.
printf("%c", (char) 97); //Prints 'a'.
printf("%d", 97); //Prints '97'.
printf("%d", (int) 'b'); //Prints '98'.
printf("%c", (char) (97 + 1)); //Prints 'b'.
printf("%c", (char) ( ((int) 'a') + 1 ) ); //Prints 'b'.
//Etc...
//All the casting in the above examples is just for demonstration,
//it would work without them also, in this case.

.double arrays to sse vectors and operations on them

This is my first contact with SSE. I'm trying to create two SSE vector based on a .double arrays and then multiply by each other and store the result back in one of the arrays. Here is an important part of the code:
.data
counts: .double 1.0,2.0,3.0,4.0
twos: .double 2.0,2.0,2.0,2.0
.text
movupd counts, %xmm7
movupd twos, %xmm6
mulpd %xmm6, %xmm7
movupd %xmm7, counts
However, those instructions seem to affect only the first two elements of a vector.
This is the result: 2.0, 4.0, 3.0, 4.0
While it should be: 2.0, 4.0, 6.0, 8.0
Could someone point me what am I doing wrong?
It would of course be much easier if I could multiply a vector by a scalar but I haven't found a proper instruction to do so anywhere.
I'm checking those values by summing them up (what's my later task after I solve this one) with this code:
mov $0, %rsi
addsd counts(%rsi), %xmm0
mov $8, %rsi
addsd counts(%rsi), %xmm0
mov $16, %rsi
addsd counts(%rsi), %xmm0
mov $24, %rsi
addsd counts(%rsi), %xmm0
Thank you for any help in advance!
An xmm register can only hold two doubles, so you can only do two at a time. If you need to do all four in one go, use single floats or migrate to AVX.
Before you go too far down this road let me strongly advise you to consider using intrinsics rather than raw assembly, otherwise you're likely to be spending a lot of unnecessary additional time and effort in coding and debugging. For example, to multiply two vectors as per your example:
#include <emmintrin.h> // SSE2 intrinsics
for (i = 0; i < N; i += 2)
{
__m128d v0 = _mm_load_pd(&a[i]); // v0 = { a[i], a[i+1] }
__m128d v1 = _mm_load_pd(&b[i]); // v1 = { b[i], b[i+1] }
__m128d v = _mm_mul_pd(v0, v1); // v = { a[i]*b[i], a[i+1]*b[i+1] }
_mm_store_pd(&c[i], v); // store result at c[i], c[i+1]
}
and to multiply a vector by a scalar:
const __m128d vk = _mm_set1_pd(k); // init scalar = { k, k }
for (i = 0; i < N; i += 2)
{
__m128d v0 = _mm_load_pd(&a[i]); // v0 = { a[i], a[i+1] }
__m128d v = _mm_mul_pd(v0, vk); // v = { a[i]*k, a[i+1]*k }
_mm_store_pd(&c[i], v); // store result at c[i], c[i+1]
}

Finding mean of array of ints

Say you have an array of int (in any language with fixed size ints). How would you calculate the int closest to their mean?
Edit: to be clear, the result does not have to be present in the array. That is, for the input array [3, 6, 7] the expected result is 5. Also I guess we need to specify a particular rounding direction, so say round down if you are equally close to two numbers.
Edit: This is not homework. I haven't had homework in five years. And this is my first time on stackoverflow, so please be nice!
Edit: The obvious approach of summing up and dividing may overflow, so I'm trying to think of an approach that is overflow safe, for both large arrays and large ints. I think handling overflow correctly (without cheating and using a different type) is by far the hardest part of this problem.
Here's a way that's fast, reasonably overflow-safe and can work when the number of elements isn't known in advance.
// The length of someListOfNumbers doesn't need to be known in advance.
int mean(SomeType someListOfNumbers) {
double mean = 0, count = 0;
foreach(element; someListOfNumbers) {
count++;
mean += (element - mean) / count;
}
if(count == 0) {
throw new UserIsAnIdiotException(
"Problem exists between keyboard and chair.");
}
return cast(int) floor(mean);
}
Calculate the sum by adding the numbers up, and dividing by the number of them, with rounding:
mean = (int)((sum + length/2) / length;
If you are worried about overflow, you can do something like:
int mean = 0, remainder = 0
foreach n in number
mean += n / length
remainder += n % length
if remainder > length
mean += 1
remainder -= length
if remainder > length/2
mean += 1
print "mean is: " mean
note that this isn't very fast.
um... how about just calculating the mean and then rounding to an integer? round(mean(thearray)) Most languages have facilities that allow you to specify the rounding method.
EDIT: So it turns out that this question is really about avoiding overflow, not about rounding. Let me be clear that I agree with those that have said (in the comments) that it's not something to worry about in practice, since it so rarely happens, and when it does you can always get away with using a larger data type.
I see that several other people have given answers that basically consist of dividing each number in the array by the count of the array, then adding them up. That is also a good approach. But just for kicks, here's an alternative (in C-ish pseudocode):
int sum_offset = 0;
for (int i = 1; i < length(array); i++)
sum_offset += array[i] - array[i-1];
// round by your method of choice
int mean_offset = round((float)sum_offset / length(array));
int mean = mean_offset + array[0];
Or another way to do the same thing:
int min = INT_MAX, max = INT_MIN;
for (int i = 0; i < length(array); i++) {
if (array[i] < min) min = array[i];
if (array[i] > max) max = array[i];
}
int sum_offset = max - min;
// round by your method of choice
int mean_offset = round((float)sum_offset / length(array));
int mean = mean_offset + min;
Of course, you need to make sure sum_offset does not overflow, which can happen if the difference between the largest and smallest array elements is larger than INT_MAX. In that case, replace the last four lines with something like this:
// round by your method of choice
int mean_offset = round((float)max / length(array) - (float)min / length(array));
int mean = mean_offset + min;
Trivia: this method, or something like it, also works quite well for mentally computing the mean of an array whose elements are clustered close together.
Guaranteed not to overflow:
length ← length of list
average ← 0
for each result in the list do:
average ← average + ( result / length )
end for
This has significant problems with accuracy if you're using ints due to truncation (the average of six 4's comes out as 0)
Welcome. fish, hope your stay is a pleasant one.
The following pseudo-code shows how to do this in the case where the sum will fit within an integer type, and round rounds to the nearest integer.
In your sample, the numbers add sum to 16, dividing by 3 gives you 5 1/3, which rounds to 5.
sum = 0
for i = 1 to array.size
sum = sum + array[i]
sum = sum / array.size
sum = round (sum)
This pseudocode finds the average and covers the problem of overflow:
double avg = 0
int count = 0
for x in array:
count += 1
avg = avg * (count - 1) / count // readjust old average
avg += x / count // add in new number
After that, you can apply your rounding code. If there is no easy way to round in your language, then something like this will work (rounds up when over .5):
int temp = avg - int(avg) // finds decimal portion
if temp <= 0.5
avg = int(avg) // round down
else
avg = int(avg) + 1 // round up
Pseudocode for getting the average:
double mean = 0
int count = 0
foreach int number in numbers
count++
mean += number - mean / count
round(mean) // rounds up
floor(mean + 0.5) // rounds up
ceil(mean - 0.5) // rounds down
Rounding generally involves adding 0.5, then truncating (floor), which is why 3.5 rounds up to 4. If you want 3.5 to round down to 3, do the rounding code yourself, but in reverse: subtract 0.5, then find the ceiling.
Edit: Updated requirements (no overflow)
ARM assembly. =] Untested. Won't overflow. Ever. (I hope.)
Can probably be optimized a bit. (Maybe use FP/LR?) =S Maybe THUMB will work better here.
.arm
; r0 = pointer to array of integers
; r1 = number of integers in array
; returns mean in r0
mean:
stmfd sp!, {r4,r5}
mov r5, r1
mov r2, 0 ; sum_lo
mov r3, 0 ; sum_hi
cmp r1, 0 ; Check for empty array
bz .end
.loop:
ldr r4, [r0], #4
add r2, r2, r4
adc r3, r3, #0 ; Handle overflow
sub r1, r1, #1 ; Next
bnz .loop
.end:
div r0, r2, r3, r5 ; Your own 64-bit/32-bit divide: r0 = (r3r2) / r5
bx lr

Resources