Finding mean of array of ints - statistics

Say you have an array of int (in any language with fixed size ints). How would you calculate the int closest to their mean?
Edit: to be clear, the result does not have to be present in the array. That is, for the input array [3, 6, 7] the expected result is 5. Also I guess we need to specify a particular rounding direction, so say round down if you are equally close to two numbers.
Edit: This is not homework. I haven't had homework in five years. And this is my first time on stackoverflow, so please be nice!
Edit: The obvious approach of summing up and dividing may overflow, so I'm trying to think of an approach that is overflow safe, for both large arrays and large ints. I think handling overflow correctly (without cheating and using a different type) is by far the hardest part of this problem.

Here's a way that's fast, reasonably overflow-safe and can work when the number of elements isn't known in advance.
// The length of someListOfNumbers doesn't need to be known in advance.
int mean(SomeType someListOfNumbers) {
double mean = 0, count = 0;
foreach(element; someListOfNumbers) {
count++;
mean += (element - mean) / count;
}
if(count == 0) {
throw new UserIsAnIdiotException(
"Problem exists between keyboard and chair.");
}
return cast(int) floor(mean);
}

Calculate the sum by adding the numbers up, and dividing by the number of them, with rounding:
mean = (int)((sum + length/2) / length;
If you are worried about overflow, you can do something like:
int mean = 0, remainder = 0
foreach n in number
mean += n / length
remainder += n % length
if remainder > length
mean += 1
remainder -= length
if remainder > length/2
mean += 1
print "mean is: " mean
note that this isn't very fast.

um... how about just calculating the mean and then rounding to an integer? round(mean(thearray)) Most languages have facilities that allow you to specify the rounding method.
EDIT: So it turns out that this question is really about avoiding overflow, not about rounding. Let me be clear that I agree with those that have said (in the comments) that it's not something to worry about in practice, since it so rarely happens, and when it does you can always get away with using a larger data type.
I see that several other people have given answers that basically consist of dividing each number in the array by the count of the array, then adding them up. That is also a good approach. But just for kicks, here's an alternative (in C-ish pseudocode):
int sum_offset = 0;
for (int i = 1; i < length(array); i++)
sum_offset += array[i] - array[i-1];
// round by your method of choice
int mean_offset = round((float)sum_offset / length(array));
int mean = mean_offset + array[0];
Or another way to do the same thing:
int min = INT_MAX, max = INT_MIN;
for (int i = 0; i < length(array); i++) {
if (array[i] < min) min = array[i];
if (array[i] > max) max = array[i];
}
int sum_offset = max - min;
// round by your method of choice
int mean_offset = round((float)sum_offset / length(array));
int mean = mean_offset + min;
Of course, you need to make sure sum_offset does not overflow, which can happen if the difference between the largest and smallest array elements is larger than INT_MAX. In that case, replace the last four lines with something like this:
// round by your method of choice
int mean_offset = round((float)max / length(array) - (float)min / length(array));
int mean = mean_offset + min;
Trivia: this method, or something like it, also works quite well for mentally computing the mean of an array whose elements are clustered close together.

Guaranteed not to overflow:
length ← length of list
average ← 0
for each result in the list do:
average ← average + ( result / length )
end for
This has significant problems with accuracy if you're using ints due to truncation (the average of six 4's comes out as 0)

Welcome. fish, hope your stay is a pleasant one.
The following pseudo-code shows how to do this in the case where the sum will fit within an integer type, and round rounds to the nearest integer.
In your sample, the numbers add sum to 16, dividing by 3 gives you 5 1/3, which rounds to 5.
sum = 0
for i = 1 to array.size
sum = sum + array[i]
sum = sum / array.size
sum = round (sum)

This pseudocode finds the average and covers the problem of overflow:
double avg = 0
int count = 0
for x in array:
count += 1
avg = avg * (count - 1) / count // readjust old average
avg += x / count // add in new number
After that, you can apply your rounding code. If there is no easy way to round in your language, then something like this will work (rounds up when over .5):
int temp = avg - int(avg) // finds decimal portion
if temp <= 0.5
avg = int(avg) // round down
else
avg = int(avg) + 1 // round up

Pseudocode for getting the average:
double mean = 0
int count = 0
foreach int number in numbers
count++
mean += number - mean / count
round(mean) // rounds up
floor(mean + 0.5) // rounds up
ceil(mean - 0.5) // rounds down
Rounding generally involves adding 0.5, then truncating (floor), which is why 3.5 rounds up to 4. If you want 3.5 to round down to 3, do the rounding code yourself, but in reverse: subtract 0.5, then find the ceiling.
Edit: Updated requirements (no overflow)

ARM assembly. =] Untested. Won't overflow. Ever. (I hope.)
Can probably be optimized a bit. (Maybe use FP/LR?) =S Maybe THUMB will work better here.
.arm
; r0 = pointer to array of integers
; r1 = number of integers in array
; returns mean in r0
mean:
stmfd sp!, {r4,r5}
mov r5, r1
mov r2, 0 ; sum_lo
mov r3, 0 ; sum_hi
cmp r1, 0 ; Check for empty array
bz .end
.loop:
ldr r4, [r0], #4
add r2, r2, r4
adc r3, r3, #0 ; Handle overflow
sub r1, r1, #1 ; Next
bnz .loop
.end:
div r0, r2, r3, r5 ; Your own 64-bit/32-bit divide: r0 = (r3r2) / r5
bx lr

Related

find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s

The below question was asked in the atlassian company online test ,I don't have test cases , this is the below question I took from this link
find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s. But
you cannot have D number of consecutive 0s and T number of consecutive 1s. N, D, T were given as inputs,
Please help me on this problem,any approach how to proceed with it
My approach for the above question is simply I applied recursion and tried for all possiblity and then I memoized it using hash map
But it seems to me there must be some combinatoric approach that can do this question in less time and space? for debugging purposes I am also printing the strings generated during recursion, if there is flaw in my approach please do tell me
#include <bits/stdc++.h>
using namespace std;
unordered_map<string,int>dp;
int recurse(int d,int t,int n,int oldd,int oldt,string s)
{
if(d<=0)
return 0;
if(t<=0)
return 0;
cout<<s<<"\n";
if(n==0&&d>0&&t>0)
return 1;
string h=to_string(d)+" "+to_string(t)+" "+to_string(n);
if(dp.find(h)!=dp.end())
return dp[h];
int ans=0;
ans+=recurse(d-1,oldt,n-1,oldd,oldt,s+'0')+recurse(oldd,t-1,n-1,oldd,oldt,s+'1');
return dp[h]=ans;
}
int main()
{
int n,d,t;
cin>>n>>d>>t;
dp.clear();
cout<<recurse(d,t,n,d,t,"")<<"\n";
return 0;
}
You are right, instead of generating strings, it is worth to consider combinatoric approach using dynamic programming (a kind of).
"Good" sequence of length K might end with 1..D-1 zeros or 1..T-1 of ones.
To make a good sequence of length K+1, you can add zero to all sequences except for D-1, and get 2..D-1 zeros for the first kind of precursors and 1 zero for the second kind
Similarly you can add one to all sequences of the first kind, and to all sequences of the second kind except for T-1, and get 1 one for the first kind of precursors and 2..T-1 ones for the second kind
Make two tables
Zeros[N][D] and Ones[N][T]
Fill the first row with zero counts, except for Zeros[1][1] = 1, Ones[1][1] = 1
Fill row by row using the rules above.
Zeros[K][1] = Sum(Ones[K-1][C=1..T-1])
for C in 2..D-1:
Zeros[K][C] = Zeros[K-1][C-1]
Ones[K][1] = Sum(Zeros[K-1][C=1..T-1])
for C in 2..T-1:
Ones[K][C] = Ones[K-1][C-1]
Result is sum of the last row in both tables.
Also note that you really need only two active rows of the table, so you can optimize size to Zeros[2][D] after debugging.
This can be solved using dynamic programming. I'll give a recursive solution to the same. It'll be similar to generating a binary string.
States will be:
i: The ith character that we need to insert to the string.
cnt: The number of consecutive characters before i
bit: The character which was repeated cnt times before i. Value of bit will be either 0 or 1.
Base case will: Return 1, when we reach n since we are starting from 0 and ending at n-1.
Define the size of dp array accordingly. The time complexity will be 2 x N x max(D,T)
#include<bits/stdc++.h>
using namespace std;
int dp[1000][1000][2];
int n, d, t;
int count(int i, int cnt, int bit) {
if (i == n) {
return 1;
}
int &ans = dp[i][cnt][bit];
if (ans != -1) return ans;
ans = 0;
if (bit == 0) {
ans += count(i+1, 1, 1);
if (cnt != d - 1) {
ans += count(i+1, cnt + 1, 0);
}
} else {
// bit == 1
ans += count(i+1, 1, 0);
if (cnt != t-1) {
ans += count(i+1, cnt + 1, 1);
}
}
return ans;
}
signed main() {
ios_base::sync_with_stdio(false), cin.tie(nullptr);
cin >> n >> d >> t;
memset(dp, -1, sizeof dp);
cout << count(0, 0, 0);
return 0;
}

How do I return the smallest value using a for loop?

I am given a limit, and I have to return the smallest value for n to make it true: 1+2+3+4+...+n >= limit. I feel like there's one thing missing, but I can't tell.
public int whenToReachLimit(int limit) {
int sum = 0;
for (int i = 1; sum < limit; i++) {
sum = sum + i;
}
return sum;
}
The output would be:
1 : 1
4 : 3
10 : 4
You get avoid the loop to compute the sum of the n first integers, using:
Thus the inequality becomes:
Notice that the left-hand side is positive (if n is negative, the sum is empty) and strictly increasing. Notice also that you are looking for the first integer satisfying the inequality. The idea here is first to replace the inequality by an equality which will allow us to solve the equation for n. In a second step, the possibly non-integer solution will be rounder to the closest integer.
Solving this equation for n should give you two solutions. The negative one can be discarded (remember n is positive). That is:
Finally, let's round this solution to the closest integer that will also satisfy the inequality:
NB: it can be overkilled for small inputs
I'm not sure if I know exactly what you want to do. But I would recommend to make a "practice run".
If Limit = 0 the function returns 0
If Limit = 1 the function returns 1
If Limit = 2 the function return 3
If Limit = 3 the function return 3
If Limit = 4 the function return 6
If Limit = 5 the function return 6
Now you decide by your own if the functions does what you're expecting.
I've found the answer. Turns out it doesn't work with a for loop which I find odd. But this is the answer to my own question.
public int whenToReachLimit(int limit) {
int n = 0;
int sum = 0;
while (sum < limit) {
sum += n;
n++;
}
return n-1;
}
You don't want to return sum, you want to return n (smallest possible value satisfying the given requirement).
return i-1 instead of sum.

Find the number of subsequences of a n-digit number, that are divisible by 8

Given n = 1 to 10^5, stored as a string in decimal format.
Example: If n = 968, then out of all subsequences i.e 9, 6, 8, 96, 68, 98, 968 there are 3 sub-sequences of it, i.e 968, 96 and 8, that are divisible by 8. So, the answer is 3.
Since the answer can be very large, print the answer modulo (10^9 + 7).
You can use dynamic programming. Let f(len, sum) be the number of subsequences of the prefix of length len such that their sum is sum modulo 8 (sum ranges from 0 to 7).
The value of f for len = 1 is obvious. The transitions go as follows:
We can start a new subsequence in the new position: f(len, a[i] % 8) += 1.
We can continue any subsequence from the shorter prefix:
for old_sum = 0..7
f(len, (old_sum * 10 + a[i]) % 8) += f(len - 1, old_sum) // take the new element
f(len, old_sum) += f(len - 1, old_sum) // ignore the new element
Of course, you can perform all computations module 10^9 + 7 and use a standard integer type.
The answer is f(n, 0) (all elements are taken into account and the sum modulo 8 is 0).
The time complexity of this solution is O(n) (as there are O(n) states and 2 transition from each of them).
Note: if the numbers can't have leading zeros, you can just one more parameter to the state: a flag that indicates whether the first element of the subsequence is zero (this sequences should never be extended). The rest of the solution stays the same.
Note: This answer assumes you mean contiguous subsequences.
The divisibility rule for a number to be divisible by 8 is if the last three digits of the number are divisible by 8. Using this, a simple O(n) algorithm can be obtained where n is the number of digits in the number.
Let N=a_0a_1...a_(n-1) be the decimal representation of N with n digits.
Let the number of sequences so far be s = 0
For each set of three digits, a_i a_(i+1) a_(i+2), check if the number is divisible by 8. If so, add i + 1 to the number of sequences, i.e., s = s + i. This is because all strings a_k..a_(i+2) will be divisible by 8 for k ranging from 0..i.
Loop i from 0 to n-2-1 and continue.
So, if you have 1424968, the subsequences divisible are at:
i=1 (424 yielding i+1 = 2 numbers: 424 and 1424)
i=3 (496 yielding i+1 = 4 numbers: 496, 2496, 42496, 142496)
i=4 (968 yielding i+1 = 5 numbers: 968, 4968, 24968, 424968, 1424968)
Note that some small modifications will be needed to consider numbers lesser than three digits in length.
Hence the total number of sequences = 2 + 4 + 5 = 11. Total complexity = O(n) where n is the number of digits.
One can use the fact that for any three-digit number abc the following holds:
abc % 8 = ((ab % 8) * 10 + c) % 8
Or in other words: the test for a number with a fixed start-index can be cascaded:
int div8(String s){
int total = 0, mod = 0;
for(int i = 0; i < s.length(); i++)
{
mod = (mod * 10 + s.charAt(i) - '0') % 8
if(mod == 0)
total++;
}
return total;
}
But we don't have fixed start-indices!
Well, that's pretty easy to fix:
Suppose two sequences a and b, such that int(a) % 8 = int(b) % 8 and b is a suffix of a. No matter what how the sequence continues, the modulos of a and b will always remain equal. Thus it's sufficient to keep track of the number of sequences that share the property of having an equal value modulo 8.
final int RESULTMOD = 1000000000 + 7;
int div8(String s){
int total = 0;
//modtable[i] is the number of subsequences with int(sequence) % 8 = i
int[] modTable = new int[8];
for(int i = 0; i < s.length(); i++){
int[] nextTable = new int[8];
//transform table from last loop-run (shared modulo)
for(int j = 0; j < 8; j++){
nextTable[(j * 10 + s.charAt(i) - '0') % 8] = modTable[j] % RESULTMOD;
}
//add the sequence that starts at this index to the appropriate bucket
nextTable[(s.charAt(i) - '0') % 8]++;
//add the count of all sequences with int(sequence) % 8 = 0 to the result
total += nextTable[0];
total %= RESULTMOD;
//table for next run
modTable = nextTable;
}
return total;
}
Runtime is O(n).
There are 10 possible states a subsequence can be in. The first is empty. The second is that there was a leading 0. And the other 8 are a ongoing number that is 0-7 mod 8. You start at the beginning of the string with 1 way of being empty, no way to be anything else. At the end of the string your answer is the number of ways to have a leading 0 plus an ongoing number that is 0 mod 8.
The transition table should be obvious. The rest is just normal dynamic programming.

Convert binary ( integer and fraction) from VHDL to decimal, negative value in C code

I have a 14-bit data that is fed from FPGA in vhdl, The NIos II processor reads the 14-bit data from FPGA and do some processing tasks, where Nios II system is programmed in C code
The 14-bit data can be positive, zero or negative. In Altera compiler, I can only define the data to be 8,16 or 32. So I define this to be 16 bit data.
First, I need to check if the data is negative, if it is negative, I need to pad the first two MSB to be bit '1' so the system detects it as negative value instead of positive value.
Second, I need to compute the real value of this binary representation into a decimal value of BOTH integer and fraction.
I learned from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) that I could convert a binary (consists of both integer and fraction) to decimal values.
To be specified, I am able to use this code quoted from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) , reproduced as below:
#include <stdio.h>
#include <math.h>
double convert(const char binary[]){
int bi,i;
int len = 0;
int dot = -1;
double result = 0;
for(bi = 0; binary[bi] != '\0'; bi++){
if(binary[bi] == '.'){
dot = bi;
}
len++;
}
if(dot == -1)
dot=len;
for(i = dot; i >= 0 ; i--){
if (binary[i] == '1'){
result += (double) pow(2,(dot-i-1));
}
}
for(i=dot; binary[i] != '\0'; i++){
if (binary[i] == '1'){
result += 1.0/(double) pow(2.0,(double)(i-dot));
}
}
return result;
}
int main()
{
char bin[] = "1101.11";
char bin1[] = "1101";
char bin2[] = "1101.";
char bin3[] = ".11";
printf("%s -> %f\n",bin, convert(bin));
printf("%s -> %f\n",bin1, convert(bin1));
printf("%s -> %f\n",bin2, convert(bin2));
printf("%s -> %f\n",bin3, convert(bin3));
return 0;
}
I am wondering if this code can be used to check for negative value? I did try with a binary string of 11111101.11 and it gives the output of 253.75...
I have two questions:
What are the modifications I need to do in order to read a negative value?
I know that I can do the bit shift (as below) to check if the msb is 1, if it is 1, I know it is negative value...
if (14bit_data & 0x2000) //if true, it is negative value
The issue is, since it involves fraction part (but not only integer), it confused me a bit if the method still works...
If the binary number is originally not in string format, is there any way I could convert it to string? The binary number is originally fed from a fpga block written in VHDL say, 14 bits, with msb as the sign bit, the following 6 bits are the magnitude for integer and the last 6 bits are the magnitude for fractional part. I need the decimal value in C code for Altera Nios II processor.
OK so I m focusing on the fact that you want to reuse the algorithm you mention at the beginning of your question and assume that the binary representation you have for your signed number is Two's complement but I`m not really sure according to your comments that the input you have is the same than the one used by the algorithm
First pad the 2 MSB to have a 16 bit representation
16bit_data = (14_bit_data & 0x2000) ? ( 14_bit_data | 0xC000) : 14_bit_data ;
In case value is positive then value will remained unchanged and if negative this will be the correct two`s complement representation on 16bits.
For fractionnal part everything is the same compared to algorithm you mentionned in your question.
For integer part everything is the same except the treatment of MSB.
For unsigned number MSB (ie bit[15]) represents pow(2,15-6) ( 6 is the width of frationnal part ) whereas for signed number in Two`s complement representation it represents -pow(2,15-6) meaning that algorithm become
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
dec = dec + rem * pow(2, t) * (9 != t ? 1 : -1);
++t;
}
or said differently if you don`t want * operator
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
if( 9 != t)
{
dec = dec + rem * pow(2, t);
}
else
{
dec = dec - rem * pow(2, t);
}
++t;
}
For the second algorithm that you mention, considering you format if dot == 11 and i == 0 we are at MSB ( 10 integer bits followed by dot) so the code become
for(i = dot - 1; i >= 0 ; i--)
{
if (binary[i] == '1')
{
if(11 != dot || i)
{
result += (double) pow(2,(dot-i-1));
}
else
{
// result -= (double) pow(2,(dot-i-1));
// Due to your number format i == 0 and dot == 11 so
result -= 512
}
}
}
WARNING : in brice algorithm the input is character string like "11011.101" whereas according to your description you have an integer input so I`m not sure that this algorithm is suited to your case
I think this should work:
float convert14BitsToFloat(int16_t in)
{
/* Sign-extend in, since it is 14 bits */
if (in & 0x2000) in |= 0xC000;
/* convert to float with 6 decimal places (64 = 2^6) */
return (float)in / 64.0f;
}
To convert any number to string, I would use sprintf. Be aware it may significantly increase the size of your application. If you don't need the float and what to keep a small application, you should make your own conversion function.

Minimum no. of comparisons to find median of 3 numbers

I was implementing quicksort and I wished to set the pivot to be the median or three numbers. The three numbers being the first element, the middle element, and the last element.
Could I possibly find the median in less no. of comparisons?
median(int a[], int p, int r)
{
int m = (p+r)/2;
if(a[p] < a[m])
{
if(a[p] >= a[r])
return a[p];
else if(a[m] < a[r])
return a[m];
}
else
{
if(a[p] < a[r])
return a[p];
else if(a[m] >= a[r])
return a[m];
}
return a[r];
}
If the concern is only comparisons, then this should be used.
int getMedian(int a, int b , int c) {
int x = a-b;
int y = b-c;
int z = a-c;
if(x*y > 0) return b;
if(x*z > 0) return c;
return a;
}
int32_t FindMedian(const int n1, const int n2, const int n3) {
auto _min = min(n1, min(n2, n3));
auto _max = max(n1, max(n2, n3));
return (n1 + n2 + n3) - _min - _max;
}
You can't do it in one, and you're only using two or three, so I'd say you've got the minimum number of comparisons already.
Rather than just computing the median, you might as well put them in place. Then you can get away with just 3 comparisons all the time, and you've got your pivot closer to being in place.
T median(T a[], int low, int high)
{
int middle = ( low + high ) / 2;
if( a[ middle ].compareTo( a[ low ] ) < 0 )
swap( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swap( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swap( a, middle, high );
return a[middle];
}
I know that this is an old thread, but I had to solve exactly this problem on a microcontroller that has very little RAM and does not have a h/w multiplication unit (:)). In the end I found the following works well:
static char medianIndex[] = { 1, 1, 2, 0, 0, 2, 1, 1 };
signed short getMedian(const signed short num[])
{
return num[medianIndex[(num[0] > num[1]) << 2 | (num[1] > num[2]) << 1 | (num[0] > num[2])]];
}
If you're not afraid to get your hands a little dirty with compiler intrinsics you can do it with exactly 0 branches.
The same question was discussed before on:
Fastest way of finding the middle value of a triple?
Though, I have to add that in the context of naive implementation of quicksort, with a lot of elements, reducing the amount of branches when finding the median is not so important because the branch predictor will choke either way when you'll start tossing elements around the the pivot. More sophisticated implementations (which don't branch on the partition operation, and avoid WAW hazards) will benefit from this greatly.
remove max and min value from total sum
int med3(int a, int b, int c)
{
int tot_v = a + b + c ;
int max_v = max(a, max(b, c));
int min_v = min(a, min(b, c));
return tot_v - max_v - min_v
}
There is actually a clever way to isolate the median element from three using a careful analysis of the 6 possible permutations (of low, median, high). In python:
def med(a, start, mid, last):
# put the median of a[start], a[mid], a[last] in the a[start] position
SM = a[start] < a[mid]
SL = a[start] < a[last]
if SM != SL:
return
ML = a[mid] < a[last]
m = mid if SM == ML else last
a[start], a[m] = a[m], a[start]
Half the time you have two comparisons otherwise you have 3 (avg 2.5). And you only swap the median element once when needed (2/3 of the time).
Full python quicksort using this at:
https://github.com/mckoss/labs/blob/master/qs.py
You can write up all the permutations:
1 0 2
1 2 0
0 1 2
2 1 0
0 2 1
2 0 1
Then we want to find the position of the 1. We could do this with two comparisons, if our first comparison could split out a group of equal positions, such as the first two lines.
The issue seems to be that the first two lines are different on any comparison we have available: a<b, a<c, b<c. Hence we have to fully identify the permutation, which requires 3 comparisons in the worst case.
Using a Bitwise XOR operator, the median of three numbers can be found.
def median(a,b,c):
m = max(a,b,c)
n = min(a,b,c)
ans = m^n^a^b^c
return ans

Resources