How to count binary sequence in binary number in Python?

How to count binary sequence in binary number in Python? - python-3.x

I would like to count '01' sequence in 5760 binary bits.
First, I would like to combine several binary numbers then count # of '01' occurrences.
For example, I have 64 bits integer. Say, 6291456. Then I convert it into binary. Most significant 4 bits are not used. So I'll get 60 bits binary 000...000011000000000000000000000
Then I need to combine(just put bits together since I only need to count '01') first 60 bits + second 60 bits + ...so 96 of 60 bits are stitched together.
Finally, I want to count how many '01' appears.
s = binToString(5760 binary bits)
cnt = s.count('01');

num = 6291226
binary = format(num, 'b')
print(binary)
print(binary.count('01'))
If I use number given by you i.e 6291456 it's binary representation is 11000000000000000000000 which gives 0 occurrences of '01'.
If you always want your number to be 60 bits in length you can use
binary = format(num,'060b')
It will add leading 0 to make it of given length

Say that nums is your list of 96 numbers, each of which can be stored in 64 bits. Since you want to throw away the most 4 significant bits, you are really taking the number modulo 2**60. Thus, to count the number of 01 in the resulting string, using the idea of #ShrikantShete to use the format function, you can do it all in one line:
''.join(format(n%2**60,'060b') for n in nums).count('01')

Related

Why does this hash calculating bit hack work?

For practice I've implemented the qoi specification in rust. In it there is a small hash function to store recently used pixels:
index_position = (r * 3 + g * 5 + b * 7 + a * 11) % 64
where r, g, b, and a are the red, green, blue and alpha channels respectively.
I assume this works as a hash because it creates a unique prime factorization for the numbers with the mod to limit the number of bytes. Anyways I implemented it naively in my code.
While looking at other implementations I came across this bit hack to optimize the hash calculation:
fn hash(rgba:[u8:4]) -> u8 {
let v = u32::from_ne_bytes(rgba);
let s = (((v as u64) << 32) | (v as u64)) & 0xFF00FF0000FF00FF;
s.wrapping_mul(0x030007000005000Bu64.to_le()).swap_bytes() as u8 & 63
}
I think I understand most of what's going on but I'm confused about the magic number (the multiplicand). To my understanding it should be flipped. As a step by step example:
let rgba = [0x12, 0x34, 0x56, 0x78].
On my machine (little endian) this gives v the value 0x78563412.
The bit shifting spreads the values, giving s = 0x7800340000560012.
Now here's where I get confused. The magic number has the values that should be multiplied aligned in a 64 bit field (3, 5, 7, 11), spaced the same way that the original values are. However they seem to be in reverse order from the values:
0x7800340000560012
0x030007000005000B
When multiplying it would seem that the highest value, the alpha channel (0x78), is being multiplied by 3, while the lowest value, the red channel (0x12), is being multiplied by 11. I'm also not entirely sure why this multiplication works anyway, after multiplying the values by various powers of 2.
I understand that the bytes are then swapped to big endian and trimmed, but that's not until after the multiplication step which loses me.
I know that the code produces the correct hash, but I don't understand why that's the case. Can anyone explain to me what I'm missing?

If you think about the way the math works, you want this flipped order, because it means all the results from each of the "logical" multiplications cluster in the same byte. The highest byte in the first value multiplied by the lowest byte in the second produces a result in the highest byte. The lowest byte in the first value's product with the highest byte in the second value produces a result in the same highest byte, and the same goes for the intermediate bytes.
Yes, the 0x78... and 0x03... are also multiplied by each other, but they overflow way past the top of the value and are lost. Having the order "backwards" means the result of the multiplications we care about all ends up summed in the uppermost byte (the total shift of the results we want is always 56 bits, because the 56th bit offset value is multiplied by the 0th, the 40th by the 16th, the 16th by the 40th, and the 0th by the 56th), with the rest of the multiplications we don't want having their results either overflow (and being lost) or appearing in lower bytes (which we ignore). If you flipped the bytes in the second value, the 0x78 * 0x0B (alpha value & multiplier) component would be lost to overflow, while the 0x12 * 0x03 (red value & multiplier) component wouldn't reach the target byte (every component we cared about would end up somewhat that wasn't the uppermost byte).
For a possibly more intuitive example, imagine doing the same work, but where all the bytes of one input except a single component are zero. If you multiply:
0x7800000000000000 * 0x030007000005000B
the logical result is:
0x1680348000258052800000000000000
but removing the overflow reduces that to:
0x2800000000000000
//^^ result we care about (actual product of 0x78 and 0x0B is 0x528, but only keeping low byte)
Similarly,
0x0000340000000000 * 0x030007000005000B
produces:
0x9c016c000104023c0000000000
overflowing to:
0x04023c0000000000
//^^ result we care about (actual product of 0x34 and 0x5 was 0x104, but only 04 kept)
In that case, the other multiplications did leave data in result (not all overflowed), but since we only look at the high byte, the rest gets ignored.
If you keep doing this math step by step and adding the results, you'll find that the high byte ends up the correct answer to the four individual multiplications you expected (mod 256); flip the order, and it won't work out that way.
The advantage to putting all the results in that high byte is that it allows you to use swap_bytes to move it cheaply to the low byte, and read the value directly (no need to even mask it on many architectures).

How does encode and decode 64 figure out that the last few zeros are mere padding?

https://learn.microsoft.com/en-us/dotnet/api/system.convert.tobase64string?view=net-5.0
It says
If an integral number of 3-byte groups does not exist, the remaining
bytes are effectively padded with zeros to form a complete group. In
this example, the value of the last byte is hexadecimal FF. The first
6 bits are equal to decimal 63, which corresponds to the base-64 digit
"/" at the end of the output, and the next 2 bits are padded with
zeros to yield decimal 48, which corresponds to the base-64 digit,
"w". The last two 6-bit values are padding and correspond to the
valueless padding character, "=".
Now,
Imagine that the byte array I send is
0
So, only one byte, namely 0
That one byte will be padded right into 000 right?
So now, we will have something like 0=== as the encoding because it takes 4 characters in base 64 encoding to encode 3 bytes.
Now, we gonna decode that.
How do we know that the original byte isn't 00, or 000, but just 0?
I must be missing something here.

So now, we will have something like 0=== as the encoding
3 padding characters is illegal. This would mean 6 bit plus padding.
And then 0 as a byte value is A in Base64, so it would be AA==.
So the first A has the first 6 bits of the 0 byte, the second A contributes the 2 remaining 0 bits for your byte, and then there are just 4 0 bits plus the padding left, not enough for a second byte.
How do we know that the original byte isn't 00, or 000, but just 0?
AA== has only 12 bits (6 bits per character) so it can only encode 1 Byte => 0
AAA= has 18 bits, enough for 2 bytes => 00
AAAA has 24 bits = 3 bytes => 000

Understanding the maths

I am trying to understand the maths in this code that converts binary to decimal. I was wondering if anyone could break it down so that I can see the working of a conversion. Sorry if this is too newb, but I've been searching for an explanation for hours and can't find one that explains it sufficently.
I know the conversion is decimal*2 + int(digit) but I still can't break it down to understand exaclty how it's converting to decimal
binary = input('enter a number: ')
decimal = 0
for digit in binary:
decimal= decimal*2 + int(digit)
print(decimal)

Here's example with small binary number 10 (which is 2 in decimal number)
binary = 10
for digit in binary:
decimal= decimal*2 + int(digit)
For for loop will take 1 from binary number which is at first place.
digit = 1 for 1st iteration.
It will overwrite the value of decimal which is initially 0.
decimal = 0*2 + 1 = 1
For the 2nd iteration digit= 0.
It will again calculate the value of decimal like below:
decimal = 1*2 + 0 = 2
So your decimal number is 2.
You can refer this for binary to decimal conversion

The for loop and syntax are hiding a larger pattern. First, consider the same base-10 numbers we use in everyday life. One way of representing the number 237 is 200 + 30 + 7. Breaking it down further, we get 2*10^2 + 3*10^1 + 7*10^0 (note that ** is the exponent operator in Python, but ^ is used nearly everywhere else in the world).
There's this pattern of exponents and coefficients with respect to the base 10. The exponents are 2, 1, and 0 for our example, and we can represent fractions with negative exponents. The coefficients 2, 3, and 7 are the same as from the number 237 that we started with.
It winds up being the case that you can do this uniquely for any base. I.e., every real number has a unique representation in base 10, base 2, and any other base you want to work in. In base 2, the exact same pattern emerges, but all the 10s are replaced with 2s. E.g., in binary consider 101. This is the same as 1*2^2 + 0*2^1 + 1*2^0, or just 5 in base-10.
What the algorithm you have does is make that a little more efficient. It's pretty wasteful to compute 2^20, 2^19, 2^18, and so on when you're basically doing the same operations in each of those cases. With our same binary example of 101, they've re-written it as (1 *2+0)*2+1. Notice that if you distribute the second 2 into the parenthesis, you get the same representation we started with.
What if we had a larger binary number, say 11001? Well, the same trick still works. (((1 *2+1 )*2+0)*2+0)*2+1.
With that last example, what is your algorithm doing? It's first computing (1 *2+1 ). On the next loop, it takes that number and multiplies it by 2 and adds the next digit to get ((1 *2+1 )*2+0), and so on. After just two more iterations your entire decimal number has been computed.

Effectively, what this is doing is taking each binary digit and multiplying it by 2^n where n is the place of that digit, and then summing them up. The confusion comes due to this being done almost in reverse, let's step through an example:
binary = "11100"
So first it takes the digit '1' and adds it on to 0 * 2 = 0, so we
have digit = '1'.
Next take the second digit '1' and add it to 1* 2 =
2, digit = '1' + '1'*2.
Same again, with digit = '1' + '1'*2 +
'1'*2^2.
Then the 2 zeros add nothing, but double the result twice,
so finally, digit = '0' + '0'*2 + '1'*2^2 + '1'*2^3 + '1'*2^4 = 28
(I've left quotes around digits to show where they are)
As you can see, the end result in this format is a pretty simple binary to decimal conversion.
I hope this helped you understand a bit :)

I will try to explain the logic :
Consider a binary number 11001010. When looping in Python, the first digit 1 comes in first and so on.
To convert it to decimal, we will multiply it with 2^7 and do this till 0 multiplied by 2^0.
And then we will add(sum) them.
Here we are adding whenever a digit is taken and then will multiply by 2 till the end of loop. For example, 1*(2^7) is performed here as decimal=0(decimal) +1, and then multiplied by 2, 7 times. When the next digit(1) comes in the second iteration, it is added as decimal = 1(decimal) *2 + 1(digit). During the third iteration of the loop, decimal = 3(decimal)*2 + 0(digit)
3*2 = (2+1)*2 = (first_digit) 1*2*2 + (seconds_digit) 1*2.
It continues so on for all the digits.

Maximum bit-width to store a summation of M n-bit binary numbers

I am trying to find the formula to calculate the maximum bit-width required to contain a sum of M n-bit unsigned binary numbers. Thanks!

The maximum bit-width needed should be ceil(log_2(M * (2^n - 1))).
Edit: Thanks to #MBurnham I realize now that it should be floor(log_2(M * (2^n - 1))) + 1 instead.

Assuming positive integers, you need floor(log2(x)) + 1 bits to store x. and the largest value the sum of m n-bit numbers can produce would be m * 2^n.
So I believe the formula should be
floor(log2(m * 2^n)) + 1
bits.

If I add 2 numbers the I need 1 bit more than the wider of the 2 numbers to store the result. So, if I add 2 n-bit numbers, I need n+1 bits to store the result.
if I add another n-bit number, I need (n+1)+1 bits to store the result (that's 3 n-bit numbers added so far)
if I add another n-bit number, I need ((n+1)+1)+1 bits to store the result (that's 4 n-bit numbers added so far)
if I add another n-bit number, I need (((n+1)+1)+1)+1 bits to store the result (that's 5 n-bit numbers added so far)
So, I think your formula is
n + M - 1

Conversion of numeric to string in MATLAB

Suppose I want to conver the number 0.011124325465476454 to string in MATLAB.
If I hit
mat2str(0.011124325465476454,100)
I get 0.011124325465476453 which differs in the last digit.
If I hit num2str(0.011124325465476454,'%5.25f')
I get 0.0111243254654764530000000
which is padded with undesirable zeros and differs in the last digit (3 should be 4).
I need a way to convert numerics with random number of decimals to their EXACT string matches (no zeros padded, no final digit modification).
Is there such as way?
EDIT: Since I din't have in mind the info about precision that Amro and nrz provided, I am adding some more additional info about the problem. The numbers I actually need to convert come from a C++ program that outputs them to a txt file and they are all of the C++ double type. [NOTE: The part that inputs the numbers from the txt file to MATLAB is not coded by me and I'm actually not allowed to modify it to keep the numbers as strings without converting them to numerics. I only have access to this code's "output" which is the numerics I'd like to convert]. So far I haven't gotten numbers with more than 17 decimals (NOTE: consequently the example provided above, with 18 decimals, is not very indicative).
Now, if the number has 15 digits eg 0.280783055069002
then num2str(0.280783055069002,'%5.17f') or mat2str(0.280783055069002,17) returns
0.28078305506900197
which is not the exact number (see last digits).
But if I hit mat2str(0.280783055069002,15) I get
0.280783055069002 which is correct!!!
Probably there a million ways to "code around" the problem (eg create a routine that does the conversion), but isn't there some way using the standard built-in MATLAB's to get desirable results when I input a number with random number of decimals (but no more than 17);

My HPF toolbox also allows you to work with an arbitrary precision of numbers in MATLAB.
In MATLAB, try this:
>> format long g
>> x = 0.280783054
x =
0.280783054
As you can see, MATLAB writes it out with the digits you have posed. But how does MATLAB really "feel" about that number? What does it store internally? See what sprintf says:
>> sprintf('%.60f',x)
ans =
0.280783053999999976380053112734458409249782562255859375000000
And this is what HPF sees, when it tries to extract that number from the double:
>> hpf(x,60)
ans =
0.280783053999999976380053112734458409249782562255859375000000
The fact is, almost all decimal numbers are NOT representable exactly in floating point arithmetic as a double. (0.5 or 0.375 are exceptions to that rule, for obvious reasons.)
However, when stored in a decimal form with 18 digits, we see that HPF did not need to store the number as a binary approximation to the decimal form.
x = hpf('0.280783054',[18 0])
x =
0.280783054
>> x.mantissa
ans =
2 8 0 7 8 3 0 5 4 0 0 0 0 0 0 0 0 0
What niels does not appreciate is that decimal numbers are not stored in decimal form as a double. For example what does 0.1 look like internally?
>> sprintf('%.60f',0.1)
ans =
0.100000000000000005551115123125782702118158340454101562500000
As you see, matlab does not store it as 0.1. In fact, matlab stores 0.1 as a binary number, here in effect...
1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 + ...
or if you prefer
2^-4 + 2^-5 + 2^-8 + 2^-9 + 2^-12 + 2^13 + 2^-16 + ...
To represent 0.1 exactly, this would take infinitely many such terms since 0.1 is a repeating number in binary. MATLAB stops at 52 bits. Just like 2/3 = 0.6666666666... as a decimal, 0.1 is stored only as an approximation as a double.
This is why your problem really is completely about precision and the binary form that a double comprises.
As a final edit after chat...
The point is that MATLAB uses a double to represent a number. So it will take in a number with up to 15 decimal digits and be able to spew them out with the proper format setting.
>> format long g
>> eps
ans =
2.22044604925031e-16
So for example...
>> x = 1.23456789012345
x =
1.23456789012345
And we see that MATLAB has gotten it right. But now add one more digit to the end.
>> x = 1.234567890123456
x =
1.23456789012346
In its full glory, look at x, as MATLAB sees it:
>> sprintf('%.60f',x)
ans =
1.234567890123456024298320699017494916915893554687500000000000
So always beware the last digit of any floating point number. MATLAB will try to round things intelligently, but 15 digits is just on the edge of where you are safe.
Is it necessary to use a tool like HPF or MP to solve such a problem? No, as long as you recognize the limitations of a double. However tools that offer arbitrary precision give you the ability to be more flexible when you need it. For example, HPF offers the use and control of guard digits down in that basement area. If you need them, they are there to save the digits you need from corruption.

You can use Multiple Precision Toolkit from MATLAB File Exchange for arbitrary precision numbers. Floating point numbers do not usually have a precise base-10 presentation.

That's because your number is beyond the precision of the double numeric type (it gives you between 15 to 17 significant decimal digits). In your case, it is rounded to the nearest representable number as soon as the literal is evaluated.
If you need more precision than what the double-precision floating-points provides, store the numbers in strings, or use arbitrary-precision libraries. For example use the Symbolic Toolbox:
sym('0.0111243254654764549999999')

You cannot get EXACT string since the number is stored in double type, or even long double type.
The number stored will be a subtle more or less than the number you gives.
computer only knows binary number 0 & 1. You must know that numbers in one radix may not expressed the same in other radix. For example, number 1/3, radix 10 yields 0.33333333...(The ellipsis (three dots) indicate that there would still be more digits to come, here is digit 3), and it will be truncated to 0.333333; radix 3 yields 0.10000000, see, no more or less, exactly the amount; radix 2 yields 0.01010101... , so it will likely truncated to 0.01010101 in computer,that's 85/256, less than 1/3 by rounding, and next time you fetch the number, it won't be the same you want.
So from the beginning, you should store the number in string instead of float type, otherwise it will lose precision.
Considering the precision problem, MATLAB provides symbolic computation to arbitrary precision.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string