Normalized values, when summed are more than 1 - linux

I have two files:
File 1:
TOPIC:topic_0 1294
aa 234
bb 123
TOPIC:topic_1 2348
aa 833
cc 239
bb 233
File 2:
0.1 0.2 0.3 0.4
This is just the format of my files. Basically, when the second column (omitting the first "TOPIC" line) is summed for each topic, it constitutes to 1 as they are the normalized values. Similarly, in file 2, the values are normalized and hence they also constitute to 1.
I perform multiplication of the values from file 1 and 2. The resulting output file looks like:
aa 231
bb 379
cc 773
The second column when summed of the output file should give 1. But few files have values little over 1 like 1.1, 1.00038. How can I precisely get 1 for the output file? Is it some rounding off that I should do or something?
PS: The formats are just examples, the values and words are different. This is just for understanding purposes. Please help me sort this.

Python stores floating point decimals in base-2.
https://docs.python.org/2/tutorial/floatingpoint.html
This means that some decimals could be terminating in base-10, but are repeating in base-2, hence the floating-point error when you add them up.
This gets into some math, but imagine in base-10 trying to express the value 2/6. When you eliminate the common factors from the numerator and denominator it's 1/3.
It's 0.333333333..... repeating forever. I'll explain why in a moment, but for now, understand that if only store the first 16 digits in the decimal, for example, when you multiply the number by 3, you won't get 1, you'll get .9999999999999999, which is a little off.
This rounding error occurs whenever there's a repeating decimal.
Here's why your numbers don't repeat in base-10, but they do repeat in base-2.
Decimals are in base-10, which prime factors out to 2^1 * 5^1. Therefore for any ratio to terminate in base-10, its denominator must prime factor to a combination of 2's and 5's, and nothing else.
Now let's get back to Python. Every decimal is stored as binary. This means that in order for a ratio's "decimal" to terminate, the denominator must prime factor to only 2's and nothing else.
Your numbers repeat in base-2.
1/10 has (2*5) in the denominator.
2/10 reduces to 1/5 which still has five in the denominator.
3/10... well you get the idea.

Related

python return wrong result when I multiple float with int

I have a multiple in python 3.7.3
when I run 0.58 * 100 it return 57.99999999999999
Then I found that Java have same result. But C can return right number. I don't know what happen with them. Sorry if it look like basic.
Its actually not the wrong answer, just an unexpected one.
If we think a bit about the problem, There are an infinite amount of numbers between 0 and 1. Then we can see that you cannot represent all numbers between 0 and 1 with a finite amount of bytes, as infinite numbers are more then a finite number of numbers. so some numbers just cant be represented (in fact, most numbers of the infinite series between 0 and 1 cannot be represented)
Following the floating point standard (IEEE-754), 0.58 is really 0.5799999999999999289457264239899814128875732421875 which is the closest number to 0.58 that can be represented with 64bit floating points.
check it with python
>>> Decimal(0.58)
Decimal('0.57999999999999996003197111349436454474925994873046875')
If you want 58.0 you can quantize it to two decimals with the Decimal class.
>>> Decimal(100 * 0.58).quantize(Decimal('.01'))
Decimal('58.00')

How to compress an integer to a smaller string of text?

Given a random integer, for example, 19357982357627685397198. How can I compress these numbers into a string of text that has fewer characters?
The string of text must only contain numbers or alphabetical characters, both uppercase and lowercase.
I've tried Base64 and Huffman-coding that claim to compress, but none of them makes the string shorter when writing on a keyboard.
I also tried to make some kind of algorithm that tries to divide the integer by the numbers "2,3,...,10" and check if the last number in the result is the number it was divided by (looks for 0 in case of division by 10). So, when decrypting, you would just multiply the number by the last number in the integer. But that does not work because in some cases you can't divide by anything and the number would stay the same, and when it would be decrypted, it would just multiply it into a larger number than you started with.
I also tried to divide the integer into blocks of 2 numbers starting from left and giving a letter to them (a=1, b=2, o=15), and when it would get to z it would just roll back to a. This did not work because when it was decrypted, it would not know how many times the number rolled over z and therefore be a much smaller number than in the start.
I also tried some other common encryption strategies. For example Base32, Ascii85, Bifid Cipher, Baudot Code, and some others I can not remember.
It seems like an unsolvable problem. But because it starts with an integer, each number can contain 10 different combinations. While in the alphabet, letters can contain 26 different combinations. This makes it so that you can store more data in 5 alphabetical letters, than in a 5 digit integer. So it is possible to store more data in a string of characters than in an integer in mathematical means, but I just can't find anyone who has ever done it.
You switch from base 10 to eg. base 62 by repeatedly dividing by 62 and record the remainders from each step like this:
Converting 6846532136 to base62:
Operation Result Remainder
6846532136 / 62 110427937 42
110427937 / 62 1781095 47
1781095 / 62 28727 21
28727 / 62 463 21
463 / 62 7 29
7 / 62 0 7
Then you use the remainder as index in to a base62 alphabet of your choice eg:
0 1 2 3 4 5 6
01234567890123456789012345678901234567890123456789012345678901
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Giving: H (7) d (29) V (21) V (21) v (47) q (42) = HdVVvq
------
It's called base10 to base62, there bunch of solutions and code on the internet.
Here is my favorite version: Base 62 conversion

How many operations can we do with an 8-digit (plus decimal) calculator?

I have this model: a simple 8-digit display calculator (no memory buttons, no square root etc etc) has buttons (the decimal point does not count as a 'digit'):
10 buttons for integers 0 to 9,
1 button for dot (decimal point, so it can hold decimals, like from 0.0000001 to 9999999.9),
4 buttons for operations (+, -, /, *), and
1 button for equality (=). (the on/off button doesn't count for this question)
The question is two-fold: how many numbers can they be represented on the calculator's screen? (a math-explained solution would be appreciated)
*AND
if we have to make all 4 basic operations between any pair of 2 numbers, of the above calculated, how many operations would that be?
Thank you for your insight and help!
For part one of this answer, we want to know how many numbers can be represented on the calculator's screen.
Start with a simplified example and work up from there. Let's start with a 1-digit display. With this calculator, you can display the numbers from 0 to 9, and you can display each of those numbers with a decimal point either before the digit (making it a decimal), or after the digit (making it an integer). How many unique numbers can be made?
.0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.
That's 20 possibilities with 1 repeat number makes 19 unique numbers. Let's find this result again, but using a mathematical approach that we can scale up to a larger number of digits.
Start by finding all the numbers 0 <= n < 1 that can be made. For the numbers to fit in that range, the decimal point must be before the first digit. We're still dealing with 1 digit, so there are 101 different ways to fill the calculator with numbers that are greater than or equal to 0, but less than 1.
Next, find all the numbers 1 <= n < 10 that can be made. To do this, you move the decimal point one place to the right, so now it's after the first digit, and you also can't allow the first digit to be zero (or the number will be less than 1). That leaves you 9 unique numbers.
[0<=n<1] + [1<=n<10] = 10 + 9 = 19
Now we have a scaleable system. Let's do it with 2 digits so you see how it works with multiple digits before we go to 8 digits. With 2 digits, we can represent 0-99, and the decimal point can go in three different places, which means we have three ranges to check: 0<=n<1, 1<=n<10, 10<=n<100. The first set can have zero in its first place, since zero is in the set, but every other set can't have zero in the first place or else the number would be in the set below it. So the first set has 102 possibilities, but each of the other sets has 9 * 101 possibilities. We can generalize this by saying that for any number d of digits that our calculator can hold, the set 0<=n<1 will have 10d possibilities, and each other set will have 9 * 10d-1 possibilities
So for 2 digits:
[0<=n<1] + [1<=n<10] + [10<=n<100] = 100 + 90 + 90 = 280
Now you can see a pattern emerging, which can be generalize to give us the total amount of unique numbers that can be displayed on a calculator with d digits:
Unique displayable numbers = 10d + d * 9 * 10d-1
You can confirm this math with a simple Python script that manually finds all the unique numbers that can be displayed, prints the quantity it found, then also prints the result of the formula above. It gets bogged down when it gets to higher numbers of digits, but digits 1 through 5 should be enough to show the formula works.
for digits in range(1, 6):
print('---%d Digits----' % digits)
numbers = set()
for d in range(digits + 1):
numbers.update(i / 10**d for i in range(10**digits))
print(len(set(numbers)))
print(10**digits + digits * 9 * 10**(digits - 1))
And the result:
---1 Digits----
19
19
---2 Digits----
280
280
---3 Digits----
3700
3700
---4 Digits----
46000
46000
---5 Digits----
550000
550000
Which means that a calculator with an 8 digit display can show 820,000,000 unique numbers.
For part two of this answer, we want to know if we have to make all 4 basic operations between any pair of 2 numbers, of the above calculated, how many operations would that be?
How many pairs of numbers can we make between 820 million unique numbers? 820 million squared. That's 672,400,000,000,000,000 = 672.4 quadrillion. Four different operations can be used on these number pairs, so multiply that by 4 and you get 2,689,600,000,000,000,000 = 2.6896 quintillion different possible operations on a simple 8 digit calculator.
EDIT:
If the intention of the original question was for a decimal point to not be allowed to come before the first digit (a decimal 0<=n<1 would have to start with 0.) then the formula for displayable numbers changes to 10d + (d - 1) * 9 * 10d-1, which means the amount of unique displayable numbers is 730 million and the total number of operations is 2.1316 quintillion.

Counting binary digits in a list of excel cells

I'm trying to make a formula that transforms a list decimal numbers to binary, then counts the number of appearances of ones at a certain position. I was trying to build an array formula that went something like this:
{=SUM(MID(DEC2BIN(A1:A10;10);9;1)}
This will return #VALUE. Is there a way to do this?
EDIT: examples added
Input (Binary Equivalent)
2 0000000010
3 0000000101
7 0000000111
7 0000000111
5 0000000101
9 0000001001
Outputs Result
(digit to sum
from the right)
1 5
2 3
3 3
4 1
This was another way e.g. for the second digit from the right
=SUMPRODUCT(--ISODD(A1:A10/2))
Divide by 2^(n-1) where n is digit numbered from the right: the ISODD function ignores any fraction that results from the division.
=SUM(0+MID(DEC2BIN(--A1:A10,10),9,1))
array-entered.
Regards
If you are trying to count have many of the second digits are set in a range of numbers you can do this:
={SUM((MOD(A1:A10,4)>=2)+0)}
To understand this, let's look at some example data
Here I have some decimal numbers with their binary equivalents. In column C I have just extracted the 2nd digit (i.e. your MID(A1,9,1)). Then in column D I just take the modulo by 4. You can see that when the remainder is greater than 2, the second digit is set.
MOD(A1,4) basically divides the number by 4 and gives us the remainder (the numerator of the remainder if it was represented as a fraction over 4). With binary numbers, division by a power of two is just a right shift. Division by 4 is a right shift by 2 and the numbers that 'fall off' are the remainder. In this case it's the first two digits. They can be
00 | 0
01 | 1
10 | 2
11 | 3
so we see that the second digit is set only when the remainder is greater than 2.
Note the +0 in the original formula is to cast the boolean result of = to an integer so we can use SUM i.e. SUM({TRUE,FALSE}) doesn't work but SUM({TRUE,FALSE}+0) computes to SUM({1,0}) which does work.
To make this generic, let's assume you want to do it for the $E$1th digit:
=SUM((MOD(A1:A12,2^$E$1)>=2^($E$1-1))+0)
With bit operations it's not necessary to treat the number as a string.
{=SUM(BITAND(A1:A10;2^(C1-1))/2^(C1-1))}
Assuming the position you are looking for is stored in C1.

Swiss and Argentinian currency fourth decimal digit rounding

When rounding amounts of currency using the algorithm for Swiss Francs, the second and third decimal digits are considered. If less than 26, they are rounded down to 0; else if less than 76, rounded down to 5; else the whole value is rounded up.
20.125 => 20.10
20.143 => 20.15
20.179 => 20.20
What happens when the amount to be rounded has a greater decimal precision? Are all decimal digits after the third simply ignored (value is truncated), or is the value first rounded in some other way to three decimal digits first? As an example, consider truncation versus a "Math.round()" approach (less than 0.5 rounds down, else round up):
Truncation | "Math.round"
=================================================================
Start 3 d.p. Rounded | Start 3 d.p. Rounded
=================================================================
20.1259 -> 20.125 => 20.10 | 20.1259 -> 20.126 => 20.15
20.1759 -> 20.175 => 20.15 | 20.1759 -> 20.176 => 20.20
As the above shows, these edge cases vary a great deal in the final result.
Argentinian currency rounding follows a similar model which just concerns itself with the third decimal digit. Although the rounded result may have two or three decimal places, the same principle applies; if the value to be rounded has four or more decimal digits, should the algorithm just truncate anything after the third digit or should it apply some other kind of intermediate rounding to get a three decimal place result first?
Thanks!
If less than 26, they are rounded down to 0; else if less than 76, rounded down to 5; else the whole value is rounded up.
By this I would assume the "Truncation" method would be appropriate, since 0.0259XXXXX is less than 0.026

Resources