How to compress an integer to a smaller string of text? - string

Given a random integer, for example, 19357982357627685397198. How can I compress these numbers into a string of text that has fewer characters?
The string of text must only contain numbers or alphabetical characters, both uppercase and lowercase.
I've tried Base64 and Huffman-coding that claim to compress, but none of them makes the string shorter when writing on a keyboard.
I also tried to make some kind of algorithm that tries to divide the integer by the numbers "2,3,...,10" and check if the last number in the result is the number it was divided by (looks for 0 in case of division by 10). So, when decrypting, you would just multiply the number by the last number in the integer. But that does not work because in some cases you can't divide by anything and the number would stay the same, and when it would be decrypted, it would just multiply it into a larger number than you started with.
I also tried to divide the integer into blocks of 2 numbers starting from left and giving a letter to them (a=1, b=2, o=15), and when it would get to z it would just roll back to a. This did not work because when it was decrypted, it would not know how many times the number rolled over z and therefore be a much smaller number than in the start.
I also tried some other common encryption strategies. For example Base32, Ascii85, Bifid Cipher, Baudot Code, and some others I can not remember.
It seems like an unsolvable problem. But because it starts with an integer, each number can contain 10 different combinations. While in the alphabet, letters can contain 26 different combinations. This makes it so that you can store more data in 5 alphabetical letters, than in a 5 digit integer. So it is possible to store more data in a string of characters than in an integer in mathematical means, but I just can't find anyone who has ever done it.

You switch from base 10 to eg. base 62 by repeatedly dividing by 62 and record the remainders from each step like this:
Converting 6846532136 to base62:
Operation Result Remainder
6846532136 / 62 110427937 42
110427937 / 62 1781095 47
1781095 / 62 28727 21
28727 / 62 463 21
463 / 62 7 29
7 / 62 0 7
Then you use the remainder as index in to a base62 alphabet of your choice eg:
0 1 2 3 4 5 6
01234567890123456789012345678901234567890123456789012345678901
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Giving: H (7) d (29) V (21) V (21) v (47) q (42) = HdVVvq
------

It's called base10 to base62, there bunch of solutions and code on the internet.
Here is my favorite version: Base 62 conversion

Related

Optical differences between characters within a string of equal length

I'm having a data set with different length of string and they get concatenated into a separate column to be made equal via LEN(), TRIM() and REPT().
The formulas I used can be seen in the last row for each column (B:E).
Althought the length of the final string is equal, one can see that the strings within the "Name with equal length" column are not optically identical/ of "same" length.
As I want to use this column for making new file names via VBA, I wanted to explicitly have file names with "optically smooth names". (I hope you get what I mean.)
How can I achieve this? Do I have to calculate the pixel differences within (case-sensitive) letters? If so, how can I do this?
Text
Place
Length of String
Needed Spaces
Name with equal length
Length of Name
SaMPLE_TEXT
P 1
12
2
SaMPLE_TEXT--P 1_.pdf
22
SaMPLE_TexT
P 2
13
1
SaMPLE_TexT-P 2_.pdf
22
SaMPLE_text
P 3
13
1
SaMPLE_text-P 3_.pdf
22
sample_TEXT
P 4
12
2
sample_TEXT--P 4_.pdf
22
SaMPLE_TEXT
P 5
12
2
SaMPLE_TEXT--P 5_.pdf
22
=LEN(TRIM(B1))
=MAX($D$1:$D$6)-LEN(TRIM(B2))+1
=TRIM(A2)&REPT("-";D2)&TRIM(B2)&"_.pdf"
=LEN(E2)

Divide floating point with intager

So, I'm trying to divide values across two columns of a .csv file, one of which comprises intagers ('counts'), and the other is made up of floats ('Surface').
df = pd.read_csv(r'G:\file_path\file1.csv')
df['f'] = df['counts']/df['Surface']
Doing so returns the 'TypeError: string indices must be integers' error message.
An example of the file is:
I have tried to find information online on how to divide floats but can only find endless resourcess on how to use the one-slash (/) or two-slash (//) methods to output floats or intagers, opposed to anything about actually dividing floats themselves.
Any ideas on how I resolve this?? Surely it can't be all that complicated.
Cheers,
R
I suspect one of the columns is dtype object.
Please try
Data
df=pd.DataFrame({'counts':[49, 47,44,43],'Surface':[1.878914,1.854631,1.854631,1.660323]})
print(df)
counts Surface
0 49 1.878914
1 47 1.854631
2 44 1.854631
3 43 1.660323
df['f'] = df['counts'].astype(int)/df['Surface'].astype(float)
counts Surface f
0 49 1.878914 26.078895
1 47 1.854631 25.341968
2 44 1.854631 23.724396
3 43 1.660323 25.898575

How many operations can we do with an 8-digit (plus decimal) calculator?

I have this model: a simple 8-digit display calculator (no memory buttons, no square root etc etc) has buttons (the decimal point does not count as a 'digit'):
10 buttons for integers 0 to 9,
1 button for dot (decimal point, so it can hold decimals, like from 0.0000001 to 9999999.9),
4 buttons for operations (+, -, /, *), and
1 button for equality (=). (the on/off button doesn't count for this question)
The question is two-fold: how many numbers can they be represented on the calculator's screen? (a math-explained solution would be appreciated)
*AND
if we have to make all 4 basic operations between any pair of 2 numbers, of the above calculated, how many operations would that be?
Thank you for your insight and help!
For part one of this answer, we want to know how many numbers can be represented on the calculator's screen.
Start with a simplified example and work up from there. Let's start with a 1-digit display. With this calculator, you can display the numbers from 0 to 9, and you can display each of those numbers with a decimal point either before the digit (making it a decimal), or after the digit (making it an integer). How many unique numbers can be made?
.0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.
That's 20 possibilities with 1 repeat number makes 19 unique numbers. Let's find this result again, but using a mathematical approach that we can scale up to a larger number of digits.
Start by finding all the numbers 0 <= n < 1 that can be made. For the numbers to fit in that range, the decimal point must be before the first digit. We're still dealing with 1 digit, so there are 101 different ways to fill the calculator with numbers that are greater than or equal to 0, but less than 1.
Next, find all the numbers 1 <= n < 10 that can be made. To do this, you move the decimal point one place to the right, so now it's after the first digit, and you also can't allow the first digit to be zero (or the number will be less than 1). That leaves you 9 unique numbers.
[0<=n<1] + [1<=n<10] = 10 + 9 = 19
Now we have a scaleable system. Let's do it with 2 digits so you see how it works with multiple digits before we go to 8 digits. With 2 digits, we can represent 0-99, and the decimal point can go in three different places, which means we have three ranges to check: 0<=n<1, 1<=n<10, 10<=n<100. The first set can have zero in its first place, since zero is in the set, but every other set can't have zero in the first place or else the number would be in the set below it. So the first set has 102 possibilities, but each of the other sets has 9 * 101 possibilities. We can generalize this by saying that for any number d of digits that our calculator can hold, the set 0<=n<1 will have 10d possibilities, and each other set will have 9 * 10d-1 possibilities
So for 2 digits:
[0<=n<1] + [1<=n<10] + [10<=n<100] = 100 + 90 + 90 = 280
Now you can see a pattern emerging, which can be generalize to give us the total amount of unique numbers that can be displayed on a calculator with d digits:
Unique displayable numbers = 10d + d * 9 * 10d-1
You can confirm this math with a simple Python script that manually finds all the unique numbers that can be displayed, prints the quantity it found, then also prints the result of the formula above. It gets bogged down when it gets to higher numbers of digits, but digits 1 through 5 should be enough to show the formula works.
for digits in range(1, 6):
print('---%d Digits----' % digits)
numbers = set()
for d in range(digits + 1):
numbers.update(i / 10**d for i in range(10**digits))
print(len(set(numbers)))
print(10**digits + digits * 9 * 10**(digits - 1))
And the result:
---1 Digits----
19
19
---2 Digits----
280
280
---3 Digits----
3700
3700
---4 Digits----
46000
46000
---5 Digits----
550000
550000
Which means that a calculator with an 8 digit display can show 820,000,000 unique numbers.
For part two of this answer, we want to know if we have to make all 4 basic operations between any pair of 2 numbers, of the above calculated, how many operations would that be?
How many pairs of numbers can we make between 820 million unique numbers? 820 million squared. That's 672,400,000,000,000,000 = 672.4 quadrillion. Four different operations can be used on these number pairs, so multiply that by 4 and you get 2,689,600,000,000,000,000 = 2.6896 quintillion different possible operations on a simple 8 digit calculator.
EDIT:
If the intention of the original question was for a decimal point to not be allowed to come before the first digit (a decimal 0<=n<1 would have to start with 0.) then the formula for displayable numbers changes to 10d + (d - 1) * 9 * 10d-1, which means the amount of unique displayable numbers is 730 million and the total number of operations is 2.1316 quintillion.

Counting binary digits in a list of excel cells

I'm trying to make a formula that transforms a list decimal numbers to binary, then counts the number of appearances of ones at a certain position. I was trying to build an array formula that went something like this:
{=SUM(MID(DEC2BIN(A1:A10;10);9;1)}
This will return #VALUE. Is there a way to do this?
EDIT: examples added
Input (Binary Equivalent)
2 0000000010
3 0000000101
7 0000000111
7 0000000111
5 0000000101
9 0000001001
Outputs Result
(digit to sum
from the right)
1 5
2 3
3 3
4 1
This was another way e.g. for the second digit from the right
=SUMPRODUCT(--ISODD(A1:A10/2))
Divide by 2^(n-1) where n is digit numbered from the right: the ISODD function ignores any fraction that results from the division.
=SUM(0+MID(DEC2BIN(--A1:A10,10),9,1))
array-entered.
Regards
If you are trying to count have many of the second digits are set in a range of numbers you can do this:
={SUM((MOD(A1:A10,4)>=2)+0)}
To understand this, let's look at some example data
Here I have some decimal numbers with their binary equivalents. In column C I have just extracted the 2nd digit (i.e. your MID(A1,9,1)). Then in column D I just take the modulo by 4. You can see that when the remainder is greater than 2, the second digit is set.
MOD(A1,4) basically divides the number by 4 and gives us the remainder (the numerator of the remainder if it was represented as a fraction over 4). With binary numbers, division by a power of two is just a right shift. Division by 4 is a right shift by 2 and the numbers that 'fall off' are the remainder. In this case it's the first two digits. They can be
00 | 0
01 | 1
10 | 2
11 | 3
so we see that the second digit is set only when the remainder is greater than 2.
Note the +0 in the original formula is to cast the boolean result of = to an integer so we can use SUM i.e. SUM({TRUE,FALSE}) doesn't work but SUM({TRUE,FALSE}+0) computes to SUM({1,0}) which does work.
To make this generic, let's assume you want to do it for the $E$1th digit:
=SUM((MOD(A1:A12,2^$E$1)>=2^($E$1-1))+0)
With bit operations it's not necessary to treat the number as a string.
{=SUM(BITAND(A1:A10;2^(C1-1))/2^(C1-1))}
Assuming the position you are looking for is stored in C1.

How do the numbers and letters differ in hexadecimal colours?

I had a look at how hexadecimal colour codes work, for the most part, it seems pretty simple. But one thing I don't understand. If I have the code #37136F, how does the 6 and the F work together? Does this mean that the two number values are added together? So the blue value is 21? Or do they add together like: 615? If it is added together (which I feel like if the most logical way) then the maximum value you can get is 30, which gives me a range of 0-30... I feel like this isn't right, please enlighten me.
First you split the hex code into pairs of digits (so #37136F becomes 37, 13, and 6F), and those are the values for red, green, and blue respectively. Let's focus on the blue component, 6F.
6F is a two digit hexadecimal number (base 16). Just as 25 in base 10 is actually 2*10 + 5, 6F in hexadecimal is actually 6*16 + 15 = 111 in base 10. In general, if X and Y are hexadecimal digits (0 through F), then XY in base 16 is X*16 + Y.
Note that the minimum and maximum two-digit hex numbers are 00 and FF respectively, which equal 0*16 + 0 = 0 and 15*16 + 15 = 255 respectively. This is why RGB values range from 0 to 255 inclusive, when written in base 10.

Resources