Is it possible to get the decimals of a given place? - 64-bit

A few years ago one of my friends had made a pi program which got the decimals of pi, so if you put a function call to it like bufferPi(5), it would get the 5th decimal place of pi. Now, obviously due to the limits of computing and for reasons I don't understand much, it's not possible to get a decimal place past a certain point.
So, let's say I had a greyscale image which was 2x2 (meaning 4 pixels in area), there would be 256 possible shades of grey ranging from pure black to pure white. Each pixel has 256 possible shades, meaning there is 256^4 shades of grey which could be used in different combinations in the image.
However, as the image gets bigger so does the amount of shades, eventually surpassing the limit of 64-bit computing (just below 2^1024) Is there any way, especially in programming/math, to get the digits afterwards or is this simply impossible to represent?
An example: Take a number which is bigger than 2^1024- in traditional programming this would be an infinite, unrepresentable number which the compiler usually just gives as "inf" or "infinity". Of course, I could shorten the number below this range and get those digits, but is there any way to get the digits after that range. See below for more detail.
I know it is possible to some extent. See the Great Internet Mersenne Prime Search for an example. It gives you a mersenne prime (a number to the power of 2, minus one) for your computer to try and see if there is any factors of.
This works on 64-bit computers, but these primes are often 2 to the power of tens of millions which is exponentially greater than 2^1024. How do they do this, since these numbers are much much much greater than the limit, so how would a computer even be able to represent the number without running out of memory and having to resort to the traditional "inf".
Sorry for the vague question but hopefully someone knows the answer, not so much for the first one but an answer to the last would be deeply appreciated. Thank you! c:

This is a good question, essentially when we talk about 64-bit computers, each elementary unit of data is 64 bits in size (e.g. address lengths, integer sizes etc...). This makes it easy to add numbers that are under this size but to go over this limit we need to split the numbers into many parts.
For example, lets consider a 4 bit computer to make things easier:
We want to add the numbers u = 01001010 and v = 00101011 but the problem is that they are too big to store therefore lets spilt these numbers into parts of 2:
a = 0100 b = 1010 (splitting u through the middle)
x = 0010 y = 1011 (splitting v through the middle)
which follows that:
a + x = 0110
b + y = 10101 <- this is too big!
now technically a + x is supposed to have an extra 4 zeros on the end so when we add a+x and b+y we actually need to add:
01100000 + 10101
But we can't tell the computer to do this because it can only add numbers up to 4 bits in length.
Therefore we need to add 0110 and 1 to get 0111 which makes our first value. And our second value then becomes 0101.
Therefore the way the final result would be stored is 0111 and 0101 (which is equal to 01110101 but the computer can't store this in 1 memory location).
This is how computers store and use arithmetic on larger numbers than the bit count can allow, by using clever tricks to add and multiply parts of the number and then storing each part in memory at separate addresses which is how computers store massive numbers like the decimal points of pi.

Related

To how many decimal places is bc accurate?

It is possible to print to several hundred decimal places a square root in bc, as it is in C. However in C it is only accurate to 15. I have checked the square root of 2 to 50 decimal places and it is accurate but what is the limit in bc? I can't find any reference to this.
To how many decimal places is bc accurate?
bc is an arbitrary precision calculator. Arbitrary precision just tells us how many digits it can represent (as many as will fit in memory), but doesn't tell us anything about accuracy.
However in C it is only accurate to 15
C uses your processor's built-in floating point hardware. This is fast, but has a fixed number of bits to represent each number, so is obviously fixed rather than arbitrary precision.
Any arbitrary precision system will have more ... precision than this, but could of course still be inaccurate. Knowing how many digits can be stored doesn't tell us whether they're correct.
However, the GNU implementation of bc is open source, so we can just see what it does.
The bc_sqrt function uses an iterative approximation (Newton's method, although the same technique was apparently known by the Babylonians in at least 1,000BC).
This approximation is just run, improving each time, until two consecutive guesses differ by less than the precision requested. That is, if you ask for 1,000 digits, it'll keep going until the difference is at most in the 1,001st digit.
The only exception is when you ask for an N-digit result and the original number has more than N digits. It'll use the larger of the two as its target precision.
Since the convergence rate of this algorithm is faster than one digit per iteration, there seems little risk of two consecutive iterations agreeing to some N digits without also being correct to N digits.

Can you retrieve the original decimal number from the least significant bits of another operation?

I am performing an operation where a function F(k,x) takes two 64bit values and returns the product of their decimal numbers. For example:
F(123,231) = 123 x 231 = 28413
The number is then converted into binary and the least significant bits are extracted. i.e. if 28413 = 0110111011111101 then we take 11111101, which is 253 in decimal.
This function is part of a Feistel network in security. When performing a type of attack (chosen plaintext) we get to the point where we have 253 and 231, but need to figure out 123.
Is there any way that is possible?
Your function is doing F(k,x) = k*x mod 256.
Your question is given F(k,x) and x, can you find k?
When x is odd, there are 2^56 solutions, all of which have k = x^-1 * F(k,x) mod 256. That is, you compute the inverse of x mod 256, and each possible solution is derived by adding a multiple of 256 to the product of F(k,x) with that value.
When x is even, you can't compute the inverse, but you can still determine the solutions using a similar trick. You need to first compute the number of twos (2s) that divide x, say it is t twos, and then divide out 2^t from x and 256, then solve the problem from there. i.e. k = (x/2^t)^-1 * F(k,x) mod (256/2^t).
Generally using multiplies in cipher designs is dangerous, especially due to chosen plaintext attacks, because an attacker can make things disappear to simplify his attack. You can find examples of breaking ciphers like that on my blog (see attacks on chaotic hash function and multiprime).
No.
By dropping the most significant bits, the operation is rendered mono-directional. In order to recover the 123 you would have to brute-force the function with every possibility until the result was the value you want.
I.e. run F(x,231) for values of x until the result of F is 253.
That said, knowing one of the two inputs and the output makes it relatively easy to brute force. It would depend on the number of valid values for x (e.g. is it always a 3 digit number? Always prime? Always odd?)
There may be some other shortcuts, depending on the patterns that multiplying a number of 231 gets you, but any given value for that number will have different patterns. e.g. if it was 9 instead of 231, you would know that the sum of the digits always summed to 9.

directx local space coordinates float accuracy

I'm a bit confused of the local space coordinate system. Suppose I have a complex object in the local space. I know when I want to put it in the world space I have to multiply it with Scale,Rotate,Translate matrix. But the problem is the local coordinate only ranged from -1.0f to 1.0f, when I want to have vertex like (1/500,1/100,1/100) things will not work, everything will become 0 due to the float accuracy problem.
The only solution to me now is separate them into lots of local space systems and ProjectView each individually to put them together. It seems not the correct way of solving the problem. I've been checked lots of books but none of them mentioned this issue. I really want to know how to solve it.
when I want to have vertex like (1/500,1/100,1/100) things will not work
What makes you think that? The float accuracy problem does not mean something will coerce to 0 if it can't be accurately represented. It just means, it will coerce to the floating point number closest to the intended figure.
It's the very same as writing down, e.g., 3/9 with at most 6 significant decimal digits: 0.33334 – it didn't coerce to 0. And the very same goes for floating point.
Now you may be familiar with scientific notation: x·10^y – this is essentially decimal floating point, a mantissa x and an exponent y which essentially specifies the order of magnitude. In binary floating point it becomes x·2^y. In either case the significant digits are in the mantissa. Your typical floating point number (in OpenGL) has a mantissa of 23 bits, which boils down to an amount of 22 significant binary digits (which are about 7 decimal digits).
I really want to know how to solve it.
The real trouble with floating point numbers is, if you have to mix and merge numbers over a large range of orders of magnitudes. As long as the numbers are of similar order of magnitudes, everything happens with just the mantissa. And that one last change in order of magnitude to the [-1, 1] range will not hurt you; heck this can be done by "normalizing" the floating point value and then simply dropping the exponent.
Recommended read: http://floating-point-gui.de/
Update
One further thing: If you're writing 1/500 in a language like C, then you're performing an integer division and that will of course round down to 0. If you want this to be a floating point operation you either have to write floating point literals or cast to float, i.e.
1./500.
or
(float)1/(float)500
Note that casting one of the operands to float suffices to make this a floating point division.

Implementing Dynamic programming Algo via threads to fasten-up

Say i have this very common DP problem ( Dynamic programming) -
Given a cost matrix cost[][] and a position (m, n) in cost[][], write a function that returns cost of minimum cost path to reach (m, n) from (0, 0). Each cell of the matrix represents a cost to traverse through that cell. Total cost of a path to reach (m, n) is sum of all the costs on that path (including both source and destination). You can only traverse down, right and diagonally lower cells from a given cell, i.e., from a given cell (i, j), cells (i+1, j), (i, j+1) and (i+1, j+1) can be traversed. You may assume that all costs are positive integers.
PS: answer to this - 8
Now, After solving this question.. Following Question ran through my mind.
Say i have 1000*1000 matrix. and O(n^2) will take some time (<1sec on intel i5 for sure).
but can i minimize it further. say starting 6-8 threads using this algorithm and then synchronizing them back to get the answer at last ? will it be fast or even logically possible to get answer or i should throw this thought away
Generally speaking, on such small problems (as you say < 1sec), parallel computing is less efficient than sequential due to protocol overhead (thread starting and synchronizing). Another problem might be, that you increase the cache miss rate because you're choosing the data you want to operate on "randomly" (not linearly) from the input. However, when it comes to larger problems, say matrices with 10 times as many entries, it sure is worth a thought (or two).
This is a possible solution. Given a 16x16 Matrix, we cut it into 4 equal squares. For each of those squares, one thread is responsible. The number in each little square indicates, after how many time units the result in that square can be calculated.
So, the total time is 33 units (whatever a unit is). Compared to the sequential solution with 64 units, it is just half of it. You can convince yourself that the runtime for any 2^k x 2^k Matrix is 2^(2k - 1) + 1.
However, this is only the first idea that came up to my mind. I hope that there is a (much) faster parallel solution in the world outside.
What's more, for the reasons I mentionned at the beginning of my answer, for all practical purposes, you would not achieve a speedup of 2 with my solution.
I'd start with algorithmic improvements. There's no need to test N2 solutions.
One key is the direction from which you entered a square. If you entered it by moving downward, there's no need to check the square to the right. Likewise, if you entered it by moving right, there's no need to check the path downward from there. The destination of a right-angle turn can always be reached via a diagonal move, leaving out one square and its positive weight/cost.
As far as threading goes, I can see (at least) a couple of ways of splitting things up. One would be to simply queue up requests from when you enter a square. I.e., instead of (for example) testing another square, it queues up requests to test its two or three exits. N threads process those requests, which generate more requests, continuing until all of them reach the end point.
This has the obvious disadvantage that you're likely to continue traversing some routes after serial code could abandon them because they're already longer than the shortest route you've round so far.
Another possibility would be to start two threads, one traversing forward, the other backward. In each, you find the shortest route to any given point along the diagonal, then you're left with a purely linear scan through those candidates to find the shortest sum.

Reducing sample bit-depth by truncating

I have to reduce the bit-depth of a digital audio signal from 24 to 16 bit.
Taking only the 16 most significant bits (i.e. truncating) of each sample is equivalent to doing a proportional calculation (out = in * 0xFFFF / 0xFFFFFF)?
You'll get better sounding results by adding a carefully crafted noise signal to the original signal, just below the truncating threshold, before truncating (a.k.a. dithering).
x * 0xffff / 0xffffff is overly of pedantic, but not in a good way if your samples are signed -- and probably not in a good way in general.
Yes, you want the maximum value in your source range to match the maximum value in your destination range, but the values used there are only for unsigned ranges, and the distribution of quantisation steps means that it'll be very rare that you use the largest possible output value.
If the samples are signed then the peak positive values would be 0x7fff and 0x7fffff, while the peak negative values would be -0x8000 and -0x800000. Your first problem is deciding whether +1 is equal to 0x7fff, or -1 is equal to -0x8000. If you choose the latter then it's a simple shift operation. If you try to have both then zero stops being zero.
After that you have a problem that division rounds towards zero. This means that too many values get rounded to zero compared with other values. This causes distortion.
If you want to scale according to the peak positive values, the correct form would be:
out = rint((float)in * 0x7fff / 0x7fffff);
If you fish around a bit you can probably find an efficient way to do that with integer arithmetic and no division.
This form should correctly round to the nearest available output value for any given input, and it should map the largest possible input value to the largest possible output value, but it's going to have an ugly distribution of quantisation steps scattered throughout the range.
Most people prefer:
out = (in + 128) >> 8;
if (out > 0x7fff) out = 0x7fff;
This form makes things the tiniest bit louder, to the point that positive values may clip slightly, but the quantisation steps are distributed evenly.
You add 128 because right-shift rounds towards negative infinity. The average quantisation error is -128 and you add 128 to correct this to keep 0 at precisely 0. The test for overflow is necessary because an input value of 0x7fffff would otherwise give a result of 0x8000, and when you store this in a 16-bit word it would wrap around giving a peak negative value.
C pedants can poke holes in the assumptions about right-shift and division behaviour, but I'm overlooking those for clarity.
However, as others have pointed out you generally shouldn't reduce the bit depth of audio without dithering, and ideally noise shaping. TPDF dither is as follows:
out = (in + (rand() & 255) - (rand() & 255)) >> 8;
if (out < -0x8000) out = -0x8000;
if (out > 0x7fff) out = 0x7fff;
Again, big issues with the usage of rand() which I'm going to overlook for clarity.
I assume you mean (in * 0xFFFF) / 0xFFFFFF, in which case, yes.
Dithering by adding noise will in general give you better results. The key to this is the shape of the noise. The popular pow-r dithering algorithms have a specific shape that is very popular in a lot of digital audio workstation applications (Cakewalk's SONAR, Logic, etc).
If you don't need the full on fidelity of pow-r, you can simply generate some noise at fairly low amplitude and mix it into your signal. You'll find this masks some of the quantization effects.

Resources