printing a very large number in python takes forever - python-3.x

I wanted to calculate a very large number and since python supports arbitraryly large numbers I thought that's perfect.
So, here it is:
import math
x=2**24
y=3840*2160
z=x**y
print("z is calculated")
print(z)
Well, the last I see is "z is calculated", so it is not the calculation itself that is the problem.
But even after an hour I do not see any other output.
So can someone explain what is going on here?
PS: z has about 60 Million digits...

So, this is expected behaviour and "won't fix".
The algorithm converting an integer to a string has quadratic complexity and an improvement is too much effort since it happens so rarely.
As a workaround it was recommended that I use a package like GMP, which seems to be better suited.
I just hope that this does not open a door to DOS attacks to servers where the attackers can provide such a number as input. it is enough if the server wants to log that number to freeze up.

Related

Did I test the ArrayFire performance incorrectly?

I cannot figure out what's wrong. I mean, the speed is way too fast, like 1 million items vs 10 million items basically have the same 0.0005 second computation on my machine. So fast, it looks like it wasn't doing anything. But the result of the data is actually correct.
It is mind boggling because if I make similar computation on sequential loop without storing the result in an array, it is not only number of cores slower, but, like 1000 times slower than ArrayFire.
So, maybe I wasn't using the timer correctly?
Do you think they didn't actually compute the data right away? Maybe it just sets up some kind of shadow marker? And when I call the myArray.host(), it will start doing all the actual computations?
From their website, it says there is some kind of JIT to bundle the computations.
ArrayFire uses Just In Time compilation to combine many light weight functions into a single kernel launch. This along with our easy-to-use API allows users to not only quickly prototype their algorithms, but also get the best out of the underlying hardware.
I start/stop my timer right before/after few ArrayFire computations. And it is just insanely fast. Maybe I test it wrong? What's the proper way to test ArrayFire performance?
Never mind, I found out what to do,
Based on the examples, I should be using af::timeit(function) instead of using the af::timer. Using af::timeit will be very slow, but, the result scale more reasonably when I increase the size 10x. It doens't actually compute right away, that's why using af::timer myself wouldn't work.
thank you

Just a couple of really basic syntax questions (Basic)

I'm in the process of updating an old BS2 (basic stamp) Rev for an Ardunio for a piece of hardware for the company I work for, and I've just encountered a couple pieces of code that I'm not too sure on and I'd like some clarification. I've looked for a bit, but a couple of these are just not listed anywhere. I can't post the full code here for obvious reason, so I'll clarify other information as needed.
CONVERT_AD:
CONFIG_AD = CONFIG_AD |%1011 'Set all bits except channel.
LOW CHIP_SELECT 'Activate the ADC.
SHIFTOUT DATA_IO,CLOCK,LSBFIRST,[CONFIG_AD\4] 'Send config bits.
SHIFTIN DATA_IO,CLOCK,MSBPOST,[AD_RESULT\12] 'Get data bits.
HIGH CHIP_SELECT 'Deactivate the ADC.
RETURN
The line that's got me here is the CONFIG_AD = CONFIG_AD |%1011. It appears to obvious be a binary conversion, but I don't know what the operators are in this case. It looks like a 'assign'.
The value of CONFIG_AD is a word, if that's of any importance. It is hooking a pin for a half-duplex communication with a LTC1298 CN8 A-D converter. I've also read the data sheet, but it doesn't provide a lot of information regarding this. I think it's a 12 bit device? (though I'm not sure).
Just a lot of this information is really outdated and not maintained, so finding good information is really proving to be a bitch.
Also, the shift in/shift out, I'm curious why they have the two division functions on this? It seems to be converting it to another format, any explanation of why this is?
And on a slightly related note that is more of a save me time question, rather than I need to know, for the basic stamp, does anyone know it's hertz rates for the clock speed?

Unknown events in nodejs/v8 flamegraph using perf_events

I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here.
Workflow is following:
run node >0.11.13 with --perf-basic-prof, which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are written.
Capture stacks using perf record -F 99 -p `pgrep -n node` -g -- sleep 30
Fold stacks using stackcollapse-perf.pl script from this repository
Generate svg flame graph using flamegraph.pl script
I get following result (which look really nice at the beginning):
Problem is that there are a lot of [unknown] elements, which I suppose should be my nodejs function calls. I assume that whole process fails somwhere at point 3, where perf data should be folded using mappings generated by node/v8 executed with --perf-basic-prof. /tmp/perf-PID.map file is created and some mapping are written to it during node execution.
How to solve this problem?
I am using CentOS 6.5 x64, and already tried this with node 0.11.13, 0.11.14 (both prebuild, and compiled as well) with no success.
FIrst of all, what "[unknown]" means is the sampler couldn't figure out the name of the function, because it's a system or library function.
If so, that's OK - you don't care, because you're looking for things responsible for time in your code, not system code.
Actually, I'm suggesting this is one of those XY questions.
Even if you get a direct answer to what you asked, it is likely to be of little use.
Here are the reasons why:
1. CPU Profiling is of little use in an I/O bound program
The two towers on the left in your flame graph are doing I/O, so they probably take a lot more wall-time than the big pile on the right.
If this flame graph were derived from wall-time samples, rather than CPU-time samples, it could look more like the second graph below, which tells you where time actually goes:
What was a big juicy-looking pile on the right has shrunk, so it is nowhere near as significant.
On the other hand, the I/O towers are very wide.
Any one of those wide orange stripes, if it's in your code, represents a chance to save a lot of time, if some of the I/O could be avoided.
2. Whether the program is CPU- or I/O-bound, speedup opportunities can easily hide from flame graphs
Suppose there is some function Foo that really is doing something wasteful, that if you knew about it, you could fix.
Suppose in the flame graph, it is a dark red color.
Suppose it is called from numerous places in the code, so it's not all collected in one spot in the flame graph.
Rather it appears in multiple small places shown here by black outlines:
Notice, if all those rectangles were collected, you could see that it accounts for 11% of time, meaning it is worth looking at.
If you could cut its time in half, you could save 5.5% overall.
If what it's doing could actually be avoided entirely, you could save 11% overall.
Each of those little rectangles would shrink down to nothing, and pull the rest of the graph, to its right, with it.
Now I'll show you the method I use. I take a moderate number of random stack samples and examine each one for routines that might be speeded up.
That corresponds to taking samples in the flame graph like so:
The slender vertical lines represent twenty random-time stack samples.
As you can see, three of them are marked with an X.
Those are the ones that go through Foo.
That's about the right number, because 11% times 20 is 2.2.
(Confused? OK, here's a little probability for you. If you flip a coin 20 times, and it has a 11% chance of coming up heads, how many heads would you get? Technically it's a binomial distribution. The most likely number you would get is 2, the next most likely numbers are 1 and 3. (If you only get 1 you keep going until you get 2.) Here's the distribution:)
(The average number of samples you have to take to see Foo twice is 2/0.11 = 18.2 samples.)
Looking at those 20 samples might seem a bit daunting, because they run between 20 and 50 levels deep.
However, you can basically ignore all the code that isn't yours.
Just examine them for your code.
You'll see precisely how you are spending time,
and you'll have a very rough measurement of how much.
Deep stacks are both bad news and good news -
they mean the code may well have lots of room for speedups, and they show you what those are.
Anything you see that you could speed up, if you see it on more than one sample, will give you a healthy speedup, guaranteed.
The reason you need to see it on more than one sample is, if you only see it on one sample, you only know its time isn't zero. If you see it on more than one sample, you still don't know how much time it takes, but you do know it's not small.
Here are the statistics.
Generally speaking it is a bad idea to disagree with a subject matter expert but (with the greatest respect) here we go!
SO urges the answer to do the following:
"Please be sure to answer the question. Provide details and share your research!"
So the question was, at least my interpretation of it is, why are there [unknown] frames in the perf script output (and how do I turn these [unknown] frames in to meaningful names)?
This question could be about "how to improve the performance of my system?" but I don't see it that way in this particular case. There is a genuine problem here about how the perf record data has been post processed.
The answer to the question is that although the prerequisite set up is correct: the correct node version, the correct argument was present to generate the function names (--perf-basic-prof), the generated perf map file must be owned by root for perf script to produce the expected output.
That's it!
Writing some new scripts today I hit apon this directing me to this SO question.
Here's a couple of additional references:
https://yunong.io/2015/11/23/generating-node-js-flame-graphs/
https://github.com/jrudolph/perf-map-agent/blob/d8bb58676d3d15eeaaf3ab3f201067e321c77560/bin/create-java-perf-map.sh#L22
[ non-root files can sometimes be forced ] http://www.spinics.net/lists/linux-perf-users/msg02588.html

What (else) is wrong with using time as a seed for random number generation?

I understand that time is an insecure seed for random number generation because it effectively reduces the size of the seed space.
But say I don't care about security. For example, say I'm doing a Monte Carlo simulation for a card game. I DO however, care about getting as close to true randomness as possible. Will time as a seed affect the randomness of my output? I would think the choice of PRNG matters more than the seed in this case.
For security purposes you obviously need a high entropy seed. And time alone cannot provide that.
For simulation purposes the quality of the seed doesn't matter much, as long as it's unique. As you noted the quality of the PRNG is more important here.
Even a PRNG in a game may need to be secure. For example in multiplayer games a player might find out the internal state of the PRNG and use that to predict future random events, guess the opponent cards, get better loot,...
One common pitfall using time to seed a PRNG is that the time doesn't change very often. For example on windows most time related functions only change their return value every few milliseconds. So all PRNGs created withing that interval will return the same sequence.
Just for the sake of completeness, this paper by Matsumoto et al. nicely illustrates how important the initialization scheme (ie. the way of choosing your seed(s)) is for simulation. Turns out a bad initialization scheme may strongly bias the results, even though the RNG algorithm as such is rather good in principle.
If you are just running a single instance of your program, then there should not be too many problems.
However I have seen people who starts multiple programs at the same time and then each program seed with time. In that case all the program gets the same sequence of random numbers -- In particular I have seen people seeding an apache process at each call to use a random numer as session-id, only to find that different people hitting the webserver at the same time get exactly the same IDs.
Hence if you are expecting to run multiple simultanous version of the program, then using time is a very bad idea.
Think that your program runs very fast and asks for the system's time to use as a seed in a great sequence, with a very few interval. You could get the same time as the answer, so it would end up generating the same random number. So, even in a simulation, a low-entropy can be a problem.
Considering that it's not that hard to have some other sources of entropy in your system, ot that even your operating system can provide you some almost-random numbers, you could use them to increase the entropy of your time-based-seed.

Do you use "kibibyte" as a unit of measurement in your programs? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
For decades, in the field of computing (except disk manufacturers), a KB (kilobyte) was understood to mean 1024 bytes. In the past few years, there has been a movement to use KiB ("kibibyte") to mean 1024 bytes, and change the meaning of kilobyte to be 1000 bytes, dooming us to many more years of confusion. On the other hand, the movement seems to be confined to Gnome, and some overzealous wikipedia editing.
Will you be converting your programs to use KiB? If you have ever displayed a filesize in KB, did you divide by 1000 or 1024?
KB is 1024 bytes, damnit.
I did this once before in an app. While internally it used kibbi's and mebbi's (KiB, MiB, etc), it would still display in what users (in this case IT folks) were used to. The underlying field was just a long that was in bytes IIRC.
It was forward compatible, and would at least allow you to enter 4 GB as well as 4GiB. It also understood shorthand entry like 4.5G and properly rounded back to the real number of bytes - not forcing poor user to have to enter it that way and prevent their mistakes. Updating to use IEC notation is 1 line of code.
kilo's are 1000 and 98% of the world uses metric. We need to get over it already.
I see a lot of anger in many of these responses which baffles me. SI prefixes are SI prefixes, and programmers have no right to alter them for no better reason than convenience and custom. It's odd that those in Computer Science, a highly technical field, are the one's clamoring to go back to the days of cubits furlongs and rods. wtf?
We all know what we mean, but sticking to custom alienates and confuses users. Just because in the early pioneer days some guys, when talking about computer memory, decided to reuse SI notation doesn't mean they were correct to do so.
In my opinion, 1 Kilobyte equals to 1000 bytes is something drivemakers want you to believe, so that your drive looks more spacious than it really is. ;)
Since I spent a few years learning to be a mechanical engineer before switching majors, I have to admit that "kilo" always means 10^3 to me. From that standpoint, KiB makes sense. However, try saying "kibibyte" outloud a few times, and think about how dumb you sound.
Therefore, kilogram is 1000 grams, kilobyte is 1024 bytes.
Addendum: In addition, I agree with those who have been saying that we shouldn't change what is already established if it works. 1024 is simply a nicer number in binary. Also, "kibibyte" still sounds like something a dog eats.
It's not changing the meaning of "kilobyte". Kilo means 1000. Some people were using it incorrectly to refer to units of 1024 bytes.
I never display file sizes in kibibytes, because users don't care about 1000 vs 1024. Instead, I always use "XXX KB/MB/GB", where XXX is the number of bytes divided by 1 thousand / 1 million / etc.
There are 2 ways to think about this:
Use what the operating system you're running on uses. That way users have a consistent experience.
Use what is correct.
If you use KiB always though there will be no confusion. If you use KB there will be confusion. So if you chose option #2 then you're better off actually using 1024 and using the KiB suffix. Working with powers of 2 is more efficient anyway.
It's up to you but my rule of thumb would be that if you have a technical audience, then use KiB and avoid all confusion. If you have a large user base of non technical users, then use what your operating system uses. By the way Windows uses KB to mean 1024 bytes.
Areas of speciality have always used terms in ways that are understood by that specialisation. For example, a mechanical engineer building a bridge uses the term "stress" to mean something completely different from, say, a lawyer who finds out his star witness has been lying on the first day in court. Should we mandate that the engineer use the same definition for "stress" as the lawyer just because that definition is more widely used? If we do, I'm not driving across that bridge!
Kilobytes = 1024 bytes. Its an industry accepted specialisation of the term.
I use KiB.
Do you really want to hurt everyone by refusing to use well-established standards just like IE?
I've always displayed file size in 1000-byte Kilobytes. It hardly ever matters to the people who can't tell the difference, and often relieves confusion when they see the actual number. 65323 bytes = 65Kb when rounded, and the "normal" people like that.
I probably won't ever display "KiB", since that's never what my customers want.
The arrogance of deciding not to follow the standard created by more than just the computer community (see... it isn't "new" that Kilo actually means 1000) is staggering.
Only if the situation called for it. In almost all cases, 1,000-based units are more appropriate.
The only exceptions I know of are memory, since it naturally comes in multiples of a power of two, and CD size, since it's measured in multiples of 220 bytes by the manufacturers. Everything else, including hard drives, DVDs, flash drives, bandwidths, processor speeds, memory buses, etc. is currently measured in 1000s, and file sizes should be, too. (Or, at least, me and Steve Jobs think so. Windows will probably continue measuring file sizes in 1024s for years...)
To avoid confusing the user, use k- = 1,000, and Ki- = 1,024.
The sloppy usage of "k" to mean 1024 is an unholy abomination that should be killed with fire.
Mac OS X doesn't use KiB, MiB, GiB. On the other hand, when it uses the metric ones, it at least does the maths correctly:
Personally I prefer to get this stuff right so that users who are currently in the dark would learn from it. Waiting for users to change first is just foolish. Users didn't suddenly wake up some day and think that a kilobyte is 1024 bytes - it was software which made them think that, so shouldn't it be software's job to correct the mistake?
I've worked in the storage industry for a decade. Arguments over the size of a TB can vary the size of a solution by 10%. In short: programmers and the storage industry use different measurements. Neither are right all the time.
The Storage Networking Industry Association (SNIA) dictionary defines kilobyte as:
Kilobyte (KB)
[General] 1,000 (10^3) bytes.
The SNIA uses the 10^3 convention commonly found in storage and data transfer-related literature rather than the 1,024 (2^10) convention common in computer system random access memory and software literature.
My rule of thumb is:
Measure memory, files, file systems, and data on a network in 1024^n byte blocks.
Measure raw disk space — and only raw disk space — in 1000^n byte blocks.
Tell the customer which unit you're using. Repeat yourself often.
By and large, that keeps me out of trouble.
One program I'm working on uses "KiB" by default, but has a user pereference as to which unit of measurement to use (1024 B in a KiB, 1024 B in a KB, or 1000 B in a KB).
No. 1024 bytes is a kilobyte, regardless of whether that makes sense.
The usage of the "kilo-" prefix for units of 1024 bytes back in the day was probably a mistake. But it's now the standard. Trying to change it now only adds to the confusion.
We don't deal with the world as it should be; we deal with the world as it is.
Technically KiB is correct, but I have seen it only in a few applications (mainly linux console apps). Users are either used to work with 1024 for both KB and KiB (IT people) or they don't really care and will think that "KiB" is misspelled (non-IT people).
However: I have been used to work with "Kilobytes = 1024 bytes" for over 20 years now and even though I know that it is scientifically wrong will go on using it.
If you need to provide KiB to allow your soul to rest, make it available as an option, but don't confuse poor users with yet another definition - especially if they work with an OS, that uses the non-scientific approach and defines KB as 1024.
(BTW: Kibibytes always reminds me of Tinky Winky and his friends... ;) )
I tried to start using these terms when teaching my students, but I've sort of given up now.
I've taught an introductory computer course ("and this is a disk drive") a few times, and it can be confusing for the students that the prefixes mean different things in different contexts. Kilo means 1024 when you have a kilobyte or a kilobit of data, except if you store it on disk when it is 1000, and if you send a kilobit per second over a network then it is 1000, and a kilohertz is of course 1000 too. And one kilometer of fiber cable is 1000 meters! But it turns out that it really isn't that much of a problem. The engineering and computer science students need to know the difference, and they will get used to it anyway. When I meet them again in database courses or in the compiler course, there is never any confusion about the different kinds of kilos, megas and teras. And students from other areas (media design and so on) don't really care.
And after I did an informal poll among the other computer science people in my corridor at the university, and found out that most of them had never heard of these new prefixes, I definitely gave up.
A KB is 1024 bytes
A kB is 1000 bytes
unfortunately spelled out is ambiguous. I always use 1024.
Knuth refers to MB as KKBytes or kkBytes to differentiate between 1024*1024 and 1000*1000
I have honestly never heard of this & I doubt it's going to gain much traction in mainstream usage. I can't imagine why I would want to start doing this. The current definition of kilobyte is accurate & sufficient. I would much rather see hard drive manufacturers start using accurate terminology rather than further dumb-down technical terminology. Why can't manufacturers either build drives that are exactly xGB in size or simply say what they really are?
Other than rants about how the terminology needs to change, I have never heard those expressions used. It is not going to catch on.
I'm still going by measurements of 210*n until computers are based on decimal...
Kilo means 10^3 when you're working in the decimal number system.
Kilo means 2^10 when you're working in the binary number system.
I mean, just look at it... they're both quite arbitrary. It seems to me that anything else is equally arbitrary - so we have 40-year entrenched arbitrary versus brand-new arbitrary. Which should win? For now, I vote for the entrenched method, simply because it will cause less total confusion.
At some point our technology is bound to change - think quantum/genetic computers - that point will be a good opportunity to sanitize our measuring system.
Also, some users will always be confused - should we remove confusion for them at the risk of confusing the community that makes it all happen (us and the hardware guys)? I think not.
For me, this is a bit like the 'hacker' arguments we had, back in the day.
Depending on how old and stubborn you are, 'hacker' may mean a different thing to you. For a while in the media (and probably still today, partly) people consider hacking to be the act of breaking into machines illegally. However, in the industry now, the feeling people get is that it is someone who enjoys tinkering with things.
For a while the security community wasn't sure if this would take off, and we actually tried to use 'cracker' to refer to the bad guys. I don't think cracker has really taken off like we'd like, but we have reclaimed 'hacker' as a legitimate term, to quite a reasonable degree of success.
So to me this is the same: just because the media has tried to consider a KB as 1,000, I will never back down, and always stand up for the rights of the remaining 24 bits.
24bFL
Drivemaker/denary Kilobytes can burn in hell. Binary units for binary machines.

Resources