Would a compressor work if there are no equal items? - zip

I am starting to learn compressors, and the basic idea for generic compressors is to introduce in a dictionary similar items to reduce the size of the whole thing. A example with words would be:
"I am in stack overflow.I am in stack overflow. I am in stack
overflow. I am in stack overflow. Hello. I am in stack overflow. I am
in stack overflow. I am in stack overflow. I am in stack overflow.
Bye."
So in the Dictionary we'd have:
A:"I am in stack overflow."
AAAAHello.AAAABye.
Would a compressor reduce size if there are no similar items? Or is it even possible for there to not be similar items?

Yes, text can be losslessly compressed even if there are no repeating strings, so long as the symbols appear with uneven frequency. For example if only 36 of the possible 256 bytes are used in a message, then it can be compressed to 65% of its size.
Yes, of course it's possible to have no repeating strings.

Related

How do I avoid stack overflow at compile time? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
General performance advice for Rust is to try to avoid placing things on the heap if possible.
An issue I am having is that I do not know where/when the size limit of a function stack will be reached, until my program panics unpredictably at runtime.
Two examples are:
Parsing deeply nested structs from JSON using Serde.
Creating many futures inside a function.
Questions:
Can I avoid this by detecting it at compile time?
How can I know what the limit of the stack is whilst I am writing code? Do others just know the exact size of their variables?
Why do people advise to try to avoid the heap?

Image conversion and Inode usage [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
This question has been flagged as irrelevant so I guess this has no real worth to anyone so I tried removing the question but the system won't let me so I am now truncating the content of this post ;)
I think you need to run the actual numbers for both scenarios:
On the fly
how long does one image take to generate and do you want the client to wait that long
do you need to pay by cpu utilization, number of CPUs etc. and what will this cost for X images thumbnailed Y times over 1 year
Stored
how much space will this use and what will it cost
how many files are there? Is the number bigger than the number of inodes in the destination file system, or is the total estimated size bigger than the file system
It^s mostly an economics question, there is no general yes/no answer. When in doubt, I'd probably go with storing them since it's a computation intensive tasks and it's not very efficient to do it over and over again. You could also do a hybrid solution like generate a thumbnail on the fly when it is first requested, then cache it until it wasn't used for certain a number of days.
TL;DR: number of inodes is probably your least concern.

Why does cos(2^27) fail? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
It seems excel knows how to calculate =cos(2^27-1) but fails to calculate =cos(2^27). That returns #NUM!. Does anyone know why?
I have no idea what sort of arithmetic that Excel uses internally, but at some point, with a large number, the error after you do a mod 2*pi operation is too substantial to produce a reliable answer. Presumably they picked 2^27 as their cutoff.
This is behavior is not well documented. The Sin Function documentation indicates that the argument is a Double, and the specified limits in the documentation indicate that the double type is stored as a 64-bit number ranging from 4.94E-324 to 1.797E308 (for positive numbers).
I suspect that it is not coincidental that 2^27 (134,217,728) bytes is precisely 128 megabytes, and it seems likely that there is an internal limitation for some trig functions (eg. COS, SIN and TAN, but interestingly, NOT for TANH, etc.). This is not to say that this amount of memory consumption would be required - it's just that a programmer's implementation could have some (potentially unnecessary) limits on these types of inputs internally.
To get around this silly limit, simply use the following:
=COS(MOD(2^27, 2*PI()))
This works because the limitation does not exist for other operations, and is nowhere to be seen in the Excel Specifications and Limits. :-)
It would be good for the documentation as linked provided a description of these limits, but unfortunately, it does not.

Linux free command meaning [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I got some researh to do about linux. One of the questions is the folowing:
'Is it possible that on a running system the output of the command free in the free column is the same for the first 2 rows? How can you force this?'
I googeled around, and I believe I've found the meaning of these values.
If I add buffers and cache together, and add that to free, it gives me the value of free for buffers and cache. Which means, the memory that could be used by applications if required so.
So, I suppose it could be that the values could be the same for those 2 rows, it depends on the usage. But, I've no idea how I could
The difference between the first and the second rows is that the first is row is does not count caches and buffers under the free column, while the second one does. Here is an example:
total used free shared buffers cached
[1] Mem: 4028712 2972388 1056324 0 315056 835360
[2] -/+ buffers/cache: 1821972 2206740
used[2] = used[1] - buffers - cached
free[2] = free[1] + buffers + cached
So the answer to your question is: it is possible that these two rows are identical (or at least very close to each other), but not likely on a real system, as it requires you to free/exhaust all the cache. If you are willing to experiment, try some suggestions from how to drop cache or write a program that eats away all available RAM.

Destroy a large amount of data as quickly as possible? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
How would you go about securely destroying several hundred gigabytes of arbitrary data as quickly as possible?
Incinerating hard drives is a slow, manual (and therefore insecure) process.
Physically destroying the drives does not (necessarily) take a significant amount of time. Consider, for example, http://www.redferret.net/?p=14528 .
I know the answer but this seems like one of those questions best left unanswered unless you know why it's being asked.

Resources