Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
It seems excel knows how to calculate =cos(2^27-1) but fails to calculate =cos(2^27). That returns #NUM!. Does anyone know why?
I have no idea what sort of arithmetic that Excel uses internally, but at some point, with a large number, the error after you do a mod 2*pi operation is too substantial to produce a reliable answer. Presumably they picked 2^27 as their cutoff.
This is behavior is not well documented. The Sin Function documentation indicates that the argument is a Double, and the specified limits in the documentation indicate that the double type is stored as a 64-bit number ranging from 4.94E-324 to 1.797E308 (for positive numbers).
I suspect that it is not coincidental that 2^27 (134,217,728) bytes is precisely 128 megabytes, and it seems likely that there is an internal limitation for some trig functions (eg. COS, SIN and TAN, but interestingly, NOT for TANH, etc.). This is not to say that this amount of memory consumption would be required - it's just that a programmer's implementation could have some (potentially unnecessary) limits on these types of inputs internally.
To get around this silly limit, simply use the following:
=COS(MOD(2^27, 2*PI()))
This works because the limitation does not exist for other operations, and is nowhere to be seen in the Excel Specifications and Limits. :-)
It would be good for the documentation as linked provided a description of these limits, but unfortunately, it does not.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a financial application that deals with "percentages" quite frequently. We are using "decimal" types to avoid rounding errors. My question is:
If I have a quantity representing 76%, is it better to store it as 0.76 or 76?
Furthermore, what are the advantages and/or disadvantages from a code maintainability point of view? Are there conventions for this?
If percentage is only a small part of the problem domain, I would probably stick with a primitive number.
PerCent literally means per hundred, and is a decimal number; e.g. 76 % is equal to 0.76. Thus, in the absence of units of measure support in the programming language itself, I would represent a percentage as a decimal number.
This also means that you don't need to perform special arithmetic in order to calculate with percentages, but you will need to multiply by 100 if you need to display the number in percent.
If percentage is a very central part of the problem domain, you should consider discarding Primitive Obsession and instead introduce a proper Value Object.
Still, even if you introduce a Percent Value Object that contains conversions to and from primitive numbers, you're still stuck with potential programmer errors, because in itself, the number 0.9 could both mean 90 % or 0.9 % depending on how you choose to interpret it.
In the end, my best advice is to cover your code base with appropriate unit tests, so that you lock the conversion code down.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Given a set of 10 symbols and a set of strings(at max 100) of length at max 20 each consisting of these symbols, find the maximum length string which can be made from these symbols that doesn't have any of the given strings as its sub-string. In case, if we can have infinite long string satisfying the property, print -1.
Besides brute force algorithm which will go exponential in time, I am not able to get any solution for this.
Any hint to approach this problem will be thankful.
Given a set of strings that need to be matched, my immediate reaction is to use http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm to create a matcher. This matcher is a finite state machine that accepts one character at a time and tells you which state you end up in next, given that character.
So I think you can reduce the problem to accepting a directed graph and a starting point and finding the longest route through that graph that does not go through the nodes that correspond to pattern matches - which I think we can simply delete from the graph. This is covered in http://en.wikipedia.org/wiki/Longest_path. Constructing this graph is also linear so the whole thing seems to be O(n)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We are building a text search solution and want a way to measure precision and recall of the system every time we add new document types. From reading some of the posts here it sounds like a machine learning based solution is the way to go. Can a expert comment on this? We will then look to add machine learning folks to our team.
The only way to get the F1-score require knowledge about the correct class, rank of all samples obtains by evaluation querys, and you also need thoses evaluation querys.
Any machine learning will need a large quantity of manual work to provided thoses samples and/or querys. So large that it wont save you any time.
Another bad aspect of this evaluation is through to learning-related intrinsic errors. It will go with the growing size of the index of the search engine and the number of examples required. You never get a good evaluation.
Forget machine-learning for the evaluation of search engine.
Build by hand your tests querys and sample, by the time it will become big and reliable.
If you really want machine-learning in your system, you should look at query pre-processing. Getting some meta-information about the query by another way (you say SVN, why not?) is generaly a good for performance and while it did'nt change the result, you can use the same sample for an end-to-end evaluation.
That what I have done few years ago, but with naive baye classifier on natural langage analysis.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
This question has been flagged as irrelevant so I guess this has no real worth to anyone so I tried removing the question but the system won't let me so I am now truncating the content of this post ;)
I think you need to run the actual numbers for both scenarios:
On the fly
how long does one image take to generate and do you want the client to wait that long
do you need to pay by cpu utilization, number of CPUs etc. and what will this cost for X images thumbnailed Y times over 1 year
Stored
how much space will this use and what will it cost
how many files are there? Is the number bigger than the number of inodes in the destination file system, or is the total estimated size bigger than the file system
It^s mostly an economics question, there is no general yes/no answer. When in doubt, I'd probably go with storing them since it's a computation intensive tasks and it's not very efficient to do it over and over again. You could also do a hybrid solution like generate a thumbnail on the fly when it is first requested, then cache it until it wasn't used for certain a number of days.
TL;DR: number of inodes is probably your least concern.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Question with regards to taking the logarithm of a variable (Statistics Question)
Say you have a bar graph displaying data for an example "Cost of Computer Orders by the Population" and you are trying to analyze the data and find a distribution. The information does not indicate anything so you take the logarithm of the variable and the graph then resembles a normal distribution. I know that the normal distribution basically means the mean, but what does taking the logarithm of the information indicate?
It seems that you are describing the lognormal distribution: a random variable is said to come from a lognormal distribution if its logarithm is distributed normally.
In practice, this can describe processes where the value cannot go below zero, and most of the population is close to the left (right skewness). For example: salaries, home prices, bone fractures, number of girlfriends all could be reasonably modeled with a log normal distribution.
For example: say that on average young adults have had 2.5 girlfriends. A few have never had one; you cannot have "negative number" of girlfriends, and a few bastards have had 25. However, most young adults will have had between, say, one and three.
if you display the values of x as their log(x) then the line in the diagramm is a straight line, when the values grow exponential. This is a stastistically trick for a fast check if values grow exponentionally.