Implement a data type to represent "large numbers" and operate with them. A "large number" is an integer that can have up to 200 digits. For internal representation of numbers will use strings. On these numbers are defined arithmetic operations (operators +, -, *), relational (>, <, = =, etc.). Build a program that exemplifies the usage of such numbers.
Basically they are asking you to create your own implementation of BigInteger
https://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html
Which represents VERY large numbers using Strings to avoid the limitations of binary represented numbers. Math is performed on them using the digits directly similar to how you might do it using a pencil and paper.
Related
I’m working on a TXT to SPC converter, and certain values have to be stored as hex of double, but Python only works with float and struct.unpack(‘<d’, struct.pack(‘<f’, value))/any other unpack and pack matryoshka doll I can conceive doesn’t work because of the difference in byte size.
The SPC library unpacks said values from SPC as <d and converts them to float through float()
What do I do?
I think you may be getting confused by different programming languages' naming strategies.
There's a class of data types known as "floating point numbers". Two floating-point number types defined by IEEE-754 are "binary32" and "binary64". In C and C++, those two types are exposed as the types float and double, respectively. In Python, only "binary64" is natively supported as a built-in type; it's known as float.
Python's struct module supports both binary32 and binary64, and uses C/C++'s nomenclature to refer to them. f specifies binary32 and d specifies binary64. Regardless of which you're using, the module packs from and unpacks to Python's native float type (which, remember, is binary64). In the case of d that's exact; in the case of f it converts the type under the hood. You don't need to fool Python into doing the conversion.
Now, I'm just going to assume you're wrong about "stored as hex of double". What I think you probably mean is "stored as double" -- namely, 64 bits in a file -- as opposed to stored as "hex of double", namely sixteen human-readable ASCII characters. That latter one just doesn't happen.
All of which is to say, if you want to store things as binary64, it's just a matter of struct.pack('d', value).
Recently I had an example where in a xml message integer fields contained leading zeros. Unfortunately these zeros had relevance. One could argue why in the schema definition integer was chosen. But that is not my question. I was a little surprised leading zeros where allowed at all. So I looked up the specs which of course told me the supertype is decimal. But as expected specification don't really tell you why certain choices where made. So my question is really what is the rationale for allowing leading zeros at all? I mean numbers generally don't have leading zeros.
On a side note I guess the only way to add a restriction on leading zeros is by a pattern.
My recollection is that the XML Schema working group allowed leading zeroes in XSD decimals because they are allowed in normal decimal notation: 1, 01, 001, 0001, etc. all denote the same number in normal numerical notation. (But I don't actually remember that it was discussed at any length, so perhaps this is just my reason for believing it was the right thing to do and other WG members had other reasons for being satisfied with it.)
You are correct to suggest that the root of the problem is the use of xsd:integer as a type for a notation using strings of digits in which leading zeroes are significant (as for example in U.S. zip codes); I think you may be over-generous to say that one could argue about that decision. What possible arguments could one bring forward in favor of such an obviously erroneous choice?
Although numbers often doesn't have leading zeroes, parsing numbers almost always allows leading zeroes.
You don't want to disallow leading zeroes for numbers completely, because you want the option to write a number like 0.12 and not only like .12. As you want to allow at least one leading zero for floating point numbers, it would feel a bit restrictive to only allow one leading zero, and only for floating point numbers.
Sometimes numbers do have leading zeroes, for example the components in a date in ISO8601 format; 2014-05-02. If you want to parse a component it's convenient if the leading zero is allowed, so that you don't have to write extra code to remove it before parsing.
The XML specification just uses the same sets of rules for parsing numbers that is generally used for most formats and in most programming languages.
I've been working with System.Numerics.Complex recently, and I've started to notice the typical floating-point "drift" where the value stored gets calculated a tenth of a millionth off or something like that, which is well-known and common with the float type and even the double type. I looked into the Complex struct, and sure enough, it used double variables. Why does it use double values to store its data and not decimal values, which are designed to prevent this? How do I work around this?
To answer your question:
doubles are several orders of magnitude faster, as operations are done at the hardware level
base-2 floats can actually be more accurate for large computations, as there is less "wobble" when shifting up and down exponents: 1 bit of precision is less than 1 decimal digit. Moreover, base-2 can use an implicit leading bit, which means they can represent more numbers than other bases.
complex numbers are typically used for scientific/engineering applications, where small relative errors of approx 10-16 are outweighed by other sources of error (e.g. due to measurement or the model).
decimals on the other hand are typically used for "accounting" type operations, where round-off error is typically negligible (i.e. addition of small numbers, multiplication by integers, etc.)
It just seemed to me studying GEP,and especially analyzing Karva expressions, that Non Terminals are most suitable for functions which type is a->a for some type a, in Haskell notation.
Like, with classic examples, Q+-*/ are all functions from 'some' Double to 'a' Double and they just change in arity.
Now, how can one coder use functions of heterogeneous signature in one Karva expressed gene?
Brief Introduction to GEP/Karva
Gene Expression Programming uses dense representations of a population of expressions and applies evolutionary pressure to make better ones to solve a given problem.
Karva notation represents an expression tree as a string, represented in a non-traditional traversal of level-at-a-time, left-to-right - read more here. Using Karva notation, it is simple and quick to combine (or mutate) expressions to create the next generation.
You can parse Karva notation in Haskell as per this answer with explanation of linear time or this answer that's the same code, but with more diagrams and no proof.
Terminals are the constants or variables in a Karva expression, so /+a*-3cb2 (meaning (a+(b*2))/(3-c)) has terminals [a,b,2,3,c]. A Karva expression with no terminals is thus a function of some arity.
My Question is then more related to how one would use different types of functions without breaking the gene.
What if one wants to use a Non Terminal like a > function? One can count on the fact that, for example, it can compare Doubles. But the result, in a strongly typed Language, would be a Bool. Now, assuming that the Non terminal encoding for > is interspersed in the gene, the parse of the k-expression would result in invalid code, because anything calling it would expect a Double.
One can then think of manually and silently sneak in a cast, as is done by Ms. Ferreira in her book, where she converts Bools into Ints like 0 and 1 for False and True.
Si it seems to me that k-expressed genes are for Non Terminals of any arity, that share the property of taking values of one type a, returning a type a.
In the end, has anyone any idea about how to overcome this?
I already now that one can use homeotic genes, providing some glue between different Sub Expression Trees, but that, IMHO, is somewhat rigid, because, again, you need to know in advance returned types.
Overview
I'm looking to analyse the difference between two characters as part of a password strength checking process.
I'll explain what I'm trying to achieve and why and would like to know if what I'm looking to do is formally defined and whether there are any recommended algorithms for achieving this.
What I'm looking to do
Across a whole string, I'm looking to compare the current character with the previous character and determine how different they are.
As this relates to password strength checking, the difference between one character and it's predecessor in a string might be defined as being how predictable character N is from knowing character N - 1. There might be a formal definition for this of which I'm not aware.
Example
A password of abc123 could be arguably less secure than azu590. Both contain three letters followed by three numbers, however in the case of the former the sequence is more predictable.
I'm assuming that a password guesser might try some obvious sequences such that abc123 would be tried much before azu590.
Considering the decimal ASCII values for the characters in these strings, and given that b is 1 different from a and c is 1 different again from b, we could derive a simplistic difference calculation.
Ignoring cases where two consecutive characters are not in the same character class, we could say that abc123 has an overall character to character difference of 4 whereas azu590 has a similar difference of 25 + 5 + 4 + 9 = 43.
Does this exist?
This notion of character to character difference across a string might be defined, similar to the Levenshtein distance between two strings. I don't know if this concept is defined or what it might be called. Is it defined and if so what is it called?
My example approach to calculating the character to character difference across a string is a simple and obvious approach. It may be flawed, it may be ineffective. Are there any known algorithms for calculating this character to character difference effectively?
It sounds like you want a Markov Chain model for passwords. A Markov Chain has a number of states and a probability of transitioning between the states. In your case the states are the characters in the allowed character set and the probability of a transition is proportional to the frequency that those two letters appear consecutively. You can construct the Markov Chain by looking at the frequency of the transitions in an existing text, for example a freely available word list or password database.
It is also possible to use variations on this technique (Markov chain of order m) where you for example consider the previous two characters instead of just one.
Once you have created the model you can use the probability of generating the password from the model as a measure of its strength. This is the product of the probabilities of each state transition.
For general signals/time-series data, this is known as Autocorrelation.
You could try adapting the Durbin–Watson statistic and test for positive auto-correlation between the characters. A naïve way may be to use the unicode code-points of each character, but I'm sure that will not be good enough.