I would like to represent a 10 bit unsigned integer in C#. I need to read and write it into BinaryStream and use ++ unary operator. Should I use int as an internal representation or is there a better way?
I would use unsigned short as my base type. Writing to binary stream is going to be fun no matter what, because you'll need to pack four of these numbers to get a whole number of bytes into the stream (assuming that you want packing).
Depending on what you want to do, using an UInt16 caped to 10 bit is a good solution. You will need to overload some operators but that should be it.
The other alternative would be to use a BitArray and to redifine the ++ unary operator.
Related
The Haskell base documentation says that "A Word is an unsigned integral type, with the same size as Int."
How can I take an Int and cast its bit representation to a Word, so I get a Word value with the same bit representation as the original Int (even though the number values they represent will be different)?
I can't use fromIntegral because that will change the bit representation.
I could loop through the bits with the Bits class, but I suspect that will be very slow - and I don't need to do any kind of bit manipulation. I want some kind of function that will be compiled down to a no-op (or close to it), because no conversion is done.
Motivation
I want to use IntSet as a fast integer set implementation - however, what I really want to store in it are Words. I feel that I could create a WordSet which is backed by an IntSet, by converting between them quickly. The trouble is, I don't want to convert by value, because I don't want to truncate the top half of Word values: I just want to keep the bit representation the same.
int2Word#/word2Int# in GHC.Prim perform bit casting. You can implement wrapper functions which cast between boxed Int/Word using them easily.
I’m working on a TXT to SPC converter, and certain values have to be stored as hex of double, but Python only works with float and struct.unpack(‘<d’, struct.pack(‘<f’, value))/any other unpack and pack matryoshka doll I can conceive doesn’t work because of the difference in byte size.
The SPC library unpacks said values from SPC as <d and converts them to float through float()
What do I do?
I think you may be getting confused by different programming languages' naming strategies.
There's a class of data types known as "floating point numbers". Two floating-point number types defined by IEEE-754 are "binary32" and "binary64". In C and C++, those two types are exposed as the types float and double, respectively. In Python, only "binary64" is natively supported as a built-in type; it's known as float.
Python's struct module supports both binary32 and binary64, and uses C/C++'s nomenclature to refer to them. f specifies binary32 and d specifies binary64. Regardless of which you're using, the module packs from and unpacks to Python's native float type (which, remember, is binary64). In the case of d that's exact; in the case of f it converts the type under the hood. You don't need to fool Python into doing the conversion.
Now, I'm just going to assume you're wrong about "stored as hex of double". What I think you probably mean is "stored as double" -- namely, 64 bits in a file -- as opposed to stored as "hex of double", namely sixteen human-readable ASCII characters. That latter one just doesn't happen.
All of which is to say, if you want to store things as binary64, it's just a matter of struct.pack('d', value).
Suppose I have a script which is executed by a 64-bit Perl and which is taking one parameter which actually is a number, but of course is a string in the first place (because all command line parameters are strings).
Now, if that parameter's value fits into a 64 bit unsigned int, the script should do something with the parameter; otherwise, it should abort with an appropriate error message.
What would be the most efficient way to check if that parameter (as a string, i.e. before using it in mathematical operations) fits into a 64-bit unsigned integer?
What I already have thought of:
I could do a string comparison
I don't want to do that because in that case I had to cope with collations, and the documentation for Unicode::Collate looks a bit oversized for my small problem.
But this is just a feeling, so I'd be grateful for comments or other opinions.
Side note: I have tried this, and it worked like expected. But this was just a quick test; I did not play around with locales, so on other systems it might not work (although I doubt that there is a collation which puts "2" before "1", but you never know).
Converting to numbers before comparing won't work:
root#spock:/root/test# perl -e '$i="18446744073709551615"+0; $j="18446744073709551616"+0; print "$i $j\n"; print(($i < $j) ? "less\n" : "greater or equal\n")'
18446744073709551615 1.84467440737096e+19
greater or equal
Note how Perl prints the second number. This is the smallest unsigned integer which does not fit into 64 bits, so Perl converts it to a double. When it then compares $i and $j numerically, it has to convert $i to a double as well; due to the loss of precision involved herein, $i is converted to the same value as $j, so the comparison goes wrong.
I could do use bigint;. I have tried this, and it behaved as expected.
But that probably would lead to a dramatic loss of performance. As far as I have understood, use bigint; implies the use of various heavy libraries.
But this is just a feeling as well, so if this is the way to go, please let me know.
Another idea (not tried yet): Could I use pack() to generate a byte sequence from the stringified number somehow? Then I could check the length of that byte sequence. If it is less or equal to 8 bytes, the stringified number fits into a 64-bit unsigned integer.
How would you solve this problem?
use constant MAX_UINT64 = '18446744073709551615';
my $larger_than_max =
length($s) > length(MAX_UINT64)
|| length($s) == length(MAX_UINT64) && $s gt MAX_UINT64;
Assumes input matches /^(?:0|[1-9][0-9]*)\z/. Adjust to liking (e.g. to handle leading zeros or signs).
You can use a simple shortcut that should eliminate most numbers. Any number that has 19 or fewer digits in the decimal representation can fit in a 64 bit integer, so if the length of the string containing the integer is less than 20, it is good.
Any string with length greater than or equal to 21 is bad.
UINT64_MAX is 18446744073709551615. So, there are some numbers with 20 decimal digits can fit a 64 bit unsigned integer. Some can't.
At this point, simple string comparison using ge will be enough because the ordering of Arabic digits is the same regardless of locale.
$ perl -E "say 'yes' if $ARGV[1] ge $ARGV[0]" 18446744073709551615 18446744073709551616
yes
I'll assume the input is a string of digits for clarity.
You ask for the most efficient way. This can't be determined without understanding the distribution of inputs. For example if the inputs are uniform in 128 bit integers, the most efficient is to start with something like:
if (length(#ARGV[0]) > 20) {die "Number too large.\n"}
This deals with over 99.9999999999 % of cases. In fact if the inputs were uniform in 256 bit integers you might be forgiven for simply writing:
warn "Number too large.\n";
As to repeatedly and consistently testing in a reasonable amount of time you could consider something like this regex from Damian Conway's Regexp::Number (for signed 64 bit numbers but the principle is valid). Notice, being real code, it deals with leading zeros.
'0*(?:(?:9(?:[0-1][0-9]{17}' .
'|2(?:[0-1][0-9]{16}' .
'|2(?:[0-2][0-9]{15}' .
'|3(?:[0-2][0-9]{14}' .
'|3(?:[0-6][0-9]{13}' .
'|7(?:[0-1][0-9]{12}' .
'|20(?:[0-2][0-9]{10}' .
'|3(?:[0-5][0-9]{9}' .
'|6(?:[0-7][0-9]{8}' .
'|8(?:[0-4][0-9]{7}' .
'|5(?:[0-3][0-9]{6}' .
'|4(?:[0-6][0-9]{5}' .
'|7(?:[0-6][0-9]{4}' .
'|7(?:[0-4][0-9]{3}' .
'|5(?:[0-7][0-9]{2}' .
'|80(?:[0-6])))))))))))))))))' .
'|[1-8]?[0-9]{0,18})'
This should be blindingly fast compared with perl run-up time for example, or even a keystroke.
As to bigint, it executes very quickly and includes some cool optimization features, but unless you are testing many numbers in code the above should suffice.
If you really want to burn rubber, though, take a look at perl guts, and use something that exposes the macro SvIOK(SV*). (See https://metacpan.org/pod/release/KRISHPL/pod2texi-0.1/perlguts.pod#What-is-an-%22IV%22? for more details.)
In all of my time programming I have squeaked by without ever learning this stuff. Would love to know more about what these are and how they are used:
UInt8
UInt16LE
UInt16BE
UInt32LE
UInt32BE
Int8
Int16LE
Int16BE
Int32LE
Int32BE
FloatLE
FloatBE
DoubleLE
DoubleBE
See https://nodejs.org/api/buffer.html#buffer_buf_readuint8_offset_noassert for where Node uses these.
This datatypes are related to number representation in appropriate byte-order. It typically essential for:
Network protocols
Binary file formats
It is essential because one system should write integers/floats in such way that will give the same value on reader side. So what format to be used is just convention between two sides (writer and reader).
What acronyms means:
BE suffix stands for BigEndian
LE stands for LittleEndian
Int is Integer
Uint is Unsigned Integer
Appropriate number in integers is number of bits in the word.
I need to make a function that receives a string such as:
int *ptr[20], *p, p2, p3[3];
and the function need to print:
ptr requires 80 bytes.
p requires 4 bytes.
p2 requires 4 bytes.
p3 requires 12 bytes.
to simplify to task, I would like to use the "fake" code in the string as a "real" code, and then just print the function sizeof(variable) to answer the question. I think it is the most simple way.
But how to do it?
What you describe is the ability to "evaluate" dynamically generated code.
Some languages -- usually they are evaluated (non-compiled) ones -- have such features, but C++ does not.
Even if it did, it wouldn't be a good solution here. You need a parser. For a formal approach, you may research lexers and context-free parsers. For an ad hoc approach...well...do whatever string manipulation you would like.