I'm setting up a function in my .vimrc (using MacVim in particular, but this should be universal to vim in general) to display file sizes (in Bytes, Kilobytes, and Megabytes) in my statusline. While the function works quite perfectly without errors, it's giving me unexpected output! In hindsight, it's certainly producing the output it should, but not the output I want.
Here's the function:
" I modified the FileSize() function shown here to suit my own preferences:
" http://got-ravings.blogspot.com/2008/08/vim-pr0n-making-statuslines-that-own.htm
function! StatuslineFileSize()
let bytes = getfsize(expand("%:p"))
if bytes < 1024
return bytes . "B"
elseif (bytes >= 1024) && (bytes < 10240)
return string(bytes / 1024.0) . "K"
elseif (bytes >= 10240) && (bytes < 1048576)
return string(bytes / 1024) . "K"
elseif (bytes >= 1048576) && (bytes < 10485760)
return string(bytes / 1048576.0) . "M"
elseif bytes >= 10485760
return string(bytes / 1048576) . "M"
endif
endfunction
Here's the way it basically works:
If filesize is less than 1KB, output size in Bytes as an integer
If filesize is between 1KB and 10KB, output size in Kilobytes as a decimal
If filesize is between 10KB and 1MB, output size in Kilobites as an integer
If filesize is between 1MB and 10MB, output size in Megabytes as a decimal
If filesize is greater than 10MB, output size in Megabytes as an integer
The output produced for steps 2 and 4 are decimals with six (6) places of precision. The desired output I would like to have should be decimals with just one (1) place of precision.
I've already searched help documentation for the round() and trunc() functions, but they will only round and truncate floats to the nearest whole number value, which is not what I would like to have happen. I've also searched the Google and StackOverflow for solutions, but most of what I can find involves altering text in the edit buffer or completely unrelated problems such as rounding floats in Java (!!!)
I'm preferably looking for a vim built-in function that can do this, a la round({expr},{prec}) or trunc({expr},{prec}), but if a user defined function can provide a sufficiently elegant solution then I'm all for that as well. I don't mind if the output is a string, since I'm obviously returning a string from StatuslineFileSize() anyways!
Use printf with precision specifiers to convert the results to strings instead of string:
return printf('%.1fM', bytes / 1048576)
Related
What determines the size of galois field when using reed-solomon algorithm to encode an arbitrary message of any size? Is it the symbol size, or the size of the message?
For example, if I am to encode ASCII characters, and I use GF(2^8) because ASCII's are 8 bits, I would end up with a maximum codeword length of 2^8 - 1 = 255 ASCII characters. Then I would have to split the message into sub-messages of length 255.
Or, if I use GF(2^s) such that 2^s - 1 >= the length of the message, then there's no need to split the message, but in this case even though I am encoding ASCII characters which are 8 bits, each symbol in the codeword would be considered 2^s bits.
Which is preferred? Or is there any other things that determine the selection of the Galois Field?
The fixed or maximum size of the message determines the symbol size. GF(2^2) for up to 15 nibbles (7.5 bytes), GF(2^8) for up to 255 bytes, GF(2^10) for up to 1023 10 bit symbols or 1278.75 bytes (often used for HDD 512 data byte sectors), GF(2^12) for up to 4095 12 bit symbols or 6142.5 bytes (often used for HDD 4096 data byte sectors).
I exported a XML file from Qlikview and the dates are in this 16-letter/digits form (i.e. 40E5A40D641FDB97). I have tried multiple ways to convert it to floating decimals and then dates but all methods have failed (incl. Excel HEX2DEC).
Anyone has dealt with this issue before? Would greatly appreciate any help!
Here is a Power Query routine that will convert that Hex number into its Date Equivalent:
I generate the binary equivalent of the Hex number using a lookup table and concatenating the results.
The algorithm should be clear in the coding, and it follows the rules set out in IEEE-754.
For the dates you mention in your question, it provides the same results.
Note that this routine assumes a valid value encoded as you describe your date representations from Qlikview. It is not a general purpose routine.
let
//don't really need the Decimal column
hexConvTable = Table.FromRecords({
[Hex="0", Dec=0, Bin = "0000"],
[Hex="1", Dec=1, Bin = "0001"],
[Hex="2", Dec=2, Bin = "0010"],
[Hex="3", Dec=3, Bin = "0011"],
[Hex="4", Dec=4, Bin = "0100"],
[Hex="5", Dec=5, Bin = "0101"],
[Hex="6", Dec=6, Bin = "0110"],
[Hex="7", Dec=7, Bin = "0111"],
[Hex="8", Dec=8, Bin = "1000"],
[Hex="9", Dec=9, Bin = "1001"],
[Hex="A", Dec=10, Bin = "1010"],
[Hex="B", Dec=11, Bin = "1011"],
[Hex="C", Dec=12, Bin = "1100"],
[Hex="D", Dec=13, Bin = "1101"],
[Hex="E", Dec=14, Bin = "1110"],
[Hex="F", Dec=15, Bin = "1111"]},
type table[Hex = Text.Type, Dec = Int64.Type, Bin = Text.Type]),
hexUp = Text.Upper(hexNum),
hexSplit = Table.FromList(Text.ToList(hexUp),Splitter.SplitByNothing(),{"hexNum"}),
//To sort back to original order
addIndex = Table.AddIndexColumn(hexSplit,"Index",0,1,Int64.Type),
//combine with conversion table
binConv = Table.Sort(
Table.Join(
addIndex,"hexNum",hexConvTable,"Hex",JoinKind.LeftOuter),
{"Index", Order.Ascending}),
//equivalent binary
binText = Text.Combine(binConv[Bin]),
sign = Text.Start(binText,1),
//change exponent binary parts to numbers
expBin = List.Transform(Text.ToList(Text.Middle(binText,1,11)),Number.FromText),
//exponent bias will vary depending on the precision being used
expBias = 1023, //Number.Power(2,10-List.PositionOf(expBin,1))-1,
expPwr= List.Reverse({0..10}),
exp = List.Accumulate({0..10},0,(state, current) =>
state + (expBin){current} * Number.Power(2,expPwr{current})) - expBias,
mantBin = List.Transform(Text.ToList(Text.Middle(binText,11,52)),Number.FromText),
mantPwr = {0..51},
mant = List.Accumulate({0..51},0,(state, current) =>
state + (mantBin){current} / Number.Power(2,mantPwr{current})) +1,
dt = mant * Number.Power(2,exp)
in
DateTime.From(dt)
you can use standard windows formatting with Num# (convert text to number) and Num to convert from hex to bin in Qlikview :
# example data from inline table in loading script
[our_hex_numbers]:
LOAD
Num(Num#(hex,'(HEX)'),'(BIN)') as bin
Inline
[hex,
'A',
'B',
'C'];
here is result:
This reference shows how floating point numbers are represented. In double precision (using a total of 64 bits) there is a sign bit, 11-bit exponent and 53-bit significand or mantissa. Observant readers will notice that gives a total of 65 bits: this is because the most significant bit in the mantissa is a hidden bit which by convention is always set to 1 and does not have to be stored.
Taking the first example:
we have
Exponent
The exponent is the first three hexadecimal digits (sign bit plus 11 digits - but the sign bit will always be zero for dates since they are positive numbers). It can be converted using any suitable standard method e.g. in Excel 365:
=LET(L,LEN(A2),seq,SEQUENCE(L),SUM((FIND(MID(A2,seq,1),"0123456789ABCDEF")-1)*16^(L-seq)))
The correct result is obtained by subtracting 1023 (the offset) from the converted value e.g.
40E -> 1038
1038 - 1023 -> 15
So the multiplier is 2^15.
Significand
We need to take the right-hand 13 hexadecimal digits (52 bits) of the string and convert it to a fraction using whatever is your favourite conversion method e.g. in Excel 365:
=LET(L,LEN(A2),seq,SEQUENCE(L),SUM((FIND(MID(A2,seq,1),"0123456789ABCDEF")-1)*16^(-seq)))
Then you need to add 1 (this is the hidden bit which is always set to 1).
Putting this together:
I made a report on QlikView licenses for myself using the file CalData.pgo.xml
and I ran into a non - critical problem of converting hex to date .. but without this transformation, the report would not be complete. (LastUsed, ToBeDeleted)
In general, I was looking for it, but I didn't find anything useful right away, except for converting 13x hex to excel.
But in the file CalData.pgo.xml the date is set in 16 digits, not 13.. I did not understand how to adapt the excel formula for 16 digits, but I realized that it is possible to trim a 16-bit hex to 13 digits. and it seems that nothing significant is lost at the same time .
it works fine for me
=date((num(Num#(right([PerDocumentCalData/NamedCalsAllocated/CalAllocEntry.LastUsed],13),'(HEX)') )*pow(16,-13)+1)*Pow(2,15),'DD.MM.YYYY hh:mm')
I changed csv to npy file. After the change, size of csv file is 5GB, and npy is 13GB.
I thought a npy file is more efficient than csv.
Am I misunderstanding this? Why is the size of npy bigger than csv?
I just used this code
full = pd.read_csv('data/RGB.csv', header=None).values
np.save('data/RGB.npy', full, allow_pickle=False, fix_imports=False)
and data structure like this:
R, G, B, is_skin
2, 5, 1, 0
10, 52, 242, 1
52, 240, 42, 0
...(row is 420,711,257)
In your case an element is an integer between 0 and 255, inclusive. That means, saved as ASCII it will need at most
3 chars for the number
1 char for ,
1 char for the whitespace
which results in at most 5 bytes (somewhat less on average) per element on the disc.
Pandas reads/interprets this as an int64 array (see full.dtype) as default, which means it needs 8 bytes per element, which leads to a bigger size of the npy-file (most of which are zeros!).
To save an integer between 0 and 255 we need only one byte, thus the size of the npy-file could be reduced by factor 8 without loosing any information - just tell pandas it needs to interpret the data as unsigned 8bit-integers:
full = pd.read_csv(r'e:\data.csv', dtype=np.uint8).values
# or to get rid of pandas-dependency:
# full = np.genfromtxt(r'e:\data.csv', delimiter=',', dtype=np.uint8, skip_header=1)
np.save(r'e:/RGB.npy', full, allow_pickle=False, fix_imports=False)
# an 8 times smaller npy-file
Most of the time npy-format needs less space, however there can be situations when the ASCII format results in smaller files.
For example if data consist mostly of very small numbers with one digit and some few very big numbers, that for them really 8bytes are needed:
in ASCII-format you pay on average 2 bytes per element (there is no need to write whitespace, , alone as delimiter is good enough).
in numpy-format you will pay 8 bytes per element.
Based on the code I found in linux/include/linux/jiffies.h:
41 #define time_after(a,b) \
42 (typecheck(unsigned long, a) && \
43 typecheck(unsigned long, b) && \
44 ((long)(b) - (long)(a) < 0))
It seems to me that there is no kind of wrap around monitoring involved. So, if the jiffies(a) were to wrap around and get back fairly close to the timeout(b), then the result would be "false" when it is actually "true".
Let's use some fairly small numbers for this example. Say, time_after(110,150) where 110 is jiffies and 150 is the timeout. The result will clearly be false - whether jiffies wrapped around or not: 150-110 is always > 0.
So, I just wanted to confirm that I didn't miss something and that is indeed how it is.
Just to be clear, in your example, because 110 is not after 150, time_after(110,150) should (and does) return false. From the comment:
time_after(a,b) returns true if the time a is after time b.
Also, note that the code does indeed handle wrapping around to 0. To make the following a bit easier to understand, I'll use unsigned and signed one-byte values, i.e. 8-bit 2's complement. But the argument is general.
Suppose b is 253, and five ticks later jiffies has wrapped around to 2. We would therefore expect time_after(2,253) to return true. And it does (using int8_t to denote a signed 8-bit value):
(int8_t) 253 - (int8_t) 2 == -3 - 2 == -5 < 0
You can try other values, too. This one is trickier, for time_after(128, 127), which should be true as well:
(int8_t) 127 - (int8_t) 128 == 127 - (-128) == 255 == -1 (for 8-bit 2's complement) < 0
In reality the type of the expression (int8_t) 127 - (int8_t) 128 would be an int, and the value really would be 255. But using longs the expression type would be long and the equivalent example would be, for time_after( 2147483648, 2147483647):
(long) 2147483647 - (long) 2147483648 == 2147483647 - (-2147483648) == 4294967295 == -1 < 0
Eventually, after wrapping around, the "after" jiffies value a will begin to catch up with the before value b, and time_after(a,b) will report false. For N-bit 2's complement, this happens when a is 2^(N-1) ticks later than b. For N=8, that happens when a is 128 ticks after b. For N=32, that's 2147483648 ticks, or (with 1 ms ticks) about 25 days.
For the mathematically inclined, I believe that in general time_after(a,b) returns true iff the least residue (modulo 2^N) of (a-b) is > 0 and < 2^(N-1).
From nearby in the same file:
/*
* Have the 32 bit jiffies value wrap 5 minutes after boot
* so jiffies wrap bugs show up earlier.
*/
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
One would hope this means it's pretty well tested.
If a server received a base64 string and wanted to check it's length before converting,, say it wanted to always permit the final byte array to be 16KB. How big could a 16KB byte array possibly become when converted to a Base64 string (assuming one byte per character)?
Base64 encodes each set of three bytes into four bytes. In addition the output is padded to always be a multiple of four.
This means that the size of the base-64 representation of a string of size n is:
ceil(n / 3) * 4
So, for a 16kB array, the base-64 representation will be ceil(16*1024/3)*4 = 21848 bytes long ~= 21.8kB.
A rough approximation would be that the size of the data is increased to 4/3 of the original.
From Wikipedia
Note that given an input of n bytes,
the output will be (n + 2 - ((n + 2) %
3)) / 3 * 4 bytes long, so that the
number of output bytes per input byte
converges to 4 / 3 or 1.33333 for
large n.
So 16kb * 4 / 3 gives very little over 21.3' kb, or 21848 bytes, to be exact.
Hope this helps
16kb is 131,072 bits. Base64 packs 24-bit buffers into four 6-bit characters apiece, so you would have 5,462 * 4 = 21,848 bytes.
Since the question was about the worst possible increase, I must add that there are usually line breaks at around each 80 characters. This means that if you are saving base64 encoded data into a text file on Windows it will add 2 bytes, on Linux 1 byte for each line.
The increase from the actual encoding has been described above.
This is a future reference for myself. Since the question is on worst case, we should take line breaks into account. While RFC 1421 defines maximum line length to be 64 char, RFC 2045 (MIME) states there'd be 76 char in one line at most.
The latter is what C# library has implemented. So in Windows environment where a line break is 2 chars (\r\n), we get this: Length = Floor(Ceiling(N/3) * 4 * 78 / 76)
Note: Flooring is because during my test with C#, if the last line ends at exactly 76 chars, no line-break follows.
I can prove it by running the following code:
byte[] bytes = new byte[16 * 1024];
Console.WriteLine(Convert.ToBase64String(bytes, Base64FormattingOptions.InsertLineBreaks).Length);
The answer for 16 kBytes encoded to base64 with 76-char lines: 22422 chars
Assume in Linux it'd be Length = Floor(Ceiling(N/3) * 4 * 77 / 76) but I didn't get around to test it on my .NET core yet.
Also it would depend on actual character encoding, i.e. if we encode to UTF-32 string, each base64 character would consume 3 additional bytes (4 byte per char).