Paraview "possible mismatch of datasize with declaration" error - vtk

Paraview (v4.1.0 64-bit, OSX 10.9.2) is giving me the following error:
Generic Warning: In /Users/kitware/Dashboards/MyTests/NightlyMaster/ParaViewSuperbuild-Release/paraview/src/paraview/VTK/IO/Legacy/vtkDataReader.cxx, line 1388
Error reading ascii data. Possible mismatch of datasize with declaration.
I'm not sure why. I've double-checked that fields are all of the expected lengths, and none of the values are NaN, inf, or otherwise extremely large. The issue starts with the output from timestep 16 (0-15 produces no error). Graphically, steps 0-15 produce plots of my data as expected; step 16 shows the "Y/Yc" series having an unexpectedly large point (0.5625, 2.86616e+36).
Is fine:
http://www.filedropper.com/ring0000015
Produces error:
http://www.filedropper.com/ring0000016

I have been facing the same problem for the last 6 months and been struggling to find a solution. I was given the following reasons to explain the error(http://www.cfd-online.com/Forums/paraview/139451-error-while-reading-vtk-files-paraview.html#post503315):
It could be a problem due to the character used for the line ending (http://en.wikipedia.org/wiki/Newline)
In a nutshell:
a)On Windows, line transition is with CR+LF.
b)On Linux, line transition is with LF only.
c)On Mac, some older versions used CR only. Nowadays I guess it should use LF as well.
CR= "Carriage Return" byte
LF= "Line Feed" byte
There might be one or more values that are of type NaN or Inf or some other special computational numeric definition for non-real numbers. They might be readable on Linux, but not on Mac, perhaps of the next possibility. If this is the case,
Location based numeric definitions, aka Locale, might be triggering situations where values are being stored with commas or with a strange scientific notation. For example, if a value "1.0002" is stored as "1,0002" or even perhaps "1.0002ES+000"
I have viewed other forums, and they have generally stated #2 and #3 and the possible solutions -- it has in general worked. However, none of the above seemed to solve my problem.
I noticed that some of the stored solution values in the ASCII files were as small as 10.e-34. I had a feeling that the underflow conditions maybe be triggering problems. I put a check in my code for underflow conditions and rounded them off to 0. This fixed the issue, with the solution being displayed at all times without error messages.

This may not fix the Inf/NaN problems, but if the numbers in the vtk file are too large or too small (i.e. 1e-50, 1e45), this may cause the same error.
One solution in this case is to change the datatype specification. When I had this problem, I specified the datatype as "float", which uses a 32-bit floating point representation (same as "float32"). Changing it to "float64" uses a 64-bit double-precision representation, which is consistent with my C++ code that generated the vtk file that uses doubles. This may eliminate the problem.

If you are using Fortran, this problem also occur when you write to file but not close it in code.
For example:
do i=1,10
write(numb,'(i3)')i
open(unit=1, file='test'//numb//'.vtk')
write(1,*).......
enddo

Related

How is this error possible and what can be done about it? "ValueError: invalid literal for int() with base 10: '1.0'"

I'm using Python 3 with the pandas library and some other data science libraries. After running into a variety of subtle type errors while just trying to compare values across two columns that should contain like integer values in a single pandas DataFrame (although the Python interpreter arbitrarily interprets the types as float, string, or series, seemingly almost at random), I'm now running into this inexplicable / nonsensical seeming error while attempting to cast back to integer, after converting the values to string to strip out blank spaces introduced (presumably by pandas internal processing, because my code tries to keep the type int throughout) much further upstream in the program flow.
ValueError: invalid literal for int() with base 10: '1.0'
The main problem I have with this error message is there should be no reason a type conversion to int should ever blow up on the value '1.0.' Just taking the error message at face value, it makes no sense to me and seems like a deeper problem or bug in pandas.
But ignoring more fundamental problems or bugs in Python or pandas, any help resolving this in a generalizable way that will play nice consistently in every reasonable scenario (behaving more like strongly-typed, type-safe code, basically) would be appreciated.
Here's the bit of code where I'm trying to deal with all the various type conversion and blank value issues I've bumped into at once, because I've gone round and around on this a few times in subtly different scenarios, and every time I thought I'd finally bullet-proofed this bit of code and gotten it working as intended in every case, some new unexpected type conversion issue like this crops up.
df[getVariableLabel(outvar)] = df[getVariableLabel(outvar)].astype(str).str.strip()
df['prediction'] = df['prediction'].astype(str).str.strip()
actual = np.array(df[getVariableLabel(outvar)].fillna(-1).astype(int))
// this is the specific line that throws the error
predicted = np.array(df['prediction'].fillna(-1).astype(int))
For further context on the code above, the "df" object is a pandas dataframe passed in by parameter. "getVariableLabel" is a helper function used to format a dynamic field name. Both columns contain simple "1" and "0" values, except where there may be nAn/blanks (which I'm attempting to fill with dummy values).
It doesn't really have to be a conversion to int for my needs. String values would be fine, too, if it were possible to keep pandas/Python from arbitrarily treating one series as ints, and the other, as floats before the conversion to string, which makes value comparisons between the two sets of values fail.
Here's the bit of the call stack dump where pandas is throwing the error, for further context:
File "C:\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py",
line 874, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas_libs\lib.pyx", line 560, in
pandas._libs.lib.astype_intsafe
Solved it for myself with the following substitution, in case anyone else runs into this. It may also have helped that I updated pandas from 1.0.1 to 1.0.2, since that update did includes some type conversion bug fixes, but more likely it was this workaround (where pd is of course the alias for the pandas library):
df[getVariableLabel(outvar)] = pd.to_numeric(df[getVariableLabel(outvar)])
df['prediction'] = pd.to_numeric(df['prediction'])
The original value error message is still confusing and seems like a bug but this worked in my particular case.

DEFLATE (RFC1951) dynamic huffman "incomplete length"

I've been studying RFC1951 and 'puff.c', and have a question about the issue of "incomplete length".
As near I can tell, defining a "dynamic" Huffman code table that allows for more codes than specified by HLIT+257 will produce an error, at least by puff.c. For example, an error is produced by 'puff.c' if, as a simple debugging test, I were to use a Huffman table of all 9-bit codes to define only 257 lit/lens. Is this outcome purposeful or a bug? And can I assume that any "inflator" based on the 'zlib' library will produce the same error?
I can't find any specification in RFC 1951 that should REQUIRE the use of a sufficiently tight Huffman code. Certainly, I can see that using an "under-subscribed" Huffman table might be inefficient, in terms of compression, but I'm not sure why such a table should be prohibited from use.
My interest isn't simply hypothetical. I really want to use an under-subscribed, literal-only, Huffman code (but NOT the example cited above) to compress some application specific images into PNG files. But I want to make sure it will work with any PNG image viewer.
The RFC specifies that the codes are Huffman codes, which by definition are complete codes. (Complete means that all bit patterns are used.)
zlib will reject incomplete or oversubscribed codes, except in the special case noted in the RFC:
If only one distance code is used, it is encoded using one bit, not
zero bits; in this case there is a single code length of one, with one
unused code.
There the incomplete code 0 for the single symbol, with code 1 unused, is permitted.
(That, by the way, is unnecessary. If there is only one distance symbol, then you don't need any bits to specify it. You know that that distance symbol must be used with any length. If that symbol needs extra bits, then those extra bits immediately follow the length. But, oh well -- for that case Phil Katz put an extraneous zero bit in every match, and now we're stuck with it.)
The fact that the RFC even had to note this special case is another clue that incomplete codes are not accepted otherwise.
There is sort of another exception in deflate, in that the fixed literal/length code is incomplete, with two unused codes at the end.
The bottom line is, no, you will not be able to use an incomplete code in a dynamic header (except the special case) and expect zlib or any compliant deflate decoder to be able to decode it.
As for why this strictness is useful, constraints on dynamic headers permit rapid detection of non-deflate streams or corrupted deflate streams. Similarly, a dynamic header with no end code is not permitted by zlib, so as to avoid the case of a bogus dynamic header permitting any following random bits to be decodable forever, never detecting an error. The unused fixed codes also help in this regard, since eventually they trigger an error in random input.
By the way, if you want to define a fixed, complete Huffman code for your case, it's super simple, and would reduce the size of almost all of your codes by one bit. Just encode eight bits for the symbols 0..253, using that symbol number directly as the code (reversing the bits of course), and nine bits for symbols 254..257, using the codes 508..511 (bits reversed).

When will _ATL_ALLOW_UNSIGNED_CHAR work?

I'm migrating a Visual C++ project which uses ATL/MFC from VS2010 to VS2013. The project compiles with /J ("assume char is unsigned"), and there is too much code that may or may not rely on that fact to easily remove the compiler flag.
Under VS2013, /J causes a compiler error in atldef.h: ATL doesn't support compilation with /J or _CHAR_UNSIGNED flag enabled. This can be suppressed by defining _ATL_ALLOW_UNSIGNED_CHAR. Microsoft mention this in the MSDN documentation for /J, along with the vague statement: "If you use this compiler option with ATL/MFC, an error might be generated. Although you could disable this error by defining _ATL_ALLOW_CHAR_UNSIGNED, this workaround is not supported and may not always work."
Does anyone know under what circumstances it is safe or unsafe to use _ATL_ALLOW_CHAR_UNSIGNED?
Microsoft struggles to keep ancient codebases, like ATL, compatible with changes in the compiler. The principal trouble-maker here is the AtlGetHexValue() function. It had a design mistake:
The numeric value of the input character interpreted as a hexadecimal digit. For example, an input of '0' returns a value of 0 and an input of 'A' returns a value of 10. If the input character is not a hexadecimal digit, this function returns -1.
-1 is the rub, 9 years ago that broke with /J in effect. And it won't actually return -1 today, it now returns CHAR_MAX ((char)255) if you compile with /J. Required since comparing unsigned char to -1 will always be false and the entire if() statement is omitted. This broke ATL itself, it will also break your code in a very nasty way if you use this function, given that this code is on the error path that is unlikely to get tested.
Shooting off the hip, there were 3 basic ways they could have solved this problem. They could have changed the return value type to int, risking breaking everybody. Or they could have noted the special behavior in the MSDN article, making everybody's eyes roll. Or they could have invoked the "time to move on" option. Which is what they picked, it was about time with MSVC++ being the laughing stock of the programming world back then.
That's about all you need to fear from ATL, low odds that you are using this function and easy to find back. Otherwise an excellent hint to look for the kind of trouble you might get from your own code.

Fortran `write (*, '(3G24.16)')` error

I have a Fortran file that must write these complicated numbers, basically I can't change these numbers:
File name: complicatedNumbers.f
implicit none
write (*,'(3G24.16)') 0.4940656458412465-323, 8.651144521298990, 495.6336980600139
end
It's then run with gfortran -o outa complicatedNumbers.f on my Ubuntu, but this error comes up:
Error: Expected expression in WRITE statement at (1)
I'm sure it has something to do with the complicated numbers because there are no errors if I change the three complicated numbers into simple numbers such as 11.11, 22.2, 33.3.
This is actually a stripped-down version of a complex Fortran file that contains many variables and links to other files. So ideally, the 3G24.16 should not be changed.
What does the 3G24.16 mean?
How can I fix it so that I can ultimately print out these numbers with ./outa?
There is nothing syntactically wrong in the snippet you've shown us. However, your use of a file name with the suffix .f makes me think that the compiler is assuming that your code is written in fixed form. That is the usual default behaviour of gfortran. If that is the case it probably truncates that line at about the last , which means that the compiler sees
write (*,'(3G24.16)') 0.4940656458412465-323, 8.651144521298990,
and raises the complaint you have shared with us. Either join us in the 21st Century and switch to free form source files, change .f to .f90 and see what fun ensues, or continue the line correctly with some character in column 6 of the next line.
As to what 3G24.16 means, refer to your favourite Fortran reference material under the heading of data edit descriptors, in particular the g data edit descriptor.
Oh, and if my bandying about of the terms fixed form source and free form source bamboozles you, read about them in your favourite Fortran reference material too.
Three errors in your program :
as you clearly use the Fortran fixed format, instructions are limited to 72 characters (132 in free format)
the number 0.4940656458412465-323 is probably not correctly written. The exponent character is missing. Try 0.4940656458412465D-323 instead. Here Fortran computes the substraction => 0.4940656458412465-323 is replaced by -322.505934354159. Notice that I propose the exponent D (double precision). Writing 0.4940656458412465E-323 is inaccurate because, for a single precision number, the minimum value for the exponent is -127.
other numbers should have an exponent D0 too because, in single precision, the number of significant digits do not exceed 6.
Possible correction, always in fixed format :
implicit none
write (*,'(3G24.16)') 0.4940656458412465D-323,
& 8.651144521298990d0,
& 495.6336980600139d0
end

linux libiconv transcode from ISO8859 or IBM850 to UTF8 error

I don't know what the original code is, so I assume that the original code is IBM850 or ISO8859-1.My process below
IBM850 -> UTF8
if this is OK, I consider the original code is IBM850, if NOK,do next step:
ISO8859-1 -> UTF8
if this is OK, I consider the original code is UTF8.
But there is a problem,
if the original code is ISO8859-1, it will be recognised to IBM850.
if the original code is IBM850, it will be recognised to ISO8859-1.
It seems that there are common ground between IBM850 and ISO8859-1.
Who can help me, thanks.
Yes, only the most trivial kind of autodetection is possible by testing whether conversion fails or succeeds. It's not going to work for input encodings where (almost) any input is valid.
You should know something more about your likely output, to test whether if it makes more sense after translating from IBM850 or from ISO8859-1. That's what enca and libenca do. You can probably start with some simple expectations to check:
Does your source happen to be within the ASCII subset of both encodings? Then you're happy with any conversion (but you have no way to know the original encoding at all).
Does your code use box drawing characters? If it does not, it would be easy to reject some candidates for IBM850.
Does your code use control characters from ISO8859-1? If it does not, it would be easy to reject some candidates for ISO8859-1 if codepoints 0x80-0x9F are used.
Do the fragments of your code which are non-ASCII always represent a text in a natural language? Then you can use frequency tables for characters and their pairs, selecting the source encoding which makes the result closer to your natural language(s) on these criteria. (If both variants are almost equally acceptable, it's probably better to give an error message and leave the final decision to humans).

Resources