Finding out what Infozip's unzip is complaining about

Finding out what Infozip's unzip is complaining about - python-3.x

Background
I have an application that generates files that should be in Zip format, PKZIP version 6.3.3 to be exact. (For the curious: SIARD 2.0)
Sample File
I have uploaded a sample file to Google Drive:
sample.siard
Problem
When I point Infozip's unzip under Linux at the file, it complains:
testing: content/ OK
testing: content/schema0/ OK
testing: content/schema0/table0/ OK
testing: content/schema0/table0/table0.xml
error: invalid compressed data to inflate
...
The same error is given for all actual files. (Not directories)
Verbose file listing (unzip -v file) gives:
...
6064 Defl:F 1868 69% 2018-01-30 10:41 055f9f61 content/schema0/table0/table0.xml
...
(no errors here)
Infozip version
I have a reasonable new version of Infozip. unzip -v gives
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with gcc 4.9.2 for Unix (Linux ELF) on Jan 28 2017.
UnZip special compilation options:
ACORN_FTYPE_NFS
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
SYMLINKS (symbolic links supported, if RTL and file system permit)
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.6, 6-Sept-2010)
VMS_TEXT_CONV
WILD_STOP_AT_DIR
[decryption, version 2.11 of 05 Jan 2007]
The only thing listed as NOT supported is unreducing, but that shouldn't be relevant.
When I try Python's zipfile module, it both tests and extracts with no problem. I have also heard that PKZIP itself have no problem with these files, but I personally don't have that installed.
So, I have no problem using these files myself, but they are intended for long time archiving and I really need to know:
The question
Is there a way for me to find out if there is a bug in the generation of these files or is there a bug in unzip's handling of them?
ZIP64?
I have searched the web and found a lot of people having problem with large files and Zip64 format. However, my files are not large. (up to 20Mb uncompressed)
Also, this version of unzip should support Zip64. (See version info above)
Tools
My preferred tools are Python, hex editors and the bash command line.

On face value, the message "invalid compressed data to inflate", suggests your zip file is corrupt. Are you certain that the exact same file can be read successfully with PKZIP, but cannot with Infozip?
After a (very) quick glance at the SIARD standard, it looks like it just uses bog-standard zip files with deflate/store compression. That means that the zip file won't have used a feature that only PKZIP can handle.
One possibility is that the archive has been created with Zip64 extensions, but your version of Infozip doesn't support it.
If you run unzip -v it should print a line containing the string ZIP64_SUPPORT if it does.
For reference, this is what I get
$ unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with gcc 4.8.3 20140911 (Red Hat 4.8.3-7) for Unix (Linux ELF) on Feb 25 2015.
UnZip special compilation options:
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
SYMLINKS (symbolic links supported, if RTL and file system permit)
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
MBCS-support (multibyte character support, MB_CUR_MAX = 6)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.6, 6-Sept-2010)
VMS_TEXT_CONV
[decryption, version 2.11 of 05 Jan 2007]
UnZip and ZipInfo environment options:
UNZIP: [none]
UNZIPOPT: [none]
ZIPINFO: [none]
ZIPINFOOPT: [none]
To check if you zip file uses Zip64, check the final 6 bytes of the zip file. If the first 4 are all 0xFF (this is the Offset to Central Dir field), it is very likely you have a Zip64 archive. Note that this will not work if your zip file has a comment.
For reference, below is a dump from a zip file that uses Zip64. Note the value of the Offset to Central Dir field is FFFFFFFF
10000020C 000000004 50 4B 05 06 END CENTRAL HEADER 06054B50
100000210 000000002 00 00 Number of this disk 0000
100000212 000000002 00 00 Central Dir Disk no 0000
100000214 000000002 04 00 Entries in this disk 0004
100000216 000000002 04 00 Total Entries 0004
100000218 000000004 DA 00 00 00 Size of Central Dir 000000DA
10000021C 000000004 FF FF FF FF Offset to Central Dir FFFFFFFF
100000220 000000002 00 00 Comment Length 0000

Self-answer.
My subject line was
Finding out what Infozip's unzip is complaining about
The answer to that turned out to require downloading the source code to unzip, add a lot of debug messages, and read them.
In this particular case, unzip was complaining about the fact that the zip file used post-file data descriptors without setting the header flag to indicate it. (General flag 3)
Normally one should set this flag and set the header CRC/length fields to all zeros.
This file did not have the flag set but still the fields were set to zero. unzip then thinks "Oh, the length really must be zero!"
Then the actual non-zero file appears and unzip gets all grumpy. The post-file data descriptor did not help.
In the body of the question I asked:
Is there a way for me to find out if there is a bug in the generation of these files or is there a bug in unzip's handling of them?
I personally think these files are broken. I haven't talked to the people responsible for generating them, but think I have a good case that they have done bad.
On a more philosophical note:
There are two schools on how unzippers should work.
One is the "best effort" school, which says that the program should do whatever it can to recover the files inside regardless of how wrong the formatting is. (There are obviously limits to this)
The other school is the "Not my problem" school of thought that says that if the zip file is in a wrong format, then the unzipper shouldn't touch it. Let the makers of the zipfile fix their problem instead.
Pkware itself is firmly in the first school of thought, while Infozip is in the second.

Related

Why terminfo breaks my build on lack of versioned symbols in libtinfo.so?

terminfo-0.4.1.4 breaks a build of hls not finding a specific version of some symbols from libtinfo.so, one of them being tigetnum.
There is also this warning, which I see when building other programs:
/lib64/libtinfo.so.5: no version information available (required by terminfo-0.4.1.4/libHSterminfo-0.4.1.4-ghc8.10.7.so)
The following is an error that breaks building of haskell-language-server:
terminfo-0.4.1.4/libHSterminfo-0.4.1.4-ghc8.10.7.so: error: undefined reference to 'tigetnum', version 'NCURSES_TINFO_5.0.19991023'
This is using versioned symbols feature of gcc stack, where you can have symbols with different versions in the same ELF file. However, my copy of libtinfo.so does not have versioned symbols and I don't see anything in terminfo code on github that would indicate it is requiring versioned symbols: that said I am not sure what I should be looking for, just grepped the srcs for the version.
objdump -x invoked on libHSterminfo-0.4.1.4-ghc8.10.7.so:
Version definitions:
1 0x01 0x057dc17f libHSterminfo-0.4.1.4-ghc8.10.7.so
Version References:
required from libtinfo.so.5:
0x02a6c513 0x00 02 NCURSES_TINFO_5.0.19991023
required from libc.so.6:
0x09691a75 0x00 03 GLIBC_2.2.5

Unfortunately, I believe this may be a Stack issue where it downloads the wrong versions of GHC on CentOS/RHEL7 systems (i.e., downloading the Debian 9 version instead of the CentOS 7 version).
In order to work around this problem, you will first need to manually clean out your .stack directory to remove the offending binary packages. The bad GHC version (tarfile and directory) will need to be removed from .stack/programs/x86_64-linux. I also found I needed to clean out .stack/snapshots to avoid additional linkage errors, and I can't point you to exactly which files under there were the offenders. It's honestly safest to wipe your entire .stack directory and start from scratch.
Anyway, I was able to build haskell-language-server in a CentOS 7 container as follows:
Ensure ncurses-devel and zlib-devel packages are installed.
Remove the entire .stack directory.
Run stack update to recreate a skeleton .stack directory.
Edit .stack/config.yaml to add the following clause:
setup-info:
ghc:
linux64:
8.10.7:
url: "https://downloads.haskell.org/~ghc/8.10.7/ghc-8.10.7-x86_64-centos7-linux.tar.xz"
Obviously, this is per-version configuration, so if you build with other GHC versions, you'll need to add additional subclauses.
Download and check out tag 1.7.0.0 of the HLS source:
git clone -b 1.7.0.0 https://github.com/haskell/haskell-language-server
Build.
$ cd haskell-language-server
$ stack build
[ . . . lots of output . . .]
haskell-language-server > Registering library for haskell-language-server-1.7.0.0..
Completed 247 action(s).
Again, it's crucial that no stale binaries from .stack or the project-local .stack-work sneak into your build. So removing .stack entirely and also building from a fresh Git clone of HLS will be safest.
Until this Stack bug is fixed, you'll have to remain vigilant in adding additional clauses to config.yaml to avoid accidentally downloading a Debian build of some older or newer GHC version and wrecking your .stack cache.

Multiple versions of same the same library in LD_LIBRARY_PATH [duplicate]

Windows provides the resource file for version information for an application and DLL. The resource file includes information like version, copyright and manufacturer.
We have a shared library and would like to add version information.
How can we do it on Linux with a shared library?

The short version is that you do this via the soname of the library. Read chapter 3 at http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html as well as chapter 3.3 ABI Versioning at http://www.akkadia.org/drepper/dsohowto.pdf

The best way to handle this is using libtool, which does the versioning for you.
Essentially, version information is not (or not primarily, don't know from my head) encoded in the library itself, but rather in its filename. Version numbers are normally given in three-dot format, with the major number increasing for each break in downward ABI compatibility, the middle for breaks in upward ABI compatibility, and the minor for patches that did not change the ABI.
Like qdot noted, symlinks in the lib directory provide the essential versioning. There is a symlink without a version number (libfoo.so) for the currently installed development headers, a symlink with a major number for each installed major version (libfoo.so.1) and a real file with the full version number. Normally, programs are linked to use libfoo.so.1 at runtime so that multiple major versions may coexist.

Linux uses the following strategy - you (the system maintainer) provide symlinks from a 'specific' shared library file, like this:
lrwxrwxrwx 1 root root 16 2011-09-22 14:36 libieee1284.so -> libieee1284.so.3
lrwxrwxrwx 1 root root 20 2011-09-22 14:36 libieee1284.so.3 -> libieee1284.so.3.2.2
-rw-r--r-- 1 root root 46576 2011-07-27 13:08 libieee1284.so.3.2.2
This way, developers can link either against -lieee1284 (any version ABI), or libieee1284.so.3 or even to the specific release and patch version (3.2.2)

Source the Intel 2017 compiler on linux

I'm following this video tutorial for using the Intel compiler, and the first thing to do is to source the compiler. In the video, this is the command:
source /opt/intel/composer_xe_2015.0.019/bin/iccvars.sh intel64
However, I'm using the 2017 version and the dir tree is different. I found the same file in:
source /opt/intel/compilers_and_libraries_2017.1.132/linux/bin/iccvars.sh intel64
Is this the analogous command, or do I need to do something else?

The linux/ subdirectory was added with ICC 17; 16 and below didn't have it.
Yes, they're basically the same.

How to clear the link warning 4099

In my vc2005 solution , when build it ,some warning will displayed such as "warning LNK4099: PDB 'libbmt.pdb' was not found...", But I don't know to to disable it.

It cannot be disabled, as it is on Microsoft's list of unignorable warnings.
If you have the source for the libraries you are using, you can rebuild them in Debug mode and copy the generated *.pdb files to the same directory as the libs you are linking.
If you do not have the source, there is a workaround, but it involves hex-editing the linker: https://connect.microsoft.com/VisualStudio/feedback/details/176188/can-not-disable-warning-lnk4099
Essentially, hex edit your link.exe (after backing it up!) to zap the
occurrence of 4099 in the list of non-ignorable warnings. I did it and
the hundred or so 4099 warnings disappeared! [L]ook
for the hex bytes 03 10 00 00 (which is 4099 as a 32-bit little-endian
hex value). Change it to (say) FF FF 00 00, save the file and you're
done.

I don't know about VS2005 but in newer versions you can ignore specific link warnings by adding /ignore:4099

Determine target ISA extensions of binary file in Linux (library or executable)

We have an issue related to a Java application running under a (rather old) FC3 on an Advantech POS board with a Via C3 processor. The java application has several compiled shared libs that are accessed via JNI.
Via C3 processor is supposed to be i686 compatible. Some time ago after installing Ubuntu 6.10 on a MiniItx board with the same processor, I found out that the previous statement is not 100% true. The Ubuntu kernel hanged on startup due to the lack of some specific and optional instructions of the i686 set in the C3 processor. These instructions missing in C3 implementation of i686 set are used by default by GCC compiler when using i686 optimizations. The solution, in this case, was to go with an i386 compiled version of Ubuntu distribution.
The base problem with the Java application is that the FC3 distribution was installed on the HD by cloning from an image of the HD of another PC, this time an Intel P4. Afterwards, the distribution needed some hacking to have it running such as replacing some packages (such as the kernel one) with the i386 compiled version.
The problem is that after working for a while the system completely hangs without a trace. I am afraid that some i686 code is left somewhere in the system and could be executed randomly at any time (for example after recovering from suspend mode or something like that).
My question is:
Is there any tool or way to find out at what specific architecture extensions a binary file (executable or library) requires? file does not give enough information.

The unix.linux file command is great for this. It can generally detect the target architecture and operating system for a given binary (and has been maintained on and off since 1973. wow!)
Of course, if you're not running under unix/linux - you're a bit stuck. I'm currently trying to find a java based port that I can call at runtime.. but no such luck.
The unix file command gives information like this:
hex: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.4.17, not stripped
More detailed information about the details of the architecture are hinted at with the (unix) objdump -f <fileName> command which returns:
architecture: arm, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000876c
This executable was compiled by a gcc cross compiler (compiled on an i86 machine for the ARM processor as a target)

I decide to add one more solution for any, who got here: personally in my case the information provided by the file and objdump wasn't enough, and the grep isn't much of a help -- I resolve my case through the readelf -a -W.
Note, that this gives you pretty much info. The arch related information resides in the very beginning and the very end. Here's an example:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x83f8
Start of program headers: 52 (bytes into file)
Start of section headers: 2388 (bytes into file)
Flags: 0x5000202, has entry point, Version5 EABI, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 8
Size of section headers: 40 (bytes)
Number of section headers: 31
Section header string table index: 28
...
Displaying notes found at file offset 0x00000148 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.16
Attribute Section: aeabi
File Attributes
Tag_CPU_name: "7-A"
Tag_CPU_arch: v7
Tag_CPU_arch_profile: Application
Tag_ARM_ISA_use: Yes
Tag_THUMB_ISA_use: Thumb-2
Tag_FP_arch: VFPv3
Tag_Advanced_SIMD_arch: NEONv1
Tag_ABI_PCS_wchar_t: 4
Tag_ABI_FP_rounding: Needed
Tag_ABI_FP_denormal: Needed
Tag_ABI_FP_exceptions: Needed
Tag_ABI_FP_number_model: IEEE 754
Tag_ABI_align_needed: 8-byte
Tag_ABI_align_preserved: 8-byte, except leaf SP
Tag_ABI_enum_size: int
Tag_ABI_HardFP_use: SP and DP
Tag_CPU_unaligned_access: v6

I think you need a tool that checks every instruction, to determine exactly which set it belongs to. Is there even an offical name for the specific set of instructions implemented by the C3 processor? If not, it's even hairier.
A quick'n'dirty variant might be to do a raw search in the file, if you can determine the bit pattern of the disallowed instructions. Just test for them directly, could be done by a simple objdump | grep chain, for instance.

To answer the ambiguity of whether a Via C3 is a i686 class processor: It's not, it's an i586 class processor.
Cyrix never produced a true 686 class processor, despite their marketing claims with the 6x86MX and MII parts. Among other missing instructions, two important ones they didn't have were CMPXCHG8b and CPUID, which were required to run Windows XP and beyond.
National Semiconductor, AMD and VIA have all produced CPU designs based on the Cyrix 5x86/6x86 core (NxP MediaGX, AMD Geode, VIA C3/C7, VIA Corefusion, etc.) which have resulted in oddball designs where you have a 586 class processor with SSE1/2/3 instruction sets.
My recommendation if you come across any of the CPUs listed above and it's not for a vintage computer project (ie. Windows 98SE and prior) then run screaming away from it. You'll be stuck on slow i386/486 Linux or have to recompile all of your software with Cyrix specific optimizations.

Expanding upon #Hi-Angel's answer I found an easy way to check the bit width of a static library:
readelf -a -W libsomefile.a | grep Class: | sort | uniq
Where libsomefile.a is my static library. Should work for other ELF files as well.

Quickest thing to find architecture would be to execute:
objdump -f testFile | grep architecture
This works even for binary.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string