Invalid compressed data--format violated? - linux

I want to extract data from xxx.tar.gz file by using tar -zxvf command, but something wrong occurs to me, here's the detail:
suse11-configserver:/home/webapp/wiki # tar -zxvf dokuwiki.20151010.tar.gz
./dokuwiki/
./dokuwiki/._.htaccess.dist
./dokuwiki/.htaccess.dist
./dokuwiki/bin/
./dokuwiki/conf/
./dokuwiki/._COPYING
./dokuwiki/COPYING
tar: Jump to the next head
gzip: stdin: invalid compressed data--format violated
tar: Child returned status 1
tar: Error is not recoverable: exiting now
But this command tar -zxvf dokuwiki.20151010.tar.gz goes well in the MacOS x System, I can not figure out the reason.

Your command is correct. But it seems the file is corrupted.
It's easy to tell, when some files are correctly extracted (for example ./dokuwiki/.htaccess.dist), but not the rest.
Recreate the dokuwiki.20151010.tar.gz file, and make sure it doesn't report errors while doing so.
If you downloaded the file from somewhere, verify the checksum, or at least the file size.
The bottomline is, either the file was incorrectly created or downloaded.
The command you have should work fine with a .tar.gz file.

Alternative Location of Gzip's fixgz Utility
In case you can no longer find fixgz on gzip.org's website, here is a link to a version available on archive.org: https://web.archive.org/web/20180624175352/http://www.gzip.org/fixgz.zip.
Source Code for fixgz Utility
Also, in case that disappears as well, below is the source code for the fixgz utility:
/* fixgz attempts to fix a binary file transferred in ascii mode by
* removing each extra CR when it followed by LF.
* usage: fixgz bad.gz fixed.gz
* Copyright 1998 Jean-loup Gailly <jloup#gzip.org>
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages
* arising from the use of this software.
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely.
*/
#include <stdio.h>
int main(argc, argv)
int argc;
char **argv;
{
int c1, c2; /* input bytes */
FILE *in; /* corrupted input file */
FILE *out; /* fixed output file */
if (argc <= 2) {
fprintf(stderr, "usage: fixgz bad.gz fixed.gz\n");
exit(1);
}
in = fopen(argv[1], "rb");
if (in == NULL) {
fprintf(stderr, "fixgz: cannot open %s\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "wb");
if (in == NULL) {
fprintf(stderr, "fixgz: cannot create %s\n", argv[2]);
exit(1);
}
c1 = fgetc(in);
while ((c2 = fgetc(in)) != EOF) {
if (c1 != '\r' || c2 != '\n') {
fputc(c1, out);
}
c1 = c2;
}
if (c1 != EOF) {
fputc(c1, out);
}
exit(0);
return 0; /* avoid warning */
}

Gzip has a prospective fix for this error in their FAQ. The provided utility didn't help in my case, but it's possible it would fix your archive. According to gzip:
If you have transferred a file in ASCII mode and you no longer have access to the original, you can try the program fixgz to remove the extra CR (carriage return) bytes inserted by the transfer. A Windows 9x/NT/2000/ME/XP binary is here. But there is absolutely no guarantee that this will actually fix your file. Conclusion: never transfer binary files in ASCII mode.

Related

How to access input device driver from userspace

I'm currently developing an input subsystem driver for touchscreen. What I don't know is how to access the device from userspace, e.g. how to open a file that should be created in filesystem.
What I've done so far is this:
After I insmod the driver, I get the following message in dmesg:
input: driver_name as /devices/platform/soc/3f804000.i2c/i2c-1/1-0038/input/input0
Now when I go at this location, I find input0, which is a directory. In this directory, I can find files such as name, properties, uevent, but none of the files here contains touch data.
My question here is, where does input subsystem puts touch data after I call
input_report_abs(data.input, ABS_X, coord_x);
input_report_abs(data.input, ABS_Y, coord_y);
input_sync(data.input);
SOLVED:
Once you do insmod, new file is created under /dev/input, in my case it was event0 file. In order to test the functionality, you can do evtest input0. This file can be used from a userspace program in the following way:
struct input_event ev;
FILE* fd = open("/dev/input/event0", O_RDWR);
while(1)
{
int count = read(fd, &ev, sizeof(struct input_event);
for(int i = 0; i < (int)count / sizeof(struct input_event); i++)
{
if(EV_KEY == ev.type) // printf ...
if(EV_ABS == ev.type) // printf ...
}
}
Hope this will help somebody because I feel like this isn't covered enough in Documentation.

Use select like function on regular disk file

I have a computer wich logs some sensors data into 8 different files.
I developed a software that allows you to copy this data to another computer when you connect the two machines using an rj45 cable.
After retrieving data at my computer, I need to send it line by line of each file using a pseudo serial (using socat).
I created a program which uses nested for loops in order to check if data is ready in all the 8 files, and then extract a line and send it to puttySX.
Problem is CPU usage. A way to reduce it, is using blocking function to know if data is ready be to read or not but is there any function like select on sockets or serial ports but for such files?
If no, what should I do? Thanks
You can take a look at inotify which lets you monitor file system events.
Here is a sample code to get you started (this is not production code):
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>
#define BUF_LEN (sizeof(struct inotify_event) * 1)
int main(int argc, char *argv[])
{
char *filepath;
int fd, wd;
struct inotify_event *event;
char buf[BUF_LEN];
ssize_t ret;
if (argc != 2)
{
fprintf(stderr, "Usage: ./%s <filepath>\n", argv[0]);
return (EXIT_FAILURE);
}
filepath = argv[1];
/* Initialization */
fd = inotify_init();
if (fd == -1)
{
perror("inotify_init()");
return (EXIT_FAILURE);
}
/* Specify which file to monitor */
wd = inotify_add_watch(fd, filepath, IN_MODIFY);
if (wd == -1)
{
perror("inotify_add_watch");
close(fd);
return (EXIT_FAILURE);
}
/* Wait for that file to be modified, */
/* and print a notification each time it does */
for (;;)
{
ret = read(fd, buf, BUF_LEN);
if (ret < 1)
{
perror("read()");
close(fd);
return (EXIT_FAILURE);
}
event = (struct inotify_event *)buf;
if (event->mask & IN_MODIFY)
printf("File modified!\n");
}
close(fd);
return(EXIT_SUCCESS);
}
So,
I post to answer my question. Thanks to #yoones I found some trick to do this.
When a log file is created, I set a bool on true in a ini file looking like this
[CreatedFiles]
cli1=false
cli2=false
cli3=false
cli4=false
cli5=false
cli6=false
cli7=false
cli8=false
Another program uses inotify to detect creation and modification in the corresponding files. Once there's some change it reads the ini file, process the data and when it finishes to read the data, it deletes the log file and write false in the ini file in the corresponding line.
Since I have to process several log files in the same time, each time I read a line, I verify my ini file to see if I have to start to process another log file as well so I can start multiple process in the same time.
I did a infinite while loop so when all processes are done, the program is back to a select call, waiting for some change and not consuming all CPU's resources.
I'm sorry if I'm not so clear, English is not my native language.
Thanks all for you reply and comments.

rapidly writing to a temp file and renaming it... is that a good idea?

I have a daemon / service on a linux box (Debian 6) that reads from a hardware device, does some calculations and then updates a file with some relevant values. This happens about 5 times per second.
Any process that is reading the file always sees nicely structured and recent values in the file.
Here is the relevant daemon code:
while(1)
{
int rename_ret;
char tmpname[] = "/var/something/readout.tmp";
char txtname[] = "/var/something/readout.txt";
FILE *f = fopen(tmpname, "w");
if (f == NULL)
{
printf("Error opening file!\n");
exit(1);
}
# ... reading from hardware, some calculation ...
# then print to the tmp file:
fprintf(f, "%12.4f\n", CntVal1);
fprintf(f, "%12.4f\n", CntVal2);
fclose(f);
rename_ret = rename(tmpname, txtname);
if(rename_ret != 0)
{
printf("Error: unable to rename the file");
exit(1);
}
nanosleep((struct timespec[]){{0, 200000000}}, NULL); // 0.2 sec
}
This works fine, but it feels kind of... wronggg?
Note that this is not the device driver, but instead it reads from the driver and processes the values for other processes to read.
So my question is:
is this a bad idea?
what's the proper way to go about it? I like the idea to be able to "just read a file" and get fairly recent values...

Tools to reduce risk regarding password security and HDD slack space

Down at the bottom of this essay is a comment about a spooky way to beat passwords. Scan the entire HDD of a user including dead space, swap space etc, and just try everything that looks like it might be a password.
The question: part 1, are there any tools around (A live CD for instance) that will scan an unmounted file system and zero everything that can be? (Note I'm not trying to find passwords)
This would include:
Slack space that is not part of any file
Unused parts of the last block used by a file
Swap space
Hibernation files
Dead space inside of some types of binary files (like .DOC)
The tool (aside from the last case) would not modify anything that can be detected via the file system API. I'm not looking for a block device find/replace but rather something that just scrubs everything that isn't part of a file.
part 2, How practical would such a program be? How hard would it be to write? How common is it for file formats to contain uninitialized data?
One (risky and costly) way to do this would be to use a file system aware backup tool (one that only copies the actual data) to back up the whole disk, wipe it clean and then restore it.
I don't understand your first question (do you want to modify the file system? Why? Isn't this dead space exactly where you want to look?)
Anyway, here's an example of such a tool:
#include <stdio.h>
#include <alloca.h>
#include <string.h>
#include <ctype.h>
/* Number of bytes we read at once, >2*maxlen */
#define BUFSIZE (1024*1024)
/* Replace this with a function that tests the passwort consisting of the first len bytes of pw */
int testPassword(const char* pw, int len) {
/*char* buf = alloca(len+1);
memcpy(buf, pw,len);
buf[len] = '\0';
printf("Testing %s\n", buf);*/
int rightLen = strlen("secret");
return len == rightLen && memcmp(pw, "secret", len) == 0;
}
int main(int argc, char* argv[]) {
int minlen = 5; /* We know the password is at least 5 characters long */
int maxlen = 7; /* ... and at most 7. Modify to find longer ones */
int avlen = 0; /* available length - The number of bytes we already tested and think could belong to a password */
int i;
char* curstart;
char* curp;
FILE* f;
size_t bytes_read;
char* buf = alloca(BUFSIZE+maxlen);
if (argc != 2) {
printf ("Usage: %s disk-file\n", argv[0]);
return 1;
}
f = fopen(argv[1], "rb");
if (f == NULL) {
printf("Couldn't open %s\n", argv[1]);
return 2;
}
for(;;) {
/* Copy the rest of the buffer to the front */
memcpy(buf, buf+BUFSIZE, maxlen);
bytes_read = fread(buf+maxlen, 1, BUFSIZE, f);
if (bytes_read == 0) {
/* Read the whole file */
break;
}
for (curstart = buf;curstart < buf+bytes_read;) {
for (curp = curstart+avlen;curp < curstart + maxlen;curp++) {
/* Let's assume the password just contains letters and digits. Use isprint() otherwise. */
if (!isalnum(*curp)) {
curstart = curp + 1;
break;
}
}
avlen = curp - curstart;
if (avlen < minlen) {
/* Nothing to test here, move along */
curstart = curp+1;
avlen = 0;
continue;
}
for (i = minlen;i <= avlen;i++) {
if (testPassword(curstart, i)) {
char* found = alloca(i+1);
memcpy(found, curstart, i);
found[i] = '\0';
printf("Found password: %s\n", found);
}
}
avlen--;
curstart++;
}
}
fclose(f);
return 0;
}
Installation:
Start a Linux Live CD
Copy the program to the file hddpass.c in your home directory
Open a terminal and type the following
su || sudo -s # Makes you root so that you can access the HDD
apt-get install -y gcc # Install gcc
This works only on Debian/Ubuntu et al, check your system documentation for others
gcc -o hddpass hddpass.c # Compile.
./hddpass /dev/YOURDISK # The disk is usually sda, hda on older systems
Look at the output
Test (copy to console, as root):
gcc -o hddpass hddpass.c
</dev/zero head -c 10000000 >testdisk # Create an empty 10MB file
mkfs.ext2 -F testdisk # Create a file system
rm -rf mountpoint; mkdir -p mountpoint
mount -o loop testdisk mountpoint # needs root rights
</dev/urandom head -c 5000000 >mountpoint/f # Write stuff to the disk
echo asddsasecretads >> mountpoint/f # Write password in our pagefile
# On some file systems, you could even remove the file.
umount testdisk
./hdpass testdisk # prints secret
Test it yourself on an Ubuntu Live CD:
# Start a console and type:
wget http://phihag.de/2009/so/hddpass-testscript.sh
sh hddpass-testscript.sh
Therefore, it's relatively easy. As I found out myself, ext2 (the file system I used) overwrites deleted files. However, I'm pretty sure some file systems don't. Same goes for the pagefile.
How common is it for file formats to contain uninitialized data?
Less and less common, I would've thought. The classic "offender" is older versions of MS office applications that (essentially) did a memory dump to disk as its "quicksave" format. No serialisation, no selection of what to dump and a memory allocator that doesn't zero newly allocated memory pages. That lead to not only juicy things from previous versions of the document (so the user could use undo), but also juicy snippets from other applications.
How hard would it be to write?
Something that clears out unallocated disk blocks shouldn't be that hard. It'd need to run either off-line or as a kernel module, so as to not interfer with normal file-system operations, but most file systems have an "allocated"/"not allocated" structure that is fairly straight-forward to parse. Swap is harder, but as long as you're OK with having it cleared on boot (or shutdown), it's not too tricky. Clearing out the tail block is trickier, definitely not something I'd want to try to do on-line, but it shouldn't be TOO hard to make it work for off-line cleaning.
How practical would such a program be?
Depends on your threat model, really. I'd say that on one end, it'd not give you much at all, but on the other end, it's a definite help to keep information out of the wrong hands. But I can't give a hard and fast answer,
Well, if I was going to code it for a boot CD, I'd do something like this:
File is 101 bytes but takes up a 4096-byte cluster.
Copy the file "A" to "B" which has nulls added to the end.
Delete "A" and overwrite it's (now unused) cluster.
Create "A" again and use the contents of "B" without the tail (remember the length).
Delete "B" and overwrite it.
Not very efficient, and would need a tweak to make sure you don't try to copy the first (and therefor full) clusters in a file. Otherwise, you'll run into slowness and failure if there's not enough free space.
There's tools that do this efficiently that are open source?

How to get the size of a gunzipped file in vim

When viewing (or editing) a .gz file, vim knows to locate gunzip and display the file properly.
In such cases, getfsize(expand("%")) would be the size of the gzipped file.
Is there a way to get the size of the expanded file?
[EDIT]
Another way to solve this might be getting the size of current buffer, but there seems to be no such function in vim. Am I missing something?
There's no easy way to get the uncompressed size of a gzipped file, short of uncompressing it and using the getfsize() function. That might not be what you want. I took at a look at RFC 1952 - GZIP File Format Specification, and the only thing that might be useful is the ISIZE field, which contains "...the size of the original (uncompressed) input data modulo 2^32".
EDIT:
I don't know if this helps, but here's some proof-of-concept C code I threw together that retrieves the value of the ISIZE field in a gzip'd file. It works for me using Linux and gcc, but your mileage may vary. If you compile the code, and then pass in a gzip'd filename as a parameter, it will tell you the uncompressed size of the original file.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
int main(int argc, char *argv[])
{
FILE *fp = NULL;
int i=0;
if ( argc != 2 ) {
fprintf(stderr, "Must specify file to process.\n" );
return -1;
}
// Open the file for reading
if (( fp = fopen( argv[1], "r" )) == NULL ) {
fprintf( stderr, "Unable to open %s for reading: %s\n", argv[1], strerror(errno));
return -1;
}
// Look at the first two bytes and make sure it's a gzip file
int c1 = fgetc(fp);
int c2 = fgetc(fp);
if ( c1 != 0x1f || c2 != 0x8b ) {
fprintf( stderr, "File is not a gzipped file.\n" );
return -1;
}
// Seek to four bytes from the end of the file
fseek(fp, -4L, SEEK_END);
// Array containing the last four bytes
unsigned char read[4];
for (i=0; i<4; ++i ) {
int charRead = 0;
if ((charRead = fgetc(fp)) == EOF ) {
// This shouldn't happen
fprintf( stderr, "Read end-of-file" );
exit(1);
}
else
read[i] = (unsigned char)charRead;
}
// Copy the last four bytes into an int. This could also be done
// using a union.
int intval = 0;
memcpy( &intval, &read, 4 );
printf( "The uncompressed filesize was %d bytes (0x%02x hex)\n", intval, intval );
fclose(fp);
return 0;
}
This appears to work for getting the byte count of a buffer
(line2byte(line("$")+1)-1)
If you're on Unix/linux, try
:%!wc -c
That's in bytes. (It works on windows, if you have e.g. cygwin installed.) Then hit u to get your content back.
HTH
From within vim editor, try this:
<Esc>:!wc -c my_zip_file.gz
That will display you the number of bytes the file is having.

Resources