I want to extract data from xxx.tar.gz file by using tar -zxvf command, but something wrong occurs to me, here's the detail:
suse11-configserver:/home/webapp/wiki # tar -zxvf dokuwiki.20151010.tar.gz
./dokuwiki/
./dokuwiki/._.htaccess.dist
./dokuwiki/.htaccess.dist
./dokuwiki/bin/
./dokuwiki/conf/
./dokuwiki/._COPYING
./dokuwiki/COPYING
tar: Jump to the next head
gzip: stdin: invalid compressed data--format violated
tar: Child returned status 1
tar: Error is not recoverable: exiting now
But this command tar -zxvf dokuwiki.20151010.tar.gz goes well in the MacOS x System, I can not figure out the reason.
Your command is correct. But it seems the file is corrupted.
It's easy to tell, when some files are correctly extracted (for example ./dokuwiki/.htaccess.dist), but not the rest.
Recreate the dokuwiki.20151010.tar.gz file, and make sure it doesn't report errors while doing so.
If you downloaded the file from somewhere, verify the checksum, or at least the file size.
The bottomline is, either the file was incorrectly created or downloaded.
The command you have should work fine with a .tar.gz file.
Alternative Location of Gzip's fixgz Utility
In case you can no longer find fixgz on gzip.org's website, here is a link to a version available on archive.org: https://web.archive.org/web/20180624175352/http://www.gzip.org/fixgz.zip.
Source Code for fixgz Utility
Also, in case that disappears as well, below is the source code for the fixgz utility:
/* fixgz attempts to fix a binary file transferred in ascii mode by
* removing each extra CR when it followed by LF.
* usage: fixgz bad.gz fixed.gz
* Copyright 1998 Jean-loup Gailly <jloup#gzip.org>
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages
* arising from the use of this software.
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely.
*/
#include <stdio.h>
int main(argc, argv)
int argc;
char **argv;
{
int c1, c2; /* input bytes */
FILE *in; /* corrupted input file */
FILE *out; /* fixed output file */
if (argc <= 2) {
fprintf(stderr, "usage: fixgz bad.gz fixed.gz\n");
exit(1);
}
in = fopen(argv[1], "rb");
if (in == NULL) {
fprintf(stderr, "fixgz: cannot open %s\n", argv[1]);
exit(1);
}
out = fopen(argv[2], "wb");
if (in == NULL) {
fprintf(stderr, "fixgz: cannot create %s\n", argv[2]);
exit(1);
}
c1 = fgetc(in);
while ((c2 = fgetc(in)) != EOF) {
if (c1 != '\r' || c2 != '\n') {
fputc(c1, out);
}
c1 = c2;
}
if (c1 != EOF) {
fputc(c1, out);
}
exit(0);
return 0; /* avoid warning */
}
Gzip has a prospective fix for this error in their FAQ. The provided utility didn't help in my case, but it's possible it would fix your archive. According to gzip:
If you have transferred a file in ASCII mode and you no longer have access to the original, you can try the program fixgz to remove the extra CR (carriage return) bytes inserted by the transfer. A Windows 9x/NT/2000/ME/XP binary is here. But there is absolutely no guarantee that this will actually fix your file. Conclusion: never transfer binary files in ASCII mode.
I have this command that writes over 100GB of data to a file.
zfs send snap1 > file
Something appears to go wrong several hours into the process. E.g., if I run the job twice, the output is slightly different. If I try to process the file with
zfs receive snap2 < file
an error is reported after several hours.
For debugging purposes, I'm guessing that there's some low probability failure in the shell redirection. Has anyone else seen problems with redirecting massive amounts of data? Any suggestions about where to proceed?
Debugging this is tedious because small examples work, and running the large case takes over 3 hours each time.
Earlier I had tried pipes:
zfs send snap1| zfs receive snap2
However this always failed with much smaller examples, for which
zfs send snap1 > file; zfs receive snap2 < file
worked. (I posted a question about that, but got no useful responses.) This is another reason that I suspect the shell.
Thanks.
The probability that the failure is in the shell (or OS) is negligible compared to a bug in zfs or a problem in how you are using it.
It just takes some minutes to test your hypothesis: compile this stupid program:
#include<unistd.h>
#include<string.h>
#define BUF 1<<20
#define INPUT 56
int main(int argc, char* argv[]) {
char buf[BUF], rbuf[BUF], *a, *b;
int len, i;
memset(buf, INPUT, sizeof(buf));
if (argc == 1)
{
while ((len = read(0, rbuf, sizeof(rbuf))) > 0)
{
a = buf; b = rbuf;
for (i = 0; i < len; ++i)
{
if (*a != *b)
return 1;
++a; ++b;
}
}
}
else
{
while (write(1, buf, sizeof(buf)) > 0);
}
return 0;
}
then try mkfifo a; ./a.out w > a in a shell and pv < a | ./a.out in another one, see how long does it take to get any bit flip.
It should get in the TiB region relatively fast...
I've got an old HDD with which I planned to fiddle around a little. First thing I'm trying to do is spinning the motor with different speeds.
Questions are:
Is there a general way to do this or does it depend on the HDD model?
Where do I find a list of commands, that I can send to the HDD controller to control the speed of the motor?
I actually found a function, that apparently spins down the motor, here it is:
/* spin-down a disk */
static void spindown_disk(const char *name)
{
struct sg_io_hdr io_hdr;
unsigned char sense_buf[255];
char dev_name[100];
int fd;
dprintf("spindown: %s\n", name);
/* fabricate SCSI IO request */
memset(&io_hdr, 0x00, sizeof(io_hdr));
io_hdr.interface_id = 'S';
io_hdr.dxfer_direction = SG_DXFER_NONE;
/* SCSI stop unit command */
io_hdr.cmdp = (unsigned char *) "\x1b\x00\x00\x00\x00\x00";
io_hdr.cmd_len = 6;
io_hdr.sbp = sense_buf;
io_hdr.mx_sb_len = (unsigned char) sizeof(sense_buf);
/* open disk device (kernel 2.4 will probably need "sg" names here) */
snprintf(dev_name, sizeof(dev_name), "/dev/%s", name);
if ((fd = open(dev_name, O_RDONLY)) < 0) {
perror(dev_name);
return;
}
/* execute SCSI request */
if (ioctl(fd, SG_IO, &io_hdr) < 0) {
char buf[100];
snprintf(buf, sizeof(buf), "ioctl on %s:", name);
perror(buf);
} else if (io_hdr.masked_status != 0) {
fprintf(stderr, "error: SCSI command failed with status 0x%02x\n",
io_hdr.masked_status);
if (io_hdr.masked_status == CHECK_CONDITION) {
phex(sense_buf, io_hdr.sb_len_wr, "sense buffer:\n");
}
}
close(fd);
}
Though I don't really understand where the actual command is sent to the controller, nor do I know how to control the speed, I don't see any rpm specifications.
You cannot control a harddisk's rotational speed, and that is a good thing. If you could, you would inevitably destroy data.
The heads float in what is commonly called "air bearing".
This is, in easy words, a spring mechanism pressing the head onto the disks's surface with a well-defined force and an air cussion from airflow due to the disk's rotation being in equilibrium at the disk's operational speed. When the disk is shut down, another spring mechanisms quickly pulls the heads out of the way into a kind of "parking position".
If you could run the drive at arbitrary speeds, the heads would scratch on the surface. Not good!
As to where the actual command is being sent in above snippet, it is the ioctl call in the line following /* execute SCSI request */.
If you are interested in playing with your old harddisk (be aware that you'll quite likely break it!), have a look at the hdparm tool and its source code. hdparm lets you tweak dozens of parameters such as power save modes, caching, or acustic management... pretty much everything that disk drives support.
In the tool's source code, you'll find a quite complete list of device commands, too.
I'm totally new to the Linux Kernel, so I probably mix things up. But any advice will help me ;)
I have a SATA HDD connected via a PCIe SATA Card and I try to use read and write like on a block device. I also want the data power blackout save on the HDD - not cached. And in the end I have to analyse how much time I loose in each linux stack layer. But one step at a time.
At the moment I try to open the device with *O_DIRECT*. But I don't really understand where I can find the device. It shows up as /dev/sdd and I created one partition /dev/sdd1.
open and read on the partition /dev/sdd1 works. write fails with *O_DIRECT* (But I'm sure I have the right blocksize)
open read and write called on /dev/sdd fails completely.
Is there maybe another file in /dev/ which represents my device on the block layer?
What are my mistakes and wrong assumptions?
This is my current test code
int main() {
int w,r,s;
char buffer[512] = "test string mit 512 byte";
printf("test\n");
// OPEN
int fd = open("/dev/sdd", O_DIRECT | O_RDWR | O_SYNC);
printf("fd = %d\n",fd);
// WRITE
printf("try to write %d byte : %s\n",sizeof(buffer),buffer);
w = write(fd,buffer,sizeof(buffer));
if(w == -1) printf("write failed\n");
else printf("write ok\n");
// RESET BUFFER
memset(buffer,0,sizeof(buffer));
// SEEK
s = lseek(fd,0,SEEK_SET);
if(s == -1) printf("seek failed\n");
else printf("seek ok\n");
// READ
r = read(fd,buffer,sizeof(buffer));
if(r == -1) printf("read failed\n");
else printf("read ok\n");
// PRINT BUFFER
printf("buffer = %s\n",buffer);
return 0;
}
Edit:
I work with the 3.2 Kernel on a power architecture - if this is important.
Thank you very much for your time,
Fabian
Depending on your SDD's block size (could by 512bit or 4K), you can only read/write mulitple of that size.
Also: when using O_DIRECT flag, you need to make sure the buffer is rightly aligned to block boundaries. You cann't ensure that using an ordinary char array, use memalign to allocate aligned memory instead.
I have a relatively complex perl script which is walking over a filesystem and storing a list of updated ownership, then going over that list and applying the changes. I'm doing this in order to update changed UIDs. Because I have several situations where I'm swapping user a's and user b's UIDs, I can't just say "everything which is now 1 should be 2 and everything which is 2 should be 1", as it's also possible that this script could be interrupted, and the system would be left in a completely busted, pretty much unrecoverable state outside of "restore from backup and start over". Which would pretty much suck.
To avoid that problem, I do the two-pas approach above, creating a structure like $changes->{path}->\%c, where c has attributes line newuid, olduid, newgid, and olduid. I then freeze the hash, and once it's written to disk, I read the hash back in and start making changes. This way, if I'm interrupted, I can check to see if the frozen hash exists or not, and just start applying changes again if it does.
The drawback is that sometimes a changing user has literally millions of files, often with very long paths. This means I'm storing a lot of really long strings as hash keys, and I'm running out of memory sometimes. So, I've come up with two options. The one relevant to this question is to instead store the elements as device:inode pairs. That'd be way more space-efficient, and would uniquely identify filesystem elements. The drawback is that I haven't figured out a particularly efficient way to either get a device-relative path from the inode, or to just apply the stat() changes I want to the inode. Yes, I could do another find, and for each file do a lookup against my stored list of devices and inodes to see if a change is needed or not. But if there's a perl-accessible system call - which is portable across HP-UX, AIX, and Linux - from which I can directly just say "on this device make these changes to this inode", it'd be notably better from a performance perspective.
I'm running this across several thousand systems, some of which have filesystems in the petabyte range, having trillions of files. So, while performance may not make much of a differece on my home PC, it's actually somewhat significant in this scenario. :) That performance need, BTW, is why I really don't want to do the other option - which would be to bypass the memory problem by just tie-ing a hash to a disk-based file. And is why I'd rather do more work to avoid having to traverse the whole filesystem a second time.
Alternate suggestions which could reduce memory consumption are, of course, also welcome. :) My requirement is just that I need to record both the old and new UID/GID values, so I can back the changes out / validate changes / update files restored from backups taken prior to the cleanup date. I've considered making /path/to/file look like ${changes}->{root}->{path}->{to}->{file}, but that's a lot more work to traverse, and I dont know that it'll really save me enough memory space to resolve my problem. Collapsing the whole thing to ->{device}->{inode} makes it basically just the size of two integers rather than N characters, which is substatial for any path longer than, say, 2 chars. :)
Simplified idea
When I mentioned streaming, I didn't mean uncontrolled. A database journal (e.g.) is also written in streaming mode, for comparison.
Also note, that the statement that you 'cannot afford to sort even a single subdirectory' directly contradicts the use of a Perl hash to store the same info (I won't blame you if you don't have the CS background).
So here is a really simple illustration of what you could do. Note that every step on the way is streaming, repeatable and logged.
# export SOME_FIND_OPTIONS=...?
find $SOME_FIND_OPTIONS -print0 | ./generate_script.pl > chownscript.sh
# and then
sh -e ./chownscript.sh
An example of generate_script.pl (obviously, adapt it to your needs:)
#!/usr/bin/perl
use strict;
use warnings;
$/="\0";
while (<>)
{
my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat;
# demo purpose, silly translation:
my ($newuid, $newgid) = ($uid+1000, $gid+1000);
print "./chmod.pl $uid:$gid $newuid:$newgid '$_'\n"
}
You could have a system dependent implementation of chmod.pl (this helps to reduce complexity and therefore: risk):
#!/usr/bin/perl
use strict;
use warnings;
my $oldown = shift;
my $newown = shift;
my $path = shift;
($oldown and $newown and $path) or die "usage: $0 <uid:gid> <newuid:newgid> <path>";
my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat $path;
die "file not found: $path" unless $ino;
die "precondition failed" unless ($oldown eq "$uid:$gid");
($uid, $gid) = split /:/, $newown;
chown $uid, $gid, $path or die "unable to chown: $path"
This will allow you to restart when things bork midway, it will even allow you to hand-pick exceptions if necessary. You can save the scripts so you'll have accountability. I've done a reasonable stab at making the scripts operate safely. However, this is obviously just a starting point. Most importantly, I do not deal with filesystem crossings, symbolic links, sockets, device nodes where you might want to pay attention to them.
original response follows:
Ideas
Yeah, if performance is the issue, do it in C
Do not do persistent logging for the whole filesystem (by the way, why the need to keep them in a single hash? streaming output is your friend there)
Instead, log completed runs per directory. You could easily break the mapping up in steps:
user A: 1 -> 99
user B: 2 -> 1
user A: 99 -> 2
Ownify - what I use (code)
As long as you can reserve a range for temporary uids/guids like the 99 there won't be any risk on having to restart (not any more than doing this transnumeration on a live filesystem, anyway).
You could start from this nice tidbit of C code (which admittedly is not very highly optmized):
// vim: se ts=4 sw=4 et ar aw
//
// make: g++ -D_FILE_OFFSET_BITS=64 ownify.cpp -o ownify
//
// Ownify: ownify -h
//
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
/* old habits die hard. can't stick to pure C ... */
#include <string>
#include <iostream>
#define do_stat(a,b) lstat(a,b)
#define do_chown(a,b,c) lchown(a,b,c)
//////////////////////////////////////////////////////////
// logic declarations
//
void ownify(struct stat& file)
{
// if (S_ISLNK(file.st_mode))
// return;
switch (file.st_uid)
{
#if defined(PASS1)
case 1: file.st_uid = 99; break;
case 99: fputs(err, "Unexpected existing owned file!"); exit(255);
#elif defined(PASS2)
case 2: file.st_uid = 1; break;
#elif defined(PASS3)
case 99: file.st_uid = 1; break;
#endif
}
switch (file.st_gid) // optionally map groups as well
{
#if defined(PASS1)
#elif defined(PASS2)
#elif defined(PASS3)
#endif
}
}
/////////////////////////////////////////////////////////
// driver
//
static unsigned int changed = 0, skipped = 0, failed = 0;
static bool dryrun = false;
void process(const char* const fname)
{
struct stat s;
if (0==do_stat(fname, &s))
{
struct stat n = s;
ownify(n);
if ((n.st_uid!=s.st_uid) || (n.st_gid!=s.st_gid))
{
if (dryrun || 0==do_chown(fname, n.st_uid, n.st_gid))
printf("%u\tchanging owner %i:%i '%s'\t(was %i:%i)\n",
++changed,
n.st_uid, n.st_gid,
fname,
s.st_uid, s.st_gid);
else
{
failed++;
int e = errno;
fprintf(stderr, "'%s': cannot change owner %i:%i (%s)\n",
fname,
n.st_uid, n.st_gid,
strerror(e));
}
}
else
skipped++;
} else
{
int e = errno;
fprintf(stderr, "'%s': cannot stat (%s)\n", fname, strerror(e));
failed++;
}
}
int main(int argc, char* argv[])
{
switch(argc)
{
case 0: //huh?
case 1: break;
case 2:
dryrun = 0==strcmp(argv[1],"-n") ||
0==strcmp(argv[1],"--dry-run");
if (dryrun)
break;
default:
std::cerr << "Illegal arguments" << std::endl;
std::cout <<
argv[0] << " (Ownify): efficient bulk adjust of owner user:group for many files\n\n"
"Goal: be flexible and a tiny bit fast\n\n"
"Synopsis:\n"
" find / -print0 | ./ownify -n 2>&1 | tee ownify.log\n\n"
"Input:\n"
" reads a null-delimited stream of filespecifications from the\n"
" standard input; links are _not_ dereferenced.\n\n"
"Options:\n"
" -n/--dry-run - test run (no changes)\n\n"
"Exit code:\n"
" number of failed items" << std::endl;
return 255;
}
std::string fname("/dev/null");
while (std::getline(std::cin, fname, '\0'))
process(fname.c_str());
fprintf(stderr, "%s: completed with %u skipped, %u changed and %u failed%s\n",
argv[0], skipped, changed, failed, dryrun?" (DRYRUN)":"");
return failed;
}
Note that this comes with quite a few safety measures
paranoia check in first pass (check no fiels with reserved uid exists)
ability to change behaviour of do_stat and do_chown with regards to links
a --dry-run option (to observe what would be done) -n
The program will gladly tell you how to use it with ownify -h:
./ownify (Ownify): efficient bulk adjust of owner user:group for many files
Goal: be flexible and a tiny bit fast
Synopsis:
find / -print0 | ./ownify -n 2>&1 | tee ownify.log
Input:
reads a null-delimited stream of file specifications from the
standard input;
Options:
-n/--dry-run - test run (no changes)
Exit code:
number of failed items
A few possible solutions that come to mind:
1) Do not store a hash in the file, just a sorted list in any format that can be reasonably parsed serially. By sorting the list by filename, you should get the equivalent of running find again, without actually doing it:
# UID, GID, MODE, Filename
0,0,600,/a/b/c/d/e
1,1,777,/a/b/c/f/g
...
Since the list is sorted by filename, the contents of each directory should be bunched together in the file. You do not have to use Perl to sort the file, sort will do nicely in most cases.
You can then just read in the file line-by-line - or with any delimiter that will not mangle your filenames - and just perform any changes. Assuming that you can tell which changes are needed for each file at once, it does not sound as if you actually need the random-access capabilities of a hash, so this should do.
So the process would happen in three steps:
Create the change file
Sort the change file
Perform changes per the change file
2) If you cannot tell which changes each file needs at once, you could have multiple lines for each file, each detailing a part of the changes. Each line would be produced the moment you determine a needed change at the first step. You can then merge them after sorting.
3) If you do need random access capabilities, consider using a proper embedded database, such as BerkeleyDB or SQLite. There are Perl modules for most embedded databases around. This will not be quite as fast, though.