I have this node.js code:
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
echo; echo; echo 'du results:';
( cd "${extractDir}" && tar -xzvf "${createTarball.pack.value}" ) > /dev/null;
find "${extractDir}/package" -type f | xargs du --threshold=500KB;
`);
k.stdout.pipe(pt(chalk.redBright('this is a big file (>500KB) according to du: '))).pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
the problem is that multiple grandchildren will write all it's stdout/stderr to the parent.
I want to do something like this instead:
const pid = process.pid;
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
exec 3<>/proc/${pid}/fd/1
echo > 3 ; echo > 3; echo 'du results:' > 3;
( cd "${extractDir}" && tar -xzvf "${createTarball.pack.value}" ) > /dev/null;
find "${extractDir}/package" -type f | xargs du --threshold=500KB ;
`);
k.stdout.pipe(pt(chalk.redBright('this is a big file (>500KB) according to du: '))).pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
but that technique won't work on MacOS because Mac doesn't have the /proc/pid fs feature. Does anyone know what I am trying to do and maybe a good workaround?
Install libsys or some other means of making raw syscalls from node. Then, do the syscall dup(2) with 1 as its argument, and store the result to variable fd. Once you do that, fd will be a number of a file descriptor that's a duplicate of node's stdout, and it will be inherited by child processes. At that point, just remove exec 3<>/proc/${pid}/fd/1 from bash, and replace all of your >&3s with >&${fd}s.
So I solved this problem a different way, the OP was a bit of an XY question - so I basically use mkfifo since I am on MacOS and don't have a /proc/<pid> file:
first we call this:
mkfifo "$fifoDir/fifo"
and then we have
const fifo = cp.spawn('bash');
fifo.stdout.setEncoding('utf8');
fifo.stdout
.pipe(pt(chalk.yellow.bold('warning: this is a big file (>5KB) according to du: ')))
.pipe(process.stdout);
fifo.stdin.end(`
while true; do
cat "${fifoDir}/fifo"
done
`);
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
echo; echo 'du results:';
tar -xzvf "${createTarball.pack.value}" -C "${extractDir}" > /dev/null;
${cmd} > "${fifoDir}/fifo";
kill -INT ${fifo.pid};
`);
k.stdout.pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
k.once('exit', code => {
if (code > 0) {
log.error('Could not run cmd:', cmd);
}
cb(code);
});
},
the "fifo" child process is a sibling process to process k, which is obvious when you look at the above code block, and the "fifo" process ends up handling the stdout of one of the subprocesses of the k process - they intercommunicate by way of the named pipe (mkfifo).
Related
My script
#!/bin/bash
cp *.ats /home/milenko/procmt
mycd() {
cd /home/milenko/procmt
}
mycd
EXT=ats
for i in *; do
if [ "${i}" != "${i%.${EXT}}" ];then
./tsmp -ascii i
fi
done
But
milenko#milenko-HP-Compaq-6830s:~/Serra do Mel/MT06/meas_2016-07-13_20-22-00$ bash k1.sh
./tsmp: handling 1 files ************************************** total input channels: 1
the name of your file does not end with ats ... might crash soon
main (no rda) -> can not open i for input, exit
./tsmp: handling 1 files ************************************** total input channels: 1
the name of your file does not end with ats ... might crash soon
main (no rda) -> can not open i for input, exit
When I go to procmt directory and list files
milenko#milenko-HP-Compaq-6830s:~/procmt$ ls *.ats
262_V01_C00_R000_TEx_BL_2048H.ats 262_V01_C00_R086_TEx_BL_4096H.ats 262_V01_C02_R000_THx_BL_2048H.ats
262_V01_C00_R000_TEx_BL_4096H.ats 262_V01_C01_R000_TEy_BL_2048H.ats 262_V01_C03_R000_THy_BL_2048H.ats
What is wrong with my script?
If I understand correctly this should work for you:
dest='/home/milenko/procmt'
cp *.ats "$dest"
cd "$dest"
for i in *.ats; do
./tsmp -ascii "$i"
done
There is no need to loop through all files when you're only interested in .ats files. Your mycd function is just doing cd so you can avoid that as well.
I run a find command with tee log and xargs process output; by accident I forget add xargs in second pipe and found this question.
The example:
% tree
.
├── a.sh
└── home
└── localdir
├── abc_3
├── abc_6
├── mydir_1
├── mydir_2
└── mydir_3
7 directories, 1 file
and the content of a.sh is:
% cat a.sh
#!/bin/bash
LOG="/tmp/abc.log"
find home/localdir -name "mydir*" -type d -print | tee $LOG | echo
If I add the second pipe with some command, such as echo or ls, the write log action would occasionally fail.
These are some examples when I ran ./a.sh many times:
% bash -x ./a.sh; cat /tmp/abc.log // this tee failed
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo
% bash -x ./a.sh; cat /tmp/abc.log // this tee ok
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo
home/localdir/mydir_2 // this is cat /tmp/abc.log output
home/localdir/mydir_3
home/localdir/mydir_1
Why is it that if I add a second pipe with some command (and forget xargs), the tee command will fail occasionally?
The problem is that, by default, tee exits when a write to a pipe fails. So, consider:
find home/localdir -name "mydir*" -type d -print | tee $LOG | echo
If echo completes first, the pipe will fail and tee will exit. The timing, though, is imprecise. Every command in the pipeline is in a separate subshell. Also, there are the vagaries of buffering. So, sometimes the log file is written before tee exits and sometimes it isn't.
For clarity, let's consider a simpler pipeline:
$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="0" [2]="0")'
1
2
3
4
5
6
7
8
9
10
$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="141" [2]="0")'
$
In the first execution, each process in the pipeline exits with a success status and the log file is written. In the second execution of the same command, tee fails with exit code 141 and the log file is not written.
I used true in place of echo to illustrate the point that there is nothing special here about echo. The problem exists for any command that follows tee that might reject input.
Documentation
Very recent versions of tee have an option to control the pipe-fail-exit behavior. From man tee from coreutils-8.25:
--output-error[=MODE]
set behavior on write error. See MODE below
The possibilities for MODE are:
MODE determines behavior with write errors on the outputs:
'warn' diagnose errors writing to any output
'warn-nopipe'
diagnose errors writing to any output not a pipe
'exit' exit on error writing to any output
'exit-nopipe'
exit on error writing to any output not a pipe
The default MODE for the -p option is 'warn-nopipe'. The default
operation when --output-error is not specified, is to exit immediately
on error writing to a pipe, and diagnose errors writing to non pipe
outputs.
As you can see, the default behavior is "to exit immediately
on error writing to a pipe". Thus, if the attempt to write to the process that follows tee fails before tee wrote the log file, then tee will exit without writing the log file.
Right, piping from tee to something that exits early (not dependent on reading the input from tee in your case) will cause intermittent errors.
For a summary of this gotcha see:
http://www.pixelbeat.org/docs/coreutils-gotchas.html#tee
I debugged the tee source code, but I'm not familiar with Linux C, so maybe have problems.
tee belongs to coreutils package, under src/tee.c
First, it set buffer with:
setvbuf (stdout, NULL, _IONBF, 0); // for standard output
setvbuf (descriptors[i], NULL, _IONBF, 0); // for file descriptor
So it is unbuffer?
Second, tee put stdout as its first item in descriptor array, and will write to descriptor with for loop:
/* In the array of NFILES + 1 descriptors, make
the first one correspond to standard output. */
descriptors[0] = stdout;
files[0] = _("standard output");
setvbuf (stdout, NULL, _IONBF, 0);
...
for (i = 0; i <= nfiles; i++) {
if (descriptors[i]
&& fwrite (buffer, bytes_read, 1, descriptors[i]) != 1) // failed!!!
{
error (0, errno, "%s", files[i]);
descriptors[i] = NULL;
ok = false;
}
}
such as tee a.log, descriptors[0] is stdout, and descriptors[1] is a.log.
As #John1024 said, pipeline is parallel (what I misunderstand before). The second pipe command, such as echo, ls, or true, not accept input, so it would not "wait" for the input, and if it execute faster, it will close the pipe (input end) before tee write to output end, so above code, the comment line will failed not not go on writing to file descriptor.
Supply:
The strace result with killed by SIGPIPE:
write(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 21) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=22649, si_uid=1000} ---
+++ killed by SIGPIPE +++
If I try to redirect the output of grep to the same file that it is reading from, like so:
$ grep stuff file.txt > file.txt
I get the error message grep: input file 'file.txt' is also the output. How does grep determine this?
According to the GNU grep source code, the grep check the i-nodes of the input and the output:
if (!out_quiet && list_files == 0 && 1 < max_count
&& S_ISREG (out_stat.st_mode) && out_stat.st_ino
&& SAME_INODE (st, out_stat)) /* <------------------ */
{
if (! suppress_errors)
error (0, 0, _("input file %s is also the output"), quote (filename));
errseen = 1;
goto closeout;
}
The out_stat is filled by calling fstat against STDOUT_FILENO.
if (fstat (STDOUT_FILENO, &tmp_stat) == 0 && S_ISREG (tmp_stat.st_mode))
out_stat = tmp_stat;
Looking at the source code - you can see that it checks for this case (the file is already open for reading by grep) and reports it, see the SAME_INODE check below:
/* If there is a regular file on stdout and the current file refers
to the same i-node, we have to report the problem and skip it.
Otherwise when matching lines from some other input reach the
disk before we open this file, we can end up reading and matching
those lines and appending them to the file from which we're reading.
Then we'd have what appears to be an infinite loop that'd terminate
only upon filling the output file system or reaching a quota.
However, there is no risk of an infinite loop if grep is generating
no output, i.e., with --silent, --quiet, -q.
Similarly, with any of these:
--max-count=N (-m) (for N >= 2)
--files-with-matches (-l)
--files-without-match (-L)
there is no risk of trouble.
For --max-count=1, grep stops after printing the first match,
so there is no risk of malfunction. But even --max-count=2, with
input==output, while there is no risk of infloop, there is a race
condition that could result in "alternate" output. */
if (!out_quiet && list_files == 0 && 1 < max_count
&& S_ISREG (out_stat.st_mode) && out_stat.st_ino
&& SAME_INODE (st, out_stat))
{
if (! suppress_errors)
error (0, 0, _("input file %s is also the output"), quote (filename));
errseen = true;
goto closeout;
}
Here is how to write back to some file:
grep stuff file.txt > tmp && mv tmp file.txt
try pipline with cat or tac:
cat file | grep 'searchpattern' > newfile
it's best practice and short for realization
Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.
I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:
<id>.dat
Something like this would be the long-winded solution:
dest:
files=`seq 1 10000`
for id in `echo $files`;
do
./t_show $id > $id
done
tar cf - $files | nice gzip -c > dat.tar.gz
source:
scp user#source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar
That is, write each output to its own file, compress & tar, send over network, extract.
It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.
Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?
The tar archive would need to extract as I said on the destination server as a separate file for each ID.
Thanks to anyone who takes the time to help.
You could just send the data formatted in some way and parse it on the the receiver.
foo.sh on the sender:
#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
data="$(./t_show $id)"
size=$(wc -c <<< "$data")
echo $id $size
cat <<< "$data"
done
On the receiver:
ssh -C user#server 'foo.sh'|while read file size; do
dd of="$file" bs=1 count="$size"
done
ssh -C compresses the data during transfer
You can at least tar stuff over a ssh connection:
tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"
How to populate the archive without intermediary files however, I don't know.
EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...
I don't think this is working with a plain bash script. But you could have a look at the Archive::TAR module for perl or other scripting languages.
The Perl Module has a function add_data to create a "file" on the fly and add it to the archive for streaming accros the network.
The Documentation is found here:
You can do better without tar:
#!/bin/bash
for id in `seq 1 1000`
do
./t_show $id
done | gzip
The only difference is that you will not get the boundaries between different IDs.
Now put that in a script, say show_me_the_ids and do from the client
shh user#source ./show_me_the_ids | gunzip
And there they are!
Or either, you can specify the -C flag to compress the SSH connection and remove the gzip / gunzip uses all together.
If you are really into it you may try ssh -C, gzip -9 and other compression programs.
Personally I'll bet for lzma -9.
I would try this:
(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user#destination "ImportscriptOrProgram"
This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.
HTH
Thanks all
I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.
Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:
sub copy_tickserver_files {
my $self = shift;
my $cmd = 'cd tickserver/ ; ';
my $i = 1;
while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
$cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
$i++;
if ( $i % 1000 == 0 ) {
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
$cmd = 'cd tickserver/ ; ';
}
}
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
}
sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;
my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
if ( m{HEADER [ ] ([0-9]+) }mxs ) {
my $id = $1;
$output = "$self->{tmp_dir}/$id.ts";
close TICKSOP;
open TICKSOP, '>', $output;
next;
}
next unless $output;
print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}
I've got a large collection of gzipped archives on my Ubuntu webserver, and I need them converted to zips. I figure this would be done with a script, but what language should I use, and how would I go about unzipping and rezipping files?
I'd do it with a bash(1) one-liner:
for f in *.tar.gz;\
do rm -rf ${f%.tar.gz} ;\
mkdir ${f%.tar.gz} ;\
tar -C ${f%.tar.gz} zxvf $f ;\
zip -r ${f%.tar.gz} $f.zip ;\
rm -rf ${f%.tar.gz} ;\
done
It isn't very pretty because I'm not great at bash(1). Note that this destroys a lot of directories so be sure you know what this does before doing it.
See the bash(1) reference card for more details on the ${foo%bar} syntax.
A simple bash script would be easiest, surely? That way you can just invoke the tar and zip commands.
the easiest solution on unix platforms may well be to use fuse and something like archivemount (libarchive), http://en.wikipedia.org/wiki/Archivemount .
/iaw
You can use node.js and tar-to-zip for this purpose. All you need to do is:
Install node.js with nvm if you do not have it.
And then install tar-to-zip with:
npm i tar-to-zip -g
And use it with:
tar-to-zip *.tar.gz
Also you can convert .tar.gz files to .zip programmatically.
You should install async and tar-to-zip locally:
npm i async tar-to-zip
And then create converter.js with contents:
#!/usr/bin/env node
'use strict';
const fs = require('fs');
const tarToZip = require('tar-to-zip');
const eachSeries = require('async/eachSeries');
const names = process.argv.slice(2);
eachSeries(names, convert, exitIfError);
function convert(name, done) {
const {stdout} = process;
const onProgress = (n) => {
stdout.write(`\r${n}%: ${name}`);
};
const onFinish = (e) => {
stdout.write('\n');
done();
};
const nameZip = name.replace(/\.tar\.gz$/, '.zip');
const zip = fs.createWriteStream(nameZip)
.on('error', (error) => {
exitIfError(error);
fs.unlinkSync(zipPath);
});
const progress = true;
tarToZip(name, {progress})
.on('progress', onProgress)
.on('error', exitIfError)
.getStream()
.pipe(zip)
.on('finish', onFinish);
}
function exitIfError(error) {
if (!error)
return;
console.error(error.message);
process.exit(1);
}
Zipfiles are handy because they offer random access to files. Tar files only sequential.
My solution for this conversion is this shell script, which calls itself via tar(1) "--to-command" option. (I prefer that rather than having 2 scripts). But I admit "untar and zip -r" is faster than this, because zipnote(1) cannot work in-place, unfortunately.
#!/bin/zsh -feu
## Convert a tar file into zip:
usage() {
setopt POSIX_ARGZERO
cat <<EOF
usage: ${0##*/} [+-h] [-v] [--] {tarfile} {zipfile}"
-v verbose
-h print this message
converts the TAR archive into ZIP archive.
EOF
unsetopt POSIX_ARGZERO
}
while getopts :hv OPT; do
case $OPT in
h|+h)
usage
exit
;;
v)
# todo: ignore TAR_VERBOSE from env?
# Pass to the grand-child process:
export TAR_VERBOSE=y
;;
*)
usage >&2
exit 2
esac
done
shift OPTIND-1
OPTIND=1
# when invoked w/o parameters:
if [ $# = 0 ] # todo: or stdin is not terminal
then
# we are invoked by tar(1)
if [ -n "${TAR_VERBOSE-}" ]; then echo $TAR_REALNAME >&2;fi
zip --grow --quiet $ZIPFILE -
# And rename it:
# fixme: this still makes a full copy, so slow.
printf "# -\n#=$TAR_REALNAME\n" | zipnote -w $ZIPFILE
else
if [ $# != 2 ]; then usage >&2; exit 1;fi
# possibly: rm -f $ZIPFILE
ZIPFILE=$2 tar -xaf $1 --to-command=$0
fi
Here is a python solution based on this answer here:
import sys, tarfile, zipfile, glob
def convert_one_archive(file_name):
out_file = file_name.replace('.tar.gz', '.zip')
with tarfile.open(file_name, mode='r:gz') as tf:
with zipfile.ZipFile(out_file, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
for m in tf.getmembers():
f = tf.extractfile( m )
fl = f.read()
fn = m.name
zf.writestr(fn, fl)
for f in glob.glob('*.tar.gz'):
convert_one_archive(f)
Here is script based on #Brad Campbell's answer that works on files passed as command arguments, works with other tar file types (uncompressed or the other compression types supported by tarfile), and handles directories in the source tar file. It will also print warnings if the source file contains a symlink or hardlink, which are converted to regular files. For symlinks, the link is resolved during conversion. This can lead to an error if the link target is not in the tar; this is also potentially dangerous from a security standpoint, so user beware.
#!/usr/bin/python
import sys, tarfile, zipfile, glob, re
def convert_one_archive(in_file, out_file):
with tarfile.open(in_file, mode='r:*') as tf:
with zipfile.ZipFile(out_file, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
for m in [m for m in tf.getmembers() if not m.isdir()]:
if m.issym() or m.islnk():
print('warning: symlink or hardlink converted to file')
f = tf.extractfile(m)
fl = f.read()
fn = m.name
zf.writestr(fn, fl)
for in_file in sys.argv[1:]:
out_file = re.sub(r'\.((tar(\.(gz|bz2|xz))?)|tgz|tbz|tbz2|txz)$', '.zip', in_file)
if out_file == in_file:
print(in_file, '---> [skipped]')
else:
print(in_file, '--->', out_file)
convert_one_archive(in_file, out_file)