Pass each file obtained from a command to another command as a parameter - linux

I am using the following line to take a pdf and split it:
pdfseparate -f 14 -l 23 ALF.SS.0.pdf "${FILE}"-%d.pdf
Now I want for each file produced, to run several commands like this:
pdfcrop --margins '-30 0 -385 0' outputOfpdfSeparate outputOfpdfSeparate-1stCol.pdf
I am trying to figure out the best way to do this:
With a single loop, for each file created by pdfseparate, if I manage to "know" what is the name of the file, I could pass it to pdfcrop and done. But since it is using %d, I do not know how to handle this "new name" in which each file has a new number. I know how to do this in Java but here I do not see it so clear.
Using pipes. I think I have the same issue since if I do
pdfseparate [options] | pdfcrops inputfile outputfile,
I do not know how to "use" the name of inputfile. I am sure it is easy but I dont see it.
Using xargs. I am studying this command since it is new for me.
Using exec. I am under the impression this is not necessary but maybe I am wrong since it's been a long while since I last used exec.
Thanks in advance.

You can use xargs. It is the best way in terms of speed.
I usually use it for converting a lot of .mp4 file to .mp3.
Doing this conversion one-by-one not only is tedious but also takes a long time. Therefore you can use the auto parallel mechanism with the help of -P 0 option in xargs
for example if I had 10 .mp4 files I would do this:
ls *.mp4 | xargs -I xxx -P 0 ffmpeg -i xxx xxx.mp3
After running this line; 10 ffmpet commands are running simultaneously.
The other way to do this is storing a list of .mp4 file in a text file like this:
ls *.mp4 > list-mp4
then:
xargs -I xxx -P 0 ffmpeg -i xxx xxx.mp3 < list-mp4
Or may you have access to GNU-parallel. Thus you can:
parallel ffmpeg -i {} {}.mp3 ::: *.mp4
Now for your case; if you want to use these (= xargs or parallel) or your own command, you should notice that your first command should send its output to stdout. Because the second command is going to read its stdin from the stdout of the first command and bash does this for your.
Thus when you can use pipe == | with your: pdfseparate than it sends its output to stdout. If it does/can NOT, then the right-side of the pipe == the second command does nothing and vice versa: the second command should/can read its stdin from incoming stdout.
For example
ls *.txt | echo {}
here echo does not read any incoming stdout from the ls command and just prints {}
Eventually, your pdfseparate should send to stdout. Then xargs store it in -I anything-your-like and then call your second command
Therefor:
pdfseparate options... | xargs -I ABC -P 0 your-second-command+its-options ABC
NOTE-1 that xargs stores the given stdout line-by-line in ABC and you pass this to your second command as its input
NOTE-2 you do not have to use -P 0 at all. It is just for speeding up the executing time. You can omit it but your second command are synchronized per incoming line.

pdfseparate does not output the files it created, thus you have to use "ls" command to get the filelist, you want to operate on.
# separate the pdfs
pdfseparate -f 14 -l 23 ALF.SS.0.pdf "${FILE}"-%d.pdf
# operate on the just created files, assumes that a "FILE" variable is set, which might not be the case
for i in $(ls "${FILE}-*.pdf"); do pdfcrop --margins '-30 0 -385 0' $i; done;
# assuming that FILE variable in your case would match ALF.SS.0-[0-9]*.pdf, you'd use this:
for i in $(ls ALF.SS.0-[0-9]*.pdf); do pdfcrop --margins '-30 0 -385 0' $i; done;

Related

How to create a dynamic command in bash?

I want to have a command in a variable that runs a program and specifies the output filename for it depending on the number of files exits (to work on a new file each time).
Here is what I have:
export MY_COMMAND="myprogram -o ./dir/outfile-0.txt"
However I would like to make this outfile number increases each time MY_COMMAND is being executed. You may suppose myprogram creates the file soon enough before the next call. So the number can be retrieved from the number of files exists in the directory ./dir/. I do not have access to change myprogram itself or the use of MY_COMMAND.
Thanks in advance.
Given that you can't change myprogram — its -o option will always write to the file given on the command line, and assuming that something also out of your control is running MY_COMMAND so you can't change the way that MY_COMMAND gets called, you still have control of MY_COMMAND
For the rest of this answer I'm going to change the name MY_COMMAND to callprog mostly because it's easier to type.
You can define callprog as a variable as in your example export callprog="myprogram -o ./dir/outfile-0.txt", but you could instead write a shell script and name that callprog, and a shell script can do pretty much anything you want.
So, you have a directory full of outfile-<num>.txt files and you want to output to the next non-colliding outfile-<num+1>.txt.
Your shell script can get the numbers by listing the files, cutting out only the numbers, sorting them, then take the highest number.
If we have these files in dir:
outfile-0.txt
outfile-1.txt
outfile-5.txt
outfile-10.txt
ls -1 ./dir/outfile*.txt produces the list
./dir/outfile-0.txt
./dir/outfile-1.txt
./dir/outfile-10.txt
./dir/outfile-5.txt
(using outfile and .txt means this will work even if there are other files not name outfile)
Scrape out the number by piping it through the stream editor sed … capture the number and keep only that part:
ls -1 ./dir/outfile*.txt | sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:'
(I'm using colon : instead of the standard slash / so I don't have to escape the directory separator in dir/outfile)
Now you just need to pick the highest number. Sort the numbers and take the top
| sort -rn | head -1
Sorting with -n is numeric, not lexigraphic sorting, -r reverses so the highest number will be first, not last.
Putting it all together, this will list the files, edit the names keeping only the numeric part, sort, and get just the first entry. You want to assign that to a variable to work with it, so it is:
high=$(ls -1 ./dir/outfile*.txt | sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:' | sort -rn | head -1)
In the shell (I'm using bash) you can do math on that, $[high + 1] so if high is 10, the expression produces 11
You would use that as the numeric part of your filename.
The whole shell script then just needs to use that number in the filename. Here it is, with lines broken for better readability:
#!/bin/sh
high=$(ls -1 ./dir/outfile*.txt \
| sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:' \
| sort -rn | head -1)
echo "myprogram -o ./dir/outfile-$[high + 1].txt"
Of course you wouldn't echo myprogram, you'd just run it.
you could do this in a bash function under your .bashrc by using wc to get the number of files in the dir and then adding 1 to the result
yourfunction () {
dir=/path/to/dir
filenum=$(expr $(ls $dir | wc -w) + 1)
myprogram -o $dir/outfile-${filenum}.txt
}
this should get the number of files in $dir and append 1 to that number to get the number you need for the filename. if you place it in your .bashrc or under .bash_aliases and source .bashrc then it should work like any other shell command
You can try exporting a function for MY_COMMAND to run.
next_outfile () {
my_program -o ./dir/outfile-${_next_number}.txt
((_next_number ++ ))
}
export -f next_outfile
export MY_COMMAND="next_outfile" _next_number=0
This relies on a "private" global variable _next_number being initialized to 0 and not otherwise modified.

Redirecting linux cout to a variable and the screen in a script

I am currently trying to make a script file that runs multiple other script files on a server. I would like to display the output of these script to the screen IN ADDITION to passing it into grep so I can do error testing. currently I have written this:
status=$(SOMEPROCESS | grep -i "SOMEPROCESS started completed correctly")
I do further error handling below this using the variable status, so I would like to display SOMEPROCESS's output to the screen for error reference. This is a read only server and I can not save the output to a log file.
You need to use the tee command. It will be slightly fiddly, since tee outputs to a file handle. However you could create a file descriptor using pipe.
Or (simpler) for your use case.
Start the script without grep and pipe it through tee SOMEPROCESS | tee /my/safely/generated/filename. Then use tail -f /my/safely/generated/filename | grep -i "my grep pattern separately.
You can use process substituion together with tee:
SOMEPROCESS | tee >(grep ...)
This will use an anonymous pipe and pass /dev/fd/... as file name to tee (or a named pipe on platforms that don't support /dev/fd/...).
Because SOMEPROCESS is likely to buffer its output when not talking to a terminal, you might see significant lag in screen output.
I'm not sure whether I understood your question exactly.
I think you want to get the output of SOMEPROCESS, test it, print it out when there are errors. If it is, I think the code bellow may help you:
s=$(SOMEPROCESS)
grep -q 'SOMEPROCESS started completed correctly' <<< $s
if [[ $? -ne 0 ]];then
# specified string not found in the output, it means SOMEPROCESS started failed
echo $s
fi
But in this code, it will store the all output in the memory, if the output is big enough, there will be a OOM risk.

tail-like continuous ls (file list)

I am monitoring the new files created in a folder in linux. Every now and then I issue an "ls -ltr" in it. But I wish there was a program/script that would automatically print it, and only the latest entries. I did a short while loop to list it, but it would repeat the entries that were not new and it would keep my screen rolling up when there were no new files. I've learned about "watch", which does show what I want and refreshes every N seconds, but I don't want a ncurses interface, I'm looking for something like tail:
continuous
shows only the new stuff
prints in my terminal, so I can run it in the background and do other things and see the output every now and then getting mixed with whatever I'm doing :D
Summarizing: get the input, compare to a previous input, output only what is new.
Something that do that doesn't sound like such an odd tool, I can see it being used for other situations also, so I would expect it to already exist, but I couldn't find anything. Suggestions?
You can use the very handy command watch
watch -n 10 "ls -ltr"
And you will get a ls every 10 seconds.
And if you add a tail -10 you will only get the 10 newest.
watch -n 10 "ls -ltr|tail -10"
If you have access to inotifywait (available from the inotify-tools package if you are on Debian/Ubuntu) you could write a script like this:
#!/bin/bash
WATCH=/tmp
inotifywait -q -m -e create --format %f $WATCH | while read event
do
ls -ltr $WATCH/$event
done
This is a one-liner that won't give you the same information that ls does, but it will print out the filename:
inotifywait -q -m -e create --format %w%f /some/directory
This works in cygwin and Linux. Some of the previous solutions which write a file will cause the disk to thrash.
This script does not have that problem:
SIG=1
SIG0=SIG
while [ $SIG != 0 ] ; do
while [ $SIG = $SIG0 ] ; do
SIG=`ls -1 | md5sum | cut -c1-32`
sleep 10
done
SIG0=$SIG
ls -lrt | tail -n 1
done

Grep filtering output from a process after it has already started?

Normally when one wants to look at specific output lines from running something, one can do something like:
./a.out | grep IHaveThisString
but what if IHaveThisString is something which changes every time so you need to first run it, watch the output to catch what IHaveThisString is on that particular run, and then grep it out? I can just dump to file and later grep but is it possible to do something like background it and then bring it to foreground and bringing it back but now piped to some grep? Something akin to:
./a.out
Ctrl-Z
fg | grep NowIKnowThisString
just wondering..
No, it is only in your screen buffer if you didn't save it in some other way.
Short form: You can do this, but you need to know that you need to do it ahead-of-time; it's not something that can be put into place interactively after-the-fact.
Write your script to determine what the string is. We'd need a more detailed example of the output format to give a better example of usage, but here's one for the trivial case where the entire first line is the filter target:
run_my_command | { read string_to_filter_for; fgrep -e "$string_to_filter_for" }
Replace the read string_to_filter_for with as many commands as necessary to read enough input to determine what the target string is; this could be a loop if necessary.
For instance, let's say that the output contains the following:
Session id: foobar
and thereafter, you want to grep for lines containing foobar.
...then you can pipe through the following script:
re='Session id: (.*)'
while read; do
if [[ $REPLY =~ $re ]] ; then
target=${BASH_REMATCH[1]}
break
else
# if you want to print the preamble; leave this out otherwise
printf '%s\n' "$REPLY"
fi
done
[[ $target ]] && grep -F -e "$target"
If you want to manually specify the filter target, this can be done by having the loop check for a file being created with filter contents, and using that when starting up grep afterwards.
That is a little bit strange what you need, but you can do it tis way:
you must go into script session first;
then you use shell how usually;
then you start and interrupt you program;
then run grep over typescript file.
Example:
$ script
$ ./a.out
Ctrl-Z
$ fg
$ grep NowIKnowThisString typescript
You could use a stream editor such as sed instead of grep. Here's an example of what I mean:
$ cat list
Name to look for: Mike
Dora 1
John 2
Mike 3
Helen 4
Here we find the name to look for in the fist line and want to grep for it. Now piping the command to sed:
$ cat list | sed -ne '1{s/Name to look for: //;h}' \
> -e ':r;n;G;/^.*\(.\+\).*\n\1$/P;s/\n.*//;br'
Mike 3
Note: sed itself can take file as a parameter, but you're not working with text files, so that's how you'd use it.
Of course, you'd need to modify the command for your case.

Accessing each line using a $ sign in linux

Whenever I execute a linux command that outputs multiple lines, I want to perform some operation on each line of the output. generally i do
command something | while read a
do
some operation on $a;
done
This works fine. But my question is, Is there some how I can access each line by a predefined symbol( dont know how to call it) /// something like $? .. or .. $! .. or .. $_
Is it possible to do
cat to_be_removed.txt | rm -f $LINE
is there a predefined $LINE in bash .. or the previous one is the shortest way. ie.
cat to_be_removed.txt | while read line; do rm -f $line; done;
xargs is what you're looking for:
cat to_be_removed.txt | xargs rm -f
Watch out for spaces in your filenames if you use that one, though. Check out the xargs man page for more information.
You might be looking for the xargs command.
It takes control arguments, plus a command and optionally some arguments for the command. It then reads its standard input, normally splitting at white space, and then arranges to repeatedly execute the command with the given arguments and as many 'file names' read from the standard input as will fit on the command line.
rm -f $(<to_be_removed.txt)
This works because rm can take multiple files as input. It also makes it much more efficient because you only call rm once and you don't need to create a pipe to cat or xargs
On a separate note, rather than using pipes in a while loop, you can avoid a subshell by using process substitution:
while read line; do
some operation on $a;
done < <(command something)
The additional benefit you get by avoiding a subshell is that variables you change inside the loop maintain their altered values outside the loop as well. This is not the case when using the pipe form and it is a common gotcha.

Resources