linux pattern match program input - linux

I have multiple files with the same pattern: file1.txt, file2.txt, file3.txt, etc. I want to run a java program for each file, something like this:
java Main file[0-9]*.txt
but it doesn't work. Is it possible to do what I want from the terminal? If not, I could change the program to read multiple args, but then again, I'd need to type all 100+ files manually.

As written, the command
java Main file[0-9]*.txt
would pass all of the matching filenames file1.txt, file2.txt, etc., in one command. The OP requested "run a java program for each file", which implies that a series of commands is intended. To do this (in bash or POSIX shell), one could do this:
for file in file[0-9]*.txt; do [ -f "$file" ] && java Main "$file"; done
Breaking it down:
this makes a loop with for file in file[0-9]*.txt using the suggested pattern,
it checks to ensure that the loop variable file has found a file rather than a wildcard expression which found none,
runs the Java class Main for each corresponding file.

Related

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

balancing the bash calculations

We have a tool for cutting adaptors https://github.com/vsbuffalo/scythe/blob/master/README.md and we wanted it to be used on all the files in the raw folder and make an output of each file separately as OUT+File Name.
Something is wrong with this script I wrote, because it doesn't take each file separately, and the whole thing doesn't work properly. It's gonna generateing empty file named OUT+files
Expected operation will looks:
take file1, use scythe on it, write output as OUTfile1
take file2 etc.
#!/bin/bash
FILES=/home/dave/raw/*
for f in $FILES
do
echo "Processing the $f file..."
/home/deve/scythe/scythe -a /home/dev/scythe/illumina_adapters.fa -o "OUT"+$f $f
done
Additionally, I noticed (testing for a single file) that the script uses only one core out of 130 available. Is there any way to improve it?
There is no string concatenation operator in shell. Use juxtaposition instead; it's "OUT$f", not "OUT"+$f.

Obtaining file names from directory in Bash

I am trying to create a zsh script to test my project. The teacher supplied us with some input files and expected output files. I need to diff the output files from myExecutable with the expected output files.
Question: Does $iF contain a string in the following code or some kind of bash reference to the file?
#!/bin/bash
inputFiles=~/project/tests/input/*
outputFiles=~/project/tests/output
for iF in $inputFiles
do
./myExecutable $iF > $outputFiles/$iF.out
done
Note:
Any tips in fulfilling my objectives would be nice. I am new to shell scripting and I am using the following websites to quickly write the script (since I have to focus on the project development and not wasting time on extra stuff):
Grammar for bash language
Begginer guide for bash
As your code is, $iF contains full path of file as a string.
N.B: Don't use for iF in $inputFiles
use for iF in ~/project/tests/input/* instead. Otherwise your code will fail if path contains spaces or newlines.
If you need to diff the files you can do another for loop on your output files. Grab just the file name with the basename command and then put that all together in a diff and output to a ".diff" file using the ">" operator to redirect standard out.
Then diff each one with the expected file, something like:
expectedOutput=~/<some path here>
diffFiles=~/<some path>
for oF in ~/project/tests/output/* ; do
file=`basename ${oF}`
diff $oF "${expectedOutput}/${file}" > "${diffFiles}/${file}.diff"
done

What does this bash script command mean (sed - e)?

I'm totally new to bash scripting but i want to solve this problem..
the command is:
objfil=`echo ${srcfil} | sed -e "s,c$,o,"`
the idea about the bash script program is to check for the source files, and check if there is an adjacent object file in the OBJ directory, if so, the rest of the program runs smoothly, if not, the iteration terminates and skips the current source file, and moves on to the next one.. it works with .c files but not on the headers, since the object filenames depend on .c files.. i want to write this command so it checks the object files not just the .c but the .h files too.. but without skipping them. i know i have to do something else too, but i need to understand what this line of command does exactly to move on. Thanks. (Sorry for my english)
UPDATE:
if test -r ${curOBJdir}/${objfil}
then
cp -v ${srcfil} ./SAVEDSRC/${srcfil}
fdone="NO"
linenums=ALL
else
fdone="YES"
err="${curOBJdir}/${objfil} is missing - ${srcfil} skipped)"
echo ${err}
echo ${err} >>${log}
fi
while test ${fdone} == "NO"
do
#rest of code ...
here is the rest of the program.. i tried to comment out the "test" part to ignore the comparison just because i only want my script to work on .h files, but without checking the e.g abc.h files has an abc.o file.. (the object file generation is needed because the end of the script there's a comparison between the hexdump of the original and modified object files). The whole script is for changing the basic types with typedefs like int to sint32_t for example.
This concrete command will substitute all c's right before line-end to o:
srcfill=abcd.c
objfil=`echo ${srcfil} | sed -e "s,c$,o,"`
echo $objfil
Output:
abcd.o
P.S. It uses a different match/replace separator: default is / but it uses ,.

All files in one dir, linux

Today I tried a script in linux to get all files in one dir. It was pretty straightforward, but I found something interesting.
#!/bin/bash
InputDir=/home/XXX/
for file in $InputDir'*'
do
echo $file
done
The output is:
/home/XXX/fileA /home/XXX/fileB
But when I just input the dir directly, like:
#!/bin/bash
InputDir=/home/XXX/
for file in /home/XXX/*
do
echo $file
done
The output is:
/home/XXX/fileA
/home/XXX/fileB
It seems, in the first script, there was only one loop and all the file names were stored in the variable $file in the FIRST loop, separated by space. But in the second script, one file name was stored in $file just in one loop, and there were more than one loop. What is exactly the difference between these two scripts?
Thanks very much, maybe my question is a little bit naive..
The behavior is correct and "as expected".
for file in $InputDir'*' means assign "/home/XXX/*" to $file (note the quotes). Since you quoted the asterisk, it will not be executed at this time. When the shell sees echo $file, it first expands the variables and then it does glob expansion. So after the first step, it sees
echo /home/XXX/*
and after glob expansion, it sees:
echo /home/XXX/fileA /home/XXX/fileB
Only now, it will execute the command.
In the second case, the pattern /home/XXX/* is expanded before the for is executed and thus, each file in the directory is assigned to file and then the body of the loop is executed.
This will work:
for file in "$InputDir"*
but it's brittle; it will fail, for example, when you forget to add a / to the end of the variable $InputDir.
for file in "$InputDir"/*
is a little bit better (Unix will ignore double slashes in a path) but it can cause trouble when $InputDir is not set or empty: You'll suddenly list files in the / (root) folder. This can happen, for example, because of a typo:
inputDir=...
for file in "$InputDir"/*
Case matters on Unix :-)
To help you understand code like this, use set -x ("enable tracing") in a line before the code you want to debug.
The difference is the quoting of '*'. In the first case the loop only executes once, with $file equal to /home/XXX/* which then expands to all the files in the directory when passed to echo. In the second case it executes once per file, with $file equal to each file name in turn.
Bottom line - change:
for file in $InputDir'*'
to:
for file in $InputDir*
or, better, and to make it more readable - change:
InputDir=/home/XXX/
for file in $InputDir'*'
to:
InputDir=/home/XXX
for file in $InputDir/*

Resources