balancing the bash calculations - linux

We have a tool for cutting adaptors https://github.com/vsbuffalo/scythe/blob/master/README.md and we wanted it to be used on all the files in the raw folder and make an output of each file separately as OUT+File Name.
Something is wrong with this script I wrote, because it doesn't take each file separately, and the whole thing doesn't work properly. It's gonna generateing empty file named OUT+files
Expected operation will looks:
take file1, use scythe on it, write output as OUTfile1
take file2 etc.
#!/bin/bash
FILES=/home/dave/raw/*
for f in $FILES
do
echo "Processing the $f file..."
/home/deve/scythe/scythe -a /home/dev/scythe/illumina_adapters.fa -o "OUT"+$f $f
done
Additionally, I noticed (testing for a single file) that the script uses only one core out of 130 available. Is there any way to improve it?

There is no string concatenation operator in shell. Use juxtaposition instead; it's "OUT$f", not "OUT"+$f.

Related

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

Using for in a Script, Ubuntu command line

How can I pass each one of my repository files and to do something with them?
For instance, I want to make a script:
#!/bin/bash
cd /myself
#for-loop that will select one by one all the files in /myself
#for each X file I will do this:
tar -cvfz X.tar.gz /myself2
So a for loop in bash is similar to python's model (or maybe the other way around?).
The model goes "for instance in list":
for some_instance in "${MY_ARRAY[#]}"; do
echo "doing something with $some_instance"
done
To get a list of files in a directory, the quick and dirty way is to parse the output of ls and slurp it into an array, a-la array=($(ls))
To quick explain what's going on here to the best of my knowledge, assigning a variable to a space-delimited string surrounded with parens splits the string and turns it into a list.
Downside of parsing ls is that it doesn't take into account files with spaces in their names. For that, I'll leave you with a link to turning a directory's contents into an array, the same place I lovingly :) ripped off the original array=($(ls -d */)) command.
you can use while loop, as it will take care of whole lines that include spaces as well:
#!/bin/bash
cd /myself
ls|while read f
do
tar -cvfz "$f.tar.gz" "$f"
done
you can try this way also.
for i in $(ls /myself/*)
do
tar -cvfz $f.tar.gz /myfile2
done

"read" command not executing in "while read line" loop [duplicate]

This question already has answers here:
Read user input inside a loop
(6 answers)
Closed 5 years ago.
First post here! I really need help on this one, I looked the issue on google, but can't manage to find an useful answer for me. So here's the problem.
I'm having fun coding some like of a framework in bash. Everyone can create their own module and add it to the framework. BUT. To know what arguments the script require, I created an "args.conf" file that must be in every module, that kinda looks like this:
LHOST;true;The IP the remote payload will connect to.
LPORT;true;The port the remote payload will connect to.
The first column is the argument name, the second defines if it's required or not, the third is the description. Anyway, long story short, the framework is supposed to read the args.conf file line by line to ask the user a value for every argument. Here's the piece of code:
info "Reading module $name argument list..."
while read line; do
echo $line > line.tmp
arg=`cut -d ";" -f 1 line.tmp`
requ=`cut -d ";" -f 2 line.tmp`
if [ $requ = "true" ]; then
echo "[This argument is required]"
else
echo "[This argument isn't required, leave a blank space if you don't wan't to use it]"
fi
read -p " $arg=" answer
echo $answer >> arglist.tmp
done < modules/$name/args.conf
tr '\n' ' ' < arglist.tmp > argline.tmp
argline=`cat argline.tmp`
info "Launching module $name..."
cd modules/$name
$interpreter $file $argline
cd ../..
rm arglist.tmp
rm argline.tmp
rm line.tmp
succes "Module $name execution completed."
As you can see, it's supposed to ask the user a value for every argument... But:
1) The read command seems to not be executing. It just skips it, and the argument has no value
2) Despite the fact that the args.conf file contains 3 lines, the loops seems to be executing just a single time. All I see on the screen is "[This argument is required]" just one time, and the module justs launch (and crashes because it has not the required arguments...).
Really don't know what to do, here... I hope someone here have an answer ^^'.
Thanks in advance!
(and sorry for eventual mistakes, I'm french)
Alpha.
As #that other guy pointed out in a comment, the problem is that all of the read commands in the loop are reading from the args.conf file, not the user. The way I'd handle this is by redirecting the conf file over a different file descriptor than stdin (fd #0); I like to use fd #3 for this:
while read -u3 line; do
...
done 3< modules/$name/args.conf
(Note: if your shell's read command doesn't understand the -u option, use read line <&3 instead.)
There are a number of other things in this script I'd recommend against:
Variable references without double-quotes around them, e.g. echo $line instead of echo "$line", and < modules/$name/args.conf instead of < "modules/$name/args.conf". Unquoted variable references get split into words (if they contain whitespace) and any wildcards that happen to match filenames will get replaced by a list of matching files. This can cause really weird and intermittent bugs. Unfortunately, your use of $argline depends on word splitting to separate multiple arguments; if you're using bash (not a generic POSIX shell) you can use arrays instead; I'll get to that.
You're using relative file paths everywhere, and cding in the script. This tends to be fragile and confusing, since file paths are different at different places in the script, and any relative paths passed in by the user will become invalid the first time the script cds somewhere else. Worse, you aren't checking for errors when you cd, so if any cd fails for any reason, then entire rest of the script will run in the wrong place and fail bizarrely. You'd be far better off figuring out where your system's root directory is (as an absolute path), then referencing everything from it (e.g. < "$module_root/modules/$name/args.conf").
Actually, you're not checking for errors anywhere. It's generally a good idea, when writing any sort of program, to try to think of what can go wrong and how your program should respond (and also to expect that things you didn't think of will also go wrong). Some people like to use set -e to make their scripts exit if any simple command fails, but this doesn't always do what you'd expect. I prefer to explicitly test the exit status of the commands in my script, with something like:
command1 || {
echo 'command1 failed!' >&2
exit 1
}
if command2; then
echo 'command2 succeeded!' >&2
else
echo 'command2 failed!' >&2
exit 1
fi
You're creating temp files in the current directory, which risks random conflicts (with other runs of the script at the same time, any files that happen to have names you're using, etc). It's better to create a temp directory at the beginning, then store everything in it (again, by absolute path):
module_tmp="$(mktemp -dt module-system)" || {
echo "Error creating temp directory" >&2
exit 1
}
...
echo "$answer" >> "$module_tmp/arglist.tmp"
(BTW, note that I'm using $() instead of backticks. They're easier to read, and don't have some subtle syntactic oddities that backticks have. I recommend switching.)
Speaking of which, you're overusing temp files; a lot of what you're doing with can be done just fine with shell variables and built-in shell features. For example, rather than reading line from the config file, then storing them in a temp file and using cut to split them into fields, you can simply echo to cut:
arg="$(echo "$line" | cut -d ";" -f 1)"
...or better yet, use read's built-in ability to split fields based on whatever IFS is set to:
while IFS=";" read -u3 arg requ description; do
(Note that since the assignment to IFS is a prefix to the read command, it only affects that one command; changing IFS globally can have weird effects, and should be avoided whenever possible.)
Similarly, storing the argument list in a file, converting newlines to spaces into another file, then reading that file... you can skip any or all of these steps. If you're using bash, store the arg list in an array:
arglist=()
while ...
arglist+=("$answer") # or ("#arg=$answer")? Not sure of your syntax.
done ...
"$module_root/modules/$name/$interpreter" "$file" "${arglist[#]}"
(That messy syntax, with the double-quotes, curly braces, square brackets, and at-sign, is the generally correct way to expand an array in bash).
If you can't count on bash extensions like arrays, you can at least do it the old messy way with a plain variable:
arglist=""
while ...
arglist="$arglist $answer" # or "$arglist $arg=$answer"? Not sure of your syntax.
done ...
"$module_root/modules/$name/$interpreter" "$file" $arglist
... but this runs the risk of arguments being word-split and/or expanded to lists of files.

storing output of ls command consisting of files with spaces in their names

I want to store output of ls command in my bash script in a variable and use each file name in a loop, but for example one file in the directory has name "Hello world", when I do variable=$(ls) "Hello" and "world" end up as two separate entries, and when I try to do
for i in $variable
do
mv $i ~
done
it shows error that files "Hello" and "world" doesn't exist.
Is there any way I can access all files in current directory and run some command even if the files have space(s) in their names.
If you must, dirfiles=(/path/of/interest/*).
And accept the admonition against parsing the output of ls!
I understand you are new to this and I'd like to help. But it isn't easy for me (us?) to provide you with an answer that would be of much help to you by the way you've stated your question.
Based on what I hear so far, you don't seem to have a basic understanding on how parameter expansions work in the shell. The following two links will be useful to you:
Matching Pathnames, Parameters
Now, if your task at hand is to operate on files meeting certain criteria then find(1) will likely to do the job.
Say it with me: don't parse the output of ls! For more information, see this post on Unix.SE.
A better way of doing this is:
for i in *
do
mv -- "$i" ~
done
or simply
mv -- * ~

unix - get substring of a file name

If I have a folder called myfiles/ which has a bunch of python files in it, in a shell script like the following:
for k in myfiles/*.py
do
// code here?
done
How do I print for each k a string that's just --name-of-file--.py ?
If I do
echo $k
as is, it prints myfiles/--name-of-file--.py
I'm very new to shell scripting, but it seems like the cut function attempts to cut the contents of the file and not just the file name (and I don't really know how to use cut).
To be clear, I'd like to know how to get rid of the folder name when printing.
basename "$k"
Or if you want to avoid spawning so many processes, this is more efficient:
echo ${k##*/}

Resources