Alternative to ls in shell-script compatible with nohup - linux

I have a shell-script which lists all the file names in a directory and store them in a new file.
The problem is that when I execute this script with the nohup command, it lists the first name four times instead of listing the correct names.
Commenting the problem with other programmers they think that the problem may be the ls command.
Part of my code is the following:
for i in $( ls -1 ./Datasets/); do
awk '{print $1}' ./genes.txt | head -$num_lineas | tail -1 >> ./aux
let num_lineas=$num_lineas-1
done
Do you know an alternative to ls that works well with nohup?
Thanks.

Don't use ls to feed the loop, use:
for i in ./Datasets/*; do
or if subdirectories are of interest
for i in ./Datasets/*/*; do
Lastly, and more correctly, use find if you need the entire tree below Datasets:
find ./Datasets -type f | while IFS= read -r file; do
(do stuff with $file)
done
Others frown, but there is nothing wrong with also using find as:
for file in $(find ./Datasets -type f); do
(do stuff with $file)
done
Just choose the syntax that most closely meets your needs.

First of all, don't parse ls! A simple glob will suffice. Secondly, your awk | head | tail chain can be simplified by only printing the first column of the line that you're interested in using awk. Thirdly, you can redirect the output of your loop to a file, rather than using >>.
Incorporating all of those changes into your script:
for i in Datasets/*; do
awk -v n="$(( num_lineas-- ))" 'NR==n{print $1}' genes.txt
done > aux
Every time the loop goes round, the value of $num_lineas will decrease by 1.
In terms of your problem with nohup, I would recommend looking into using something like screen, which is known to be a better solution for maintaining a session between logins.

Related

How to move files using the result as condition after grep command

I have 2 files that I needed to grep in a separate file.
The two files are in this directory /var/list
TB.1234.txt
TB.135325.txt
I have to grep them in another file in another directory which is in /var/sup/. I used the command below:
for i in TB.*; do grep "$i" /var/sup/logs.txt; done
what I want to do is, if the result of the grep command contains the word "ERROR" the files which is found in /var/list will be moved to another directory /var/last.
for example I grep this file TB.1234.txt to /var/sup/logs.txt then the result is like this:
ERROR: TB.1234.txt
TB.1234.txt will be move to /var/last.
please help. I don't know how to construct the logic on how to move the files, I'm stuck in that I provided, I am also trying to use two greps in a for loop but I am encountering an error.
I am new in coding and really appreciates any help and suggestions. Thank you so much.
If you are asking how to move files which contain "ERROR", this should be extremely straightforward.
for file in TB.*; do
grep -q 'ERROR' "$file" &&
mv "$file" /var/last/
done
The notation this && that is a convenient shorthand for
if this; then
that
fi
The -q option to grep says to not print the matches, and quit as soon as you find one. Like all well-defined commands, grep sets its exit code to reflect whether it succeeded (the status is visible in $?, but usually you would not examine it directly; perhaps see also Why is testing ”$?” to see if a command succeeded or not, an anti-pattern?)
Your question is rather unclear, but if you want to find either of the matching files in a third file, perhaps something like
awk 'FNR==1 && (++n < ARGC-1) { a[n] = FILENAME; nextfile }
/ERROR/ { for(j=1; j<=n; ++j) if ($0 ~ a[j]) b[a[j]]++ }
END { for(f in b) print f }' TB*.txt /var/sup/logs.txt |
xargs -r mv -t /var/last/
This is somewhat inefficient in that it will read all the lines in the log file, and brittle in that it will only handle file names which do not contain newlines. (The latter restriction is probably unimportant here, as you are looking for file names which occur on the same line as the string "ERROR" in the first place.)
In some more detail, the Awk script collects the wildcard matches into the array a, then processes all lines in the last file, looking for ones with "ERROR" in them. On these lines, it checks if any of the file names in a are also found, and if so, also adds them to b. When all lines have been processed, print the entries in b, which are then piped to a simple shell command to move them.
xargs is a neat command to read some arguments from standard input, and run another command with those arguments added to its command line. The -r option says to not run the other command if there are no arguments.
(mv -t is a GNU extension; it's convenient, but not crucial to have here. If you need portable code, you could replace xargs with a simple while read -r loop.)
The FNR==1 condition requires that the input files are non-empty.
If the text file is small, or you expect a match near its beginning most of the time, perhaps just live with grepping it multiple times:
for file in TB.*; do
grep -Eq "ERROR.*$file|$file.*ERROR" /var/sup/logs.txt &&
mv "$file" /var/last/
done
Notice how we now need double quotes, not single, around the regular expression so that the variable $file gets substituted in the string.
grep has an -l switch, showing only the filename of the file which contains a pattern. It should not be too difficult to write something like (this is pseudocode, it won't work, it's just for giving you an idea):
if $(grep -l "ERROR" <directory> | wc -l) > 0
then foreach (f in $(grep -l "ERROR")
do cp f <destination>
end if
The wc -l is to check if there are any files which contain the word "ERROR". If not, nothing needs to be done.
Edit after Tripleee's comment:
My proposal can be simplified as:
if grep -lq "ERROR" TB.*;
then foreach (f in $(grep -l "ERROR")
do cp f <destination>
end if
Edit after Tripleee's second comment:
This is even shorter:
for f in $(grep -l "ERROR" TB.*);
do cp "$f" destination;
done

Get last line from grep search on multiple files, and write them in a output file

I have multiple files located in multiple directories. From them I search a keyword 'ENERGY' by grep. In each file I get multiple match cases. I want to take the last line from each file and save the results in the output.txt file. I wrote the following code:
labl=SubDir
ENERGY=`grep 'ENERGY' MyDir*${labl}*/*.txt`
cat > output.txt << EOF
${ENERGY}
EOF
This code saves all match cases from each file. But as mentioned, I need the last match case from each file. For that I modified the grep command as:
ENERGY=`grep 'ENERGY' MyDir*${labl}*/*.txt|taile -l`
Unfortunately this doesn't do the job either. Instead, it saves all the match cases from the last file only.
How to solve it?
Please don't run multiple processes/pipes to achieve this.
gawk '/ENERGY/{last=$0} ENDFILE{if(last!="") print last; last=""}' MyDir*"$labl"*/*.txt
/ENERGY/{last=$0}: On lines which match the regex ENERGY, set variable last to the contents of the entire line $0
ENDFILE{...} Run this {action} at the end of every input file supplied by the glob.
if(last!="") print last: print last if it's not null
last="": reset this variable to null, avoiding duplication
MyDir*"${labl}"*/*.txt: Quoted variable in glob will match directory names that include spaces
Use a for loop:
for f in MyDir*"$lab1"*/*.txt; do
grep ENERGY "$f" | tail -1 >> output.txt
done
Yet one but probably not last possible approach is to use parallel like this. Probably you can achieve the same with xargs, but I personally prefer parallel as simpler and giving the possibility to scale your process.
ls -1 file* | parallel -j1 "grep ENERGY {} | tail -n 1" > output.txt

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?
The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.
Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.
Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.
As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

General solution for bypassing file headers in shell commands

I make extensive use of piping multiple linux shell commands, for example:
grep BLAH file1 | sed 's/old/new/' | sort -k 1,1 > file3
My files often have a header line, and often I have to preserve it throughout the pipeline. So, for example, I would want to grep, sed and sort from line 2 and on, while keeping the 1st line unchanged.
I am looking for some general solution that given some command(s) would preserve the header. I usually write the header to a file before the pipe and then cat it back after the pipe ends. I have started using zshell, so I was wondering if that might help to get a more streamlined solution.
Perhaps something like this:
(arrows are pipes in the image)
but I am not sure how to get that to work in zshell or if it is even possible. One problem is that I need to follow up the first pipe split with a command on both pipes.
Any creative solutions?
Vaughn and devnull have already directed you towards the solution. They both contain typos though and I have some remarks to add and would advise to use this instead:
{ head -n 1 file1; tail -n +2 file1 | grep BLAH | sed 's/old/new/' | sort -k 1,1; } >file3
What it does is take the first line of file1 in one command (your header) and does your grep/sed/whatever magic in a second command on the rest of the file (sans the header, tail -n +2) and redirects the combined output to file3.
Notes:
If your shell supports { } it is preferred over the ( ) construct in this case as it does not spawn a subshell (sometimes it is desirable to have the subshell, though).
head -2 is deprecated, you should use the -n parameter like head -n 2
You can skip the tail -n +2 file1 part if you absolutely know that what you are grepping for cannot be found in your header, but it is certainly cleaner this way.
This should work in most recent shells, btw (bash, ksh, zsh).

How to tell how many files match description with * in unix

Pretty simple question: say I have a set of files:
a1.txt
a2.txt
a3.txt
b1.txt
And I use the following command:
ls a*.txt
It will return:
a1.txt a2.txt a3.txt
Is there a way in a bash script to tell how many results will be returned when using the * pattern. In the above example if I were to use a*.txt the answer should be 3 and if I used *1.txt the answer should be 2.
Comment on using ls:
I see all the other answers attempt this by parsing the output of
ls. This is very unpredictable because this breaks when you have
file names with "unusual characters" (e.g. spaces).
Another pitfall would be, it is ls implementation dependent. A
particular implementation might format output differently.
There is a very nice discussion on the pitfalls of parsing ls output on the bash wiki maintained by Greg Wooledge.
Solution using bash arrays
For the above reasons, using bash syntax would be the more reliable option. You can use a glob to populate a bash array with all the matching file names. Then you can ask bash the length of the array to get the number of matches. The following snippet should work.
files=(a*.txt) && echo "${#files[#]}"
To save the number of matches in a variable, you can do:
files=(a*.txt)
count="${#files[#]}"
One more advantage of this method is you now also have the matching files in an array which you can iterate over.
Note: Although I keep repeating bash syntax above, I believe the above solution applies to all sh-family of shells.
You can't know ahead of time, but you can count how many results are returned. I.e.
ls -l *.txt | wc -l
ls -l will display the directory entries matching the specified wildcard, wc -l will give you the count.
You can save the value of this command in a shell variable with either
num=$(ls * | wc -l)
or
num=`ls -l *.txt | wc -l`
and then use $num to access it. The first form is preferred.
You can use ls in combination with wc:
ls a*.txt | wc -l
The ls command lists the matching files one per line, and wc -l counts the number of lines.
I like suvayu's answer, but there's no need to use an array:
count() { echo $#; }
count *
In order to count files that might have unpredictable names, e.g. containing new-lines, non-printable characters etc., I would use the -print0 option of find and awk with RS='\0':
num=$(find . -maxdepth 1 -print0 | awk -v RS='\0' 'END { print NR }')
Adjust the options to find to refine the count, e.g. if the criteria is files starting with a lower-case a with .txt extension in the current directory, use:
find . -type f -name 'a*.txt' -maxdepth 1 -print0

Resources