Count/Enumerate files in folder filtered by content - linux

I have a folder with lots of files with some data. Not every file has a complete data set.
The complete data sets all have a common string of the form 'yyyy-mm-dd' on the last line so i thought i might filter with something like tail -n 1, but have no idea how to do that.
Any idea how to do something like that in a simple script or bash command?

for f in *
do
tail -n 1 "$f" |
grep -qE '^[0-9]{4}-[01][0-9]-[0-3][0-9]$' &&
echo "$f"
done

Related

How to move files using the result as condition after grep command

I have 2 files that I needed to grep in a separate file.
The two files are in this directory /var/list
TB.1234.txt
TB.135325.txt
I have to grep them in another file in another directory which is in /var/sup/. I used the command below:
for i in TB.*; do grep "$i" /var/sup/logs.txt; done
what I want to do is, if the result of the grep command contains the word "ERROR" the files which is found in /var/list will be moved to another directory /var/last.
for example I grep this file TB.1234.txt to /var/sup/logs.txt then the result is like this:
ERROR: TB.1234.txt
TB.1234.txt will be move to /var/last.
please help. I don't know how to construct the logic on how to move the files, I'm stuck in that I provided, I am also trying to use two greps in a for loop but I am encountering an error.
I am new in coding and really appreciates any help and suggestions. Thank you so much.
If you are asking how to move files which contain "ERROR", this should be extremely straightforward.
for file in TB.*; do
grep -q 'ERROR' "$file" &&
mv "$file" /var/last/
done
The notation this && that is a convenient shorthand for
if this; then
that
fi
The -q option to grep says to not print the matches, and quit as soon as you find one. Like all well-defined commands, grep sets its exit code to reflect whether it succeeded (the status is visible in $?, but usually you would not examine it directly; perhaps see also Why is testing ”$?” to see if a command succeeded or not, an anti-pattern?)
Your question is rather unclear, but if you want to find either of the matching files in a third file, perhaps something like
awk 'FNR==1 && (++n < ARGC-1) { a[n] = FILENAME; nextfile }
/ERROR/ { for(j=1; j<=n; ++j) if ($0 ~ a[j]) b[a[j]]++ }
END { for(f in b) print f }' TB*.txt /var/sup/logs.txt |
xargs -r mv -t /var/last/
This is somewhat inefficient in that it will read all the lines in the log file, and brittle in that it will only handle file names which do not contain newlines. (The latter restriction is probably unimportant here, as you are looking for file names which occur on the same line as the string "ERROR" in the first place.)
In some more detail, the Awk script collects the wildcard matches into the array a, then processes all lines in the last file, looking for ones with "ERROR" in them. On these lines, it checks if any of the file names in a are also found, and if so, also adds them to b. When all lines have been processed, print the entries in b, which are then piped to a simple shell command to move them.
xargs is a neat command to read some arguments from standard input, and run another command with those arguments added to its command line. The -r option says to not run the other command if there are no arguments.
(mv -t is a GNU extension; it's convenient, but not crucial to have here. If you need portable code, you could replace xargs with a simple while read -r loop.)
The FNR==1 condition requires that the input files are non-empty.
If the text file is small, or you expect a match near its beginning most of the time, perhaps just live with grepping it multiple times:
for file in TB.*; do
grep -Eq "ERROR.*$file|$file.*ERROR" /var/sup/logs.txt &&
mv "$file" /var/last/
done
Notice how we now need double quotes, not single, around the regular expression so that the variable $file gets substituted in the string.
grep has an -l switch, showing only the filename of the file which contains a pattern. It should not be too difficult to write something like (this is pseudocode, it won't work, it's just for giving you an idea):
if $(grep -l "ERROR" <directory> | wc -l) > 0
then foreach (f in $(grep -l "ERROR")
do cp f <destination>
end if
The wc -l is to check if there are any files which contain the word "ERROR". If not, nothing needs to be done.
Edit after Tripleee's comment:
My proposal can be simplified as:
if grep -lq "ERROR" TB.*;
then foreach (f in $(grep -l "ERROR")
do cp f <destination>
end if
Edit after Tripleee's second comment:
This is even shorter:
for f in $(grep -l "ERROR" TB.*);
do cp "$f" destination;
done

script to move files based on extension criteria

I've a certain amount of files always containing same name but different extensions, for example sample.dat, sample.txt, etc.
I would like to create a script that looks where sample.dat is present and than moves all files with name sample*.* into another directory.
I know how to identify them with ls *.dat | sed 's/\(.*\)\..*/\1/', however I would like to concatenate with something like || mv (the result of the first part) *.* /otherdirectory/
You can use this bash one-liner:
for f in `ls | grep YOUR_PATTERN`; do mv ${f} NEW_DESTINATION_DIRECTORY/${f}; done
It iterates through the result of the operation ls | grep, which is the list of your files you wish to move, and then it moves each file to the new destination.
Something simple like this?
dat_roots=$(ls *.dat | sed 's/\.dat$//')
for i in $dat_roots; do
echo mv ${i}*.* other-directory
done
This will break for file names containing spaces, so be careful.
Or if spaces are an issue, this will do the job, but is less readable.
ls *.dat | sed 's/\.dat$//' | while read root; do
mv "${root}"*.* other-directory
done
Not tested, but this should do the job:
shopt -s nullglob
for f in *.dat
do
mv ${f%.dat}.* other-directory
done
Setting the nullglob option ensures that the loob is not executed, if no dat-file exists. If you use this code as part of a larger script, you might want to unset it afterwards (shopt -u nullglob).

Searching a string using grep in a range of multiple files

Hope title is self-explanatory but I'll still try to be more clear on what I'm trying to do. I am looking for a string "Live message" within my log files. By using simple grep command, I can get this information from all the files inside a folder. The command I'm using is as follows,
grep "Live message" *
However, since I have log files ranging back to mid-last year, is there a way to define a range using grep to search for this particular string. My log files appear as follows,
commLogs.log.2015-11-01
commLogs.log.2015-11-01
commLogs.log.2015-11-01
...
commLogs.log.2016-01-01
commLogs.log.2016-01-02
...
commLogs.log.2016-06-01
commLogs.log.2016-06-02
I would like to search for "Live message" within 2016-01-01 - 2016-06-02 range, now writing each file name would be very hard and tidious like this,
grep "Live message" commLogs.log.2016-01-01 commLogs.log.2016-01-02 commLogs.log.2016-01-03 ...
Is there a better way than this?
Thank you in advance for any help
ls * | sed "/2015-06-01/,/2016-06-03/p" -n | xargs grep "Live message"
ls * is all log file (better sort by date), may be replace on find -type f -name ...
sed "/<BEGIN_REGEX>/,/<END_REGEX>/p" -n filter all line between BEGIN_REGEX and END_REGEX
xargs grep "Live message" is pass all files to grep
You are fortunate that your dates are stored in YYYY-MM-DD fashion; it allows you to compare dates by comparing strings lexicographically:
for f in *; do
d=${f#commLogs.log.}
if [[ $d > 2016-01-00 && $d < 2016-06-02 ]]; then
cat "$f"
fi
done | grep "Live message"
This isn't ideal; it's a bit verbose, and requires running cat multiple times. It can be improved by storing file names in an array, which will work as long as the number of matches doesn't grow too big:
for f in *; do
d=${f#commLogs.log.}
if [[ $d > 2016-01-00 && $d < 2016-06-02 ]]; then
files+=("$f")
fi
done
grep "Live message" "${f[#]}"
Depending on the range, you may be able to write a suitable pattern to match the range, but it gets tricky since you can only pattern match strings, not numeric ranges.
grep "Live message" commLogs.log.2016-0[1-5]-* commLogs.log.2016-06-0[1-2]

Using linux sort on multiple files

Is there a way I can run the following command with Linux for many files at once?
$ sort -nr -k 2 file1 > file2
I assume you have many input files, and you want to create a sorted version of each of them. I would do this using something like
for f in file*
do
sort $f > $f.sort
done
Now, this has the small problem that if you run it again, if will not only sort all the files again, it will also create file1.sort.sort to go with file1.sort. There are various ways to fix that. We can fix the second problem by creating sorted files thate don't have names beginning with "file":
for f in file*
do
sort $f > sorted.$f
done
But that's kind of weird, and I wouldn't want files named like that. Alternatively, we could use a slightly more clever script that checks whether the file needs sorting, and avoids both problems:
for f in file*
do
if expr $f : '.*\.sort' > /dev/null
then
: no need to sort
elif test -e $f.sort
then
: already sorted
else
sort -nr -k 2 $f > $f.sort
fi
done

Paste files from list of paths into single output file

I have a file containing a list of filenames and their paths, as in the example below:
$ cat ./filelist.txt
/trunk/data/9.20.txt
/trunk/data/9.30.txt
/trunk/data/50.3.txt
/trunk/data/55.100.txt
...
All of these files, named as X.Y.txt, contain a list of double values. For example:
$ cat ./9.20.txt
1.23
1.0e-6
...
I'm trying to paste all of these X.Y.txt files into a single file, but I'm not sure about how to do it. Here's what I've been able to do so far:
cat ./filelist.txt | xargs paste output.txt >> output.txt
Any ideas on how to do it properly?
You could simply cat-append each file into your output file, as in:
$ cat <list_of_paths> | xargs -I {} cat {} >> output.txt
In the above command, each line from your input file will be taken by xargs, and will be used to replace {}, so that each actual command being run is:
$ cat <X.Y.txt> >> output.txt
If all you're looking to do is to read each line from filelist.txt and append the contents of the file that the line refers to to a single output file, use this:
while read -r file; do
[[ -f "$file" ]] && cat "$file"
done < "filelist.txt" > "output.txt"
Edit: If you know your input file to only contain lines that are file paths (and optionally empty lines) - and no comments, etc. - #Rubens' xargs-based solution is the simplest.
The advantage of the while loop is that you can pre-process each line from the input file, as demonstrated by the -f test above, which ensures that the input line refers to an existing file.
More complex but without argument length limit
Well, the limit here is the available computer memory.
The file buffer.txt must not exist already.
touch buffer.txt
cat filelist.txt | xargs -iXX bash -c 'paste buffer.txt XX > output.txt; mv output.txt buffer.txt';
mv buffer.txt output.txt
What this does, by line:
Create a buffer.txt file which must be initially empty. (paste does not seem to like non-existent files. There does not seem to be a way to make it treat such files as empty.)
Run paste buffer.txt XX > output.txt; mv output.txt buffer.txt. XX is replaced by each file in the filelist.txt file. You can't just do paste buffer.txt XX > buffer.txt because buffer.txt will be truncated before paste processes it. Hence the mv rigmarole.
Move buffer.txt to output.txt so that you get your output with the file name you wanted. Also makes it safe to rerun the whole process.
The previous version forced xargs to issue exactly one paste per file you want to paste but for even better performance, you can do this:
touch buffer.txt;
cat filelist.txt | xargs bash -c 'paste buffer.txt "$#" > output.txt; mv output.txt buffer.txt' FILLER;
mv buffer.txt output.txt
Note the presence of "$#" in the command that bash executes. So paste gets the list of arguments from the list of arguments given to bash. The FILLER parameter passed to bash is to give it a value for $0. If it were not there, then the first file that xargs gives to bash would be used for $0 and thus paste would skip some files.
This way, xargs can pass hundreds of parameters to paste with each invocation and thus reduce dramatically the number of times paste is invoked.
Simpler but limited way
This method suffer from limitations on the number of arguments that a shell can pass to a command it executes. However, in many cases it is good enough. I can't count the number of times when I was performing spur-of-the-moment operations where using xargs would have been superfluous. (As part of a long term solution, that's another matter.)
The simpler way is:
paste `cat filelist.txt` > output.txt
It seems you were thinking that xargs would execute paste output.txt >> output.txt multiple times but that's not how it works. The redirection applies to the entire cat ./filelist.txt | xargs paste output.txt (as you initially had it). If you want to have redirection apply to the individual commands launched by xargs you have it launch a shell, like I do above.
#!/usr/bin/env bash
set -x
while read -r
do
echo "${REPLY}" >> output.txt
done < filelist.txt
OR, to get the files directly:-
#!/usr/bin/env bash
set -x
find *.txt -type f | while read $files
do
echo "${files}" >> output.txt
done
A simple while loop should do the trick:
while read line; do
cat ${line} >> output.txt
done < filelist.txt

Resources