Searching a string using grep in a range of multiple files - linux

Hope title is self-explanatory but I'll still try to be more clear on what I'm trying to do. I am looking for a string "Live message" within my log files. By using simple grep command, I can get this information from all the files inside a folder. The command I'm using is as follows,
grep "Live message" *
However, since I have log files ranging back to mid-last year, is there a way to define a range using grep to search for this particular string. My log files appear as follows,
commLogs.log.2015-11-01
commLogs.log.2015-11-01
commLogs.log.2015-11-01
...
commLogs.log.2016-01-01
commLogs.log.2016-01-02
...
commLogs.log.2016-06-01
commLogs.log.2016-06-02
I would like to search for "Live message" within 2016-01-01 - 2016-06-02 range, now writing each file name would be very hard and tidious like this,
grep "Live message" commLogs.log.2016-01-01 commLogs.log.2016-01-02 commLogs.log.2016-01-03 ...
Is there a better way than this?
Thank you in advance for any help

ls * | sed "/2015-06-01/,/2016-06-03/p" -n | xargs grep "Live message"
ls * is all log file (better sort by date), may be replace on find -type f -name ...
sed "/<BEGIN_REGEX>/,/<END_REGEX>/p" -n filter all line between BEGIN_REGEX and END_REGEX
xargs grep "Live message" is pass all files to grep

You are fortunate that your dates are stored in YYYY-MM-DD fashion; it allows you to compare dates by comparing strings lexicographically:
for f in *; do
d=${f#commLogs.log.}
if [[ $d > 2016-01-00 && $d < 2016-06-02 ]]; then
cat "$f"
fi
done | grep "Live message"
This isn't ideal; it's a bit verbose, and requires running cat multiple times. It can be improved by storing file names in an array, which will work as long as the number of matches doesn't grow too big:
for f in *; do
d=${f#commLogs.log.}
if [[ $d > 2016-01-00 && $d < 2016-06-02 ]]; then
files+=("$f")
fi
done
grep "Live message" "${f[#]}"
Depending on the range, you may be able to write a suitable pattern to match the range, but it gets tricky since you can only pattern match strings, not numeric ranges.
grep "Live message" commLogs.log.2016-0[1-5]-* commLogs.log.2016-06-0[1-2]

Related

Bash script that counts and prints out the files that start with a specific letter

How do i print out all the files of the current directory that start with the letter "k" ?Also needs to count this files.
I tried some methods but i only got errors or wrong outputs. Really stuck on this as a newbie in bash.
Try this Shellcheck-clean pure POSIX shell code:
count=0
for file in k*; do
if [ -f "$file" ]; then
printf '%s\n' "$file"
count=$((count+1))
fi
done
printf 'count=%d\n' "$count"
It works correctly (just prints count=0) when run in a directory that contains nothing starting with 'k'.
It doesn't count directories or other non-files (e.g. fifos).
It counts symlinks to files, but not broken symlinks or symlinks to non-files.
It works with 'bash' and 'dash', and should work with any POSIX-compliant shell.
Here is a pure Bash solution.
files=(k*)
printf "%s\n" "${files[#]}"
echo "${#files[#]} files total"
The shell expands the wildcard k* into the array, thus populating it with a list of matching files. We then print out the array's elements, and their count.
The use of an array avoids the various problems with metacharacters in file names (see e.g. https://mywiki.wooledge.org/BashFAQ/020), though the syntax is slightly hard on the eyes.
As remarked by pjh, this will include any matching directories in the count, and fail in odd ways if there are no matches (unless you set nullglob to true). If avoiding directories is important, you basically have to get the directories into a separate array and exclude those.
To repeat what Dominique also said, avoid parsing ls output.
Demo of this and various other candidate solutions:
https://ideone.com/XxwTxB
To start with: never parse the output of the ls command, but use find instead.
As find basically goes through all subdirectories, you might need to limit that, using the -maxdepth switch, use value 1.
In order to count a number of results, you just count the number of lines in your output (in case your output is shown as one piece of output per line, which is the case of the find command). Counting a number of lines is done using the wc -l command.
So, this comes down to the following command:
find ./ -maxdepth 1 -type f -name "k*" | wc -l
Have fun!
This should work as well:
VAR="k"
COUNT=$(ls -p ${VAR}* | grep -v ":" | wc -w)
echo -e "Total number of files: ${COUNT}\n" 1>&2
echo -e "Files,that begin with ${VAR} are:\n$(ls -p ${VAR}* | grep -v ":" )" 1>&2

using grep to find and write specific lines with conditions

I'm new to shell commands.
I need to do some advanced search script and I have no idea how to do it.
I need to look for lines in my computer log files that contained "str1" and not contained "str2", order by date that greater or equal to some giving date, and write the result to a file, where the process run in the background.
in more formal way I want to do this:
new_process(write(from *log* where log_line contain "Str1" && log_line not contain "str2" order by Date having Date >= %date%) to read.txt)
thank you!
You may be making it harder than it need to be. grep 'str1' "$logfile" | grep -v 'str2' will find all lines containing str1 and not str2. For the given date, you will want to touch a file that has the mod time set to the date you want to measure against. You then use your list to feed a while read -r fname; do loop and test [ "$fname" -nt "testfilename" ] and write the files that satisfy that condition to a new file.
Your final script flow will look something like:
newname="${1:-newfile.txt}"
touch -d "date string" testfilename
grep "$str1" "$logfile" | grep -v "str2" | while read -r fname; do
[ "$fname" -nt "testfilename" ] && printf "%s\n" "$fname" > "$newname"
done
rm testfilename ## clean up
There are other ways to approach this, but this is a fairly standard approach. Let me know if you have further questions.

Extract part of a file name in bash

I have a folder with lots of files having a pattern, which is some string followed by a date and time:
BOS_CRM_SUS_20130101_10-00-10.csv (3 strings before date)
SEL_DMD_20141224_10-00-11.csv (2 strings before date)
SEL_DMD_SOUS_20141224_10-00-10.csv (3 strings before date)
I want to loop through the folder and extract only the part before the date and output into a file.
Output
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
This is my script but it is not working
#!/bin/bash
# script variables
FOLDER=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/
LOG_FILE=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/log
echo "Starting the programme at: $(date)" >> $LOG_FILE
# Getting part of the file name from FOLDER
for file in `ls $FOLDER/*.csv`
do
mv "${file}" "${file/date +%Y%m%d HH:MM:SS}" 2>&1 | tee -a $LOG_FILE
done #> $LOG_FILE
Use sed with extended-regex and groups to achieve this.
cat filelist | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
where filelist is a file with all the names you care about. Of course, this is just a placeholder because I don't know how you are going to list all eligible files. If a glob will do, for example, you can do
ls mydir/*.csv | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
Assuming you wont have numbers in the first part, you could use:
$ for i in *csv;do str=$(echo $i|sed -r 's/[0-9]+.*//'); echo $str; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
Or with parameter substitution:
$ for i in *csv;do echo ${i%_*_*}_; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
When you use ${var/pattern/replace}, the pattern must be a filename glob, not command to execute.
Instead of using the substitution operator, use the pattern removal operator
mv "${file}" "${file%_*-*-*.csv}.csv"
% finds the shortest match of the pattern at the end of the variable, so this pattern will just match the date and time part of the filename.
The substitution:
"${file/date +%Y%m%d HH:MM:SS}"
is unlikely to do anything, because it doesn't execute date +%Y%m%d HH:MM:SS. It just treats it as a pattern to search for, and it's not going to be found.
If you did execute the command, though, you would get the current date and time, which is also (apparently) not what you find in the filename.
If that pattern is precise, then you can do the following:
echo "${file%????????_??-??-??.csv}" >> "$LOG_FILE"
using grep:
ls *.csv | grep -Po "\K^([A-Za-z]+_)+"
output:
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

Operating on multiple results from find command in bash

Hi I'm a novice linux user. I'm trying to use the find command in bash to search through a given directory, each containing multiple files of the same name but with varying content, to find a maximum value within the files.
Initially I wasn't taking the directory as input and knew the file wouldn't be less than 2 directories deep so I was using nested loops as follows:
prev_value=0
for i in <directory_name> ; do
if [ -d "$i" ]; then
cd $i
for j in "$i"/* ; do
if [ -d "$j" ]; then
cd $j
curr_value=`grep "<keyword>" <filename>.txt | cut -c32-33` #gets value I'm comparing
if [ $curr_value -lt $prev_value ]; then
curr_value=$prev_value
else
prev_value=$curr_value
fi
fi
done
fi
done
echo $prev_value
Obviously that's not going to cut it now. I've looked into the -exec option of find but since find is producing a vast amount of results I'm just not sure how to handle the variable assignment and comparisons. Any help would be appreciated, thanks.
find "${DIRECTORY}" -name "${FILENAME}.txt" -print0 | xargs -0 -L 1 grep "${KEYWORD}" | cut -c32-33 | sort -nr | head -n1
We find the filenames that are named FILENAME.txt (FILENAME is a bash variable) that exist under DIRECTORY.
We print them all out, separated by nulls (this avoids any problems with certain characters in directory or file names).
Then we read them all in again using xargs, and pass the null-separated (-0) values as arguments to grep, launching one grep for each filename (-L 1 - let's be POSIX-compliant here). (I do that to avoid grep printing the filenames, which would screw up cut).
Then we sort all the results, numerically (-n), in descending order (-r).
Finally, we take the first line (head -n1) of the sorted numbers - which will be the maximum.
P.S. If you have 4 CPU cores you can try adding the -P 4 option to xargs to try to make the grep part of it run faster.

Count/Enumerate files in folder filtered by content

I have a folder with lots of files with some data. Not every file has a complete data set.
The complete data sets all have a common string of the form 'yyyy-mm-dd' on the last line so i thought i might filter with something like tail -n 1, but have no idea how to do that.
Any idea how to do something like that in a simple script or bash command?
for f in *
do
tail -n 1 "$f" |
grep -qE '^[0-9]{4}-[01][0-9]-[0-3][0-9]$' &&
echo "$f"
done

Resources