using grep to find and write specific lines with conditions

using grep to find and write specific lines with conditions - linux

I'm new to shell commands.
I need to do some advanced search script and I have no idea how to do it.
I need to look for lines in my computer log files that contained "str1" and not contained "str2", order by date that greater or equal to some giving date, and write the result to a file, where the process run in the background.
in more formal way I want to do this:
new_process(write(from *log* where log_line contain "Str1" && log_line not contain "str2" order by Date having Date >= %date%) to read.txt)
thank you!

You may be making it harder than it need to be. grep 'str1' "$logfile" | grep -v 'str2' will find all lines containing str1 and not str2. For the given date, you will want to touch a file that has the mod time set to the date you want to measure against. You then use your list to feed a while read -r fname; do loop and test [ "$fname" -nt "testfilename" ] and write the files that satisfy that condition to a new file.
Your final script flow will look something like:
newname="${1:-newfile.txt}"
touch -d "date string" testfilename
grep "$str1" "$logfile" | grep -v "str2" | while read -r fname; do
[ "$fname" -nt "testfilename" ] && printf "%s\n" "$fname" > "$newname"
done
rm testfilename ## clean up
There are other ways to approach this, but this is a fairly standard approach. Let me know if you have further questions.

Related

bash/awk/unix detect changes in lines of csv files

I have a timestamp in this format:
(normal_file.csv)
timestamp
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
The dates are usually uniform, however, there are files with irregular dates pattern such as this example:
(abnormal_file.csv)
timestamp
19/02/2002
19/02/2003
19/02/2005
19/02/2006
In my directory, there are hundreds of files that consist of normal.csv and abnormal.csv.
I want to write a bash or awk script that detect the dates pattern in all files of a directory. Files with abnormal.csv should be moved automatically to a new, separate directory (let's say dir_different/).
Currently, I have tried the following:
#!/bin/bash
mkdir dir_different
for FILE in *.csv;
do
# pipe 1: detect the changes in the line
# pipe 2: print the timestamp column (first column, columns are comma-separated)
awk '$1 != prev {print ; prev = $1}' < $FILE | awk -F , '{print $1}'
done
If the timestamp in a given file is normal, then only one single timestamp will be printed; but for abnormal files, multiple dates will be printed.
I am not sure how to separate the abnormal files from the normal files, and I have tried the following:
do
output=$(awk 'FNR==3{print $0}' $FILE)
echo ${output}
if [[ ${output} =~ ([[:space:]]) ]]
then
mv $FILE dir_different/
fi
done
Or is there an easier method to detect changes in lines and separate files that have different lines? Thank you for any suggestions :)

Assuming that none of your "normal" CSV files have trailing newlines this should do the separation just fine:
#!/bin/bash
mkdir -p dir_different
for FILE in *.csv;
do
if awk '{a[$1]++}END{if(length(a)<=2){exit 1}}' "$FILE" ; then
echo mv "$FILE" dir_different
fi
done
After a dry-run just get rid of the echo :)
Edit:
{a[$1]++} This bit creates an array a that gets the first field of each line as an index, and that gets incremented every time the same value is seen.
END{if(length(a)<=2){exit 1}} This checks how many elements are in the array. If there there are less than 3 (which should be the case if there's always the same date and we only get 1 header, 1 date) exit the processing with 1.
"$FILE" is part of the bash script, not awk, and I quoted your variable out of habit, should you ever have files w/ spaces in their names you'll see why :)

So, a "normal" file contains only two different lines:
timestamp
dd/mm/yyyy
Testing if a file is normal is thus as simple as:
[ $(sort -u file.csv | wc -l) -eq 2 ]
This leads to the following possible solution:
#!/usr/bin/env bash
mkdir -p dir_different
for FILE in *.csv;
do
if [ $(sort -u "$FILE" | wc -l) -ne 2 ] ; then
echo mv "$FILE" dir_different
fi
done

Rename files into numbers, starting with a specific number

I want to rename all files in a directory to be sequential numbers:
1.txt
2.txt
3.txt
and so on...
Here's the code I'm currently using:
ls | cat -n | while read n f; do mv "$f" "$n.txt"; done
The code does work, but I need to start with a specific number. For example, I may want to start with the number 49 instead of the number 1.
Is there any way to do this in terminal (on a Mac)?

You could use something like nl with the -v option to set a starting line number other than 1, but instead, you can just use Bash features:
i=1
for f in *; do
[[ -f $f ]] && mv "$f" $((i++)).txt
done
where i is set to the initial value you want.
This also avoids parsing the output of ls, which is recommended to avoid. Instead, I use a glob (*) and a test (-f) to make sure that I'm actually manipulating files and not directories.

How can we increment a string variable within a for loop

#! /bin/bash
for i in $(ls);
do
j=1
echo "$i"
not expected Output:-
autodeploy
bin
config
console-ext
edit.lok
need Output like below if give input 2 it should print "bin" based on below condition, but I want out put like Directory list
1.)autodeploy
2.)bin
3.)config
4.)console-ext
5.)edit.lok
and if i like as input:- 2 then it should print "bin"

Per BashFAQ #1, a while read loop is the correct way to read content line-by-line:
#!/usr/bin/env bash
enumerate() {
local line i
i=0
while IFS= read -r line; do
((++i))
printf '%d.) %s\n' "$i" "$line"
done
}
ls | enumerate
However, ls is not an appropriate tool for programmatic use; the above is acceptable if the results of ls are only for human consumption, but not if they're going to be parsed by a machine -- see Why you shouldn't parse the output of ls(1).
If you want to list files and let the user choose among them by number, pass the results of a glob expression to select:
select filename in *; do
echo "$filename" && break
done

I don't understand what you mean in your question by like Directory list, but following your example, you do not need to write a loop:
ls|nl -s '.)' -w 1
If you want to avoid ls, you can do the following (but be careful - this only works if the directory entries do not contain white spaces (because this would make fmt to break them into two lines):
echo *|fmt -w 1 |nl -s '.)' -w 1

Searching a string using grep in a range of multiple files

Hope title is self-explanatory but I'll still try to be more clear on what I'm trying to do. I am looking for a string "Live message" within my log files. By using simple grep command, I can get this information from all the files inside a folder. The command I'm using is as follows,
grep "Live message" *
However, since I have log files ranging back to mid-last year, is there a way to define a range using grep to search for this particular string. My log files appear as follows,
commLogs.log.2015-11-01
commLogs.log.2015-11-01
commLogs.log.2015-11-01
...
commLogs.log.2016-01-01
commLogs.log.2016-01-02
...
commLogs.log.2016-06-01
commLogs.log.2016-06-02
I would like to search for "Live message" within 2016-01-01 - 2016-06-02 range, now writing each file name would be very hard and tidious like this,
grep "Live message" commLogs.log.2016-01-01 commLogs.log.2016-01-02 commLogs.log.2016-01-03 ...
Is there a better way than this?
Thank you in advance for any help

ls * | sed "/2015-06-01/,/2016-06-03/p" -n | xargs grep "Live message"
ls * is all log file (better sort by date), may be replace on find -type f -name ...
sed "/<BEGIN_REGEX>/,/<END_REGEX>/p" -n filter all line between BEGIN_REGEX and END_REGEX
xargs grep "Live message" is pass all files to grep

You are fortunate that your dates are stored in YYYY-MM-DD fashion; it allows you to compare dates by comparing strings lexicographically:
for f in *; do
d=${f#commLogs.log.}
if [[ $d > 2016-01-00 && $d < 2016-06-02 ]]; then
cat "$f"
fi
done | grep "Live message"
This isn't ideal; it's a bit verbose, and requires running cat multiple times. It can be improved by storing file names in an array, which will work as long as the number of matches doesn't grow too big:
for f in *; do
d=${f#commLogs.log.}
if [[ $d > 2016-01-00 && $d < 2016-06-02 ]]; then
files+=("$f")
fi
done
grep "Live message" "${f[#]}"
Depending on the range, you may be able to write a suitable pattern to match the range, but it gets tricky since you can only pattern match strings, not numeric ranges.
grep "Live message" commLogs.log.2016-0[1-5]-* commLogs.log.2016-06-0[1-2]

Extract part of a file name in bash

I have a folder with lots of files having a pattern, which is some string followed by a date and time:
BOS_CRM_SUS_20130101_10-00-10.csv (3 strings before date)
SEL_DMD_20141224_10-00-11.csv (2 strings before date)
SEL_DMD_SOUS_20141224_10-00-10.csv (3 strings before date)
I want to loop through the folder and extract only the part before the date and output into a file.
Output
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
This is my script but it is not working
#!/bin/bash
# script variables
FOLDER=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/
LOG_FILE=/app/list/l088app5304d1/socles/Data/LEMREC/infa_shared/Shell/Check_Header_T24/log
echo "Starting the programme at: $(date)" >> $LOG_FILE
# Getting part of the file name from FOLDER
for file in `ls $FOLDER/*.csv`
do
mv "${file}" "${file/date +%Y%m%d HH:MM:SS}" 2>&1 | tee -a $LOG_FILE
done #> $LOG_FILE

Use sed with extended-regex and groups to achieve this.
cat filelist | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'
where filelist is a file with all the names you care about. Of course, this is just a placeholder because I don't know how you are going to list all eligible files. If a glob will do, for example, you can do
ls mydir/*.csv | sed -r 's/(.*)[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/\1/'

Assuming you wont have numbers in the first part, you could use:
$ for i in *csv;do str=$(echo $i|sed -r 's/[0-9]+.*//'); echo $str; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_
Or with parameter substitution:
$ for i in *csv;do echo ${i%_*_*}_; done
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

When you use ${var/pattern/replace}, the pattern must be a filename glob, not command to execute.
Instead of using the substitution operator, use the pattern removal operator
mv "${file}" "${file%_*-*-*.csv}.csv"
% finds the shortest match of the pattern at the end of the variable, so this pattern will just match the date and time part of the filename.

The substitution:
"${file/date +%Y%m%d HH:MM:SS}"
is unlikely to do anything, because it doesn't execute date +%Y%m%d HH:MM:SS. It just treats it as a pattern to search for, and it's not going to be found.
If you did execute the command, though, you would get the current date and time, which is also (apparently) not what you find in the filename.
If that pattern is precise, then you can do the following:
echo "${file%????????_??-??-??.csv}" >> "$LOG_FILE"

using grep:
ls *.csv | grep -Po "\K^([A-Za-z]+_)+"
output:
BOS_CRM_SUS_
SEL_DMD_
SEL_DMD_SOUS_

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

using grep to find and write specific lines with conditions - linux

Related

bash/awk/unix detect changes in lines of csv files

Rename files into numbers, starting with a specific number

How can we increment a string variable within a for loop

Searching a string using grep in a range of multiple files

Extract part of a file name in bash

Categories

Resources