linux file based using grep cmd - linux

I have a series of files, for example:
ABC_DDS_20150212_CD.csv
ABC_DDS_20150210_20150212_CD.csv
ABC_DFG_20150212_20150217_CD.csv
I want to apply grep command in Linux so I can extract the first 2 files, but not the 3rd file from ls.
I tried to following:
grep -l "" *20150212* -- exclude *20150212_201502*

You can pipe into grep -v:
grep -l "" *20150212* | grep -v "20150212_201502"
I haven't seen people use grep -l like that though. Using ls seems like a cleaner solution:
ls -l *20150212* | grep -v "20150212_201502"

As u mentioned first two files of ls command
then we can use head as well , just like below
ls -l *your_search_name* | head -2

You can extract your matches by using the below command
grep -l "" *20150212*
output
ABC_DDS_20150210_20150212_CD.csv
ABC_DDS_20150212_CD.csv
ABC_DFG_20150212_20150217_CD.csv
and then to get first 2 lines, you can use head command
grep -l "" *20150212* | head -2
Output
ABC_DDS_20150210_20150212_CD.csv
ABC_DDS_20150212_CD.csv

Related

Loop to filter out lines from apache log files

I have several apache access files that I would like to clean up a bit before I analyze them. I am trying to use grep in the following way:
grep -v term_to_grep apache_access_log
I have several terms that I want to grep, so I am piping every grep action as follow:
grep -v term_to_grep_1 apache_access_log | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > apache_access_log_cleaned
Until here my rudimentary script works as expected! But I have many apache access logs, and I don't want to do that for every file. I have started to write a bash script but so far I couldn't make it work. This is my try:
for logs in ./access_logs/*;
do
cat $logs | grep -v term_to_grep | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > $logs_clean
done;
Could anyone point me out what I am doing wrong?
If you have a variable and you append _clean to its name, that's a new variable, and not the value of the old one with _clean appended. To fix that, use curly braces:
$ var=file.log
$ echo "<$var>"
<file.log>
$ echo "<$var_clean>"
<>
$ echo "<${var}_clean>"
<file.log_clean>
Without it, your pipeline tries to redirect to the empty string, which results in an error. Note that "$file"_clean would also work.
As for your pipeline, you could combine that into a single grep command:
grep -Ev 'term_to_grep|term_to_grep_2|term_to_grep_3|term_to_grep_n' "$logs" > "${logs}_clean"
No cat needed, only a single invocation of grep.
Or you could stick all your terms into a file:
$ cat excludes
term_to_grep_1
term_to_grep_2
term_to_grep_3
term_to_grep_n
and then use the -f option:
grep -vf excludes "$logs" > "${logs}_clean"
If your terms are strings and not regular expressions, you might be able to speed this up by using -F ("fixed strings"):
grep -vFf excludes "$logs" > "${logs}_clean"
I think GNU grep checks that for you on its own, though.
You are looping over several files, but in your loop you constantly overwrite your result file, so it will only contain the last result from the last file.
You don't need a loop, use this instead:
egrep -v 'term_to_grep|term_to_grep_2|term_to_grep_3' ./access_logs/* > "$logs_clean"
Note, it is always helpful to start a Bash script with set -eEuCo pipefail. This catches most common errors -- it would have stopped with an error when you tried to clobber the $logs_clean file.

How to return substring from a linux command

I'm connecting to an exadata and want to get information about "ORACLE_HOME" variable inside them. So i'm using this command:
ls -l /proc/<pid>/cwd
this is the output:
2 oracle oinstall 0 Jan 23 21:20 /proc/<pid>/cwd -> /u01/app/database/11.2.0/dbs/
i need the get the last part :
/u01/app/database/11.2.0 (i dont want the "/dbs/" there)
i will be using this command several times in different machines. So how can i get this substring from whole output?
Awk and grep are good for these types of issues.
New:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | sed 's#/dbs/##'
Old:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | egrep -o '^.+[.0-9]'
Awk prints the last column of the input which is your ls command and then grep grabs the beginning of that string up the last occurrence of numbers and dots. This is a situational solution and perhaps not the best.
Parsing the output of ls is generally considered sub-optimal. I would use something more like this instead:
dirname $(readlink -f /proc/<pid>/cwd)

first two results from ls command

I am using ls -l -t to get a list of files in a directory ordered by time.
I would like to limit the search result to the top 2 files in the list.
Is this possible?
I've tried with grep and I struggled.
You can pipe it into head:
ls -l -t | head -3
Will give you top 3 lines (2 files and the total).
This will just give you the first 2 lines of files, skipping the size line:
ls -l -t | tail -n +2 | head -2
tail strips the first line, then head outputs the next 2 lines.
To avoid dealing with the top output line you can reverse the sort and get the last two lines
ls -ltr | tail -2
This is pretty safe, but depending what you'll do with those two file entries after you find them, you should read Parsing ls on the problems with using ls to get files and file information.
Or you could try just this
ls -1 -t | head -2
The -1 switch skips the title line.
You can use the head command to grab only the first two lines of output:
ls -l -t | head -2
You have to pipe through head.
ls -l -t | head -n 3
will output the two first results.
Try this:
ls -td -- * | head -n 2

Omitting the first line from any Linux command output

I have a requirement where i'd like to omit the 1st line from the output of ls -latr "some path" Since I need to remove total 136 from the below output
So I wrote ls -latr /home/kjatin1/DT_901_linux//autoInclude/system | tail -q which excluded the 1st line, but when the folder is empty it does not omit it. Please tell me how to omit 1st line in any linux command output
The tail program can do this:
ls -lart | tail -n +2
The -n +2 means “start passing through on the second line of output”.
Pipe it to awk:
awk '{if(NR>1)print}'
or sed
sed -n '1!p'
ls -lart | tail -n +2 #argument means starting with line 2
This is a quick hacky way: ls -lart | grep -v ^total.
Basically, remove any lines that start with "total", which in ls output should only be the first line.
A more general way (for anything):
ls -lart | sed "1 d"
sed "1 d" means only print everything but first line.
You can use awk command:
For command output use pipe: | awk 'NR>1'
For output of file: awk 'NR>1' file.csv

linux shell scripting kiddie's question

an Unix shell script with only purpose - count the number of running processes of qmail (could be anything else). Easy thing, but there must be some bug in code:
#!/bin/bash
rows=`ps aux | grep qmail | wc -l`
echo $rows
Because
echo $rows
always shows greater number of rows (11) than if I just count rows in
ps aux | grep qmail
There are just 8 rows. Does it work this way on your system too?
Nowadays with linux, there is pgrep. If you have it on your system, you can skip grep -v grep
$ var=$(pgrep bash) # or `pgrep bash | wc -l`
$ echo $var
2110 2127 2144 2161 2178 2195 2212 2229
$ set -- $var; echo ${#}
8
also, if your ps command has -C option, another way
$ ps -C bash -o pid= | wc -l
if not, you can set a character class in your grep pattern
$ ps aux|grep [q]mail | wc -l
It appears that you're counting the grep process itself and the header line that ps normally prints before its output.
I'd suggest something more like:
qprocs=$(ps auxwww | grep -c "[q]mail")
... note that GNU grep has a "-c" switch to have it print a "count" of matches rather than the lines themselves. The trick with the regular expression here is to match qmail without matching the literal string that's on the grep command line. So we take any single character in the string and wrap it in square brackets such that it is a single character "class." The regexp: [q]mail matches the string qmail without matching the string [q]mail.
Note that even with this regex you may still find some false positive matches. If you really want to be more precise then you should supply a custom output format string to your ps command (see the man pages) or you should feed it through a pipemill or you should parse the output of the ps command based on fields (using awk or cut or a while read loop). (The -o option to ps is by far the easiest among these).
No, since I'm not running qmail. However, you will want to, at a bare minimum, exclude the process running your grep:
ps aux | grep qmail | grep -v grep
For debugging, you may want to do:
rows=`ps aux | grep qmail`
echo $rows >debug.input
od -xcb debug.input
(to see your input to the script in great detail) and then rewrite your script temporarily as:
#!/bin/bash
rows=`cat debug.input | wc -l`
echo $rows
That way, you can see the input and figure out what effect it's having on your code, even as you debug it.
A good debugger will eventually learn to only change one variable at a time. If your changing your code to get it working, that's the variable - don't let the input to your code change as well.
Use
$ /sbin/pidof qmail
A few ways...
ps -e | grep ' [q]mail' | wc -l
ps -C qmail -opid= | wc -l
pidof qmail | tr ' ' '\n' | wc -l
pgrep is on many Linux distributions, and I imagine available for other Unices.
[dan#khorium ~]$ whatis pgrep
pgrep (1) - look up or signal processes based on name and other attributes
[dan#khorium ~]$ pgrep mingetty
1920
1921
1922
1923
1924
In your case, pgrep qmail | wc -l should do the trick.

Resources