Validating file records shell script - linux

I have a file with content as follows and want to validate the content as
1.I have entries of rec$NUM and this field should be repeated 7 times only.
for example I have rec1.any_attribute this rec1 should come only 7 times in whole file.
2.I need validating script for this.
If records for rec$NUM are less than 7 or Greater than 7 script should report that record.
Please Help
Thanks in Advance... :)

Simple awk:
awk -F: '/^rec/{a[$1]++}END{for(t in a){if(a[t]!=7){print "Some error for record: " t}}}' test.rc

grep '^rec1' file.txt | wc -l
grep '^rec2' file.txt | wc -l
grep '^rec3' file.txt | wc -l
All above should return 7.

The commands:
grep rec file2.txt | cut -d':' -f1 | uniq -c | egrep -v '^ *7'
will success if file follows your rules, fails (and returns the failing record) if it doesn't.
(replace "uniq -c" by "sort -u" if record numbers can be mixed).


Output of wc -l without file-extension

I've got the following line:
wc -l ./*.txt | sort -rn
i want to cut the file extension. So with this code i've got the output:
number filename.txt
for all my .txt-files in the .-directory. But I want the output without the file-extension, like this:
number filename
I tried a pipe with cut for different kinds of parameter, but all i got was to cut the whole filename with this command.
wc -l ./*.txt | sort -rn | cut -f 1 -d '.'
Assuming you don't have newlines in your filename you can use sed to strip out ending .txt:
wc -l ./*.txt | sort -rn | sed 's/\.txt$//'
unfortunately, cut doesn't have a syntax for extracting columns according to an index from the end. One (somewhat clunky) trick is to use rev to reverse the line, apply cut to it and then rev it back:
wc -l ./*.txt | sort -rn | rev | cut -d'.' -f2- | rev
Using sed in more generic way to cut off whatever extension the files have:
$ wc -l *.txt | sort -rn | sed 's/\.[^\.]*$//'
14 total
8 woc
3 456_base
3 123_base
0 empty_base
A better approach using proper mime type (what is the extension of tar.gz or such multi extensions ? )
for file; do
case $(file -b $file) in
*ASCII*) echo "this is ascii" ;;
*PDF*) echo "this is pdf" ;;
*) echo "other cases" ;;
This is a POC, not tested, feel free to adapt/improve/modify

Find duplicate entries in a text file using shell

I am trying to find duplicate *.sh entry mention in a text file(test.log) and delete it, using shell program. Since the path is different so uniq -u always print duplicate entry even though there are two entry in a text file
cat test.log
I tried couple of way using few command but dont have idea on how to get above output.
rev test.log | cut -f1 -d/ | rev | sort | uniq -d
Any clue on this?
You can use awk for this by splitting fields on / and using $NF (last field) in an associative array:
awk -F/ '!seen[$NF]++' test.log
awk shines for these kind of tasks but here in a non awk solution,
$ sed 's|.*/|& |' file | sort -k2 -u | sed 's|/ |/|'
or, if your path is balanced (the same number of parents for all files)
$ sort -t/ -k5 -u file
awk '!/my_shellprog\/test\/first/' file

grepping using the result of previous grep

Is there a way to perform a grep based on the results of a previous grep, rather than just piping multiple greps into each other. For example, say I have the log file output below:
ID 1000 xyz occured
ID 1001 misc content
ID 1000 misc content
ID 1000 status code: 26348931276572174
ID 1000 misc content
ID 1001 misc content
To begin with, I'd like to grep the whole log file file to see if "xyz occured" is present. If it is, I'd like to get the ID number of that event and grep through all the lines in the file with that ID number looking for the status code.
I'd imagined that I could use xargs or something like that but I can't seem to get it work.
grep "xyz occured" file.log | awk '{ print $2 }' | xargs grep "status code" | awk '{print $NF}'
Any ideas on how to actually do this?
A general answer for grep-ing the grep-ed output:
grep 'patten1' *.txt | grep 'pattern2'
notice that the second grep is not pointing at a file.
More about cool grep stuff here
You're almost there. But while xargs can sometimes be used to do what you want (depending on how the next command takes its arguments), you aren't actually using it to grep for the ID you just extracted. What you need to do is take the output of the first grep (containing the ID code) and use that in the next grep's expression. Something like:
grep "^ID `grep 'xyz occured' file.log | awk '{print $2}'` status code" file.log
Obviously another option would be to write a script to do this in one pass, a-la Ed's suggestion.
Yet another way
for x in `grep "xyz occured" file.log | cut -d\ -f2`
grep $x file.log
The thing I like about this method is if you wanted to you could write the output to a file for each status code.
grep $x file.log >> /var/tmp/$x.out
This is all about retrieve the files in a narrowed search scope. In your case the search scope is determined by a file content.
I have found this problem more often while reducing the search scope through many searches (applying filters to the previous grep results).
Trying to find general answer:
Generate a list with the result of the first grep:
grep pattern | awk -F':' '{print $1}'
Second grep into the list of files like here
xargs grep -i pattern
apply this cascading filter the times you need just adding awk to get only the filenames and xargs to pass the filenames to grep -i
For example:
grep 'pattern1' | awk -F':' '{print $1}' | xargs grep -i 'pattern2'
Just use awk:
awk '{info[$2] = info[$2] $0 ORS} /xyz occured/{ids[$2]} END{ for (id in ids) printf "%s",info[id]}' file.log
awk '/status code/{code[$2]=$NF} /xyz occured/{ids[$2]} END{ for (id in ids) print code[id]}' file.log
depending what you really want to output. Some expected output in your question would help.
Grep the result of a previous Grep:
Given this file contents:
ID 1000 xyz occured
ID 1001 misc content
ID 1000 misc content
ID 1000 status code: 26348931276572174
ID 1000 misc content
ID 1001 misc content
This command:
grep "xyz" file.log | awk '{ print $2 }' > f.log; grep `cat f.log` file.log;
returns this:
ID 1000 xyz occured
ID 1000 misc content
ID 1000 status code: 26348931276572174
ID 1000 misc content
It looks for "xyz" in file.log places the result in f.log. Then greps for that ID in file.log. If the outer grep returns multiple ID numbers, then the inner grep will only search the first ID number and error out on the others.

Grep - returning both the line number and the name of the file

I have a number of log files in a directory. I am trying to write a script to search all the log files for a string and echo the name of the files and the line number that the string is found.
I figure I will probably have to use 2 grep's - piping the output of one into the other since the -l option only returns the name of the file and nothing about the line numbers. Any insight in how I can successfully achieve this would be much appreciated.
Many thanks,
$ grep -Hn root /etc/passwd
combining -H and -n does what you expect.
If you want to echo the required informations without the string :
$ grep -Hn root /etc/passwd | cut -d: -f1,2
or with awk :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd
if you want to create shell variables :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd | bash
$ echo $line
$ echo $file
Use -H. If you are using a grep that does not have -H, specify two filenames. For example:
grep -n pattern file /dev/null
My version of grep kept returning text from the matching line, which I wasn't sure if you were after... You can also pipe the output to an awk command to have it ONLY print the file name and line number
grep -Hn "text" . | awk -F: '{print $1 ":" $2}'

Sorting in bash

I have been trying to get the unique values in each column of a tab delimited file in bash. So, I used the following command.
cut -f <column_number> <filename> | sort | uniq -c
It works fine and I can get the unique values in a column and its count like
105 Linux
55 MacOS
500 Windows
What I want to do is instead of sorting by the column value names (which in this example are OS names) I want to sort them by count and possibly have the count in the second column in this output format. So It will have to look like:
Windows 500
MacOS 105
Linux 55
How do I do this?
cut -f <col_num> <filename>
| sort
| uniq -c
| sort -r -k1 -n
| awk '{print $2" "$1}'
The sort -r -k1 -n sorts in reverse order, using the first field as a numeric value. The awk simply reverses the order of the columns. You can test the added pipeline commands thus (with nicer formatting):
pax> echo '105 Linux
55 MacOS
500 Windows' | sort -r -k1 -n | awk '{printf "%-10s %5d\n",$2,$1}'
Windows 500
Linux 105
MacOS 55
cut -f <column_number> <filename> | sort | uniq -c | awk '{ print $2" "$1}' | sort
This will alter the column order (awk) and then just sort the output.
Hope this will help you
Using sed based on Tagged RE:
cut -f <column_number> <filename> | sort | uniq -c | sort -r -k1 -n | sed 's/\([0-9]*\)[ ]*\(.*\)/\2 \1/'
Doesn't produce output in a neat format though.
