Search multiple strings from file in multiple files in specific column and output the count in unix shell scripting

Search multiple strings from file in multiple files in specific column and output the count in unix shell scripting - string

I have searched extensively on the internet about this but haven't found much details.
Problem Description:
I am using aix server.
I have a pattern.txt file that contains customer_id for 100 customers in the following sample format:
160471231
765082023
75635713
797649756
8011688321
803056646
I have a directory (/home/aswin/temp) with several files (1.txt, 2.txt, 3.txt and so on) which are pipe(|) delimited. Sample format:
797649756|1001|123270361|797649756|O|2017-09-04 23:59:59|10|123769473
803056646|1001|123345418|1237330|O|1999-02-13 00:00:00|4|1235092
64600123|1001|123885297|1239127|O|2001-08-19 00:00:00|10|1233872
75635713|1001|123644701|75635713|C|2006-11-30 00:00:00|11|12355753
424346821|1001|123471924|12329388|O|1988-05-04 00:00:00|15|123351096
427253285|1001|123179704|12358099|C|2012-05-10 18:00:00|7|12352893
What I need to do search all the strings from pattern.txt file in all files in the directory, in first column of each file and list each filename with number of matches. so if same row has more than 1 match it should be counted as 1.
So the output should be something like (only the matches in first column should count):
1.txt:4
2.txt:3
3.txt:2
4.txt:5
What I have done till now:
cd /home/aswin/temp
grep -srcFf ./pattern.txt * /dev/null >> logfile.txt
This is giving the output in the desired format, but it searching the strings in all columns and not just first column. So the output count is much more than expected.
Please help.

If you want to do that with grep, you must change the pattern.
With your command, you search for pattern in /dev/null and the output is /dev/null:0
I think you want 2>/dev/null but this is not needed because you tell -s to grep.
Your pattern file is in the same directory so grep search in it and output pattern.txt:6
All your files are in the same directory so the -r is not needed.
You put the logfile in the same directory, so the second time you run the command grep search in it and output logfile.txt:0
If you can modify the pattern file, you write each line like ^765082023|
and you rename this file without .txt
So this command give you what you look for.
grep -scf pattern *.txt >>logfile
If you can't modify the pattern file, you can use awk.
awk -F'|' '
NR==FNR{a[$0];next}
FILENAME=="pattern.txt"{next}
$1 in a {b[FILENAME]++}
END{for(i in b){print i,":",b[i]}}
' pattern.txt *.txt >>logfile.txt

Related

List each file that doesn't match a pattern recursively

Tried the following command, it lists all the lines including file names
which are not matching the given pattern.
grep -nrv "^type.* = .*"
"But what we need is list of file names in a folder with content
which does not have even a single occurrence of above pattern."
Your help will be really appreciated.

You need the -L option:
grep -rL '^type.* = .*' directory_name
From the GNU grep manual:
-L, - -files-without-match
    Suppress normal output; instead print the name of each input file from which no output    would normally have been printed. The scanning will stop on the first match.

Merge Files and Prepend Filename and Directory

I need to merge files in a directory and include the directory, filename, and line number in each line of the output. I've found many helpful posts about including the filename and line number but not the directory name. Grep -n gets line numbers and I've seen some find commands that get some of the other parts but I can't seem to pull them all together. (I'm using Ubuntu for all of the data processing.)
Imagine two files in directory named "8". (Each directory in the data I have is a number. The data were provided that way.)
file1.txt
JohnPaulGeorgeRingo
file2.txt
MickKeefBillBrianCharlie
The output should look like this:
8:file1.txt:1:John8:file1.txt:2:Paul8:file1.txt:3:George8:file1.txt:4:Ringo8:file2.txt:1:Mick8:file2.txt:2:Keef8:file2.txt:3:Bill8:file2.txt:4:Brian8:file2.txt:5:Charlie
The separators don't have to be colons. Tabs would work just fine.
Thanks much!

If it's just one directory level deep you could try something like so. We go into each directory, print each line with its number and then append the directory name to the front with sed:
$ for x in `ls`; do
(cd $x ; grep -n . *) | sed -e 's/^/'$x:'/g'
done
1:c.txt:2:B
1:c.txt:3:C
2:a.txt:1:A
2:a.txt:2:B

grep from a input file, multiple lines while the input file has ^name

I would really appreciate some help with this:
I have a huge file, I will give you an example of how it is formatted:
name:lastname:email
I have a input file with lots of names set out like this example:
edward
michael
jenny
I want to match to name column from the huge file to the name in the input file, and only if it is an exact match (case insensitive)
Once it finds a match I want it to output a .txt with all of the matchs
I think I can use a command something like ^Michael: to give it.
Can anyone help me with this grep problem?
sorry if I am not too clear its very late and I have been on this problem for ages
"Centos 5, "grep -i -E -f file.txt /root/dir2search >out.txt"
file.txt containing
^michael:
^bobert:
^billy:
Doesn't find anything.

grep -i -E -f inputfile namesfile > outputfile will do what you want, if your input file consists of one input name per line, in the pattern you already suggested:
^Michael:
^Jane:
^Tom:
-i: case-insensitive matching
-E: regexp pattern matching (often the default, but I don't know how your environment is set up)
-f: read patterns from a file, one pattern per line
>: redirect the output to a file
To get the existing input file you described (space-separated names) into the new format, you could use:
sed -r 's/([^ ]+)[ $]?/^\1:\n/g;s/\n$//g' inputfile > newinputfile

grep -f on files in a zipped folder

I am performing a recursive fgrep/grep -f search on a zipped up folder using the following command in one of my programs:
The command I am using:
grep -r -i -z -I -f /path/to/pattern/file /home/folder/TestZipFolder.zip
Inside the pattern file is the string "Dog" that I am trying to search for.
In the zipped up folder there are a number of text files containing the string "Dog".
The grep -f command successfully finds the text files containing the string "Dog" in 3 files inside the zipped up folder, but it prints the output all on one line and some strange characters appear at the end i.e PK (as shown below). And when I try and print the output to a file in my program other characters appear on the end such as ^B^T^#
Output from the grep -f command:
TestZipFolder/test.txtThis is a file containing the string DogPKtest1.txtDog, is found again in this file.PKTestZipFolder/another.txtDog is written in this file.PK
How would I get each of the files where the string "Dog" has been found to print on a new line so they are not all grouped together on one line like they are now?
Also where are the "PK" and other strange characters appearing from in the output and how do i prevent them from appearing?
Desired output
TestZipFolder/test.txt:This is a file containing the string Dog
TestZipFolder/test1.txt:Dog, is found again in this file
TestZipFolder/another.txt:Dog is written in this file
Something along these lines, whereby the user is able to see where the string can be found in the file (you actually get the output in this format if you run the grep command on a file that is not a zip file).

If you need a multiline output, better use zipgrep :
zipgrep -s "pattern" TestZipFolder.zip
the -s is to suppress error messages(optional). This command will print every matched lines along with the file name. If you want to remove the duplicate names, when more than one match is in a file, some other processing must be done using loops/grep or awk or sed.
Actually, zipgrep is a combination egrep and unzip. And its usage is as follows :
zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...]
so you can pass any egrep options to it.

Comparing part of a filename from a text file to filenames from a directory (grep + awk)

This is not exactly the easiest one to explain in a title.
I have a file inputfile.txt that contains parts of filenames:
file1.abc
filed.def
fileq.lmn
This file is an input file that I need to use to find the full filenames of an actual directory. The ends of the filenames are different from case to case, but part of them is always the same.
I figured that I could grep text from the input file to the ls command in said directory (or the ls command to a simple text file), and then use awk to output my full desired result, but I'm having some trouble doing that.
file1.abc is read from the input file inputfile.txt
It's checked against the directory contents.
If the file exists, specific directories based on the filename are created.
(I'm also in a Busybox environment.. I don't have a lot at my disposal)
Something like this...
cat lscommandoutput.txt \
| awk -F: '{print("mkdir" system("grep $0"); inputfile.txt}' \
| /bin/sh
Thank you.
Edit: My apologies for not being clear on this.
The output should be the full filename of each line found in lscommandoutput.txt using the inputfile.txt to grep those specific lines.
If inputfile.txt contains:
file1.abc
filed.def
fileq.lmn
and lscommandoutput.txt contains:
file0.oba.ca-1.fil
file1.abc.de-1.fil
filed.def.com-2.fil
fileh.jkl.open-1.fil
fileq.lmn.he-2.fil
The extra lines that aren't contained in the inputfile.txt are ignored. The ones that are in the inputfile.txt have a directory created for them with the name that got grepped from lscommandoutput.txt.
/dir/dir2/file1.abc.de-1.fil/ <-- directory in which files can be placed in
/dir/dir2/filed.def.com-2.fil/
/dir/dir2/fileq.lmn.he-2.fil/
Hopefully that is a little bit clearer.

First, you win a useless use of cat award
Secondly, you've explained this really badly. If you can't describe the problem clearly in plain English it's not surprising you are having trouble turning it into a script or set of commands.
grep -f is a good way to get the directory names, but I don't understand what you want to do with them afterwards.
My problem now is using the outputted file with the one file I want to put the folders
Wut? What does "the one file I want to put the folders" mean? Where does the file come from? Is it the file named in inputlist.txt? Does it go in the directory that it matched?
If you just want to create the directories you can do:
fgrep -f ./inputfile.txt ./lscommandoutput.txt | xargs mkdir
N.B. you probably want fgrep so that the input strings aren't treated as regular expressions and regex metacharacters such as . are ignored.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string