grep -f on files in a zipped folder - linux

I am performing a recursive fgrep/grep -f search on a zipped up folder using the following command in one of my programs:
The command I am using:
grep -r -i -z -I -f /path/to/pattern/file /home/folder/TestZipFolder.zip
Inside the pattern file is the string "Dog" that I am trying to search for.
In the zipped up folder there are a number of text files containing the string "Dog".
The grep -f command successfully finds the text files containing the string "Dog" in 3 files inside the zipped up folder, but it prints the output all on one line and some strange characters appear at the end i.e PK (as shown below). And when I try and print the output to a file in my program other characters appear on the end such as ^B^T^#
Output from the grep -f command:
TestZipFolder/test.txtThis is a file containing the string DogPKtest1.txtDog, is found again in this file.PKTestZipFolder/another.txtDog is written in this file.PK
How would I get each of the files where the string "Dog" has been found to print on a new line so they are not all grouped together on one line like they are now?
Also where are the "PK" and other strange characters appearing from in the output and how do i prevent them from appearing?
Desired output
TestZipFolder/test.txt:This is a file containing the string Dog
TestZipFolder/test1.txt:Dog, is found again in this file
TestZipFolder/another.txt:Dog is written in this file
Something along these lines, whereby the user is able to see where the string can be found in the file (you actually get the output in this format if you run the grep command on a file that is not a zip file).

If you need a multiline output, better use zipgrep :
zipgrep -s "pattern" TestZipFolder.zip
the -s is to suppress error messages(optional). This command will print every matched lines along with the file name. If you want to remove the duplicate names, when more than one match is in a file, some other processing must be done using loops/grep or awk or sed.
Actually, zipgrep is a combination egrep and unzip. And its usage is as follows :
zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...]
so you can pass any egrep options to it.

Related

List each file that doesn't match a pattern recursively

Tried the following command, it lists all the lines including file names
which are not matching the given pattern.
grep -nrv "^type.* = .*"
"But what we need is list of file names in a folder with content
which does not have even a single occurrence of above pattern."
Your help will be really appreciated.
You need the -L option:
grep -rL '^type.* = .*' directory_name
From the GNU grep manual:
-L, - -files-without-match
    Suppress normal output; instead print the name of each input file from which no output    would normally have been printed. The scanning will stop on the first match.

Search multiple strings from file in multiple files in specific column and output the count in unix shell scripting

I have searched extensively on the internet about this but haven't found much details.
Problem Description:
I am using aix server.
I have a pattern.txt file that contains customer_id for 100 customers in the following sample format:
160471231
765082023
75635713
797649756
8011688321
803056646
I have a directory (/home/aswin/temp) with several files (1.txt, 2.txt, 3.txt and so on) which are pipe(|) delimited. Sample format:
797649756|1001|123270361|797649756|O|2017-09-04 23:59:59|10|123769473
803056646|1001|123345418|1237330|O|1999-02-13 00:00:00|4|1235092
64600123|1001|123885297|1239127|O|2001-08-19 00:00:00|10|1233872
75635713|1001|123644701|75635713|C|2006-11-30 00:00:00|11|12355753
424346821|1001|123471924|12329388|O|1988-05-04 00:00:00|15|123351096
427253285|1001|123179704|12358099|C|2012-05-10 18:00:00|7|12352893
What I need to do search all the strings from pattern.txt file in all files in the directory, in first column of each file and list each filename with number of matches. so if same row has more than 1 match it should be counted as 1.
So the output should be something like (only the matches in first column should count):
1.txt:4
2.txt:3
3.txt:2
4.txt:5
What I have done till now:
cd /home/aswin/temp
grep -srcFf ./pattern.txt * /dev/null >> logfile.txt
This is giving the output in the desired format, but it searching the strings in all columns and not just first column. So the output count is much more than expected.
Please help.
If you want to do that with grep, you must change the pattern.
With your command, you search for pattern in /dev/null and the output is /dev/null:0
I think you want 2>/dev/null but this is not needed because you tell -s to grep.
Your pattern file is in the same directory so grep search in it and output pattern.txt:6
All your files are in the same directory so the -r is not needed.
You put the logfile in the same directory, so the second time you run the command grep search in it and output logfile.txt:0
If you can modify the pattern file, you write each line like ^765082023|
and you rename this file without .txt
So this command give you what you look for.
grep -scf pattern *.txt >>logfile
If you can't modify the pattern file, you can use awk.
awk -F'|' '
NR==FNR{a[$0];next}
FILENAME=="pattern.txt"{next}
$1 in a {b[FILENAME]++}
END{for(i in b){print i,":",b[i]}}
' pattern.txt *.txt >>logfile.txt

Dynamic searching and string copying in bash

I use mailget for a home-made "backup" system, which backs pre-specified files up when receiving a mail containing the string "backup" by using the following search command:
$ grep -rnw '/path/to/mailbox/' -e "backup"
I want to extract a mailaddress to a variable $var looking like this whereas the string "Return-Path: " (13 chars), always is static in the beginning of each mail file as following:
Return-Path: <someone#domain.com>
In conclusion: When a file containing the string "backup" is detected under a given path, the script is supposed to extract the mailaddress from the regarded file to $var.
Can't get my head around this one, grateful for any help.
The natural mechanism for capturing the output of a command in a variable is "command substitution". The syntax for a command substitution is $( <the command> ); it expands to the standard output of the specified command.
The standard lightweight general tools appropriate for extracting text from a file such as yours are sed and awk. You can also use grep's -l option to make it emit the name of the file wherein it found a match, rather than the match itself. You might put those together something like this:
var=$(sed -n -e '/^Return-Path:/ {s/.*<\(.*\)>.*/\1/;p;q}' $(grep -rlw '/path/to/mailbox/' -e "backup"))
The nested command substitution obtains the names of the files containing the target string; the sed command processes those files and extracts (only) the text between the < and > on the first line starting with "Return-Path:". It makes some assumptions that render it shorter but less robust; my objective is merely to demonstrate, not to write production-quality code for you.

linux script shell : grep a part of path in a list of path

In my script shell, i have 2 files. The first one is a file containing only names of files with part of the path :
list1:
aaa/bbb/file1.ext
ccc/ddd/file2.ext
eee/fff/file3.ext
The second one is a list of every files of the extension ".ext" with the absolute path before them:
list2:
/home/.../aaa/bbb/file1.ext
...
...
...
/home/...ccc/ddd/file2.ext
...
And I am trying to extract the lines of the second file list2, containing the lines of the first one with grep.
For now I tried :
while read line
do
grep "$line" "list1"
done < list2
But this command doesn't ouptut anything, however the command
grep "aaa/bbb/file1.ext" "list1"
have the output I am waiting for
/home/.../aaa/bbb/file1.ext
Anyone sees what I am missing on this script? Thanks
This is one of the cases where -f option from grep comes very handy:
grep -f f1 f2
For your given input returns:
/home/.../aaa/bbb/file1.ext
/home/...ccc/ddd/file2.ext
From man grep:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by
POSIX.)

Comparing part of a filename from a text file to filenames from a directory (grep + awk)

This is not exactly the easiest one to explain in a title.
I have a file inputfile.txt that contains parts of filenames:
file1.abc
filed.def
fileq.lmn
This file is an input file that I need to use to find the full filenames of an actual directory. The ends of the filenames are different from case to case, but part of them is always the same.
I figured that I could grep text from the input file to the ls command in said directory (or the ls command to a simple text file), and then use awk to output my full desired result, but I'm having some trouble doing that.
file1.abc is read from the input file inputfile.txt
It's checked against the directory contents.
If the file exists, specific directories based on the filename are created.
(I'm also in a Busybox environment.. I don't have a lot at my disposal)
Something like this...
cat lscommandoutput.txt \
| awk -F: '{print("mkdir" system("grep $0"); inputfile.txt}' \
| /bin/sh
Thank you.
Edit: My apologies for not being clear on this.
The output should be the full filename of each line found in lscommandoutput.txt using the inputfile.txt to grep those specific lines.
If inputfile.txt contains:
file1.abc
filed.def
fileq.lmn
and lscommandoutput.txt contains:
file0.oba.ca-1.fil
file1.abc.de-1.fil
filed.def.com-2.fil
fileh.jkl.open-1.fil
fileq.lmn.he-2.fil
The extra lines that aren't contained in the inputfile.txt are ignored. The ones that are in the inputfile.txt have a directory created for them with the name that got grepped from lscommandoutput.txt.
/dir/dir2/file1.abc.de-1.fil/ <-- directory in which files can be placed in
/dir/dir2/filed.def.com-2.fil/
/dir/dir2/fileq.lmn.he-2.fil/
Hopefully that is a little bit clearer.
First, you win a useless use of cat award
Secondly, you've explained this really badly. If you can't describe the problem clearly in plain English it's not surprising you are having trouble turning it into a script or set of commands.
grep -f is a good way to get the directory names, but I don't understand what you want to do with them afterwards.
My problem now is using the outputted file with the one file I want to put the folders
Wut? What does "the one file I want to put the folders" mean? Where does the file come from? Is it the file named in inputlist.txt? Does it go in the directory that it matched?
If you just want to create the directories you can do:
fgrep -f ./inputfile.txt ./lscommandoutput.txt | xargs mkdir
N.B. you probably want fgrep so that the input strings aren't treated as regular expressions and regex metacharacters such as . are ignored.

Resources