how to search for a particular string from a .gz file? - linux

I want to search for a particular string from a .gz file containing a text file without extracting in linux terminal. I know how to search for a string from a text file using grep "text to search" ./myfile.txt. But how to make it work for .gz files?

You can use zgrep. Usage is similar to grep.
zgrep "pattern" file.gz
From the man page's description:
Zgrep invokes grep on compressed or gzipped files. All options
specified are passed directly to grep. If no file is specified, then
the standard input is decompressed if necessary and fed to grep.
Otherwise the given files are uncompressed if necessary and fed to
grep.

gunzip -c mygzfile.gz | grep "string to be searched"
But this would only work if the .gz file contains text file which is true in your case.

Already it have been answered, but it could be really helpful if you want to search in multiple .gz files.
For searching in all the .gz files in a specific folder you can use
zgrep "yourString" *

Related

How to replace a string in multiple files in multiple subfolders with different file extensions in linux using command line

I have already followed this query # (How to replace a string in multiple files in linux command line).
My question is rather an extension of the same.
I want to check only specific file extensions in the subfolders also but not every file extension.
What I have already tried:
grep -rli 'old-word' * | xargs -i# sed -i 's/old-word/new-word/g' #
My problem: It is changing in every other file format as well. I want to search and replace only in one file extension.
Please add another answer where I can change the entire line of a file as well not just one word.
Thanks in advance.
Simplest solution is to use complex grep command:
grep -rli --include="*.html" --include=".json" 'old-word' *
The disadvantage of this solution. Is that you do not have clear control which files are scanned.
Better suggesting to tune a find command to locate your desired files.
Using RegExp filtering option -regex to filter file names.
So you verify the correct files are scanned.
Than feed the find command result to grep scanning list.
Example:
Assuming you are looking for file extensions txt pdf html .
Assuming your search path begins in /home/user/data
find /home/user/data -regex ".*\.\(html\|txt\|pdf\)$"
Once you have located your files. It is possible to grep match each file from the the above find command:
grep -rli 'old-word' $( find /home/user/data -regex ".*\.\(html\|txt\|pdf\)$" )

Using sed in red hat linux to replace text

I have some XML files in a directory and they all include the tag: <difficult>0</difficult>. I just want to change that to <difficult>1</difficult>.
I'm using the following command:
sed 's/difficult>0/difficult>1/g' *.xml
All that happens is that the full XML text of all the files gets displayed, with the difficult tag showing a value of 1, but nothing happens to the actual files. When I open them, they still all contain <difficult>0</difficult>.
sed -i 's/difficult>0/difficult>1/g' *.xml
Change a string in a file with sed?
Yes, sed usually puts its result on stdout. To change the files in-place, use the -i flag:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
One more time :)
Don't use regex to parse HTML. Using a proper parser & xpath :
xml ed -L -u '//difficult/text()' -v "1" file
xml is xmlstarlet
Check: RegEx match open tags except XHTML self-contained tags

How to extract first few lines from a csv file inside a tar file without extracting it in linux?

I have a tar file which has lot of csv files in it.
How to get the first few lines of each csv file without extracting it?
I tried:
$(tar -Oxf $tarfile $file | head -n "$NL") >> cdn.log
But got error saying:
time(http:index: command not found
This is some line in one of the csv files. Similar errors are reported for all csv files...
Any idea??
Using -O you can tell tar to extract a file to standard output instead of to file. So you should be able to first use tar tf <YOUR_FILE> to list the files from archive and filter it using grep to find the CSV files, and then for each file use tar xf <YOUR_FILE> <NAME_OF_CSV> -O | head to get the file's beginning to stdout. This may be a bit ineffective since you unpack the archive as many tiems as there are CSV files, but should work.
You can use perl and its Archive::Tar module. Here a one-liner that extract the first two lines of each one:
perl -MArchive::Tar -E '
for (Archive::Tar->new(shift)->get_files) {
say (join qq|\n|, (split /\n/, $_->get_content, 3)[0..1])
}
' file.tar
It assumes that the tar file only has text files and they are csv. Otherwise you will have to grep the list to filter those you want.

Comparing part of a filename from a text file to filenames from a directory (grep + awk)

This is not exactly the easiest one to explain in a title.
I have a file inputfile.txt that contains parts of filenames:
file1.abc
filed.def
fileq.lmn
This file is an input file that I need to use to find the full filenames of an actual directory. The ends of the filenames are different from case to case, but part of them is always the same.
I figured that I could grep text from the input file to the ls command in said directory (or the ls command to a simple text file), and then use awk to output my full desired result, but I'm having some trouble doing that.
file1.abc is read from the input file inputfile.txt
It's checked against the directory contents.
If the file exists, specific directories based on the filename are created.
(I'm also in a Busybox environment.. I don't have a lot at my disposal)
Something like this...
cat lscommandoutput.txt \
| awk -F: '{print("mkdir" system("grep $0"); inputfile.txt}' \
| /bin/sh
Thank you.
Edit: My apologies for not being clear on this.
The output should be the full filename of each line found in lscommandoutput.txt using the inputfile.txt to grep those specific lines.
If inputfile.txt contains:
file1.abc
filed.def
fileq.lmn
and lscommandoutput.txt contains:
file0.oba.ca-1.fil
file1.abc.de-1.fil
filed.def.com-2.fil
fileh.jkl.open-1.fil
fileq.lmn.he-2.fil
The extra lines that aren't contained in the inputfile.txt are ignored. The ones that are in the inputfile.txt have a directory created for them with the name that got grepped from lscommandoutput.txt.
/dir/dir2/file1.abc.de-1.fil/ <-- directory in which files can be placed in
/dir/dir2/filed.def.com-2.fil/
/dir/dir2/fileq.lmn.he-2.fil/
Hopefully that is a little bit clearer.
First, you win a useless use of cat award
Secondly, you've explained this really badly. If you can't describe the problem clearly in plain English it's not surprising you are having trouble turning it into a script or set of commands.
grep -f is a good way to get the directory names, but I don't understand what you want to do with them afterwards.
My problem now is using the outputted file with the one file I want to put the folders
Wut? What does "the one file I want to put the folders" mean? Where does the file come from? Is it the file named in inputlist.txt? Does it go in the directory that it matched?
If you just want to create the directories you can do:
fgrep -f ./inputfile.txt ./lscommandoutput.txt | xargs mkdir
N.B. you probably want fgrep so that the input strings aren't treated as regular expressions and regex metacharacters such as . are ignored.

Combine files in one

Currently I am in this directory-
/data/real/test
When I do ls -lt at the command prompt. I get like below something-
REALTIME_235000.dat.gz
REALTIME_234800.dat.gz
REALTIME_234600.dat.gz
REALTIME_234400.dat.gz
REALTIME_234200.dat.gz
How can I consolidate the above five dat.gz files into one dat.gz file in Unix without any data loss. I am new to Unix and I am not sure on this. Can anyone help me on this?
Update:-
I am not sure which is the best way whether I should unzip each of the five file then combine into one? Or
combine all those five dat.gz into one dat.gz?
If it's OK to concatenate files content in random order, then following command will do the trick:
zcat REALTIME*.dat.gz | gzip > out.dat.gz
Update
This should solve order problem:
zcat $(ls -t REALTIME*.dat.gz) | gzip > out.dat.gz
What do you want to happen when you gunzip the result? If you want the five files to reappear, then you need to use something other than the gzip (.gz) format. You would need to either use tar (.tar.gz) or zip (.zip).
If you want the result of the gunzip to be the concatenation of the gunzip of the original files, then you can simply cat (not zcat or gzcat) the files together. gunzip will then decompress them to a single file.
cat [files in whatever order you like] > combined.gz
Then:
gunzip combined.gz
will produce an output that is the concatenation of the gunzip of the original files.
The suggestion to decompress them all and then recompress them as one stream is completely unnecessary.

Resources