How to extract first few lines from a csv file inside a tar file without extracting it in linux? - linux

I have a tar file which has lot of csv files in it.
How to get the first few lines of each csv file without extracting it?
I tried:
$(tar -Oxf $tarfile $file | head -n "$NL") >> cdn.log
But got error saying:
time(http:index: command not found
This is some line in one of the csv files. Similar errors are reported for all csv files...
Any idea??

Using -O you can tell tar to extract a file to standard output instead of to file. So you should be able to first use tar tf <YOUR_FILE> to list the files from archive and filter it using grep to find the CSV files, and then for each file use tar xf <YOUR_FILE> <NAME_OF_CSV> -O | head to get the file's beginning to stdout. This may be a bit ineffective since you unpack the archive as many tiems as there are CSV files, but should work.

You can use perl and its Archive::Tar module. Here a one-liner that extract the first two lines of each one:
perl -MArchive::Tar -E '
for (Archive::Tar->new(shift)->get_files) {
say (join qq|\n|, (split /\n/, $_->get_content, 3)[0..1])
}
' file.tar
It assumes that the tar file only has text files and they are csv. Otherwise you will have to grep the list to filter those you want.

Related

Bash loop to gunzip file and remove file extension and file prefixes

I have several .vcf.gz files:
subset_file1.vcf.vcf.gz
subset_file2.vcf.vcf.gz
subset_file3.vcf.vcf.gz
I want to gunzip these file and rename them (remove subset_ and redudant .vcf extension in one go and get these files:
file1.vcf
file2.vcf
file3.vcf
This is the script I have tried:
iFILES=/file/path/*.gz
for i in $iFILES;
do gunzip -k $i > /get/in/this/dir/"${i##*/}"
done
Since you have to three operation at your output path name
1.remove the directory part
2.remove prefix subset_
3.remove redudant extension .vcf
It's hard to accomplish with only one command.
Following is a modification version. Be CAREFUL to try it. I didn't test it thorough in my computer.
for i in /file/path/*.gz;
do
# get the output file name
o=$(echo ${i##*/} | sed 's/.*_\(.*\)\(\.[a-z]\{3\}\)\{2\}.*/\1\2/g')
gunzip -k $i > /get/in/this/dir/$o
done

Extracting a specific file name from multiple .tar.gz files to build one .csv file

another newbie question..please bear with me.
I have a multiple .tar.gz files that contain the same XX.log file ( Named the same in each .tar.gz file ).
I need to extract only that specific XX.log file from each .tar.gz file and then append them in on list file named DataByDate.csv
I've tried multiple ways to accomplish this in one line:
zcat /tmp/jhoney/DATA.2015-10-09* | tar --extract --file=XX.log | perl -lne '/.{0,0}2015-10-09.{0,30}/ $$ print $&' >/tmp/jhoney/DataByDate.csv
This returns the error :
tar: XX.log: Cannot open:No such file or directory
tar: Error is not recoverable: existing now.
Any idea's?
You need to read man tar. I think you need something more like this:
for t in /tmp/jhoney/DATA.2015-10-09*;do tar -zxOf $t XX.log |
perl -lne '/.{0,0}2015-10-09.{0,30}/ && print $&';done >/tmp/jhoney/DataByDate.csv
Also, {0,0} doesn't seem to make sense. And if you really meant "append", the redirect should maybe be >> instead of just >.

Keep most x files and delete all others from directory

I found the slimier post from STO but those does not filter files with extension. So writing again.
I an writing a shell script to keep last (most latest) 3 .txt files in directory and wants to remove all other .txt files.
For Example... In Directory "Home" I have following files.
test.txt
my.txt
image.jpg
test.avi
sample.txt
country.txt
study.txt
When I run linux script, output should be like as below....
Keep File (keep only last 3 .txt files only)
test.txt
my.txt
image.jpg
test.avi
sample.txt
Delete File
country.txt
study.txt
Thanks
List entries by ctime (newest first), skip the first three items, delete the rest:
ls -c *.txt | tail -n +4 | xargs rm

print content of more than one file in a zip archive

I have some zip files that are really large and I want to print them without extracting first. I am using zcat and zless to do that and then I redirect the output to a different application. When my zip file contains more than one text file I receive the following error:
zcat tweets.zip >a
gzip: tweets.zip has more than one entry--rest ignored
How can I do what I want with zip files that contain more than one text file?
You can do this to output a file without extracting:
$ unzip -p <zip_file> <file_to_print>
For example:
$ unzip -p MyEar.ear META-INF/MANIFEST.MF
As cur4so mentioned you can also list all files using:
$ unzip -l <zip_file>
Use the -p option of unzip to pipe the output. Multiple files are concatenated. The -c option does the same thing, but includes the file name in front of each file.
If you just want to see a list of files in your zip archive use:
unzip -l tweets.zip
if you want to extract just some file:
unzip tweets.zip file-of-interest-as-it-is-pointed-in-the-archive
if you want something else, could you clarify your question?

Combine files in one

Currently I am in this directory-
/data/real/test
When I do ls -lt at the command prompt. I get like below something-
REALTIME_235000.dat.gz
REALTIME_234800.dat.gz
REALTIME_234600.dat.gz
REALTIME_234400.dat.gz
REALTIME_234200.dat.gz
How can I consolidate the above five dat.gz files into one dat.gz file in Unix without any data loss. I am new to Unix and I am not sure on this. Can anyone help me on this?
Update:-
I am not sure which is the best way whether I should unzip each of the five file then combine into one? Or
combine all those five dat.gz into one dat.gz?
If it's OK to concatenate files content in random order, then following command will do the trick:
zcat REALTIME*.dat.gz | gzip > out.dat.gz
Update
This should solve order problem:
zcat $(ls -t REALTIME*.dat.gz) | gzip > out.dat.gz
What do you want to happen when you gunzip the result? If you want the five files to reappear, then you need to use something other than the gzip (.gz) format. You would need to either use tar (.tar.gz) or zip (.zip).
If you want the result of the gunzip to be the concatenation of the gunzip of the original files, then you can simply cat (not zcat or gzcat) the files together. gunzip will then decompress them to a single file.
cat [files in whatever order you like] > combined.gz
Then:
gunzip combined.gz
will produce an output that is the concatenation of the gunzip of the original files.
The suggestion to decompress them all and then recompress them as one stream is completely unnecessary.

Resources