Extracting a specific file name from multiple .tar.gz files to build one .csv file - linux

another newbie question..please bear with me.
I have a multiple .tar.gz files that contain the same XX.log file ( Named the same in each .tar.gz file ).
I need to extract only that specific XX.log file from each .tar.gz file and then append them in on list file named DataByDate.csv
I've tried multiple ways to accomplish this in one line:
zcat /tmp/jhoney/DATA.2015-10-09* | tar --extract --file=XX.log | perl -lne '/.{0,0}2015-10-09.{0,30}/ $$ print $&' >/tmp/jhoney/DataByDate.csv
This returns the error :
tar: XX.log: Cannot open:No such file or directory
tar: Error is not recoverable: existing now.
Any idea's?

You need to read man tar. I think you need something more like this:
for t in /tmp/jhoney/DATA.2015-10-09*;do tar -zxOf $t XX.log |
perl -lne '/.{0,0}2015-10-09.{0,30}/ && print $&';done >/tmp/jhoney/DataByDate.csv
Also, {0,0} doesn't seem to make sense. And if you really meant "append", the redirect should maybe be >> instead of just >.

Related

Sort files according to their filetype

After an HD problem and some work, I have a bunch of files with names like "f1234", "f1235", etc.
My goal is to sort this files according to their filetype. For example, I want to move all the PDF files in the "pdfs" directory.
For one file, I can do : "file f1234", and if it's a PDF, I can "mv f1234 pdfs/". But I have thousands of file... Can you help me with a bash or zsh command for sort all the PDF in one pass ? Thanks
The hard part here is reliably turning the output of file into a directory name. I think probably the best candidate for that is the mime-type of the file rather than the human readable output of file. I'd use something like:
mkdir sorted
for f in f*
do
d=$(file -b --mime-type "$f" | tr / -)
mkdir -p "sorted/$d"
mv "$f" "sorted/$d/"
done
Obviously I'd test that out a bit before running it on your files, but something pretty close to that should work.

Bash loop to gunzip file and remove file extension and file prefixes

I have several .vcf.gz files:
subset_file1.vcf.vcf.gz
subset_file2.vcf.vcf.gz
subset_file3.vcf.vcf.gz
I want to gunzip these file and rename them (remove subset_ and redudant .vcf extension in one go and get these files:
file1.vcf
file2.vcf
file3.vcf
This is the script I have tried:
iFILES=/file/path/*.gz
for i in $iFILES;
do gunzip -k $i > /get/in/this/dir/"${i##*/}"
done
Since you have to three operation at your output path name
1.remove the directory part
2.remove prefix subset_
3.remove redudant extension .vcf
It's hard to accomplish with only one command.
Following is a modification version. Be CAREFUL to try it. I didn't test it thorough in my computer.
for i in /file/path/*.gz;
do
# get the output file name
o=$(echo ${i##*/} | sed 's/.*_\(.*\)\(\.[a-z]\{3\}\)\{2\}.*/\1\2/g')
gunzip -k $i > /get/in/this/dir/$o
done

bash: using unzip to change filename within archive

Given the following archive file20150101.zip, which contains only one file, file1115.txt, how would I go about renaming the file inside to match the name of the archive (file1115.txt to file20150101.txt)?
I've tried, unsuccessfully, to use unzip -u (most likely due to a misunderstanding of how it should work).
I doubt you can do this directly. You probably need to recreate the zip archive.
syncname() {
zip=$1
bare=${1%.zip}
new=$bare.txt
unzip "$zip"
mv file*.txt "$new"
rm "$zip"
zip "$zip" "$new"
}
syncname file20150101.zip
If you wanted to be more robust, handle zips with multiple files, etc. you could work on parsing the date stamp in the original name (with date) and finding the matching .txt file for that stamp.

How to extract first few lines from a csv file inside a tar file without extracting it in linux?

I have a tar file which has lot of csv files in it.
How to get the first few lines of each csv file without extracting it?
I tried:
$(tar -Oxf $tarfile $file | head -n "$NL") >> cdn.log
But got error saying:
time(http:index: command not found
This is some line in one of the csv files. Similar errors are reported for all csv files...
Any idea??
Using -O you can tell tar to extract a file to standard output instead of to file. So you should be able to first use tar tf <YOUR_FILE> to list the files from archive and filter it using grep to find the CSV files, and then for each file use tar xf <YOUR_FILE> <NAME_OF_CSV> -O | head to get the file's beginning to stdout. This may be a bit ineffective since you unpack the archive as many tiems as there are CSV files, but should work.
You can use perl and its Archive::Tar module. Here a one-liner that extract the first two lines of each one:
perl -MArchive::Tar -E '
for (Archive::Tar->new(shift)->get_files) {
say (join qq|\n|, (split /\n/, $_->get_content, 3)[0..1])
}
' file.tar
It assumes that the tar file only has text files and they are csv. Otherwise you will have to grep the list to filter those you want.

Combine files in one

Currently I am in this directory-
/data/real/test
When I do ls -lt at the command prompt. I get like below something-
REALTIME_235000.dat.gz
REALTIME_234800.dat.gz
REALTIME_234600.dat.gz
REALTIME_234400.dat.gz
REALTIME_234200.dat.gz
How can I consolidate the above five dat.gz files into one dat.gz file in Unix without any data loss. I am new to Unix and I am not sure on this. Can anyone help me on this?
Update:-
I am not sure which is the best way whether I should unzip each of the five file then combine into one? Or
combine all those five dat.gz into one dat.gz?
If it's OK to concatenate files content in random order, then following command will do the trick:
zcat REALTIME*.dat.gz | gzip > out.dat.gz
Update
This should solve order problem:
zcat $(ls -t REALTIME*.dat.gz) | gzip > out.dat.gz
What do you want to happen when you gunzip the result? If you want the five files to reappear, then you need to use something other than the gzip (.gz) format. You would need to either use tar (.tar.gz) or zip (.zip).
If you want the result of the gunzip to be the concatenation of the gunzip of the original files, then you can simply cat (not zcat or gzcat) the files together. gunzip will then decompress them to a single file.
cat [files in whatever order you like] > combined.gz
Then:
gunzip combined.gz
will produce an output that is the concatenation of the gunzip of the original files.
The suggestion to decompress them all and then recompress them as one stream is completely unnecessary.

Resources