Print the first line of each file inside a tar.gz without extracting - linux

I'm looking for a command in order to print the first line of every file contained in a tar.gz archive, without extracting it.
Example:
tar -ztvf MyArchive.tar.gz
-rw-r--r-- root/root 3732541752 2020-04-04 03:24 FILE1.TXT
-rw-r--r-- root/root 90493394 2020-04-04 03:16 FILE2.TXT
-rw-r--r-- root/root 103294570 2020-04-03 21:06 FILE3.TXT
-rw-r--r-- root/root 16865694 2020-04-03 21:07 FILE4.TXT
-rw-r--r-- root/root 13176227988 2020-04-03 23:36 FILE5.TXT
I need to print the first line of each FILE*.TXT inside the tar.gz
How can I achieve this?

You could achieve using tar and for loop commands.
for i in $(tar -ztvf MyArchive.tar.gz|grep -i file|awk '{print $NF}')
do
tar xfO MyArchive.tar.gz $i|head -1
done
Using "tar xfO MyArchive.tar.gz filename" to read the content of files inside tar.gz

Try this:
tar zxf MyArchive.tar.gz --to-command="head -n 1"
This command takes files in the tar individually and feeds them into the command "head -n 1".

Related

How to tar a branch of file tree?

I currently have some files and directories at this path:
/var/tmp/mydir/
I want to tar the whole path, excluding any other content in 'var' and 'tmp'.
Example:
$ ls /var
tmp
dir1 *(exclude)*
file1 *(exclude)*
$ ls /var/tmp
mydir
dir2 *(exclude)*
file2 *(exclude)*
$ ls /var/tmp/mydir
tarme1
tarme2
tarme3
In this case, I want to tar the directory tree /var/tmp/mydir and the content of 'mydir'.
Use tar -cf <archive_name>.tar /var/tmp/mydir which will give you what you need.
Use man tar to get more help (should be quite easy to understand).
If you want to modify your path some other way consider using -C switch. From man:
-C, --directory DIR
change to directory DIR
Do
tar -c --recursion --file backup.tar tmp/mydir
and
tar -tvf backup.tar
gives me :
drwxrwxr-x ssam/ssam 0 2016-05-02 12:02 tmp/mydir/
-rw-rw-r-- ssam/ssam 0 2016-05-02 12:02 tmp/mydir/tarme3
-rw-rw-r-- ssam/ssam 0 2016-05-02 12:02 tmp/mydir/tarme1
-rw-rw-r-- ssam/ssam 0 2016-05-02 12:02 tmp/mydir/tarme2
which is what you need. You can extract/restore it using
tar -xf backup.tar -C /var
Remember this will overwrite the files in mydir

How to delete HDFS folder having windows special characters (^M) in the name

I wrote a shell script to create hdfs folders in windows 7 and ran on Linux server. Now, hdfs folders got created but with special character ^M at the end of the name(probably carriage return). It doesn't show up in Linux but i can see when the 'ls' output is redirected to a file.
I should have run dos2unix before running this script. However now I am not able to delete folders with ^M. Could someone assist on how to delete these folders.
Just a supplementary answers fo #SachinJ.
TL;DR
$ hdfs dfs -rm -r -f $(hdfs dfs -ls /path/to/dir | sed '<LINE_NUMBER>q;d' | awk '{print $<FILE_NAME_COLUM_NUMBER>}')
should be replace to line number of file you want to delete in the output of hdfs dfs -ls /path/to/dir.
Here is the example.
Details
Suppose your hdfs dir like this
$ hdfs dfs -ls /path/to/dir
Found 5 items
drwxr-xr-x - test supergroup 0 2019-08-22 10:41 /path/to/dir/dir1
drwxr-xr-x - test supergroup 0 2019-07-11 15:35 /path/to/dir/dir2
drwxr-xr-x - test supergroup 0 2019-07-05 17:53 /path/to/dir/dir3
drwxr-xr-x - test supergroup 0 2019-08-22 11:28 /path/to/dir/dirtodelete
drwxr-xr-x - test supergroup 0 2019-07-26 11:07 /path/to/dir/dir4
When you ls from it, the screen output looks just ok.
But you can't select it
$ hdfs dfs -ls /path/to/dir/dirtodelete
ls: `/path/to/dir/dirtodelete': No such file or directory
$ hdfs dfs -ls /path/to/dir/dirtodelete*
ls: `/path/to/dir/dirtodelete*': No such file or directory
What's more, when output ls result to file and use vim to read, it shows like following
$ hdfs dfs -ls /path/to/dir > tmp
$ vim tmp
Found 5 items
drwxr-xr-x - test supergroup 0 2019-08-22 10:41 /path/to/dir/dir1
drwxr-xr-x - test supergroup 0 2019-07-11 15:35 /path/to/dir/dir2
drwxr-xr-x - test supergroup 0 2019-07-05 17:53 /path/to/dir/dir3
drwxr-xr-x - test supergroup 0 2019-08-22 11:28 /path/to/dir/dirtodelete^M^M
drwxr-xr-x - test supergroup 0 2019-07-26 11:07 /path/to/dir/dir4
What is "^M", it's a CARRIAGE RETURN (CR). More info here
Linux \n(LF) eq to Windows \r\n(CRLF)
This problem occurs edit same file in Windows and Linux.
So, we just to use correct filename, then we can delete it .But it can't be copy from the screen.
Here sed command works!
ls output as following
$ hdfs dfs -ls /path/to/dir
Found 5 items
drwxr-xr-x - test supergroup 0 2019-08-22 10:41 /path/to/dir/dir1
drwxr-xr-x - test supergroup 0 2019-07-11 15:35 /path/to/dir/dir2
drwxr-xr-x - test supergroup 0 2019-07-05 17:53 /path/to/dir/dir3
drwxr-xr-x - test supergroup 0 2019-08-22 11:28 /path/to/dir/dirtodelete
drwxr-xr-x - test supergroup 0 2019-07-26 11:07 /path/to/dir/dir4
the filename is on line 5
so hdfs dfs -ls /path/to/dir | sed '5q;d' will cut the line we need.
sed '5q;d' means read the first 5 line and quit, delete former lines, so it selects 5th line.
Then we can use awk the select filename column, index form 1, so column number is 8.
so just write the command
$ hdfs dfs -ls /path/to/dir/ | sed '5q;d' | awk '{print $8}'
/path/to/dir/dirtodelete
Then we can delete it.
$ hdfs dfs -rm -r -f $(hdfs dfs -ls /path/to/dir/ | sed '5q;d' | awk '{print $8}')
Sometimes wildchar may not work ( rm filename* ), better use the below option.
rm -r $(ls | sed '<LINE_NUMER>q;d')
Replace with line number in the output of ls command.

how to get the number records along with file size in unix

I am trying to get the file details and the number of records for each file along with size.
i tried with this ls -lhtr 234*201406*.log.gz it is giving all the details except record count. if i tried ls -lhtr 234*201406*.log.gz | wc -l it is showing the number of files.
present o/p:
-rw-r--r-- 1 jenkins tomcat 120M Jun 30 18:25 234_1404165601_20140630220001.log.gz
-rw-r--r-- 1 jenkins tomcat 144M Jun 30 19:24 234_1404169201_20140630230001.log.gz
i need o/p as
-rw-r--r-- 1 jenkins tomcat 120M Jun 30 18:25 234_1404165601_20140630220001.log.gz 20000
can you please help me on this to get.thanks in advance.
You can use zcat (or gunzip -c) for printing # of lines from .gz files:
find . -name '*.gz' -exec bash -c 'f="$1"; du -h "$f"; zcat "$f" | wc -l' - '{}' \;

tar creates an empty folder, how do i get rid of it?

I have the following line
tar -c -v -z -f "$ARCHIVE_PATH/$3_$fileYear$fileMonth.tar.gz" -C "$ARCHIVE_PATH/tmp" .
where
$ARCHIVE_PATH = /opt/colorado/archive/
$3 = IMPORT
$fileYear = 2014
$fileMonth = 06
so the line creates a .tar.gz file called IMPORT_201406.tar.gz in /opt/colorado/archive/ from the files located in /opt/colorado/archive/tmp/
however when i use tar -ztvf "opt/colorado/archive/IMPORT_201406.tar.gz" i see this
-rwxr-xr-x root/root 27 2014-06-04 14:20 ./afile.txt
drwxr-xr-x root/root 0 2014-06-04 14:08 ./opt/
drwxr-xr-x root/root 0 2014-06-04 14:08 ./opt/colorado/
drwxr-xr-x root/root 0 2014-06-04 14:08 ./opt/colorado/archive/
drwxrwxr-x [USER]/[USER] 0 2014-06-04 14:09 ./opt/colorado/archive/tmp/
-rwxr-xr-x root/root 712 2014-06-04 14:20 ./twofile.txt
-rwxr-xr-x root/root 383 2014-06-04 14:20 ./random.cvs
-rwxr-xr-x root/root 27 2014-06-04 14:20 ./helloworld.sh
-rwxr-xr-x root/root 7938 2014-06-04 14:20 ./helloworld.py
from my understanding, if i didn't have -C the /opt/colorado/archive/tmp/ would have been added to every file so adding -C tells tar to move to that directory first, i can see in the list of files however why is the /opt/colorado/archive/tmp/ folder added and is there a way to remove it?
It's adding the directory to the archive because you asked it to. Specifically, you told tar to archive the directory ., so that's what it does. Computers tend to be literal.
If you don't want the directory archived, you'll have to pass the filenames in the directory to tar. Here's one way to do that:
(cd "$ARCHIVE_PATH/tmp"; ls) |
tar -cvzf "$ARCHIVE_PATH/$3_$fileYear$fileMonth.tar.gz" -C "$ARCHIVE_PATH/tmp" -T-
Alternatively, you can execute tar from the directory with the files:
cd "$ARCHIVE_PATH/tmp"; tar -cvzf "$ARCHIVE_PATH/$3_$fileYear$fileMonth.tar.gz" *
I don't know why you get the directory included with its full path, rather than just ., and why it also includes the parent directories in the archive. The version of tar on my system (tar (GNU tar) 1.26) doesn't seem to do that.
This command will generate the tar file:
tar -cvf new.tar.gz my_dir_to_tar/
where new.tar.gz = tar file name
and my_dir_to_tar/ = directory to make a tar

cp command to overwrite the destination file which is a symbolic link

Does cp command have any option to overwrite the destination file which is a symbolic link?
The problem is as follows:
[dthnguyen#dthnguyen test]$ ls -l
total 8
-rw-rw-r--. 1 dthnguyen dthnguyen 5 Feb 21 09:07 a.txt
lrwxrwxrwx. 1 dthnguyen dthnguyen 7 Feb 21 08:55 b.txt -> ./a.txt
-rw-rw-r--. 1 dthnguyen dthnguyen 5 Feb 21 08:55 c.txt
[dthnguyen#dthnguyen test]$ cp c.txt b.txt
After do the copy, a.txt has the content of c.txt, b.txt still links to a.txt. The expected result is a.txt holds the old content, b.txt is a new regular file that has the same content as c.txt.
Tell cp to remove it first.
cp --remove-destination c.txt b.txt

Resources