Confusion while extracting required part of a file using awk

Confusion while extracting required part of a file using awk - linux

I have a script making use of awk,sed,grep and other shell features.
I have stuck at a place so need your help ...
This is the input file for the my problem
udit#udit-Dabba ~/ah $ cat decrypt.txt
60 00 00 00 00 17 3a 20 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 02 *00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69 6e* 00
00 00 03 29
My purpose is to extract 00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69 6e from the above mentioned file
,also marked between *'s above
Although obvious but these *'s are shown to clear the situation here , they are not actually present in the file.
The last five units of the file as shown above are ..
00 00 00 03 29
These 00 are simple pad bytes and 03 specify their pad length
and now here is the part of script to extract the required part :
size=`wc -w decrypt.txt`
padlen=3 // calculated by some other mechanism
awk -v size=$size -v padlen=$padlen 'BEGIN {RS=" ";ORS=" ";} {if (NR > 40
&& NR <=size-padlen-2) print $0}' decrypt.txt | sed '1,1s/ //'
output :
00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69
My problem :
last unit 6e missing
Also tried through terminal ...
size=68,padlen=3 so loop should go from NR=40 to NR<=63
udit#udit-Dabba ~/ah $ awk 'BEGIN {RS=" ";ORS=" ";} {if (NR > 40 && NR <= 65)
print $0}' decrypt.txt | sed '1,1s/ //'
00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69 6e 00
00
Working fine if loop goes upto 65.So should also work upto 63
udit#udit-Dabba ~/ah $ awk 'BEGIN {RS=" ";ORS=" ";} {if (NR > 40 && NR <= 64)
print $0}' decrypt.txt | sed '1,1s/ //'
00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69 6e
But what is this ???? when I decrease 65 to 64 , there is loss of two 00 units.Why this is happening ???
Also tried this one but could not find a reason why this weird output.
udit#udit-Dabba ~/ah $ awk 'BEGIN {RS="[ \n]";ORS=" ";} {if (NR > 40
&& NR <=65)print $0}' decrypt.txt | sed '1,1s/ //'
0002 00 00 e0 f9 6a 61 61 6e 65 6b 61 68 61 6e 67 61 79 65 77 6f 64
Plase help me out ...
May be I have explained the problem more than the required but really need it .
I am new to all these shell and awk things and so there may be a silly mistake which I could not find out .
Please help me on this ..
Thnx in advance ..
EDIT :
60 00 00 00 00 17 3a 20 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 02
These are fixed 40 units of ipv6 header,will always remain same.
The portion between *'s is of variable length that is why I need to work in that way otherwise it would have been a simple task .

_padlen=3 _length=23
awk '{
for (i = NF - l - p - 2; i < NF - p - 2; i++)
printf "%s", ($i (i < NF - p - 2 - 1 ? OFS : ORS))
}' l="$_length" p="$_padlen" RS= ORS='\n' decrypt.txt

I made some small changes in the code and able to get till 6e*
size=68; padlen=3 ;awk -v size=$size -v padlen=$padlen 'BEGIN {RS=" ";ORS=" ";} {if (NR > 40 && NR <=size-padlen-1) print $0}' decrypt.txt | sed '1,1s/ //'
I made size as 68 becos wc wis printing the size and file name and you have to remove it when u are passing the same to the awk script.
Note: I havent understood your requirement fully

If I understand the problem as being: discard the first 40 values and the last n values (where n is the padding + 2 i.e. in this case 3 + 2 = 5), this might work:
header=40;padding=5;
tr -d '\n' <decrypt.txt |
sed -r 's/\s+/ /g;s/^(\S+\s+){'"$header"'}//;s/(\S+\s*){'"$padding"'}$//'
The trick is to unroll the data and then pick the bits you want.

Related

Vim: calling xxd with system command in substitution results in conversion error

Background is that I have a log file that contains hex dumps that I want to convert with xxd to get that nice ASCII column that shows possible strings in the binary data.
The log file format looks like this:
My interesting hex dump:
00 53 00 6f 00 6d 00 65 00 20 00 74 00 65 00 78
00 74 00 20 00 65 00 78 00 61 00 6d 00 70 00 6c
00 65 00 20 00 75 00 73 00 69 00 6e 00 67 00 20
00 55 00 54 00 46 00 2d 00 31 00 36 00 20 00 69
00 6e 00 20 00 6f 00 72 00 64 00 65 00 72 00 20
00 74 00 6f 00 20 00 67 00 65 00 74 00 20 00 30
00 78 00 30 00 30 00 20 00 62 00 79 00 74 00 65
00 73 00 2e
Visually selecting the hex dump and do xxd -r -p followed by a xxd -g1 on the result does exactly what I'm aiming for.
However, since the number of dumps I want to convert are quite a few I would rather automate the process.
So I'm using the following substitute command to do the conversion:
:%s/\(\x\{2\} \?\)\{16\}\_.*/\=system('xxd -g1',system('xxd -r -p',submatch(0)))
The expression matches the entire hex dump in the log file. The match is sent to xxd -r -p as stdin and its output is used as stdin for xxd -g1.
Well, that's the idea at least.
The thing is that the above almost works. It produces the following result:
My interesting hex dump:
00000000: 01 53 01 6f 01 6d 01 65 01 20 01 74 01 65 01 78 .S.o.m.e. .t.e.x
00000010: 01 74 01 20 01 65 01 78 01 61 01 6d 01 70 01 6c .t. .e.x.a.m.p.l
00000020: 01 65 01 20 01 75 01 73 01 69 01 6e 01 67 01 20 .e. .u.s.i.n.g.
00000030: 01 55 01 54 01 46 01 2d 01 31 01 36 01 20 01 69 .U.T.F.-.1.6. .i
00000040: 01 6e 01 20 01 6f 01 72 01 64 01 65 01 72 01 20 .n. .o.r.d.e.r.
00000050: 01 74 01 6f 01 20 01 67 01 65 01 74 01 20 01 30 .t.o. .g.e.t. .0
00000060: 01 78 01 30 01 30 01 20 01 62 01 79 01 74 01 65 .x.0.0. .b.y.t.e
00000070: 01 73 01 2e .s..
All 00 bytes have mysteriously transformed into 01.
It should have produced the following:
My interesting hex dump:
00000000: 00 53 00 6f 00 6d 00 65 00 20 00 74 00 65 00 78 .S.o.m.e. .t.e.x
00000010: 00 74 00 20 00 65 00 78 00 61 00 6d 00 70 00 6c .t. .e.x.a.m.p.l
00000020: 00 65 00 20 00 75 00 73 00 69 00 6e 00 67 00 20 .e. .u.s.i.n.g.
00000030: 00 55 00 54 00 46 00 2d 00 31 00 36 00 20 00 69 .U.T.F.-.1.6. .i
00000040: 00 6e 00 20 00 6f 00 72 00 64 00 65 00 72 00 20 .n. .o.r.d.e.r.
00000050: 00 74 00 6f 00 20 00 67 00 65 00 74 00 20 00 30 .t.o. .g.e.t. .0
00000060: 00 78 00 30 00 30 00 20 00 62 00 79 00 74 00 65 .x.0.0. .b.y.t.e
00000070: 00 73 00 2e .s..
What am I not getting here?
Of course I can use macros and other ways of doing this, but I want to understand why my substitution command doesn't do what I expect.
Edit:
For anyone that want to achieve the same thing I provide the substitution expression that works on an entire file. The expression above was only for testing purposes using the log file example also from above.
The one below is the one that performs a correct conversion, modified based on the information Kent provided in his answer.
:%s/\(\(\x\{2\} \)\{16\}\_.\)\+/\=system('xxd -p -r | xxd -g1',submatch(0))

very likely, the problem is string conversion in the system() The input will be converted into a string by vim, so does the output of your first xxd command.
You can try to extract that hex parts into a file. then:
xxd -r -p theFile|vim -
And then calling the system('xxd -g1', alltext), you are gonna get something else than 00 too.
This doesn't work in the same way of a pipe (xxd ...|xxd...). But unfortunately, the system() function doesn't accept pipes.
If you want to fix your :s command, you need to call systemlist() on your first xxd call to get the data in binary format, then pass it to the 2nd xxd:
:%s/\(\x\{2\} \?\)\{16\}\_.*/\=system('xxd -g1',systemlist('xxd -r -p',submatch(0)))
The cmd above will generate the 00s. since there is no string conversion.
However, when working with some data format other than plain string, perhaps we can use filters instead of calling system(). It would be a lot eaiser. For your example:
2,$!xxd -r -p|xxd -g1

How to create a gzip file w/ FEXTRA & FCOMMENT fields

I have a need to test if a program that I'm writing is parsing the gzip header correctly, and that includes reading the FEXTRA, FNAME, and FCOMMENT fields. Yet it seems that gzip doesn't support creating archives with the FEXTRA and FCOMMENT fields -- only FNAME. Are there any existing tools which can do all three of these?

The Perl module IO::Compress::Gzip optionally lets you set the three fields you are intrested in. (Fair disclosure: I am the author of the module)
Here is some sample code that sets FNAME to "filename", FCOMMENT to "This is a comment" and creates an FEXTRA field with a single subfield with ID "ab" and value "cde".
use IO::Compress::Gzip qw(gzip $GzipError);
gzip \"payload" => "/tmp/test.gz",
Name => "filename",
Comment => "This is a comment",
ExtraField => [ "ab" => "cde"]
or die "Cannot create gzip file: $GzipError" ;
And here is a hexdump of the file it created.
00000000 1f 8b 08 1c cb 3b 3a 5a 00 03 07 00 61 62 03 00 |.....;:Z....ab..|
00000010 63 64 65 66 69 6c 65 6e 61 6d 65 00 54 68 69 73 |cdefilename.This|
00000020 20 69 73 20 61 20 63 6f 6d 6d 65 6e 74 00 2b 48 | is a comment.+H|
00000030 ac cc c9 4f 4c 01 00 15 6a 2c 42 07 00 00 00 |...OL...j,B....|
0000003f

unable to unzip zip file in linux centos

I am unable to unzip file in linux centos. Getting following error
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.

As you are mentioning jar in your comments we can consider this a programming question ;-)
First of all you should try to validate your file. If available you can even compare the checksum provided for this file and / or the filesize with the location you downloaded it from.
To verify the zip file on a low level you can use this command:
hexdump -C -n 100 file.zip
This will show you the first 100 bytes of the zips structure which will look similar to this:
00000000 50 4b 03 04 0a 00 00 00 00 00 88 43 65 47 11 7a |PK.........CeG.z|
00000010 39 1e 15 00 00 00 15 00 00 00 0e 00 1c 00 66 69 |9.............fi|
00000020 6c 65 31 69 6e 7a 69 70 2e 74 78 74 55 54 09 00 |le1inzip.txtUT..|
00000030 03 0f 05 3b 56 2f 05 3b 56 75 78 0b 00 01 04 e8 |...;V/.;Vux.....|
00000040 03 00 00 04 e8 03 00 00 54 68 69 73 20 69 73 20 |........This is |
00000050 61 20 66 69 6c 65 0a 1b 5b 31 37 7e 0a 50 4b 03 |a file..[17~.PK.|
00000060 04 0a 00 00 |....|
The first two byte of the file have to be PK, if not the file is invalid. Some bytes later you will find the name of the first file stored. In this example it is file1inzip.txt.

DNS txt record invalid packet, FORMERR

I'm having trouble with my home made "for fun" nameserver. It's been a couple of months since I updated it so I'm a bit rusty and thought I'd ask here and see if someone else sees what's wrong. I'm getting a FORMERR when asking for a TXT record, and the same problem occur on different domains, so there's probably something wrong in the packet formatting. Anyone?
dig txt ffffff.com #ns1.ffffff.com
;; Got bad packet: FORMERR
1024 bytes
ce bf 84 00 00 01 00 01 00 02 00 00 06 66 66 66 .............fff
66 66 66 03 63 6f 6d 00 00 10 00 01 c0 0c 00 10 fff.com.........
00 01 00 00 02 58 00 13 12 57 65 6c 63 6f 6d 65 .....X...Welcome
20 74 6f 20 66 66 66 66 66 66 66 00 c0 0c 00 02 .to.fffffff.....
00 01 00 00 02 58 00 10 03 6e 73 31 06 66 66 66 .....X...ns1.fff
66 66 66 03 63 6f 6d 00 c0 0c 00 02 00 01 00 00 fff.com.........
02 58 00 10 03 6e 73 32 06 66 66 66 66 66 66 03 .X...ns2.ffffff.
63 6f 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 com.............

In the example above supplied, I added an incorrect 00 (null terminator) at the end of the TXT-string. After removing the null terminator from the TXT records, the txt records now work on my nameserver.

Extract last and second last strings of a file in shell variables

Although it is looking similar to my previous post but here purpose is different.
udit#udit-Dabba ~/ah $ cat decrypt.txt
60 00 00 00 00 17 3a 20 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 02 *00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69 6e* 00
00 00 03 29
I want to extract last string of the file (here it is 29) in a shell varaible
I tried this ...
size=`wc -w encrypt.txt`
awk -v size=$size 'BEGIN {RS=" ";ORS=" ";}' {if (NR>size-1 &&
NR < size+1)print $0}' decrypt.txt
Output :
29
But when I changed the file slightly ..
udit#udit-Dabba ~/ah $ cat decrypt.txt
60 00 00 00 00 17 3a 20 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 02 *00 00 e0 f9 6a 61 61 6e
65 6b 61 68 61 6e 67 61 79 65 77 6f 64 69 6e* 00
65 6b 61 68 61 6e 67 61 00 00 03 29
Output :
03
Why there is discrepency between the results ??
I am new to awk and shell features so I am not sure whether it is a right way to do so or not ???
I think there should be some variation of grep,sed,awk or any other linux command which may solve my problem but I am not aware of it.
Please guide me for this.
Thanx in advance.
Purpose :
Make two variables in a shell script which should store last and second last strings of an input file.
Limitation :
Every input file contains a blank line at the end of file.
(Like in above mentioned file , after the file contents there would be one more blank line just like hitting ENTER key and that can not be changed because it is being generated through a C program at run time.)

grep -v "^$" file | tr " " "\n" | tail -n 2
Maybe the grep-part isn't perfect and maybe should change.
Edit
tr -s " " "\n" < file | tail -n 2
is better solution - see Gordon Davisson's comment.

To get the last field:
awk '{ if (NF > 0) { last = $NF } } END { print last }' "$#"
The second last field is trickier for the case where there is just one field on the last line (so you need the last field from the line before).
awk '{ if (NF > 0)
{
if (NF == 1) { lastbut1 = last; last = $1; }
else { lastbut1 = $(NF-1); last = $NF; }
}
}
END { print lastbut1 " " last; }' "$#"
This produces a blank and the last value if the file contains but one value. It produces just a blank if there are no values at all.

If you consider the record separator to be space or newline, then you just need to keep the last 2 records.
awk -v 'RS=[ \n]+' '{prev2 = prev1; prev1 = $0} END {print prev2, prev1}' filename

FIRST="$(head -n 1 file)"
LAST="$(tail -n 1 file)"
LASTBUTONE="$(tail -n 2 file | head -n 1)"
naturally, you can cut off the last field in a variety of ways:
echo "$ONEOFTHOSE" | gawk '{print $(NF)}'
echo "$ONEOFTHOSE" | sed -e 's/^.*[[:space:]]//'

Here's a tr/sed solution:
answers=$(tr -d '\n' <input_file | sed -r 's/.*(\S\S)\s*(\S\S)\s*$/\1 \2/')
echo "Last = ${answers#???} Penultimate = ${answers%???}"
Sed only:
answers=$(sed -r '1{h;d};H;${x;y/\n/ /;s/.*(\S\S)\s*(\S\S)\s*$/\1 \2/p};d' input_file)
echo "Last = ${answers#???} Penultimate = ${answers%???}"

If you got 'rev' utility installed below one would be handy. Presuming that space is the delimiter.
rev <file>|cut -f1,2|rev

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Confusion while extracting required part of a file using awk - linux

_padlen=3 _length=23 awk '{ for (i = NF - l - p - 2; i < NF - p - 2; i++) printf "%s", ($i (i < NF - p - 2 - 1 ? OFS : ORS)) }' l="$_length" p="$_padlen" RS= ORS='\n' decrypt.txt

Related

Vim: calling xxd with system command in substitution results in conversion error

How to create a gzip file w/ FEXTRA & FCOMMENT fields

unable to unzip zip file in linux centos

DNS txt record invalid packet, FORMERR

Extract last and second last strings of a file in shell variables

Categories

Resources