How to insert an offset to hexdump with xxd? - linux

Is there an easy way to add an offset to the hex dump generated by xxd ?
i.e instead of
0000: <data>
0004: <data>
0008: <data>
I should get
Offset+0000: <data>
Offset+0004: <data>
Offset+0008: <data>

xxd now appears to come with offset support, using -o [offset]
for example: xxd -o 0x07d20000 file.bin
My version of xxd on Gentoo Linux has it, but I dug deeper to help folks on other distros:
xxd V1.10 27oct98 by Juergen Weigert -- Do not use the xxd version -- I have found this source code without the offset support!! So I tracked down where my binary comes from:
app-editors/vim-core-7.4.769 -- So apparently, as long as you have a modern VIM installed, you can reap the benefits of the added offset support; at least on Gentoo, but I'm steering you in the right direction.
If you find that your distro still ships an older xxd, considering manually compiling a newer VIM that you confirm has offset support.

This is what I am doing now..It works perfectly but its kind of lame approach for just adding an offset :)
xxd file.bin | xxd -r -s 0x2e00000 | xxd -s 0x2e00000 > file.hex

Reading your comment below:
I want the first byte of binary file to be present at the offset. i.e Just add an offset without seeking.
makes me believe the only way to do this is parsing the output and modifying it in order to add the desired offset.
I didn't found anything in the docs that would allow this to be done easily, sorry. :(

If you can live with AWK here's a proof of concept:
$ xxd random.bin | gawk --non-decimal-data ' # <-- treat 0x123 like numbers
> {
> offset = 0x1000 # <-- your offset, may be hex of dec
>
> colon = index($0, ":") - 1
> x = "0x" substr($0, 1, colon) # <-- add 0x prefix to original offset ...
> sub(/^[^:]*/, "") # <-- ... and remove it from line
>
> new = x + offset # <-- possible thanks to --non-decimal-data
> printf("%0"colon"x", new) # <-- print updated offset ...
> print # <-- ... and the rest of line
> }'
0001000: ac48 df8c 2dbe a80c cd03 06c9 7c9d fe06 .H..-.......|...
0001010: bd9b 02a1 cf00 a5ae ba0c 8942 0c9e 580d ...........B..X.
0001020: 6f4b 25a6 6c72 1010 8d5e ffe0 17b5 8f39 oK%.lr...^.....9
0001030: 34a3 6aef b5c9 5be0 ef44 aa41 ae98 44b1 4.j...[..D.A..D.
^^^^
updated offsets (+0x1000)
I bet it would be shorter in Perl or Python, but AWK just feels more "script-ish" :-)

Related

How to launch an app in memory on linux system

I read the encrypted file, decrypt it in a buffer. how could I run the decrypted code?
where should I jump to? in DOS, I know, jump to the buffer offset 0x100, that's the code entry point. how about in linux?
thank you
Xian
Try using tail -c (output last K bytes).
Full answer:
First convert from hex to dec (remove the "0x" before converting)
Then, find your input file size. Deduct 0x100
hex="100"
# convert hex to dec
dec=$(echo "obase=10; ibase=16; ${hex}" | bc)
# input_file size in bytes
file_size=$(stat --printf="%s" input_file)
truncated_file_size=$(($file_size - $dec))
tail -c $truncated_file_size input_file > new_file

Find HEX value in file and grep the following value

I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.
Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135
In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15
I have tried with
xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt
and
xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt
But I can't get it working.
Can anybody give me a hint?
As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:
#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
i += len(needle)
print (haystack[i : i + 28])
i = haystack.find(needle, i)
If your grep supports -P parameter then you could simply use the below command.
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A
For 56 chars,
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135
Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.
Adapted from this post: Grep characters before and after match?
I added some blank lines for readability.
VirtualBox:~$ cat data.dat
Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata
VirtualBox:~$ cat test.sh
awk '/5Z\<\/time\>/ {
match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat
VirtualBox:~$ ./test.sh
2011-04-26T14:47:02-14:47:15
VirtualBox:~$
EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.

add text after keyword in bash / shell

I am in the middle of a migration for PTR records from MSoft and I am adjusting the zonefiles for my needs. I have already prepared the zone files so they look like the following:
snapo#jump:~/mike/10$ cat 21.128
102 [AGE:3630582] 1200 PTR host1.domain.company.local.
69 [AGE:3630774] 1200 PTR host2.domain.compan2.local.
[AGE:3630762] 1200 PTR host2.domain.company.local.
80 [AGE:3630774] 1200 PTR hostXX.domain.company.local.
so I have the filename as variable x and I want to achieve the output of the text file to be like this with awk (because I don't think that there is another way in bash). Please no php/python/perl answers, because the script will need to run on different systems and the only language that is supposed to be installed is bash.
Because this is a merge from multiple PTR zones to one, I would have to edit the zone file to look like this:
102.21.128 [AGE:3630582] 1200 PTR host1.domain.company.local.
69.21.128 [AGE:3630774] 1200 PTR host2.domain.compan2.local.
21.128 [AGE:3630762] 1200 PTR host2.domain.company.local.
80.21.128 [AGE:3630774] 1200 PTR hostXX.domain.company.local.
It is also possible that there is no number in the first row "empty" , then it should add it without a dot in front. Do you have an awk sample or any other sample (cut , grep , head, tail, sed)?
Command should replace the strings in the existing file or with a pipe in the output file > editedtextfile.txt or similar.
With sed:
sed 's/^[^[:space:]]\+/&.21.128/' filename
Treating the input as plain text has the advantage of keeping the formatting intact.
For the edited question, this can be expanded to
sed 's/^[^[:space:]]\+/&.21.128/; s/^[[:space:]]/21.128&/' filename
Addendum: If you don't want to repeat the inserted data in the code, then
sed 's/^[^[:space:]]*/&\n21.128/; s/^\n//; s/\n/./' filename
is another approach that uses a little more trickery: It inserts a marker before the new data, removes the marker if there is nothing before it and otherwise replaces it with a dot.
Addendum 2: Using shell variables with sed code is a little tricky and potentially dangerous (because of code injection). If the variable comes from a trustworthy source and is known to not contain any metacharacters, then it is possible to write
sed "s/^[^[:space:]]*/&\n$variable/; s/^\n//; s/\n/./" filename
as #triplee points out in the comments. If $variable contains slashes but no other metacharacters and a character is known that it does not contain, then it is possible to use a different delimiter for the s command:
sed "s#^[^[:space:]]*#&\n$variable#; s/^\n//; s/\n/./" filename
(if it is known that $variable does not contain the character #).
If none of this is the case, deeper magic is required. For example, if $variable is known to be a single line (I suspect that this is the case because otherwise the transformation makes little sense), then it is possible to write
(echo "$variable"; cat filename) | sed '1 { h; d; }; s/^[^[:space:]]*/&\n/; G; s/\(.*\n\)\(.*\)\n\(.*\)/\1\3\2/; s/^\n//; s/\n/./'
This feeds the variable to sed as first line of the input, and then works as follows:
1 { h; d; } # first line: hold, don't print
s/^[^[:space:]]*/&\n/ # after that: Insert marker as before
G # fetch variable from the hold buffer
s/\(.*\n\)\(.*\)\n\(.*\)/\1\3\2/ # move it to the right place
s/^\n// # rest as before.
s/\n/./
However, at this point you may want to consider using awk instead, which has better facilities to deal with shell variables (that is to say, you can use them without treating them as code):
awk -v var="$variable" '{ n = match($0, /[ \t]/); print substr($0, 1, n - 1) (n <= 1 ? "" : ".") var substr($0, n) }' filename
The -v var="$variable" makes a variable var known to the awk code that has the value of $variable", and the awk code then works as follows:
{
# find the first space or tab in the line (0 if none)
# (I would use [[:space:]] here, but there are commonly shipped versions
# of mawk that don't understand POSIX character classes, so for portability
# I resort to [ \t])
n = match($0, /[ \t]/)
# assemble output line accordingly and print it.
print substr($0, 1, n - 1) (n <= 1 ? "" : ".") var substr($0, n)
}
awk -F" " '{print $1".21.128\t" $2"\t"$3"\t"$4"\t"$5}' $1

Add line feed every 2391 byte

I am using Redhat Linux 6.
I have a file which should comes from mainframe MVS with EBCDIC-ASCII conversion.
(But I suspect some conversion may be wrong)
Anyway, I know that the record length is 2391 byte. There are 10 records and the file size is 23910 byte.
For each 2391 byte record, there are many 0a or 0d char (not CRLF). I want to replace them with, say, # and #.
Also, I want to add a LF (i.e.0a) every 2391 byte so as to make the file become a normal unix text file for further processing.
I have try to use
dd ibs=2391 obs=2391 if=emyfile of=myfile.new
But, this cannot work. Both files are the same.
I also try
dd ibs=2391 obs=2391 if=myfile | awk '{print $0}'
But, this also not work
Can anyone help on this ?
Something like this:
#!/bin/bash
for i in {0..9}; do
dd if=emyfile bs=2391 count=1 skip=$i | LC_CTYPE=C tr '\r\n' '##'
echo
done > newfile
If your files are longer, you will need more than 10 iterations. I would look to handle that by running an infinite looop and exiting the loop on error, like this:
#!/bin/bash
i=0
while :; do
dd if=emyfile bs=2391 count=1 skip=$i | LC_CTYPE=C tr '\r\n' '##'
[ ${PIPESTATUS[0]} -ne 0 ] && break
echo
((i++))
done > newfile
However, on my iMac under OSX, dd doesn't seem to exit with an error when you go past end of file - maybe try your luck on your OS.
You could try
$ dd bs=2391 cbs=2391 conv=ascii,unblock if=emyfile of=myfile.new
conv=ascii converts from EBCDIC to ASCII. conv=unblock inserts a newline at the end of each cbs-sized block (after removing trailing spaces).
If you already have a file in ASCII and just want to replace some characters in it before splitting the blocks, you could use tr(1). For example, the following will replace each carriage return with '#' and each newline (linefeed) with '#':
$ tr '\r\n' '##' < emyfile | dd bs=2391 cbs=2391 conv=unblock of=myfile.new

Binary grep on Linux?

Say I have generated the following binary file:
# generate file:
python -c 'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]' > mydata.bin
# get file size in bytes
stat -c '%s' mydata.bin
# 14
And say, I want to find the locations of all zeroes (0x00), using a grep-like syntax.
The best I can do so far is:
$ hexdump -v -e "1/1 \" %02x\n\"" mydata.bin | grep -n '00'
1: 00
2: 00
3: 00
4: 00
9: 00
12: 00
However, this implicitly converts each byte in the original binary file into a multi-byte ASCII representation, on which grep operates; not exactly the prime example of optimization :)
Is there something like a binary grep for Linux? Possibly, also, something that would support a regular expression-like syntax, but also for byte "characters" - that is, I could write something like 'a(\x00*)b' and match 'zero or more' occurrences of byte 0 between bytes 'a' (97) and 'b' (98)?
EDIT: The context is that I'm working on a driver, where I capture 8-bit data; something goes wrong in the data, which can be kilobytes up to megabytes, and I'd like to check for particular signatures and where they occur. (so far, I'm working with kilobyte snippets, so optimization is not that important - but if I start getting some errors in megabyte long captures, and I need to analyze those, my guess is I would like something more optimized :) . And especially, I'd like something where I can "grep" for a byte as a character - hexdump forces me to search strings per byte)
EDIT2: same question, different forum :) grepping through a binary file for a sequence of bytes
EDIT3: Thanks to the answer by #tchrist, here is also an example with 'grepping' and matching, and displaying results (although not quite the same question as OP):
$ perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin
ca000000cb000000cc000000cd000000ce # Matched data (hex)
66357 # Offset (dec)
To have the matched data be grouped as one byte (two hex characters) each, then "H2 H2 H2 ..." needs to be specified for as many bytes are there in the matched string; as my match '.....\0\0\0\xCC\0\0\0.....' covers 17 bytes, I can write '"H2"x17' in Perl. Each of these "H2" will return a separate variable (as in a list), so join also needs to be used to add spaces between them - eventually:
$ perl -ln0777e 'print join(" ", unpack("H2 "x17,$1)), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin
ca 00 00 00 cb 00 00 00 cc 00 00 00 cd 00 00 00 ce
66357
Well.. indeed Perl is very nice 'binary grepping' facility, I must admit :) As long as one learns the syntax properly :)
This seems to work for me:
grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>
Short form:
grep -obUaP "<\x-hex pattern>" <file>
Example:
grep -obUaP "\x01\x02" /bin/grep
Output (Cygwin binary):
153: <\x01\x02>
33210: <\x01\x02>
53453: <\x01\x02>
So you can grep this again to extract offsets. But don't forget to use binary mode again.
Someone else appears to have been similarly frustrated and wrote their own tool to do it (or at least something similar): bgrep.
One-Liner Input
Here’s the shorter one-liner version:
% perl -ln0e 'print tell' < inputfile
And here's a slightly longer one-liner:
% perl -e '($/,$\) = ("\0","\n"); print tell while <STDIN>' < inputfile
The way to connect those two one-liners is by uncompiling the first one’s program:
% perl -MO=Deparse,-p -ln0e 'print tell'
BEGIN { $/ = "\000"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
chomp($_);
print(tell);
}
Programmed Input
If you want to put that in a file instead of a calling it from the command line, here’s a somewhat more explicit version:
#!/usr/bin/env perl
use English qw[ -no_match_vars ];
$RS = "\0"; # input separator for readline, chomp
$ORS = "\n"; # output separator for print
while (<STDIN>) {
print tell();
}
And here’s the really long version:
#!/usr/bin/env perl
use strict;
use autodie; # for perl5.10 or better
use warnings qw[ FATAL all ];
use IO::Handle;
IO::Handle->input_record_separator("\0");
IO::Handle->output_record_separator("\n");
binmode(STDIN); # just in case
while (my $null_terminated = readline(STDIN)) {
# this just *past* the null we just read:
my $seek_offset = tell(STDIN);
print STDOUT $seek_offset;
}
close(STDIN);
close(STDOUT);
One-Liner Output
BTW, to create the test input file, I didn’t use your big, long Python script; I just used this simple Perl one-liner:
% perl -e 'print 0.0.0.0.2.4.6.8.0.1.3.0.5.20' > inputfile
You’ll find that Perl often winds up being 2-3 times shorter than Python to do the same job. And you don’t have to compromise on clarity; what could be simpler that the one-liner above?
Programmed Output
I know, I know. If you don’t already know the language, this might be clearer:
#!/usr/bin/env perl
#values = (
0, 0, 0, 0, 2,
4, 6, 8, 0, 1,
3, 0, 5, 20,
);
print pack("C*", #values);
although this works, too:
print chr for #values;
as does
print map { chr } #values;
Although for those who like everything all rigorous and careful and all, this might be more what you would see:
#!/usr/bin/env perl
use strict;
use warnings qw[ FATAL all ];
use autodie;
binmode(STDOUT);
my #octet_list = (
0, 0, 0, 0, 2,
4, 6, 8, 0, 1,
3, 0, 5, 20,
);
my $binary = pack("C*", #octet_list);
print STDOUT $binary;
close(STDOUT);
TMTOWTDI
Perl supports more than one way to do things so that you can pick the one that you’re most comfortable with. If this were something I planned to check in as school or work project, I would certainly select the longer, more careful versions — or at least put a comment in the shell script if I were using the one-liners.
You can find documentation for Perl on your own system. Just type
% man perl
% man perlrun
% man perlvar
% man perlfunc
etc at your shell prompt. If you want pretty-ish versions on the web instead, get the manpages for perl, perlrun, perlvar, and perlfunc from http://perldoc.perl.org.
The bbe program is a sed-like editor for binary files. See documentation.
Example with bbe:
bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin
11:x00 x00 xcc x00 x00 x00 xcd x00 x00 x00 xce
Explanation
-b search pattern between //. each 2 byte begin with \x (hexa notation).
-b works like this /pattern/:length (in byte) after matched pattern
-s similar to 'grep -o' suppress unmatched output
-e similar to 'sed -e' give commands
-e 'F d' display offsets before each result here: '11:'
-e 'p h' print results in hexadecimal notation
-e 'A \n' append end-of-line to each result
You can also pipe it to sed to have a cleaner output:
bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin | sed -e 's/x//g'
11:00 00 cc 00 00 00 cd 00 00 00 ce
Your solution with Perl from your EDIT3 give me an 'Out of memory'
error with large files.
The same problem goes with bgrep.
The only downside to bbe is that I don't know how to print context that precedes a matched pattern.
One way to solve your immediate problem using only grep is to create a file containing a single null byte. After that, grep -abo -f null_byte_file target_file will produce the following output.
0:
1:
2:
3:
8:
11:
That is of course each byte offset as requested by "-b" followed by a null byte as requested by "-o"
I'd be the first to advocate perl, but in this case there's no need to bring in the extended family.
What about grep -a? Not sure how it works on truly binary files but it works well on text files that the OS thinks is binary.

Resources