Hex dump to binary data conversion - linux

Need to convert hex dump file to binary data use xxd command (or any other suitable methods that works). The raw hexdump was produced not with xxd.
Tried two variants with different options:
xxd -r input.log binout.bin
xxd -r -p input.log binout.bin
Both methods produce wrong results: first command create binary file size 2.2GB, the second command produce binary file size 82382 bytes, both binary file size mismatch, the expected binary size is 65536 bytes.
part of hex file:
807e0000: 4562 537f e0b1 6477 84bb 6bae 1cfe 81a0 | EbS...dw..k.....
807e0010: 94f9 082b 5870 4868 198f 45fd 8794 de6c | ...+XpHh..E....l
807e0020: b752 7bf8 23ab 73d3 e272 4b02 57e3 1f8f | .R{.#.s..rK.W...
807e0030: 2a66 55ab 07b2 eb28 032f b5c2 9a86 c57b | *fU....(./.....{
807e0040: a5d3 3708 f230 2887 b223 bfa5 ba02 036a | ..7..0(..#.....j
807e0050: 5ced 1682 2b8a cf1c 92a7 79b4 f0f3 07f2 | \...+.....y.....
807e0060: a14e 69e2 cd65 daf4 d506 05be 1fd1 3462 | .Ni..e........4b
What can be the issue here and how to convert data correctly?

After the xxd you need to remove the first and last parts.
$ sed -i 's/^\(.\)\{9\}//g' binary.txt
$ sed -i 's/\(.\)\{16\}$//g' binary.txt
binary.txt is the name of your file after xxd.
After that you can convert it to binary again.
$ for i in $(cat binary.txt) ; do printf "\x$i" ; done > mybinary
After this if you have the original .bin file you can check md5sums of the files to see if they have the same value. If they have same value then the transformation completed succesfully.
$ md5sum originbinary
$ md5sum mybinary
You can cover more details in the first part of this link. https://acassis.wordpress.com/2012/10/21/how-to-transfer-files-to-a-linux-embedded-system-over-serial/

Related

efficient Unix script to base64 decode a file with encrypted data

The file sample below need to decode the third column and output as fourth column
883122374206 883122002074206 UzJocGUUh4c=
883122406445 883122002106445 U5dkVjlVWIc=
883122533096 883122002233096 U0dwcGORRxA=
883122624312 883122002324312 U5OJkFQ1NIc=
883122759484 883122002459484 U4NmgHUwV4c=
883122763589 883122002463589 U4WTAYBQmYc=
883122981968 883122002478427 UyY3QDAAFoI=
883122936510 883122002636510 U4kggBFncxA=
883122326985 883122002666363 U1lwcHcyBBA=
883122330017 883122002668313 U3JlEVRBiIc=
883122339137 883122002673700 UwUiESBIAYc=
883122438696 883122002733023 U1MJgGJgg4c=
883122242176 883122002875188 U4Q3IBFBB0U=
883122230176 883122002883257 U2GUAXdZaIc=
883122532560 883122002232560 U4kVkFBzVhA=
output like
883122374206 883122002074206 UzJocGUUh4c=
883122406445 883122002106445 U5dkVjlVWIc=
883122533096 883122002233096 U0dwcGORRxA=
883122624312 883122002324312 U5OJkFQ1NIc=
883122759484 883122002459484 U4NmgHUwV4c=
883122763589 883122002463589 U4WTAYBQmYc=
883122981968 883122002478427 UyY3QDAAFoI=
883122936510 883122002636510 U4kggBFncxA=
883122326985 883122002666363 U1lwcHcyBBA=
883122330017 883122002668313 U3JlEVRBiIc=
883122339137 883122002673700 UwUiESBIAYc=
883122438696 883122002733023 U1MJgGJgg4c=
883122242176 883122002875188 U4Q3IBFBB0U=
883122230176 883122002883257 U2GUAXdZaIc=
883122532560 883122002232560 U4kVkFBzVhA=
when decoding one item I use script
echo "U2GUAXdZaIc=" | base64 --decode | hexdump -v -e '/1 "%x"' | dd conv=swab status=none;echo
when I decode only column3 using base64 -d below I get one line output file, how can I get line by line output ?
cat DS10_export41.ldif | base64 --decode | hexdump -v -e '/1 "%x"' | dd conv=swab status=none oflag=sync of=DS10_export42.ldif

How to make version-sort command work in a sh file?

I'm trying to use "sort -V" command (aka version-sort) in a sh file.
Specifically, I have the following line of code in a sh file:
SOME_PATH="$(ls dir_1/dir_2/v*/filename.txt | sort -V | tail -n1)"
What I'm trying to accomplish through the above command is that given a list of file paths with different version numbers, I want to get the file path with the greatest version number.
For example, let's assume that I have the following list of file paths:
dir_1/dir_2/v1/filename.txt,
dir_1/dir_2/v2/filename.txt,
dir_1/dir_2/v11/filename.txt
Then, I want the command to return dir_1/dir_2/v11/filename.txt instead of dir_1/dir_2/v2/filename.txt since the former has the greatest version value, "11".
From my understanding the above linux command precisely accomplishes this.
I confirmed it working on the Linux bash terminal.
However, when I run a sh file with the above command in it, I'm getting a
"ERROR: Unknown command line flag 'V'" error message.
Is there a way to make version-sort work in a sh file?
If not, is there a way to implement it not using -V flag?
Thank you.
Using shell's printf and awk:
SOME_PATH=$(printf %s\\0 dir_1/dir_2/v*/filename.txt |
awk 'BEGIN{FS="/";RS="\0";v=0}{match($3,/v([[:digit:]]+)/,m);if(m[1]>v){v=m[1];l=$0}}END{print l}')
Using awk only:
SOME_PATH=$(awk 'BEGIN{delete ARGV[0];v=0;for(i in ARGV){split(ARGV[i],s,"/");match(s[3],/v([[:digit:]]+)/,m);if(m[1]>v){v=m[1];l=ARGV[i]}}}END{print l}' dir_1/dir_2/v*/filename.txt)
Formatted awk script:
#!/usr/bin/env -S awk -f
BEGIN {
delete ARGV[0]
v=0
for (i in ARGV) {
split(ARGV[i], s, "/")
match(s[3], /v([[:digit:]]+)/, m)
if (m[1]>v) {
v=m[1]
l=ARGV[i]
}
}
}
END {
print l
}
Using a null delimited list stream, and not parsing the output of ls 1:
SOME_PATH=$(
printf '%s\0' dir_1/dir_2/v*/filename.txt |
sort -z -t'/' -k3V |
tail -zn1 |
tr -d '\0'
)
How it works:
printf '%s\0' dir_1/dir_2/v*/filename.txt: Expands the paths into a null delimited stream output.
sort -z -t'/' -k3V: Sorts the null delimited input stream on -k3V version number from the 3rd column, -t'/' using / as a delimiter.
tail -zn1: Outputs the least null delimited entry from the input stream.
tr -d '\0': Trim-out any remaining null to prevent the shell from complaining with error: warning: command substitution: ignored null byte in input.
StackExchange: Why not parse ls (and what to do instead)?

sort fasta by sequence size

I currently want to sort a hudge fasta file (+10**8 lines and sequences) by sequence size. fasta is a clear defined format in biology use to store sequence (genetic or proteic):
>id1
sequence 1 # could be on several line
>id2
sequence 2
...
I have run a tools that give me in tsv format:
the Identifiant, the length, and the position in bytes of the identifiant.
for now what I am doing is to sort this file by the length column then I parse this file and use seek to retrieve the corresponding sequence then append it to a new file.
# this fonction will get the sequence using seek
def get_seq(file, bites):
with open(file) as f_:
f_.seek(bites, 0) # go to the line of interest
line = f_.readline().strip() # this line is the begin of the
#sequence
to_return = "" # init the string which will contains the sequence
while not line.startswith('>') or not line: # while we do not
# encounter another identifiant
to_return += line
line = f_.readline().strip()
return to_return
# simply append to a file the id and the sequence
def write_seq(out_file, id_, sequence):
with open(out_file, 'a') as out_file:
out_file.write('>{}\n{}\n'.format(id_.strip(), sequence))
# main loop will parse the index file and call the function defined below
with open(args.fai) as ref:
indice = 0
for line in ref:
spt = line.split()
id_ = spt[0]
seq = get_seq(args.i, int(spt[2]))
write_seq(out_file=args.out, id_=id_, sequence=seq)
my problems is the following is really slow does it is normal (it takes several days)? Do I have another way to do it? I am a not a pure informaticien so I may miss some point but I was believing to index files and use seek was the fatest way to achive this am I wrong?
Seems like opening two files for each sequence is probably contibuting to a lot to the run time. You could pass file handles to your get/write functions rather than file names, but I would suggest using an established fasta parser/indexer like biopython or samtools. Here's an (untested) solution with samtools:
subprocess.call(["samtools", "faidx", args.i])
with open(args.fai) as ref:
for line in ref:
spt = line.split()
id_ = spt[0]
subprocess.call(["samtools", "faidx", args.i, id_, ">>", args.out], shell=True)
What about bash and some basic unix commands (csplit is the clue)? I wrote this simple script, but you can customize/improve it. It's not highly optimized and doesn't use index file, but nevertheless may run faster.
csplit -z -f tmp_fasta_file_ $1 '/>/' '{*}'
for file in tmp_fasta_file_*
do
TMP_FASTA_WC=$(wc -l < $file | tr -d ' ')
FASTA_WC+=$(echo "$file $TMP_FASTA_WC\n")
done
for filename in $(echo -e $FASTA_WC | sort -k2 -r -n | awk -F" " '{print $1}')
do
cat "$filename" >> $2
done
rm tmp_fasta_file*
First positional argument is a filepath to your fasta file, second one is a filepath for output, i.e. ./script.sh input.fasta output.fasta
Using a modified version of fastq-sort (currently available at https://github.com/blaiseli/fastq-tools), we can convert the file to fastq format using bioawk, sort with the -L option I added, and convert back to fasta:
cat test.fasta \
| tee >(wc -l > nb_lines_fasta.txt) \
| bioawk -c fastx '{l = length($seq); printf "#"$name"\n"$seq"\n+\n%.*s\n", l, "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"}' \
| tee >(wc -l > nb_lines_fastq.txt) \
| fastq-sort -L \
| tee >(wc -l > nb_lines_fastq_sorted.txt) \
| bioawk -c fastx '{print ">"$name"\n"$seq}' \
| tee >(wc -l > nb_lines_fasta_sorted.txt) \
> test_sorted.fasta
The fasta -> fastq conversion step is quite ugly. We need to generate dummy fastq qualities with the same length as the sequence. I found no better way to do it with (bio)awk than this hack based on the "dynamic width" thing mentioned at the end of https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html#Format-Modifiers.
The IIIII... string should be longer than the longest of the input sequences, otherwise, invalid fastq will be obtained, and when converting back to fasta, bioawk seems to silently skip such invalid reads.
In the above example, I added steps to count the lines. If the line numbers are not coherent, it may be because the IIIII... string was too short.
The resulting fasta file will have the shorter sequences first.
To get the longest sequences at the top of the file, add the -r option to fastq-sort.
Note that fastq-sort writes intermediate files in /tmp. If for some reason it is interrupted before erasing them, you may want to clean your /tmp manually and not wait for the next reboot.
Edit
I actually found a better way to generate dummy qualities of the same length as the sequence: simply using the sequence itself:
cat test.fasta \
| bioawk -c fastx '{print "#"$name"\n"$seq"\n+\n"$seq}' \
| fastq-sort -L \
| bioawk -c fastx '{print ">"$name"\n"$seq}' \
> test_sorted.fasta
This solution is cleaner (and slightly faster), but I keep my original version above because the "dynamic width" feature of printf and the usage of tee to check intermediate data length may be interesting to know about.
You can also do it very conveniently with awk, check the code below:
awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' input.fasta |\
awk -F '\t' '{printf("%d\t%s\n",length($2),$0);}' |\
sort -k1,1n | cut -f 2- | tr "\t" "\n"
This and other methods have been posted in Biostars (e.g. using BBMap's sortbyname.sh script), and I strongly recommend this community for questions such like this one.

grep is unable to match contents of 1 file in another file

Grep is unable to search contents of 1 file in the other file, Dont know what is wrong.
have 1 file called mine having contents like
sadiadas
HTTP:STC:ACTIVEX:MCAFEE-FREESCN 
HTTP:STC:IMG:ANI-BLOCK-STR2 
HTTP:STC:ADOBE:PDF-LIBTIFF 
HTTP:STC:ADOBE:PS-PNG-BO 
HTTP:STC:DL:EOT-IO 
HTTP:STC:IE:CLIP-MEM 
HTTP:STC:DL:XLS-DATA-INIT 
HTTP:STC:ADOBE:FLASH-RUNTIME 
HTTP:STC:ADOBE:FLASH-ARGREST 
HTTP:STC:DL:MS-NET-CLILOADER-MC 
HTTP:ORACLE:COREL-DRAW-BO 
HTTP:STC:MS-FOREFRONT-RCE 
HTTP:STC:DL:VISIO-UMLSTRING 
HTTP:ORACLE:OUTSIDEIN-CORELDRAW 
HTTP:STC:DL:MAL-M3U 
HTTP:STC:JAVA:MIXERSEQ-OF 
HTTP:STC:DL:MAL-WEBEX-WRF 
HTTP:STC:DL:XLS-FORMULA-BIFF 
HTTP:STC:JAVA:TYPE1-FONT 
HTTP:STC:DL:XLS-FIELD-MC 
HTTP:STC:IE:AUTH-REFLECTION 
HTTP:STC:DL:MOZILLA-WAV-BOF 
HTTP:XSS:PHPNUKE-BOOKMARKS1 
HTTP:STC:DL:MAL-WIN-BRIEFCASE-2 
HTTP:STC:ADOBE:FLASH-INT-OV 
HTTP:STC:IE:MAL-GIF-DOS 
APP:NOVELL:GWMGR-INFODISC 
APP:SYMC:MESSAGING-SAVE.DO-CSRF 
HTTP:STC:ADOBE:READER-MC-RCE 
HTTP:STC:DL:SOPHOS-RAR-VMSF-RGB 
HTTP:ORACLE:OUTSIDE-IN-PRDOX-BO 
HTTP:STC:JAVA:IBM-RMI-PROXY-RCE  
HTTP:STC:IE:REMOVECHILD-UAF 
HTTP:STC:COREL-WP-BOF 
SHELLCODE:MSF:PROPSPRAY 
HTTP:VLC-ABC-FILE-BOF 
HTTP:MISC:MS-XML-SIG-VAL-DOS 
HTTP:STC:ADOBE:FLASH-PLAYER-BOF 
HTTP:STC:ADOBE:FLASHPLR-FILE-MC 
HTTP:STC:ADOBE:FLASH-AS3-INT-OV 
HTTP:ORACLE:OUTSIDE-IN-MSACCESS 
HTTP:STC:SCRIPT:APACHE-XML-DOS 
HTTP:STC:JAVA:METHODHANDLE 
HTTP:STC:ADOBE:CVE-2014-0506-UF 
HTTP:STC:IE:CVE-2014-1789-MC 
HTTP:STC:ACTIVEX:KVIEW-KCHARTXY 
SHELLCODE:X86:LIN-SHELL-REV-80S 
HTTP:STC:JAVA:JRE-PTR-CTRL-EXEC 
HTTP:STC:ADOBE:CVE-2015-0091-CE 
HTTP:DOS:MUL-PRODUCTS 
HTTP:MISC:WAPP-SUSP-FILEUL1 
SHELLCODE:X86:BASE64-NOOP-80C 
SHELLCODE:X86:BASE64-NOOP-80S 
SHELLCODE:X86:REVERS-CONECT-80C 
SHELLCODE:X86:REVERS-CONECT-80S 
SHELLCODE:X86:FLDZ-GET-EIP-80C 
SHELLCODE:X86:FLDZ-GET-EIP-80S 
SHELLCODE:X86:WIN32-ENUM-80C 
SHELLCODE:X86:WIN32-ENUM-80S 
and another file that has some of the contents of file 1 called 2537_2550
HTTP:STC:OUTLOOK:MAILTO-QUOT-CE
HTTP:STC:HSC:HCP-QUOTE-SCRIPT
HTTP:STC:HSC:MS-HSC-URL-VLN
HTTP:STC:TELNET-URL-OPTS
HTTP:STC:NOTES-INI
HTTP:STC:MOZILLA:SHELL
HTTP:STC:RESIZE-DOS
HTTP:STC:IE:SHELL-WEB-FOLDER
HTTP:STC:IE:IE-MHT-REDIRECT
HTTP:IIS:ASP-DOT-NET-BACKSLASH
APP:SECURECRT-CONF
HTTP:STC:IE:IE-FTP-CMD
HTTP:STC:IE:URL-HIDING-ENC
HTTP:STC:MOZILLA:IFRAME-SRC
HTTP:STC:JAVA:MAL-JNLP-FILE
HTTP:STC:MOZILLA:WRAPPED-JAVA
HTTP:STC:MOZILLA:ICONURL-JS
APP:REAL:PLAYER-FORMAT-STRING
HTTP:STC:IE:FULLMEM-RELOAD
HTTP:STC:DL:PPT-SCRIPT
HTTP:STC:MOZILLA:FIREUNICODE
HTTP:STC:IE:MULTI-ACTION
HTTP:STC:IE:CREATETEXTRANGE
HTTP:STC:IE:HTML-TAG-MC
HTTP:STC:IE:NESTED-OBJECT-TAG
SHELLCODE:JS:UNICODE-ENC
HTTP:STC:IE:UTF8-DECODE-OF
HTTP:STC:IE:VML-FILL-BOF
HTTP:STC:MOZILLA:FF-DEL-OBJ-REF
HTTP:STC:ADOBE:ACROBAT-URL-DF
HTTP:STC:CLSID:ACTIVEX:TREND-AX
HTTP:XSS:IE7-XSS
HTTP:STC:NAV-REDIR
HTTP:STC:ACTIVEX:AOL-AMPX
HTTP:STC:ACTIVEX:IENIPP
HTTP:STC:ACTIVEX:REAL-PLAYER
HTTP:STC:ACTIVEX:ORBIT-DWNLDR
HTTP:STC:SEARCH-LINK
HTTP:STC:ITUNES-HANDLER-OF
HTTP:STC:OPERA:FILE-URL-OF
HTTP:STC:ACTIVEX:EASYMAIL
HTTP:STC:ACTIVEX:IETAB-AX
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:IE:TOSTATIC-DISC
HTTP:STC:WHSC-RCE
HTTP:STC:IE:CROSS-DOMAIN-INFO
HTTP:STC:IE:UNISCRIBE-FNPS-MC
HTTP:STC:IE:CSS-OF
HTTP:STC:OBJ-FILE-BASE64
HTTP:STC:IE:ANIMATEMOTION
HTTP:STC:CHROME:GURL-XO-BYPASS
HTTP:STC:SAFARI:WEBKIT-1ST-LTR
HTTP:STC:IE:BOUNDELEMENTS
HTTP:STC:IE:IFRAME-MEM-CORR
HTTP:STC:STREAM:QT-HREFTRACK
HTTP:STC:MOZILLA:CONSTRUCTFRAME
HTTP:STC:MOZILLA:ARGMNT-FUNC-CE
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:IE:HTML-RELOAD-CORRUPT
HTTP:STC:IE:TABLE-SPAN-CORRUPT
HTTP:STC:IE:TABLE-LAYOUT
HTTP:STC:DL:MSHTML-DBLFREE
HTTP:STC:IE:EVENT-INVOKE
HTTP:STC:IE:DEREF-OBJ-ACCESS
HTTP:STC:IE:TOSTATIC-XSS
HTTP:STC:ON-BEFORE-UNLOAD
HTTP:STC:DL:MAL-WOFF
HTTP:STC:DL:EOT-IO
HTTP:STC:MOZILLA:FF-REMOTE-MC
HTTP:STC:DL:DIRECTX-SAMI
HTTP:STC:IE:ONREADYSTATE
HTTP:STC:DL:VML-GRADIENT
HTTP:STC:IE:TABLES-MEMCORRUPT
HTTP:STC:JAVA:DOCBASE-BOF
HTTP:STC:IE:CLIP-MEM
HTTP:STC:ACTIVEX:WMI-ADMIN
HTTP:STC:MOZILLA:DOC-WRITE-MC
HTTP:STC:IE:SELECT-ELEMENT
HTTP:STC:IE:XML-ELEMENT-RCE
SHELLCODE:X86:FNSTENV-80C
HTTP:STC:IE:OBJ-MGMT-MC
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ACTIVEX:ISSYMBOL
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:IE:VML-RCE
HTTP:STC:IE:HTML-TIME
HTTP:STC:IE:LAYOUT-GRID
HTTP:STC:IE:CELEMENT-RCE
HTTP:STC:IE:SELECT-EMPTY
HTTP:XSS:MS-IE-TOSTATICHTML
HTTP:STC:SAFARI:WEBKIT-FREE-CE
HTTP:IIS:ASP-PAGE-BOF
HTTP:STC:MOZILLA:FIREFOX-MC
HTTP:STC:MOZILLA:FF-XSL-TRANS
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:STC:MOZILLA:CLEARTEXTRUN
HTTP:STC:MOZILLA:FIREFOX-ENG-MC
HTTP:STC:MOZILLA:PARAM-OF
HTTP:ORACLE:COREL-DRAW-BO
HTTP:STC:MOZILLA:JIT-ESCAPE-MC
HTTP:STC:SAFARI:WEBKIT-SVG-MC
HTTP:STC:SAFARI:INNERHTML-MC
HTTP:STC:MOZILLA:NSCSSVALUE-OF
HTTP:NOVELL:GROUPWISE-IMG-BOF
I tried
grep -Ff mine 2537_2550 but the grep wasn't able to search?
Using exactly your input and your command I'm able to find the matching lines:
$ grep -Ff file1 file2
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:DL:EOT-IO
HTTP:STC:IE:CLIP-MEM
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:ORACLE:COREL-DRAW-BO
Probably you have some non-printable character that prevents you from finding the matches.
Try to remove non printable characters from both your files with the following command:
tr -cd '\11\12\15\40-\176' < infile > outfile
I have used the input data you have mentioned and it is working .
Following output is given
$ grep -Ff pattern searchFile
HTTP:STC:ADOBE:PDF-LIBTIFF
HTTP:STC:ADOBE:PS-PNG-BO
HTTP:STC:DL:EOT-IO
HTTP:STC:IE:CLIP-MEM
HTTP:STC:DL:XLS-DATA-INIT
HTTP:STC:ADOBE:FLASH-RUNTIME
HTTP:STC:ADOBE:FLASH-ARGREST
HTTP:STC:DL:MS-NET-CLILOADER-MC
HTTP:ORACLE:COREL-DRAW-BO
Probably there is some non-printable characters in your file .
use cat -vte filename to look for them.
In case your file have been ftped from some different OS server like windows , use dos2unix filename to convert it into unix file format

Find HEX value in file and grep the following value

I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.
Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135
In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15
I have tried with
xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt
and
xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt
But I can't get it working.
Can anybody give me a hint?
As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:
#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
i += len(needle)
print (haystack[i : i + 28])
i = haystack.find(needle, i)
If your grep supports -P parameter then you could simply use the below command.
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A
For 56 chars,
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135
Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.
Adapted from this post: Grep characters before and after match?
I added some blank lines for readability.
VirtualBox:~$ cat data.dat
Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata
VirtualBox:~$ cat test.sh
awk '/5Z\<\/time\>/ {
match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat
VirtualBox:~$ ./test.sh
2011-04-26T14:47:02-14:47:15
VirtualBox:~$
EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.

Resources