Read output of dd into a shell script variable - linux

Being very new to shell scripts, I have pieced together the following to search /dev/sdd1, sector by sector, to find a string. How do I get the sector data into the $HAYSTACK variable?
#!/bin/bash
HAYSTACK=""
START_SEARCH=$1
NEEDLE=$2
START_SECTOR=2048
END_SECTOR=226512895+1
SECTOR_NUMBER=$((START_SEARCH + START_SECTOR))
while [ $SECTOR_NUMBER -lt $END_SECTOR ]; do
$HAYSTACK=`dd if=/dev/sdd1 skip=$SECTOR_NUMBER count=1 bs=512`
if [[ "$HAYSTACK" =~ "$NEEDLE" ]]; then
echo "Match found at sector $SECTOR_NUMBER"
break
fi
let SECTOR_NUMBER=SECTOR_NUMBER+1
done
Update
The intention is not to make a perfect script to handle fragmented file scenarios (I doubt that is possible at all).
In my case not being able to distinguish stings with nulls is also a non-issue.
If you could expand the pipe suggestions into an answer it would be more than enough. Thanks!
Background
I have managed to wipe my www folder and have been trying to recover as much of my source files as possible. I have used Scalpel to recover my php and html files. But the version I could get working on my Ubuntu 16.04 is Version 1.60 which does not support regex in header/footer so I cannot make a good pattern for css, js, and json files.
I remember fairly rare strings to search for and find my files, but have no idea where in a block the string could be. The solution I came up with is this shell script to read blocks from the partition and look for the substring and if a match is found print out the LSB number and exit.

If the searched for item is a text string, consider using the -t
option of the strings command to print the offset of where the
string is found. Since strings doesn't care where the data is
from, it works on files, block devices, and piped input from dd.
Example from the start of a hard disk:
sudo strings -t d /dev/sda | head -5
Output:
165 ZRr=
286 `|f
295 \|f1
392 GRUB
398 Geom
Instead of head that could be piped to grep -m 1 GRUB, which
would output only the first line with "GRUB":
sudo strings -t d /dev/sda | grep -m 1 GRUB
Output:
392 GRUB
From there, bash can do quite a lot. This code finds the first 5
instances of "GRUB" on my boot partition /dev/sda7:
s=GRUB ; sudo strings -t d /dev/sda7 | grep "$s" |
while read a b ; do
n=${b%%${s}*}
printf "String %-10.10s found %3i bytes into sector %i\n" \
"\"${b#${n}}\"" $(( (a % 512) + ${#n} )) $((a/512 + 1))
done | head -5
Output (the sector numbers here are relative to the start of the
partition):
String "GRUB Boot found 7 bytes into sector 17074
String "GRUB." found 548 bytes into sector 25702
String "GRUB." found 317 bytes into sector 25873
String "GRUBLAYO" found 269 bytes into sector 25972
String "GRUB" found 392 bytes into sector 26457
Things to watch out for:
Don't do dd-based single-block searches with strings as it would fail if the string spanned two blocks. Use strings to get
the offset first, then convert that offset to blocks, (or
sectors).
strings -t d can return big strings, and the "needle" might be several bytes into a string, in which case the offset would be the
start of the big string, rather than the grep string (or
"needle"). The above bash code allows for that and uses the $n
to calculate a corrected offset.
Lazy all-in-one util rafind2 method. Example, search for the
first instance of "GRUB" on /dev/sda7 as before:
sudo rafind2 -Xs GRUB /dev/sda7 | head -7
Output:
0x856207
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00856207 4752 5542 2042 6f6f 7420 4d65 6e75 006e GRUB Boot Menu.n
0x00856217 6f20 666f 6e74 206c 6f61 6465 6400 6963 o font loaded.ic
0x00856227 6f6e 732f 0069 636f 6e64 6972 0025 733a ons/.icondir.%s:
0x00856237 2564 3a25 6420 6578 7072 6573 7369 6f6e %d:%d expression
0x00856247 2065 7870 6563 7465 6420 696e 2074 expected in t
With some bash and sed that output can be reworked into the same
format as the strings output:
s=GRUB ; sudo rafind2 -Xs "$s" /dev/sda7 |
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" |
sed -r -n 'h;n;n;s/.{52}//;H;n;n;n;n;g;s/\n//p' |
while read a b ; do
printf "String %-10.10s\" found %3i bytes into sector %i\n" \
"\"${b}" $((a%512)) $((a/512 + 1))
done | head -5
The first sed instance is borrowed from jfs' answer to "Program
that passes STDIN to STDOUT with color codes stripped?", since
the rafind2 outputs non-text color codes.
Output:
String "GRUB Boot" found 7 bytes into sector 17074
String "GRUB....L" found 36 bytes into sector 25703
String "GRUB...LI" found 317 bytes into sector 25873
String "GRUBLAYO." found 269 bytes into sector 25972
String "GRUB .Geo" found 392 bytes into sector 26457

Have you thought about some like this
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/x F l/'g > v2
cmp -lb v1 v2
for example applying this to a .pdf file
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g > v2
cmp -l v1 v2
gives the output
228 106 F 170 x
23525 106 F 170 x
37737 106 F 170 x
48787 106 F 170 x
52577 106 F 170 x
56833 106 F 170 x
57869 106 F 170 x
118322 106 F 170 x
119342 106 F 170 x
where numbers in first column will be the byte offsets where the pattern being sought starts. These byte offsets are multiplied by four since od uses four bytes for every byte.
A single line form (in a bash shell), without writing large temporary files, would be
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g | cmp -lb - <(od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g )
this avoids needing to write the contents of /dev/sdd1 to temporary files somewhere.
Here is an example looking for PDF on a USB drive device and dividing by 4 and 512 to get block numbers
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/P D F/x D F/'g ) | awk '{print int($1/512/4)}' | head -10
testing this gives
100000+0 records in
100000+0 records out
51200000 bytes transferred in 18.784280 secs (2725683 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 40.915697 secs (1251353 bytes/sec)
cmp: EOF on -
28913
32370
32425
33885
35097
35224
37177
38522
39981
41570
where numbers are 512 byte block numbers. Checking gives
dd if=/dev/disk5s1 bs=512 skip=35224 count=1 | od -vc | grep P
0000340 \0 \0 \0 001 P D F C A R O \0 \0 \0 \0
Here is what an actual full example looks like with a disk and looking for character sequence live and where characters are separated by NUL
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/x \\0 i \\0 v \\0 e/'g | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/l \\0 i \\0 v \\0 e/'g )
Note
this would not deal with fragmentation into non-consecutive blocks where that splits the pattern. The second sed, which does pattern and substitution, could be replaced by a custom program that does some partial pattern match and makes a substitution if number of matching characters is above some level. That might return false positives, but is probably the only way to deal with fragmentation.

Related

How to count number of occurrence consecutive pattern spanning over lines in Bash?

For example, I have a file like this. How can I count the number of occurrences of consecutive N's spanning over lines?
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
CACTGCTGTCACCCTCCATGCACCTGCCCACCCTCCAAGGATCNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNGgtgtgtatatatcatgtgtgatgtgtggtgtgtg
gggttagggttagggttaNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNAGaggcatattgatctgttgttttattttcttacag
ttgtggtgtgtggtgNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
The expected result is 4 because there are 4 groups of N
I tried grep -Eozc 'N+', but the result is 1
If possible, I hope the line number and length of N can be shown too
awk '$1=$1' FS='' OFS='\n' file | uniq -c | grep -c N
or
tr -d '\r\n' < file | grep -o 'N*' | grep -c .
Output:
4
In plain bash, without using any external command:
v=$(<file)X
v=${v//[[:space:]]}
v=${v//N[^N]/ }
v=${v//[^ ]}
echo ${#v}
Output:
4
A little long, but straightforward:
< tmp.txt \
tr -d '\n' | # Strip newlines
tr -s N | # Collapse strings of Ns to a single N
tr -dC N | # Strip anything that *isn't* an N
wc -c # Count the resulting Ns
As a one-liner:
< tmp.txt tr -d '\n' | tr -s N | tr -dC N | wc -c
Invoke a Ruby One-Liner from Bash
You can do this as a Ruby one-liner from Bash, whether reading from a file or standard input. For example:
$ ruby -e 'puts ARGF.read.delete("\n").scan(/N+/).count' example.txt
4
$ ruby -e 'puts ARGF.read.delete("\n").scan(/N+/).count' <<< "$str"
4
The notion is to slurp the whole file, remove all the newlines, and then count the groups of consecutive N characters.
Note: If you want to ignore isolated N's, then just scan for /N{2,}/ instead. That will only count runs of two or more N characters.
Assuming that your data is in a file called test.txt:
We read all data from it.
Show lines that match our pattern (Starts and ends with N and only contains N)
Count number of lines
So here is the code that do this:
cat test.txt | egrep -oe "^N*$" | wc -l

How do I extract everything in a file after the first null character from shell

I have a file that looks like this: some ascii stuff\0some more ascii stuff\0and a little more ascii stuff\0.
I want to extract everything after the first \0. So my output after this process would be some more ascii stuff\0and a little more ascii stuff\0
How would I go about doing this? This is done within initramfs so my access to commands is somewhat limited. I do have cut, grep, and awk which I've been trying to get work, but I'm just not having any luck.
This utils are mostly busybox and sh for the shell
Easily done, with nothing but shell builtins (well, cat isn't a builtin, but you can substitute it with the actual intended consumer of your stream):
{ IFS= read -r -d '' _; cat; } <yourfile
read -d '' reads everything, one byte at a time, up to the first NUL on stdin. What's left on that stream, thus, is all the content after that NUL.
You can test it as follows:
printf '%s\0' one two three | { IFS= read -r -d '' _; hexdump -C; }
...which properly emits:
00000000 74 77 6f 00 74 68 72 65 65 00 |two.three.|
0000000a
If you have grep, you most likely also have sed.
This works for me:
echo -e "one\000two\000three" | sed 's/[^\o000]*\o000//'
Using gnu awk you can do this:
awk -F '\\0' 'NR == 1{sub($1 FS, "")} 1' file
some more ascii stuffand a little more ascii stuff
Verify output with od -c:
awk -F '\\0' 'NR == 1{sub($1 FS, "")} 1' file | od -c
0000000 s o m e m o r e a s c i i
0000020 s t u f f \0 a n d a l i t t
0000040 l e m o r e a s c i i s t
0000060 u f f \0 \n
0000065
I would use perl
perl -n0e 'print unless $.==1'
The -0 sets the record separator to the null byte, and the print prints everything except the first record.
Whether this works for you or not will depend on the version of awk you have available at that time ... this works for me w/ GNU awk 4.1.3
echo -e 'some ascii stuff\0some more ascii stuff\0and a little more ascii stuff\0'| awk 'BEGIN{RS="\0";ORS="\t"} NR>1{print $0}'
some more ascii stuff and a little more ascii stuff

How to translate and remove non-printable characters? [duplicate]

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.
Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

How to remove the last CR char with `cut`

I would like to get a portion of a string using cut. Here a dummy example:
$ echo "foobar" | cut -c1-3 | hexdump -C
00000000 66 6f 6f 0a |foo.|
00000004
Notice the \n char added at the end.
In that case there is no point to use cut to remove the last char as follow:
echo "foobar" | cut -c1-3 | rev | cut -c 1- | rev
I will still get this extra and unwanted char and I would like to avoid using an extra command such as:
shasum file | cut -c1-16 | perl -pe chomp
The \n is added by echo. Instead, use printf:
$ echo "foobar" | od -c
0000000 f o o b a r \n
0000007
$ printf "foobar" | od -c
0000000 f o o b a r
0000006
It is funny that cut itself also adds a new line:
$ printf "foobar" | cut -b1-3 | od -c
0000000 f o o \n
0000004
So the solution seems using printf to its output:
$ printf "%s" $(cut -b1-3 <<< "foobar") | od -c
0000000 f o o
0000003

Convert binary data to hexadecimal in a shell script

I want to convert binary data to hexadecimal, just that, no fancy formatting and all. hexdump seems too clever, and it "overformats" for me. I want to take x bytes from the /dev/random and pass them on as hexadecimal.
Preferably I'd like to use only standard Linux tools, so that I don't need to install it on every machine (there are many).
Perhaps use xxd:
% xxd -l 16 -p /dev/random
193f6c54814f0576bc27d51ab39081dc
Watch out!
hexdump and xxd give the results in a different endianness!
$ echo -n $'\x12\x34' | xxd -p
1234
$ echo -n $'\x12\x34' | hexdump -e '"%x"'
3412
Simply explained. Big-endian vs. little-endian :D
With od (GNU systems):
$ echo abc | od -A n -v -t x1 | tr -d ' \n'
6162630a
With hexdump (BSD systems):
$ echo abc | hexdump -ve '/1 "%02x"'
6162630a
From Hex dump, od and hexdump:
"Depending on your system type, either or both of these two utilities will be available--BSD systems deprecate od for hexdump, GNU systems the reverse."
Perhaps you could write your own small tool in C, and compile it on-the-fly:
int main (void) {
unsigned char data[1024];
size_t numread, i;
while ((numread = read(0, data, 1024)) > 0) {
for (i = 0; i < numread; i++) {
printf("%02x ", data[i]);
}
}
return 0;
}
And then feed it from the standard input:
cat /bin/ls | ./a.out
You can even embed this small C program in a shell script using the heredoc syntax.
All the solutions seem to be hard to remember or too complex. I find using printf the shortest one:
$ printf '%x\n' 256
100
But as noted in comments, this is not what author wants, so to be fair, below is the full answer.
... to use above to output actual binary data stream:
printf '%x\n' $(cat /dev/urandom | head -c 5 | od -An -vtu1)
What it does:
printf '%x\n' .... - prints a sequence of integers , i.e. printf '%x,' 1 2 3, will print 1,2,3,
$(...) - this is a way to get output of some shell command and process it
cat /dev/urandom - it outputs random binary data
head -c 5 - limits binary data to 5 bytes
od -An -vtu1 - octal dump command, converts binary to decimal
As a testcase ('a' is 61 hex, 'p' is 70 hex, ...):
$ printf '%x\n' $(echo "apple" | head -c 5 | od -An -vtu1)
61
70
70
6c
65
Or to test individual binary bytes, on input let’s give 61 decimal ('=' char) to produce binary data ('\\x%x' format does it). The above command will correctly output 3d (decimal 61):
$printf '%x\n' $(echo -ne "$(printf '\\x%x' 61)" | head -c 5 | od -An -vtu1)
3d
If you need a large stream (no newlines) you can use tr and xxd (part of Vim) for byte-by-byte conversion.
head -c1024 /dev/urandom | xxd -p | tr -d $'\n'
Or you can use hexdump (POSIX) for word-by-word conversion.
head -c1024 /dev/urandom | hexdump '-e"%x"'
Note that the difference is endianness.
dd + hexdump will also work:
dd bs=1 count=1 if=/dev/urandom 2>/dev/null | hexdump -e '"%x"'
Sometimes perl5 works better for portability if you target more than one platform. It comes with every Linux distribution and Unix OS. You can often find it in container images where other tools like xxd or hexdump are not available. Here's how to do the same thing in Perl:
$ head -c8 /dev/urandom | perl -0777 -ne 'print unpack "H*"'
5c9ed169dabf33ab
$ echo -n $'\x01\x23\xff' | perl -0777 -ne 'print unpack "H*"'
0123ff
$ echo abc | perl -0777 -ne 'print unpack "H*"'
6162630a
Note that this uses slurp more, which causes Perl to read the entire input into memory, which may be suboptimal when the input is large.
These three commands will print the same (0102030405060708090a0b0c):
n=12
echo "$a" | xxd -l "$n" -p
echo "$a" | od -N "$n" -An -tx1 | tr -d " \n" ; echo
echo "$a" | hexdump -n "$n" -e '/1 "%02x"'; echo
Given that n=12 and $a is the byte values from 1 to 26:
a="$(printf '%b' "$(printf '\\0%o' {1..26})")"
That could be used to get $n random byte values in each program:
xxd -l "$n" -p /dev/urandom
od -vN "$n" -An -tx1 /dev/urandom | tr -d " \n" ; echo
hexdump -vn "$n" -e '/1 "%02x"' /dev/urandom ; echo

Resources