How can I change the number of columns printed by `hexdump`? - hexdump

How can I change the number of columns printed by hexdump from the default 16 (to 21)?
Or where can I find the place to change the default format string used in hexdump in order to modify the number used there?

It seems that the default format is obtained thus:
hexdump -e '"%07.7_Ax\n"' -e '"%07.7_ax " 8/2 "%04x " "\n"'
From man hexdump:
Implement the -x option:
"%07.7_Ax\n"
"%07.7_ax " 8/2 "%04x " "\n"
If you want to understand hexdump's format, you'll have to read the manual, but here's a short walkthrough of the previous format:
The first part %07.7_Ax\n is the part that displays the last line that only contains the offset. Per the manual:
_a[dox] Display the input offset, cumulative across input files, of the
next byte to be displayed. The appended characters d, o, and x
specify the display base as decimal, octal or hexadecimal
respectively.
_A[dox] Identical to the _a conversion string except that it is only
performed once, when all of the input data has been processed.
For the second: we now understand the "%07.7_ax " part. The 8/2 means 8 iterations and 2 bytes for the following, namely, "%04x ". Finally, after these, we have a newline: "\n".
I'm not really sure how you want your 21 bytes. Maybe this would do:
hexdump -e '"%07.7_Ax\n"' -e '"%07.7_ax " 21/1 "%02x " "\n"'
And you know how to get rid of the offset, if needed:
hexdump -e '21/1 "%02x " "\n"'

Related

Replace double quotes that come in pairs

I'd like to replace double quotes " characters which come in pairs. Let me explain what I mean.
"Some sentence"
Here double quotes should be replaced because they come in pair.
"Some sentence
Here should not be replaced - there is no matching pair for the first quote character.
I'd like to replace first quote character with „.
❯ echo „ |hexdump -C
00000000 e2 80 9e 0a
And the second quote character with ”
❯ echo ” |hexdump -C
00000000 e2 80 9d 0a
Summing it up, the following:
Hi, "how
are you"
Should be the following after being replacement is made.
Hi, „how
are you”
I've come up with the following code, but it fails to work:
'sed -r s/(\")(.+)(\")/\1\xe2\x80\x9e\3\xe2\x80\x9d/g'
" hi " gives "„"”.
EDIT
As requested in the comments, here comes a sample from a file to be modified. Important note: the file is structured - perhaps it may help. The file is always a srt file, i.e. movie subtitle format.
104
00:10:25,332 --> 00:10:27,876
Kobieta mówi do drugiej:
"Widzisz to, co ja?"
105
00:10:28,001 --> 00:10:30,904
A tamta: "No to co?
Każdy wygląda tak samo."
Your expression doesn't work because you have three capturing groups: The three sets of (). You are putting the 1st (the first quote) and the 3rd (the last quote) in the output and ignoring the 2nd, which is the part you want to keep.
There's no reason to capture the quotes, since you don't want to inject them into the output. Only the bit in the middle needs to be captured.
There is also a flaw, the (.*) will itself match against a string containing a quote. So /"(.*)"/ would match the entire sequence "one"two", with the capture, (.*), matching one"two. Use [^"]* to match a sequence of non-quote characters.
Fixing this, and treating the entire text file as one line with -z, which only works if there are no nul characters in the text file, it appears this works:
sed -zE 's/"([^"]+)"/„\1“/g'
sed -rn ':a;s/"([^"]*)"/„\1”/g;/"/!{p;b;};$p;N;ba'
It substitutes all "xx" with „xx”. If the result contains no more " it is printed and we restart with next line. Else we concatenate the next line and we restart. The $p is just here to print the last lines if they contain a dangling ".

how to "decdump" a string in bash?

I need to convert a string into a sequence of decimal ascii code using bash command.
example:
for the string 'abc' the desired output would be 979899 where a=97, b=98 and c=99 in ascii decimal code.
I was able to achieve this with ascii hex code using xxd.
printf '%s' 'abc' | xxd -p
which gives me the result: 616263
where a=61, b=62 and c=63 in ascii hexadecimal code.
Is there an equivalent to xxd that gives the result in ascii decimal code instead of ascii hex code?
If you don't mind the results are merged into a line, please try the following:
echo -n "abc" | xxd -p -c 1 |
while read -r line; do
echo -n "$(( 16#$line ))"
done
Result:
979899
str=abc
printf '%s' $str | od -An -tu1
The -An gets rid of the address line, which od normally outputs, and the -tu1 treats each input byte as unsigned integer. Note that it assumes that one character is one byte, so it won't work with Unicode, JIS or the like.
If you really don't want spaces in the result, pipe it further into tr -d ' '.
Unicode Solution
What makes this problem annoying is that you have to pipeline characters when converting from hex to decimal. So you can't do a simple conversion from char to hex to dec as some characters hex representations are longer than others.
Both of these solutions are compatible with unicode and use a character's code point. In both solutions, a newline is chosen as separator for clarity; change this to '' for no separator.
Bash
sep='\n'
charAry=($(printf 'abc🎶' | grep -o .))
for i in "${charAry[#]}"; do
printf "%d$sep" "'$i"
done && echo
97
98
99
127926
Python (in Bash)
Here, we use a list comprehension to convert every character to a decimal number (ord), join it as a string and print it. sys.stdin.read() allows us to use Python inline to get input from a pipe. If you replace input with your intended string, this solution is then cross-platform.
printf '%s' 'abc🎶' | python -c "
import sys
input = sys.stdin.read()
sep = '\n'
print(sep.join([str(ord(i)) for i in input]))"
97
98
99
127926
Edit: If all you care about is using hex regardless of encoding, use #user1934428's answer

sed command to replace char at a specific byte position

I am having two files say test1 & test2 having a difference at some random position.So i want to search for char position that differ in test1 from test 2 & want to replace that with * in test 1 . But the constraint is without knowing the char by just knowing the position of char.
So i have tried to use the cmp -b to get the byte position that differ but cant get something in sed or any where that can replace char at a byte position or something in compare that will give line no. as well as char position in line that differ. So any help (the main constraint is that cant replace with char value as don't want that change at other places in file want change only at that position). sed first occurrence replacement will also not work as first occurence may be before the differ position.
So you just want to write some byte at some offset without file truncation, maybe dd will help?
Setup:
$ cat f1
aaaaaaaaaaaaaaaaaaaaaaaaa
$ cat f2
aaaaaaaaaaaaaaabaaaaaaaaa
Script:
if ! CMPOUT=`cmp -b f1 f2`; then
POS=`echo "$CMPOUT" | sed -r 's/^.*: byte ([0-9]+),.*$/\1/'`
echo -n '*' | dd of=f2 seek="$((POS-1))" bs=1 count=1 conv=notrunc
fi
Result:
$ cat f2
aaaaaaaaaaaaaaa*aaaaaaaaa

Replace 2nd occurance of a special character after nth occurance of a delimiter from string,in unix/linux

Here My question is,
Replace 2nd or all occurance of a special character after nth occurance of a delimiter from string,in unix/linux
or
Replace "Text Qualifier" character from data field in unix.
I have below string where '"'(Double Quote) should get replaced with space.
String:
"123"~"23"~"abc"~24.50~"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"~"2013-03-13"
From above string, i want below output:
"123"~"23"~"abc"~24.50~"descr :- nut size 12 & bolt size 12 1/2, Quantity=20"~"2013-03-13"
I have replaced " double quote character with space character.
"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"
&
"descr :- nut size 12 & bolt size 12 1/2, Quantity=20"
I want to identify such rows from file & would like to replace such text qualifier character from data in Unix/Linux.
Request you to provide your inputs, & thanking you in advance.
I would use plain read to get the fields and then modify the wished ones using sed or shell variable substitution mechanisms:
echo '"123"~"23"~"abc"~24.50~"descr :- nut size 12" & bolt size 12"1/2, Quantity=20"~"2013-03-13"' | {
IFS='~' read a b c d e f
printf "%s~%s~%s~%s~%s~%s" "$a" "$b" "$c" "$d" "$(sed 's/"/ /g' <<<$e)" "$f"
# or:
printf "%s~%s~%s~%s~%s~%s" "$a" "$b" "$c" "$d" "${e//\"/ }" "$f"
}
That IFS ("Internal Field Separator") is an internal variable telling the shell how to separate fields, e.g. when using read. In our case using this tells the shell to use ~ as separator. Prepending the assignment directly to the read command makes that assignment only for the duration of the read command.
I'm going to assume you work in the bash shell.
awk(awk) can help you split your input string at the right positions, using its "-F" option:
echo xyzabcdef | awk -Fb '{print $1}'
gives you "xyza", the the first string before the separator.
Then, the tr(1) utility can help you replace characters:
tr '"' ' '
will replace '"' with ' '. I hope this helps to get you in the right direction.

How to use grep to extract a line with inconsistent number of space character

Like I have a text file with,
+
Code here
+
Code here +
Code+ here
+
Code here
And I want to grep this file and only show the line with a + character only? What regex should I construct?
Please advise
Many thanks
If I understand correctly, you want the line with only a +, no whitespace. To do that,
use ^ and $ to match the beginning and end of the line, respectively.
grep '^+$' filename
If you want a line with nothing but the + character, use:
grep '^+$' inputFile
This uses the start and end markers to ensure it only has that one character.
However, if you want lines with only a + character, possibly surrounded by spaces (as seems to be indicated by your title), you would use something like:
grep '^ *+ *$' inputFile
The sections are:
"^", start of line marker.
" *", zero or more spaces.
"+", your plus sign.
" *", zero or more spaces.
"$", end of line marker.
The following transcript shows this in action:
pax> echo ' +
Code here
+
Code here +
Code+ here
+
Code here' | grep '^ *+ *$' | sed -e 's/^/OUTPUT:>/' -e 's/$/</'
OUTPUT:> +<
OUTPUT:> +<
OUTPUT:> + <
The input data has been slightly modified, and a sed filter has been added, to show how it handles spacing on either side.
And if you want general white space rather than just spaces, you can change the space terms from the current " *" into "\s*" for example.

Resources