I have a text file containing lots of RGB colour codes in decimal. For example
000,000,000
000,003,025
000,007,048
000,010,069
000,014,089
000,017,108
000,020,125
000,024,140
000,027,155
I would like to convert each line to hex format (desired output):
00,00,00
00,03,15
00,07,30
00,08,45
I know I can use printf "%.2x,%.2x,%.2x\n" 000 010 69 however printf "%.2x,%.2x,%.2x\n" 000 010 069 does not work as 069 is not convertable.
I thought awk would be a reasonable tool for the job, but I guess I would face the same problems converting decimals such as 069 etc.
perl -le '$hex = sprintf("%.2x,%.2x,%.2x",005,69,255); print $hex' also has the same issue with 069
It works fine in awk:
$ echo 000,062,102 | awk '{printf( "%x,%x,%x\n", $1,$2,$3)}' FS=,
0,3e,66
You're simply missing the commas between the arguments:
echo "000,010,069" | awk -F ',' '{ printf "%02X,%02X,%02X\n", $1, $2, $3 }'
produces:
00,0A,45
Verified both on Mac OS X (BSD awk) and Linux (GNU awk).
Perl solution:
perl -pe 's/([0-9]+)/sprintf "%02x", $1/ge' INPUT
You do not have to care about octal interpretation. It applies to literals only, not to values of variables.
Related
I have the file, which have a Chinese word in each line like this :
王大明
新型传染病
電子雷射
I want to add the number of Chinese character in each end of line :
王大明 3
新型传染病 5
電子雷射 4
How can I do this?
I know command, sed, wc. However, I cannot achieve this work. I tried many things, but clearly I need help here.
sed -i s/$/{length $0}/ myfile
sed -i s/$/{wc -m}/ myfile
awk '{$2=system(awk 'length') OFS $2} 1' myfile
What exactly will work will depend entirely on what exactly your input looks like. If you are dealing with Unicode glyphs, use a Unicode-aware tool such as e.g. Python.
bash$ cat uniline
#!/usr/bin/env python3
import sys
for line in sys.stdin:
line = line.rstrip('\n')
print(line, len(line))
bash$ chmod +x uniline
bash$ uniline <<\:
> 王大明
> 新型传染病
> 電子雷射
> :
王大明 3
新型传染病 5
電子雷射 4
(I had to trim some whitespace from the ends of the lines in the example you posted.)
For the record, my system encoding is UTF-8, meaning the first line's representation as bytes is
bash$ echo '王大明' | xxd
00000000: e78e 8be5 a4a7 e698 8e0a ..........
Perhaps see also Problematic questions about decoding errors for some relevant background.
If you are lucky, even Awk and wc might be locale-aware on your platform. Your sed attempts really have no chance of working (though if you have GNU sed you could try with the /e option; but really, probably don't). If you have GNU Awk and the en_US.UTF-8 locale defined, this works, too:
bash$ echo $'\xe7\x8e\x8b\xe5\xa4\xa7\xe6\x98\x8e' |
> LC_ALL=en-US.UTF-8 awk '{ print $0, length }'
王大明 3
if you're VERY certain the only multi-byte characters there are chinese, then do
gawk/mawk/mawk2 '{ print $0, \
\
gsub(/\342|\343|\344|\345|\346|\347|\350|\351|\357|\360/, "&") }'
This list of leading-bytes shall correctly account for either 3- or 4-byte code-points related to chinese chars, of either simplified and traditional, plus all special compatibility variants.
Run that in either byte-mode or unicode-mode and it'll give you the same result. Your locale settings DOES NOT matter here (as long as your input is already UTF8 compliant text)
If you're definitely in byte-mode or LC_ALL=C, then
awk '{ print $0, gsub(/[\342-\351\357\360]/,"&") }'
One of the less-mentioned-but-excellent use case for gsub() is to use it for purposes of counting occurrences without having to do split() or substr().
if you're REALLY pedantic about exactness, the hideous regex i use myself is
function isChinese(str6) { return (str6 ~
/\344|\345|\346|\347|\350|\351|
(\343|\360|\357)(\244|\245|\246|\247|
\250|\251|\252|\253)|(\357\271|
\343(\204|\207))(\200|\201|\202|\203|\204|
\205|\206|\207|\210|
\211|\212|\213|\214|\215|\216|\217)|(\343\206|
\357\270)(\260|\261|\262|\263|\264|\265|\266|\267|
\270|\271|\272|\273|\274|\275|\276|\277)|
(\343|\360)(\240|\241|\242|\243|\254|\255|\256|\257|\260|
\261)|\342(\272|\273|\274|\275|\276|\277(\200|
\210|\211|\212|\213|\214|\215|\216|\217))|
(\342\277|\343(\204|\206|\207))(\220|\221|\222|
\223|\224|\225|\226|\227|\230|\231|\232|\233|
\234|\235|\236|\237)|\343(\200|\210|\211|\212|
\213|\214|\215|\216|\217|\220|\221|\222|\223|
\224|\225|\226|\227|\230|\231|\232|\233|\234|
\235|\236|\237|\262|\263|\264|\265|\266|(\204|
\206|\207)(\240|\241|\242|\243|\244|
\245|\246|\247|\250|\251|\252|\253|\254|\255|\256|\257))/) };
I need to convert a string into a sequence of decimal ascii code using bash command.
example:
for the string 'abc' the desired output would be 979899 where a=97, b=98 and c=99 in ascii decimal code.
I was able to achieve this with ascii hex code using xxd.
printf '%s' 'abc' | xxd -p
which gives me the result: 616263
where a=61, b=62 and c=63 in ascii hexadecimal code.
Is there an equivalent to xxd that gives the result in ascii decimal code instead of ascii hex code?
If you don't mind the results are merged into a line, please try the following:
echo -n "abc" | xxd -p -c 1 |
while read -r line; do
echo -n "$(( 16#$line ))"
done
Result:
979899
str=abc
printf '%s' $str | od -An -tu1
The -An gets rid of the address line, which od normally outputs, and the -tu1 treats each input byte as unsigned integer. Note that it assumes that one character is one byte, so it won't work with Unicode, JIS or the like.
If you really don't want spaces in the result, pipe it further into tr -d ' '.
Unicode Solution
What makes this problem annoying is that you have to pipeline characters when converting from hex to decimal. So you can't do a simple conversion from char to hex to dec as some characters hex representations are longer than others.
Both of these solutions are compatible with unicode and use a character's code point. In both solutions, a newline is chosen as separator for clarity; change this to '' for no separator.
Bash
sep='\n'
charAry=($(printf 'abc🎶' | grep -o .))
for i in "${charAry[#]}"; do
printf "%d$sep" "'$i"
done && echo
97
98
99
127926
Python (in Bash)
Here, we use a list comprehension to convert every character to a decimal number (ord), join it as a string and print it. sys.stdin.read() allows us to use Python inline to get input from a pipe. If you replace input with your intended string, this solution is then cross-platform.
printf '%s' 'abc🎶' | python -c "
import sys
input = sys.stdin.read()
sep = '\n'
print(sep.join([str(ord(i)) for i in input]))"
97
98
99
127926
Edit: If all you care about is using hex regardless of encoding, use #user1934428's answer
How to replace the last character in column 2 with value 0
input
1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1
output desired
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Modifying Input with gensub()
You can use any number of GNU awk string functions to do this, but the gensub() command is particularly useful. It has the signature:
gensub(regexp, replacement, how [, target])
which makes it extremely flexible for these sorts of transformations.
Converting Your Example
# Store your input in a shell variable for MCVE convenience, although
# you can have this data in a file or pass it on standard input if you
# prefer.
example_input='1232;1001;1
2231;2007;1
2234;2009;2
2003;1114;1'
# Use awk's gensub() string function.
echo "$example_input" | awk '{print gensub(/.;/, "0;", 2, $1)}'
This results in the following output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
awk approach:
awk -F';' '{ sub(/.$/,0,$2) }1' OFS=';' file
The output:
1232;1000;1
2231;2000;1
2234;2000;2
2003;1110;1
Or the same with substr() function:
awk -F';' '{ $2=substr($2,0,3)0 }1' OFS=';' file
not necessarily better, but a mathematical approach for numerical data...
$ awk 'BEGIN{FS=OFS=";"} {$2=int($2/10)*10}1'
round down the last digits (ones), to round down two digits (ones and tens) replace 10 with 100.
Or, simple replacement is easier with GNU sed
$ sed 's/.;/0;/2'
I would do that with sed:
sed -e 's/^\([^;]*;[^;]*\).;/\10;/' filename
I have strings like following which should be parsed with only unix command (bash)
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
I want to trim the strings like above upto 4th underscore from end/right side. So output should be
49_sftp_mac_myfile_simul_test
Number of underscores can vary in overall string. For example, The string could be
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
Output should be (after trimming up to 4th occurrence of underscore from right.
49_sftp_simul_test
Easily done using awk that decrements NF i.e. no. of fields to -4 after setting input+output field separator as underscore:
s='49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed'
awk 'BEGIN{FS=OFS="_"} {NF -= 4; $1=$1} 1' <<< "$s"
49_sftp_mac_myfile_simul_test
You can use bash's parameter expansion for that:
string="..."
echo "${string%_*_*_*_*}"
With GNU sed:
$ sed -E 's/(_[^_]*){4}$//' <<< "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"
49_sftp_mac_myfile_simul_test
From the end of line, removes 4 occurrences of _ followed by non _ characters.
Perl one-liner
echo $your-string | perl -lne '$n++ while /_/g; print join "_",((split/_/)[-$n-1..-5])'
input
49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_mac_myfile_simul_test
input
49_sftp_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed
the output
49_sftp_simul_test
Not the fastest but maybe the easiest to remember and funiest:
echo "49_sftp_mac_myfile_simul_test_9999_4000000000000001_2017-02-06_15-15-26.49.csv.failed"|
rev | cut -d"_" -f5- | rev
I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.
Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135
In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15
I have tried with
xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt
and
xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt
But I can't get it working.
Can anybody give me a hint?
As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:
#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
i += len(needle)
print (haystack[i : i + 28])
i = haystack.find(needle, i)
If your grep supports -P parameter then you could simply use the below command.
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A
For 56 chars,
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135
Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.
Adapted from this post: Grep characters before and after match?
I added some blank lines for readability.
VirtualBox:~$ cat data.dat
Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata
VirtualBox:~$ cat test.sh
awk '/5Z\<\/time\>/ {
match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat
VirtualBox:~$ ./test.sh
2011-04-26T14:47:02-14:47:15
VirtualBox:~$
EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.