trim the output on command line unix - linux

I have an ouput from the command line, i need to trim and get the desired output as shown below:
Input:
['0x66']
['0x66', '0x137', '0xa9']
[]
['0x148', '0x11a', '0x167', '0x151', '0xe6']
[]
['0x171', '0xe2', '0x174']
Output:
0x66
0x66
0x137
0xa9
0x148
0x11a
0x151
0xe6
I used: tr -d "[]'," but after removing those do linux has any command like .split() in python.
[EDIT] After looking at the man pages of tr, I see there is a translate option so I piped the whole ouptut to:
output | tr -d "[]' | tr " " "\n"

I suggest using simple grep -o (show only matching text):
grep -o "0x[^']*" file
0x66
0x66
0x137
0xa9
0x148
0x11a
0x167
0x151
0xe6
0x171
0xe2
0x174

How about piping to this:
| tr -d "[]' " | tr ',' '\n' | sed -n '/^.\+$/p'
It first deletes useless chars, then "splits" the fields to their own lines and then removes any empty ones.

With tr and a little help from sed:
tr -d "[]' " <file.txt | sed -n '/./ p' | tr ',' '\n'
tr -d "[]' " removes all[,]`, space and single quotes from the data
sed part matches lines with at least one character to leave out empty lines
Then tr ',' '\n' converts all commas to newlines
% cat file.txt
['0x66']
['0x66', '0x137', '0xa9']
[]
['0x148', '0x11a', '0x167', '0x151', '0xe6']
[]
['0x171', '0xe2', '0x174']
% tr -d "[]' " <file.txt | sed -n '/./ p' | tr ',' '\n'
0x66
0x66
0x137
0xa9
0x148
0x11a
0x167
0x151
0xe6
0x171
0xe2
0x174

Related

How to join every newline Strings within single or double quote

How to join every newline Strings within single or double quote separated by comma.
Example:
I have below names..
$ cat file
James kurt
Suji sane
Bhujji La
Loki Hapa
Desired:
"James kurt", "Suji sane", "Bhujji La", "Loki Hapa"
EDIT:
My Side Efforts:
Below which i have done but there i'm completing it in two steps, jst curious if it can be clubbed into one only.
$ awk '{print "\x22" $1" "$2 "\x22"}'| tr '\n' ','
First print all lines with the " and then join the lines with a comma:
< file xargs -d '\n' printf '"%s"\n' | paste -sd,
Instead of newline you could just remove trailing (or leading comma):
< file xargs -d '\n' printf '"%s",' | sed 's/,$//'
< file xargs -d '\n' printf ',"%s"' | cut -c2-
< file xargs -d '\n' printf ', "%s"' | cut -c3- # with space after comma
With sed add the " and hold the lines, then on last line replace newline with comma and remove the leading command and print:
sed -n 's/^/"/;s/$/"/;H;${x;s/\n/, /g;s/^, //;p}' file
You were close! The " " in your attempt adds a space between the line and ". You could:
awk '{print "\x22" $0 "\x22"}' | tr '\n' ',' |
# and then remove trailing comma:
sed 's/,$//'
But joining the lines with paste is just simpler then replacing newlines with comma and removing the last one:
awk '{print "\x22" $0 "\x22"}' | paste -sd,
Could you please try following.
awk -v lines=$(wc -l < Input_file) -v s1="\"" '
BEGIN{
OFS=", "
}
{
printf("%s%s",s1 $0 s1,lines==FNR?ORS:OFS)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v lines=$(wc -l < Input_file) -v s1="\"" ' ##Starting awk program, creating variable lines which has total number of lines in Input_file and creating s1 variable with " in it.
BEGIN{ ##Starting BEGIN section of this program from here.
OFS=", " ##Setting OFS value as comma space here.
}
{
printf("%s%s",s1 $0 s1,lines==FNR?ORS:OFS) ##Printing current line and either printing space or new line as per condition.
}
' Input_file ##Mentioning Input_file name here.
awk '{printf "%s",(NR==1?"":",")"\042"$0"\042"}END{print ""}'
Note that the last END statement is only used to add the last new-line to the output. This makes it POSIX complaint.
This might work for you (GNU sed):
sed ':a;N;$!ba;s/.*/"&"/mg;s/\n/, /g' file
Slurp file into the pattern space, surround lines by double quotes and replace newlines by a comma and a space.
Alternative:
sed -z 's/\n$//;s/.*/"&"/mg;s/\n/, /g;s/$/\n/' file

how to replace a specific char occurrences in string after a given substring

I have a string Contain key=value format separated by #
I am trying to replace the '=' char occurrences with ':' in the value of TITLE using BASH script.
"ID=21566#OS=Linux#TARGET_END=Synchronica#DEPENDENCY=Independent#AUTOMATION_OS=Linux#AUTOMATION_TOOL=JSystem#TITLE=Session tracking. "DL Started" Status Reported.Level=none"
later on i am parsing this string to execute the eval operation
eval $(echo $test_line | sed 's/"//g' | tr '#' '\n' | tr ' ' '_' | sed 's/=/="/g' | sed 's/$/"/g')
When the sed 's/=/="/g' section will also change ..Level=none to
Level="none
This leads to
eval: line 52: unexpected EOF while looking for matching `"'
What will be right replace bash command to replace my string ?
As an alternative, consider pure-bash solution to bring the variables into bash, avoiding the (risky) eval.
IFS=# read -a kv <<<"ID=21566#OS=Linux#TARGET_END=Synchronica#..."
for kvp in "${kv[#]}" ; do
declare "$kvp"
done
I found the way to solve it.
I will add sed 's/=/:/8g' to my eval command.
It will replace 8th to nth occurrences of '='.
The action will only effect the value of TITLE as expected.
eval $(echo $test_line | sed 's/=/:/8g' | sed 's/"/"/g' | tr '#' '\n' | tr ' ' '_' | sed 's/=/="/g' | sed 's/$/"/g')
I did it like this :
echo '"ID=21566#OS:Linux#TARGET_END:Synchronica#DEPENDENCY:Independent#AUTOMATION_OS:Linux#AUTOMATION_TOOL:JSystem#TITLE:Session tracking. "DL Started" Status Reported.Level=none"' \
|
sed -E 's/(#)?([A-Z_]+)(=)/\1\2:/g'
Let me know if it works for you.

How to translate and remove non-printable characters? [duplicate]

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.
Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

Count number of patterns with a single command

I'd like to count the number of occurrences in a string. For example, in this string :
'apache2|ntpd'
there are 2 different strings separated by | character.
Another example :
'apache2|ntpd|authd|freeradius'
In this case there are 4 different strings separated by | character.
Would you know a shell or perl command that could simply count this for me?
you can use awk command as below;
echo "apache2|ntpd" | awk -F'|' '{print NF}'
-F'|' is to field separator;
NF means Number of Fields
Example;
user#host:/tmp$ echo 'apache2|ntpd|authd|freeradius' | awk -F'|' '{print NF}'
4
you can also use this;
user#host:/tmp$ echo "apache2|ntpd" | tr '|' ' ' | wc -w
2
user#host:/tmp$ echo 'apache2|ntpd|authd|freeradius' | tr '|' ' ' | wc -w
4
tr '|' ' ' : translate | to space
wc -w : print the word counts
if there are spaces in the string, wc -w not correct result, so
echo 'apac he2|ntpd' | tr '|' '\n' | wc -l
user#host:/tmp$ echo 'apac he2|ntpd' | tr '|' ' ' | wc -w
3 --> not correct
user#host:/tmp$ echo 'apac he2|ntpd' | tr '|' '\n' | wc -l
2
tr '|' '\n' : translate | to newline
wc -l : number of lines
Do can do this just within bash without calling external languages like awk or external programs like grep and tr.
data='apache2|ntpd|authd|freeradius'
res=${data//[!|]/}
num_strings=$(( ${#res} + 1 ))
echo $num_strings
Let me explain.
res=${data//[!|]/} removes all characters that are not (that's the !) pipes (|).
${#res} gives the length of the resulting string.
num_strings=$(( ${#res} + 1 )) adds one to the number of pipes to get the number of fields.
It's that simple.
Another pure bash technique using positional-parameters
$ userString="apache2|ntpd|authd|freeradius"
$ printf "%s\n" $(IFS=\|; set -- $userString; printf "%s\n" "$#")
4
Thanks to cdarke's suggestion from the commands, the above command can directly store the count to a variable
$ printf -v count "%d" $(IFS=\|; set -- $userString; printf "%s\n" "$#")
$ printf "%d\n" "$count"
4
With wc and parameter expansion:
$ data='apache2|ntpd|authd|freeradius'
$ wc -w <<< ${data//|/ }
4
Using parameter expansion, all pipes are replaced with spaces. The result string is passed to wc -w for word count.
As #gniourf_gniourf mentionned, it works with what at first looks like process names but will fail if strings contain spaces.
You can do this with grep as well-
echo "apache2|ntpd|authd|freeradius" | grep -o "|" | wc -l
Output-
3
That output is the number of pipes.
To get the number of commands-
var=$(echo "apache2|ntpd|authd|freeradius" | grep -o "|" | wc -l)
echo $((var + 1))
Output -
4
You could use awk to count the occurrances of delimiters +1:
$ awk '{print gsub(/\|/,"")+1}' <(echo "apache2|ntpd|authd|freeradius")
4
may be this will help you.
IN="apache2|ntpd"
mails=$(echo $IN | tr "|" "\n")
for addr in $mails
do
echo "> [$addr]"
done

Removing Control Characters from a File

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.
Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

Resources