Bash string split

Bash string split - string

I have a log file that I'm reading line by line.
Possible input:
" 0:00 InitAuth: \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners"
Wanted Output:
$TIME = 0:00
$TYPE = InitAuth:
$DATA = \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners
Or $Output[0], $Output[1], $Output[2]
I don't care if it will be 1 array or 3 vars.
At first i was thinking about splitting that line into 3 vars and use space like delimiter, so i was trying to search for sh replacement for PHP command explode(' ', $input, 3); but then i found line like this:
"1728:32ClientUserinfoChanged: 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1\0\a2\0"
$TIME = 1728:32
$TYPE = ClientUserinfoChanged:
$DATA = 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1
And there is no space between Time and Type info.
So I want to know, how should i split that text now ?
Also I'm novice in Shell and I'm googling every possible command.

Something along this line might help you get desired output:
sed -r 's/([0-9]+:[0-9]{2})([a-zA-Z ]+:)(.*)/$TIME:\1\n$TYPE:\2\n$DATA:\3/'
Test:
[jaypal:~] echo "1728:32ClientUserinfoChanged: 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1\0\a2\0" | gsed -r 's/([0-9 ]+:[0-9]{2})([a-zA-Z ]+:)(.*)/$TIME:\1\n$TYPE:\2\n$DATA:\3/'
$TIME:1728:32
$TYPE:ClientUserinfoChanged:
$DATA: 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1\0\a2\0
[jaypal:~] echo "0:00 InitAuth: \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners" | gsed -r 's/([0-9 ]+:[0-9]{2})([a-zA-Z ]+:)(.*)/$TIME:\1\n$TYPE:\2\n$DATA:\3/'
$TIME:0:00
$TYPE: InitAuth:
$DATA: \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners
[jaypal:~]

So here is what i wanted and it works as well as I need, it's good. :)
Thanks Jaypal.
TIME=`echo $LINE | sed -r 's/([0-9]+:[0-9]{2})(.*)/\1/'`
TYPE=`echo $LINE | sed -r 's/([0-9]+:[0-9]{2})([a-zA-Z ]+:)(.*)/\2/'`
DATA=`echo $LINE | sed -r 's/([0-9]+:[0-9]{2})([a-zA-Z ]+:)(.*)/\3/'`
I guess that someone would truncate that :D

Related

How can I fix my bash script to find a random word from a dictionary?

I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?

problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1

MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".

This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}

I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465

cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1

Count total number of pattern between two pattern (using sed if possible) in Linux

I have to count all '=' between two pattern i.e '{' and '}'
Sample:
{
100="1";
101="2";
102="3";
};
{
104="1,2,3";
};
{
105="1,2,3";
};
Expected Output:
3
1
1

A very cryptic perl answer:
perl -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ge'
The tr function returns the number of characters transliterated.
With the new requirements, we can make a couple of small changes:
perl -0777 -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ges'
-0777 reads the entire file/stream into a single string
the s flag to the s/// function allows . to handle newlines like a plain character.

Perl to the rescue:
perl -lne '$c = 0; $c += ("$1" =~ tr/=//) while /\{(.*?)\}/g; print $c' < input
-n reads the input line by line
-l adds a newline to each print
/\{(.*?)\}/g is a regular expression. The ? makes the asterisk frugal, i.e. matching the shortest possible string.
The (...) parentheses create a capture group, refered to as $1.
tr is normally used to transliterate (i.e. replace one character by another), but here it just counts the number of equal signs.
+= adds the number to $c.

Awk is here too
grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'
example
echo '{100="1";101="2";102="3";};
{104="1,2,3";};
{105="1,2,3";};'|grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'
output
3
1
1

First some test input (a line with a = outside the curly brackets and inside the content, one without brackets and one with only 2 brackets)
echo '== {100="1";101="2";102="3=3=3=3";} =;
a=b
{c=d}
{}'
Handle line without brackets (put a dummy char so you will not end up with an empty string)
sed -e 's/^[^{]*$/x/'
Handle line without equal sign (put a dummy char so you will not end up with an empty string)
sed -e 's/{[^=]*}/x/'
Remove stuff outside the brackets
sed -e 's/.*{\(.*\)}/\1/'
Remove stuff inside the double quotes (do not count fields there)
sed -e 's/"[^"]*"//g'
Use #repzero method to count equal signs
awk -F "=" '{print NF-1}'
Combine stuff
echo -e '{100="1";101="2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$/x/' -e 's/{[^=]*}/x/' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{print NF-1}'
The ugly temp fields x and replacing {} can be solved inside awk:
echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{if (NF>0) c=NF-1; else c=0; print c}'
or shorter
echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{print (NF>0) ? NF-1 : 0; }'

No harder sed than done ... in.
Restricting this answer to the environment as tagged, namely:
linux shell unix sed wc
will actually not require the use of wc (or awk, perl, or any other app.).
Though echo is used, a file source can easily exclude its use.
As for bash, it is the shell.
The actual environment used is documented at the end.
NB. Exploitation of GNU specific extensions has been used for brevity
but appropriately annotated to make a more generic implementation.
Also brace bracketed { text } will not include braces in the text.
It is implicit that such braces should be present as {} pairs but
the text src. dangling brace does not directly violate this tenet.
This is a foray into the world of `sed`'ng to gain some fluency in it's use for other purposes.
The ideas expounded upon here are used to cross pollinate another SO problem solution in order
to aquire more familiarity with vetting vagaries of vernacular version variances. Consequently
this pedantic exercice hopefully helps with the pedagogy of others beyond personal edification.
To test easily, at least in the environment noted below, judiciously highlight the appropriate
code section, carefully excluding a dangling pipe |, and then, to a CLI command line interface
drag & drop, copy & paste or use middle click to enter the code.
The other SO problem. linux - Is it possible to do simple arithmetic in sed addresses?
# _______________________________ always needed ________________________________
echo -e '\n
\n = = = {\n } = = = each = is outside the braces
\na\nb\n { } so therefore are not counted
\nc\n { = = = = = = = } while the ones here do count
{\n100="1";\n101="2";\n102="3";\n};
\n {\n104="1,2,3";\n};
a\nb\nc\n {\n105="1,2,3";\n};
{ dangling brace ignored junk = = = \n' |
# _____________ prepatory conditioning needed for final solutions _____________
sed ' s/{/\n{\n/g;
s/}/\n}\n/g; ' | # guarantee but one brace to a line
sed -n '/{/ h; # so sed addressing can "work" here
/{/,/}/ H; # use hHold buffer for only { ... }
/}/ { x; s/[^=]*//g; p } ' | # then make each {} set a line of =
# ____ stop code hi-lite selection in ^--^ here include quote not pipe ____
# ____ outputs the following exclusive of the shell " # " comment quotes _____
#
#
# =======
# ===
# =
# =
# _________________________________________________________________________
# ____________________________ "simple" GNU solution ____________________________
sed -e '/^$/ { s//0/;b }; # handle null data as 0 case: next!
s/=/\n/g; # to easily count an = make it a nl
s/\n$//g; # echo adds an extra nl - delete it
s/.*/echo "&" | sed -n $=/; # sed = command w/ $ counts last nl
e ' # who knew only GNU say you ah phoo
# 0
# 0
# 7
# 3
# 1
# 1
# _________________________________________________________________________
# ________________________ generic incomplete "solution" ________________________
sed -e '/^$/ { s//echo 0/;b }; # handle null data as 0 case: next!
s/=$//g; # echo adds an extra nl - delete it
s/=/\\\\n/g; # to easily count an = make it a nl
s/.*/echo -e & | sed -n $=/; '
# _______________________________________________________________________________
The paradigm used for the algorithm is instigated by the prolegomena study below.
The idea is to isolate groups of = signs between { } braces for counting.
These are found and each group is put on a separate line with ALL other adorning characters removed.
It is noted that sed can easily "count", actually enumerate, nl or \n line ends via =.
The first "solution" uses these sed commands:
print
branch w/o label starts a new cycle
h/Hold for filling this sed buffer
exchanage to swap the hold and pattern buffers
= to enumerate the current sed input line
substitute s/.../.../; with global flag s/.../.../g;
and most particularly the GNU specific
evaluate (execute can not remember the actual mnemonic but irrelevantly synonymous)
The GNU specific execute command is avoided in the generic code. It does not print the answer but
instead produces code that will print the answer. Run it to observe. To fully automate this, many
mechanisms can be used not the least of which is the sed write command to put these lines in a
shell file to be excuted or even embed the output in bash evaluation parentheses $( ) etc.
Note also that various sed example scripts can "count" and these too can be used efficaciously.
The interested reader can entertain these other pursuits.
prolegomena:
concept from counting # of lines between braces
sed -n '/{/=;/}/=;'
to
sed -n '/}/=;/{/=;' |
sed -n 'h;n;G;s/\n/ - /;
2s/^/ Between sets of {} \n the nl # count is\n /;
2!s/^/ /;
p'
testing "done in":
linuxuser#ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
linuxuser#ubuntu:~$ sed --version -----> sed (GNU sed) 4.4

And for giggles an awk-only alternative:
echo '{
> 100="1";
> 101="2";
> 102="3";
> };
> {
> 104="1,2,3";
> };
> {
> 105="1,2,3";
> };' | awk 'BEGIN{RS="\n};";FS="\n"}{c=gsub(/=/,""); if(NF>2){print c}}'
3
1
1

How to get the last character of a string in a shell?

I have written the following lines to get the last character of a string:
str=$1
i=$((${#str}-1))
echo ${str:$i:1}
It works for abcd/:
$ bash last_ch.sh abcd/
/
It does not work for abcd*:
$ bash last_ch.sh abcd*
array.sh assign.sh date.sh dict.sh full_path.sh last_ch.sh
It lists the files in the current folder.

Per #perreal, quoting variables is important, but because I read this post like five times before finding a simpler approach to the question at hand in the comments...
str='abcd/'
echo "${str: -1}"
=> /
Alternatively use ${str:0-1} as pointed out in the comments.
str='abcd*'
echo "${str:0-1}"
=> *
Note: The extra space in ${str: -1} is necessary, otherwise ${str:-1} would result in 1 being taken as the default value if str is null or empty.
${parameter:-word}
Use Default Values. If parameter is unset or null, the
expansion of word is substituted. Otherwise, the value of
parameter is substituted.
Thanks to everyone who participated in the above; I've appropriately added +1's throughout the thread!

That's one of the reasons why you need to quote your variables:
echo "${str:$i:1}"
Otherwise, bash expands the variable and in this case does globbing before printing out. It is also better to quote the parameter to the script (in case you have a matching filename):
sh lash_ch.sh 'abcde*'
Also see the order of expansions in the bash reference manual. Variables are expanded before the filename expansion.
To get the last character you should just use -1 as the index since the negative indices count from the end of the string:
echo "${str: -1}"
The space after the colon (:) is REQUIRED.
This approach will not work without the space.

I know this is a very old thread, but no one mentioned which to me is the cleanest answer:
echo -n $str | tail -c 1
Note the -n is just so the echo doesn't include a newline at the end.

Every answer so far implies the word "shell" in the question equates to Bash.
This is how one could do that in a standard Bourne shell:
printf "%s" "$str" | tail -c 1

another solution using awk script:
last 1 char:
echo $str | awk '{print substr($0,length,1)}'
last 5 chars:
echo $str | awk '{print substr($0,length-5,5)}'

Single line:
${str:${#str}-1:1}
Now:
echo "${str:${#str}-1:1}"

Try:
"${str:$((${#str}-1)):1}"
For e.g.:
someone#mypc:~$ str="A random string*"; echo "$str"
A random string*
someone#mypc:~$ echo "${str:$((${#str}-1)):1}"
*
someone#mypc:~$ echo "${str:$((${#str}-2)):1}"
g

For portability
you can say "${s#"${s%?}"}":
#!/bin/sh
m=bzzzM n=bzzzN
for s in \
'vv' 'w' '' 'uu ' ' uu ' ' uu' / \
'ab?' 'a?b' '?ab' 'ab??' 'a??b' '??ab' / \
'cd#' 'c#d' '#cd' 'cd##' 'c##d' '##cd' / \
'ef%' 'e%f' '%ef' 'ef%%' 'e%%f' '%%ef' / \
'gh*' 'g*h' '*gh' 'gh**' 'g**h' '**gh' / \
'ij"' 'i"j' '"ij' "ij'" "i'j" "'ij" / \
'kl{' 'k{l' '{kl' 'kl{}' 'k{}l' '{}kl' / \
'mn$' 'm$n' '$mn' 'mn$$' 'm$$n' '$$mn' /
do case $s in
(/) printf '\n' ;;
(*) printf '.%s. ' "${s#"${s%?}"}" ;;
esac
done
Output:
.v. .w. .. . . . . .u.
.?. .b. .b. .?. .b. .b.
.#. .d. .d. .#. .d. .d.
.%. .f. .f. .%. .f. .f.
.*. .h. .h. .*. .h. .h.
.". .j. .j. .'. .j. .j.
.{. .l. .l. .}. .l. .l.
.$. .n. .n. .$. .n. .n.

expr $str : '.*\(.\)'
Or
echo ${str: -1}

For anyone interested in a pure POSIX method:
https://github.com/spiralofhope/shell-random/blob/master/live/sh/scripts/string-fetch-last-character.sh
#!/usr/bin/env sh
string_fetch_last_character() {
length_of_string=${#string}
last_character="$string"
i=1
until [ $i -eq "$length_of_string" ]; do
last_character="${last_character#?}"
i=$(( i + 1 ))
done
printf '%s' "$last_character"
}
string_fetch_last_character "$string"

echo $str | cut -c $((${#str}))
is a good approach

Searching for text

I'm trying to write a shell script that searches for text within a file and prints out the text and associated information to a separate file.
From this file containing list of gene IDs:
DDIT3 ENSG00000175197
DNMT1 ENSG00000129757
DYRK1B ENSG00000105204
I want to search for these gene IDs (ENSG*), their RPKM1 and RPKM2 values in a gtf file:
chr16 gencodeV7 gene 88772891 88781784 0.126744 + . gene_id "ENSG00000174177.7"; transcript_ids "ENST00000453996.1,ENST00000312060.4,ENST00000378384.3,"; RPKM1 "1.40735"; RPKM2 "1.61345"; iIDR "0.003";
chr11 gencodeV7 gene 55850277 55851215 0.000000 + . gene_id "ENSG00000225538.1"; transcript_ids "ENST00000425977.1,"; RPKM1 "0"; RPKM2 "0"; iIDR "NA";
and print/ write it to a separate output file
Gene_ID RPKM1 RPKM2
ENSG00000108270 7.81399 8.149
ENSG00000101126 12.0082 8.55263
I've done it on the command line using for each ID using:
grep -w "ENSGno" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' > output.file
but when it comes to writing the shell script, I've tried various combinations of for, while, read, do and changing the variables but without success. Any ideas would be great!

You can do something like:
while read line
do
var=$(echo $line | awk '{print $2}')
grep -w "$var" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' >> output.file
done < geneIDs.file

Retrieving information from a text file. Linux

Basically I am trying to read information from three text files in which it contains unique information.
The way the text file is setup is this:
textA.txt
----------------
something.awesome.com
something2.awesome.com
something3.awesome.com
...
textB.txt
----------------
123
456
789
...
textC.txt
----------------
12.345.678.909
87.65.432.1
102.254.326.12
....
Now what its suppose to look like when i output it something like this
something.awesome.com : 123 : 12.345.678.909
something2.awesome.com : 456 : 87.65.432.1
something3.awesome.com : 789 : 102.254.326.12
The code I am trying now is this:
for each in `cat site.txt` ; do
site=`echo $each | cut -f1`
for line in `cat port.txt` ; do
port=`echo $line | cut -f1`
for this in `cat ip.txt` ; do
connect=`echo $this | cut -f1`
echo "$site : $port : $connect"
done
done
done
The result I am getting is just crazy wrong and just not what i want. I don't know how to fix this.
I want to be able to call the information through variable form.

paste testA.txt testB.txt testC.txt | sed -e 's/\t/ : /g'
Output is:
something.awesome.com : 123 : 12.345.678.909
something2.awesome.com : 456 : 87.65.432.1
something3.awesome.com : 789 : 102.254.326.12
Edit: Here is a solution using pure bash:
#!/bin/bash
exec 7<testA.txt
exec 8<testB.txt
exec 9<testC.txt
while true
do
read site <&7
read port <&8
read connect <&9
[ -z "$site" ] && break
echo "$site : $port : $connect"
done
exec 7>&-
exec 8>&-
exec 9>&-

Have you looked at using paste ?
e.g.
$ paste testA.txt testB.txt
etc. The -d operator will specify a separator character.
A related utility is the SQL-like join, which you can use in scenarios where you have to join using fields common to your input files.

head -2 /etc/hosts | tail -1 | awk '{print$2}'
where /etc/hosts is the name of a file.
(head -2 ) is used to retrieve top 2 lines from the file.
(tail -1) is used to retrieve only last one line outputed from (head -2).
(awk '{print$2}') is used to print the 2nd column of line outputted from (tail -1).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash string split - string

Related

How can I fix my bash script to find a random word from a dictionary?

Count total number of pattern between two pattern (using sed if possible) in Linux

How to get the last character of a string in a shell?

Searching for text

Retrieving information from a text file. Linux

Categories

Resources