Obtaining specific data from a very very long string unix

Obtaining specific data from a very very long string unix - string

I'm doing a script in unix for obtaining specific data, after running a program it gives as an output a very huge string, for example: (is just a random example)
In this example, the null scorex: 34;hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar scorex: 687;test statistic with Yates's continuity correction is scorex: 9;
and I like that whenever it finds the string "scorex: " it gives me the actual score: 34, 687 or 9, for this example.
Thank you
I forgot, my string is inside a variable called RESULTADO

You can use grep:
grep -oP 'scorex:\s?\K\d*' input
or
<command> | grep -oP 'scorex:\s?\K\d*'
For your example:
$ echo "In this example, the null scorex: 34;hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar scorex: 687;test statistic with Yates's continuity correction is scorex: 9;" | grep -oP 'scorex:\s?\K\d*'
34
687
9

This can be solved via a regex. Considering the following pattern:
scorex: (\d+)
Using this pattern with grep would look like this:
grep -Eo "scorex: (\d+)" file_containing_string | cut -d: -f2
The output of this is for every capture the result:
34
687
9

Related

How can I fix my bash script to find a random word from a dictionary?

I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?

problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1

MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".

This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}

I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465

cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1

Using printf to display an extracted string through grep and to be use as user input in a script

Good day,
This is kinda lenghty, Im hoping for the kind help of anybody who can support me on this simple problem (to others) but taking me almost forever to figure out.
I have this file (EOL.txt) which consists of the following sample lists:
35 - 5976
36 - 5976C0
53 - 5976C2
64 - 5976D0
69 - 43593
72 - 43593C0
Im using the following commands to extract the leftmost figure since this correspond to a routine number of another script:
grep 5976C2 EOL.txt | head -n1 | cut -d- -f1
After I acquired that number, I will input that along with the other data on another script (N.csh-syntax as follows) that will execute another one (Test.csh):
$./N.csh 53 XXXX.XX "01 02 03"
N.csh --> printf "$1\n$2\n$3\nYYYY\n1\nN\n" | /export/home/Script/Test.csh
What I want to do now is to incorporate the grep command to N.csh so that I wont have to do that separately. It should look like this:
$./N.csh 5976C2 XXXX.XX "01 02 03"
I tried the following commands but its not working.
grep $1 EOL.txt | head -n1 | cut -d- -f1 >> A ; set B=`cat A` ; printf %s "$B\n$2\n$3\n82869\n1\nN\n"
Im new to this stuff, any help will be highly appreciated.
Thanks a lot in advance.
Mike

You can use the following in the file N.csh:
set mynumber = `grep $1 EOL.txt | head -n1 | cut -d- -f1`
printf "$mynumber\n$2\n$3\n...
and then invoke N.csh like
./N.csh 5976C2 XXXX.XX "01 02 03"
Note that after set mynumber =, in the first line, there is a "backtick" - a reversed single quote. The shell executes the commands delimited by two backticks, takes the output, and puts it back in place of the original contents, so the first line turns into set mynumber = 53.

Find HEX value in file and grep the following value

I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.
Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135
In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15
I have tried with
xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt
and
xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt
But I can't get it working.
Can anybody give me a hint?

As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:
#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
i += len(needle)
print (haystack[i : i + 28])
i = haystack.find(needle, i)

If your grep supports -P parameter then you could simply use the below command.
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A
For 56 chars,
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135

Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.
Adapted from this post: Grep characters before and after match?
I added some blank lines for readability.
VirtualBox:~$ cat data.dat
Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata
VirtualBox:~$ cat test.sh
awk '/5Z\<\/time\>/ {
match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat
VirtualBox:~$ ./test.sh
2011-04-26T14:47:02-14:47:15
VirtualBox:~$
EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.

Filtering Linux command output

I need to get a row based on column value just like querying a database. I have a command output like this,
Name ID Mem VCPUs State
Time(s)
Domain-0 0 15485 16 r-----
1779042.1
prime95-01 512 1
-b---- 61.9
Here I need to list only those rows where state is "r". Something like this,
Domain-0 0 15485 16
r----- 1779042.1
I have tried using "grep" and "awk" but still I am not able to succeed.
Any help me is much appreciated
Regards,
Raaj

There is a variaty of tools available for filtering.
If you only want lines with "r-----" grep is more than enough:
command | grep "r-----"
Or
cat filename | grep "r-----"

grep can handle this for you:
yourcommand | grep -- 'r-----'
It's often useful to save the (full) output to a file to analyse later. For this I use tee.
yourcommand | tee somefile | grep 'r-----'
If you want to find the line containing "-b----" a little later on without re-running yourcommand, you can just use:
grep -- '-b----' somefile
No need for cat here!
I recommend putting -- after your call to grep since your patterns contain minus-signs and if the minus-sign is at the beginning of the pattern, this would look like an option argument to grep rather than a part of the pattern.

try:
awk '$5 ~ /^r.*/ { print }'
Like this:
cat file | awk '$5 ~ /^r.*/ { print }'

grep solution:
command | grep -E "^([^ ]+ ){4}r"
What this does (-E switches on extended regexp):
The first caret (^) matches the beginning of the line.
[^ ] matches exactly one occurence of a non-space character, the following modifier (+) allows it to also match more occurences.
Grouped together with the trailing space in ([^ ]+ ), it matches any sequence of non-space characters followed by a single space. The modifyer {4} requires this construct to be matched exactly four times.
The single "r" is then the literal character you are searching for.
In plain words this could be written like "If the line starts <^> with four strings that are followed by a space <([^ ]+ ){4}> and the next character is , then the line matches."
A very good introduction into regular expressions has been written by Jan Goyvaerts (http://www.regular-expressions.info/quickstart.html).

Filtering by awk cmd in linux:-
Firstly find the column for this cmd and store file2 :-
awk '/Domain-0 0 15485 /' file1 >file2
Output:-
Domain-0 0 15485 16
r----- 1779042.1
after that awk cmd in file2:-
awk '{print $1,$2,$3,$4,"\n",$5,$6}' file2
Final Output:-
Domain-0 0 15485 16
r----- 1779042.1

Getting n-th line of text output

I have a script that generates two lines as output each time. I'm really just interested in the second line. Moreover I'm only interested in the text that appears between a pair of #'s on the second line. Additionally, between the hashes, another delimiter is used: ^A. It would be great if I can also break apart each part of text that is ^A-delimited (Note that ^A is SOH special character and can be typed by using Ctrl-A)

output | sed -n '1p' #prints the 1st line of output
output | sed -n '1,3p' #prints the 1st, 2nd and 3rd line of output

your.program | tail +2 | cut -d# -f2
should get you 2/3 of the way.

Improving Grumdrig's answer:
your.program | head -n 2| tail -1 | cut -d# -f2

I'd probably use awk for that.
your_script | awk -F# 'NR == 2 && NF == 3 {
num_tokens=split($2, tokens, "^A")
for (i = 1; i <= num_tokens; ++i) {
print tokens[i]
}
}'
This says
1. Set the field separator to #
2. On lines that are the 2nd line, and also have 3 fields (text#text#text)
3. Split the middle (2nd) field using "^A" as the delimiter into the array named tokens
4. Print each token
Obviously this makes a lot of assumptions. You might need to tweak it if, for example, # or ^A can appear legitimately in the data, without being separators. But something like that should get you started. You might need to use nawk or gawk or something, I'm not entirely sure if plain awk can handle splitting on a control character.

bash:
read
read line
result="${line#*#}"
result="${result%#*}"
IFS=$'\001' read result -a <<< "$result"
$result is now an array that contains the elements you're interested in. Just pipe the output of the script to this one.

here's a possible awk solution
awk -F"#" 'NR==2{
for(i=2;i<=NF;i+=2){
split($i,a,"\001") # split on SOH
for(o in a ) print o # print the splitted hash
}
}' file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Obtaining specific data from a very very long string unix - string

This can be solved via a regex. Considering the following pattern: scorex: (\d+) Using this pattern with grep would look like this: grep -Eo "scorex: (\d+)" file_containing_string | cut -d: -f2 The output of this is for every capture the result: 34 687 9

Related

How can I fix my bash script to find a random word from a dictionary?

Using printf to display an extracted string through grep and to be use as user input in a script

Find HEX value in file and grep the following value

Filtering Linux command output

Getting n-th line of text output

Categories

Resources