Retrieving information from a text file. Linux - linux

Basically I am trying to read information from three text files in which it contains unique information.
The way the text file is setup is this:
textA.txt
----------------
something.awesome.com
something2.awesome.com
something3.awesome.com
...
textB.txt
----------------
123
456
789
...
textC.txt
----------------
12.345.678.909
87.65.432.1
102.254.326.12
....
Now what its suppose to look like when i output it something like this
something.awesome.com : 123 : 12.345.678.909
something2.awesome.com : 456 : 87.65.432.1
something3.awesome.com : 789 : 102.254.326.12
The code I am trying now is this:
for each in `cat site.txt` ; do
site=`echo $each | cut -f1`
for line in `cat port.txt` ; do
port=`echo $line | cut -f1`
for this in `cat ip.txt` ; do
connect=`echo $this | cut -f1`
echo "$site : $port : $connect"
done
done
done
The result I am getting is just crazy wrong and just not what i want. I don't know how to fix this.
I want to be able to call the information through variable form.

paste testA.txt testB.txt testC.txt | sed -e 's/\t/ : /g'
Output is:
something.awesome.com : 123 : 12.345.678.909
something2.awesome.com : 456 : 87.65.432.1
something3.awesome.com : 789 : 102.254.326.12
Edit: Here is a solution using pure bash:
#!/bin/bash
exec 7<testA.txt
exec 8<testB.txt
exec 9<testC.txt
while true
do
read site <&7
read port <&8
read connect <&9
[ -z "$site" ] && break
echo "$site : $port : $connect"
done
exec 7>&-
exec 8>&-
exec 9>&-

Have you looked at using paste ?
e.g.
$ paste testA.txt testB.txt
etc. The -d operator will specify a separator character.
A related utility is the SQL-like join, which you can use in scenarios where you have to join using fields common to your input files.

head -2 /etc/hosts | tail -1 | awk '{print$2}'
where /etc/hosts is the name of a file.
(head -2 ) is used to retrieve top 2 lines from the file.
(tail -1) is used to retrieve only last one line outputed from (head -2).
(awk '{print$2}') is used to print the 2nd column of line outputted from (tail -1).

Related

shell linux is it possible to treat variable assignation contained into a string?

I wonder if it’s possible in shell to use variable declared into a string ?
I dont know how to be explicit so here is my problem :
I use raspi-gpio for drive gpio, and i want to know states of one GPIO example :
raspi-gpio get 21
return a string :
GPIO 21: level=0 fsel=0 func=INPUT
Into this string there is level=0; fsel=0 and func=INPUT that are like variable declaration for shell.
My question :
Is it possible with minimum line to treat this 3 declarations as variable, for direct use :
if $level == 0 then...
Assuming command raspi-gpio get 21 returns GPIO 21: level=0 fsel=0 func=INPUT
We can delete unneeded GPIO 21: in front by "splitting" it from colon (:) character.
Then pipe the output to next command to replace space with newline.
So we have this command:
GPIO_OUTPUT=`raspi-gpio get 21`
MYCMD=`echo $GPIO_OUTPUT | cut -d':' -f2 | tr ' ' '\n'`
output of echo -e "$MYCMD" would be:
level=0
fsel=0
func=INPUT
This already looks like a valid bash variable assignment, so we can just source it:
source <(echo -e $MYCMD)
Complete code:
GPIO_OUTPUT=`raspi-gpio get 21`
MYCMD=`echo $GPIO_OUTPUT | cut -d':' -f2 | tr ' ' '\n'`
source <(echo -e $MYCMD)
if [ $level -eq 0 ] ; then
echo "level is zero"
fi

How can I fix my bash script to find a random word from a dictionary?

I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?
problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1
MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".
This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}
I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465
cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1

Linux grep and sort log files

I looked almost everywhere (there, there, there, there and there) with no luck.
What I have here is a bunch of log files in a directory, where I need to look for a specific ID (myID) and sort the output by date. Here is an example :
in file1.log :
2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
in file2.log:
2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}
in file3.log:
2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
Exepected output :
2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}
What I am doing now (and it works pretty well), is :
grep -hri --color=always "myID" | sort -n
The only problem is that with the -h option of grep, the file names are hidden. I'd like to keep the file names AND keep the sorting.
I tried :
grep -ri --color=always "myID" | sort -n -t ":" -k1,1 -k2,2
But it doesn't work. Basically, the grep command outputs the name of the file followed by ":", I'd like to sort the results from this character.
Thanks a lot
Try this:
grep --color=always "myID" file*.log | sort -t : -k2,2 -k3,3n -k4,4n
Output:
file3.log:2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
file1.log:2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
file2.log:2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}
Another solution, a little bit longer but I think it should work:
grep -l "myID" file* > /tmp/file_names && grep -hri "myID" file* | sort -n > /tmp/grep_result && paste /tmp/file_names /tmp/grep_result | column -s $'\t' -t
What it does basically is, first store files names by:
grep -l "myID" file* > /tmp/file_names
Store grep sorted results:
grep -hri "myID" file* | sort -n > /tmp/grep_result
Paste the results column-wise (using a tab separator):
paste /tmp/file_names /tmp/grep_result | column -s $'\t' -t
The column ordering for sort is 1-based, so k1 will be your filename part. That means that in your attempt, you are sorting by filename, then by date and hour of your log line. Also, the -n means that you are using numeric ordering, which won't be playing nicely with yyyy-mm-dd hh:mm:ss format (it will read yyyy-mm-dd hh as only the first number, i.e. the year).
You can use:
sort -t ":" -k2
Note that I specified column 2 as the start, and left the end blank. The end defaults to the end-of-line.
If you want to sort specific columns, you need to explicitly set the start and end, for example: -k2,2. You can use this to sort out-of-sequence columns, for example -k4,4 -k2,2 will sort by column 4 and use column 2 for tie-breaking.
You could also use -k2,4, which would stop sorting at the colon just before your log details (i.e. it would use 2015-09-26 15:39:48,788 - ERROR - bar)
Finally, perhaps you want to have your log files in a consistent order if the time is the same:
sort -t ":" -k2,4 -k1,1
Try rust-based tool Super Speedy Syslog Searcher
(assuming you have rust installed)
cargo install super_speedy_syslog_searcher
then
s4 file1.log file2.log file3.log | grep "myID"
The only problem is that with the -h option of grep, the file names are hidden. I'd like to keep the file names AND keep the sorting.
You could try
$ s4 --color=never -nw file1.log file2.log file3.log | grep "myID"
file1.log:2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
file2.log:2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
file3.log:2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}

Bash string split

I have a log file that I'm reading line by line.
Possible input:
" 0:00 InitAuth: \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners"
Wanted Output:
$TIME = 0:00
$TYPE = InitAuth:
$DATA = \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners
Or $Output[0], $Output[1], $Output[2]
I don't care if it will be 1 array or 3 vars.
At first i was thinking about splitting that line into 3 vars and use space like delimiter, so i was trying to search for sh replacement for PHP command explode(' ', $input, 3); but then i found line like this:
"1728:32ClientUserinfoChanged: 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1\0\a2\0"
$TIME = 1728:32
$TYPE = ClientUserinfoChanged:
$DATA = 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1
And there is no space between Time and Type info.
So I want to know, how should i split that text now ?
Also I'm novice in Shell and I'm googling every possible command.
Something along this line might help you get desired output:
sed -r 's/([0-9]+:[0-9]{2})([a-zA-Z ]+:)(.*)/$TIME:\1\n$TYPE:\2\n$DATA:\3/'
Test:
[jaypal:~] echo "1728:32ClientUserinfoChanged: 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1\0\a2\0" | gsed -r 's/([0-9 ]+:[0-9]{2})([a-zA-Z ]+:)(.*)/$TIME:\1\n$TYPE:\2\n$DATA:\3/'
$TIME:1728:32
$TYPE:ClientUserinfoChanged:
$DATA: 0 n\ThunderBird\t\3\r\2\tl\0\f0\ \f1\ \f2\ \a0\0\a1\0\a2\0
[jaypal:~] echo "0:00 InitAuth: \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners" | gsed -r 's/([0-9 ]+:[0-9]{2})([a-zA-Z ]+:)(.*)/$TIME:\1\n$TYPE:\2\n$DATA:\3/'
$TIME:0:00
$TYPE: InitAuth:
$DATA: \auth\0\auth_status\init\auth_cheaters\1\auth_tags\1\auth_notoriety\1\auth_groups\ \auth_owners
[jaypal:~]
So here is what i wanted and it works as well as I need, it's good. :)
Thanks Jaypal.
TIME=`echo $LINE | sed -r 's/([0-9]+:[0-9]{2})(.*)/\1/'`
TYPE=`echo $LINE | sed -r 's/([0-9]+:[0-9]{2})([a-zA-Z ]+:)(.*)/\2/'`
DATA=`echo $LINE | sed -r 's/([0-9]+:[0-9]{2})([a-zA-Z ]+:)(.*)/\3/'`
I guess that someone would truncate that :D

Searching for text

I'm trying to write a shell script that searches for text within a file and prints out the text and associated information to a separate file.
From this file containing list of gene IDs:
DDIT3 ENSG00000175197
DNMT1 ENSG00000129757
DYRK1B ENSG00000105204
I want to search for these gene IDs (ENSG*), their RPKM1 and RPKM2 values in a gtf file:
chr16 gencodeV7 gene 88772891 88781784 0.126744 + . gene_id "ENSG00000174177.7"; transcript_ids "ENST00000453996.1,ENST00000312060.4,ENST00000378384.3,"; RPKM1 "1.40735"; RPKM2 "1.61345"; iIDR "0.003";
chr11 gencodeV7 gene 55850277 55851215 0.000000 + . gene_id "ENSG00000225538.1"; transcript_ids "ENST00000425977.1,"; RPKM1 "0"; RPKM2 "0"; iIDR "NA";
and print/ write it to a separate output file
Gene_ID RPKM1 RPKM2
ENSG00000108270 7.81399 8.149
ENSG00000101126 12.0082 8.55263
I've done it on the command line using for each ID using:
grep -w "ENSGno" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' > output.file
but when it comes to writing the shell script, I've tried various combinations of for, while, read, do and changing the variables but without success. Any ideas would be great!
You can do something like:
while read line
do
var=$(echo $line | awk '{print $2}')
grep -w "$var" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' >> output.file
done < geneIDs.file

Resources