Linux Bash: extracting text from file int variable - linux

I haven't found anything that clearly answers my question. Although very close, I think...
I have a file with a line:
# Skipsdata for serienummer 1158
I want to extract the 4 digit number at the end and put it into a variable, this number changes from file to file so I can't just search for "1158". But the "# Skipsdata for serienummer" always remains the same.
I believe that either grep, sed or awk may be the answer but I'm not 100 % clear on their usage.

Using Awk as
numberRequired=$(awk '/# Skipsdata for serienummer/{print $NF}' file)
printf "%s\n" "$numberRequired"
1158

You can use grep with the -o switch, which prints only the matched part instead of the whole line.
Print all numbers at the end of lines from file yourFile
grep -Po '\d+$' yourFile
Print all four digit numbers at the end of lines like described in your question:
grep -Po '^# Skipsdata for serienummer \K\d{4}$' yourFile
-P enables perl style regexes which support \d and especially \K.
\d matches any digit (0-9).
\d{4} matches exactly four digits.
\K lets grep forget the previously matched part, such that only the part afterwards is printed.

There are multiple ways to find your number. Assuming the input data is in a file called inputfile:
mynumber=$(sed -n 's/# Skipsdata for serienummer //p' <inputfile) will print only the number and ignore all the other lines;
mynumber=$(grep '^# Skipsdata for serienummer' inputfile | cut -d ' ' -f 5) will filter the relevant lines first, then only output the 5th field (the number)

Related

Fetching the value of variable stored in a file

I am trying to fetch the output of a variable stored in a file in another shell script.
Example:
cat abc.log
var1=2
var2=2
var3=25
I am writing a script to fetch the value of var3.
Thank you in advance.
awk -F= '$1 ~ /^[[:space:]]*var3/ { print $2 }' abc.log
Set the field delimiter to = and then where the line contains "var3", print the second field.
Alternatively, you could:
source abc.log
and then:
echo $var3
Using sed you can isolate 25 with particularity with:
sed -n '/^[[:space:]]*var3=/s/^[^=]*=//p' file
Explanation
This is the general substitution form s/find/replace/ with a matching expression preceding it. The total form is /match/s/find/replace/. The option -n suppresses the normal printing of pattern-space and the p at the end tells sed to print the line where the match and substitution took place. Specifically,
/match/ locates a line with any number of preceding whitespace characters followed by var3=. The POSIX [:space:] character class matches any whitespace,
the /find/ is all characters anchored from the '^' beginning that are not the [^=] character and then match the literal '=' character, and finally
the /replace/ is the empty-string leaving the 25 alone which is printed.
Example Use/Output
$ sed -n '/^[[:space:]]*var3=/s/^[^=]*=//p' file
25
A grep one-liner, if your grep has support for Perl-compatible regular expressions (the -P option; not all greps support that)
grep -Po '^\s*var3=\K.*' abc.log
or,
grep -Po '^\s*var3=\K.*' abc.log | tail -n1
in order to get the last value of the var3, if multiple var3s is a possibility.

How can I find the number of 8 letter words that do not contain the letter "e", using the grep command?

I want to find the number of 8 letter words that do not contain the letter "e" in a number of text files (*.txt). In the process I ran into two issues: my lack of understanding in quantifiers and how to exclude characters.
I'm quite new to the Unix terminal, but this is what I have tried:
cat *.txt | grep -Eo "\w+" | grep -i ".*[^e].*"
I need to include the cat command because it otherwise includes the names of the text files in the pipe. The second pipe is to have all the words in a list, and it works, but the last pipe was meant to find all the words that do not have the letter "e" in them, but doesn't seem to work. (I thought "." for no or any number of any character, followed by a character that is not an "e", and followed by another "." for no or any number of any character.)
cat *.txt | grep -Eo "\w+" | grep -wi "[a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]"
This command works to find the words that contain 8 characters, but it is quite ineffective, because I have to repeat "[a-z]" 8 times. I thought it could also be "[a-z]{8}", but that doesn't seem to work.
cat *.txt | grep -Eo "\w+" | grep -wi "[a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]" | grep -i ".*[^e].*"
So finally, this would be my best guess, however, the third pipe is ineffective and the last pipe doesn't work.
You may use this grep:
grep -hEiwo '[a-df-z]{8}' *.txt
Here:
[a-df-z]{8}: Matches all letters except e
-h: Don't print filename in output
-i: Ignore case search
-o: Print matches only
-w: Match complete words
In case you are ok with GNU awk and assuming that you want to print only the exact words and could be multiple matches in a line if this is the case one could try following.
awk -v IGNORECASE="1" '{for(i=1;i<=NF;i++){if($i~/^[a-df-z]{8}$/){print $i}}}' *.txt
OR without the use of IGNORCASE one could try:
awk '{for(i=1;i<=NF;i++){if(tolower($i)~/^[a-df-z]{8}$/){print $i}}}' *.txt
NOTE: Considering that you want exact matches of 8 letters only in lines. 8 letter words followed by a punctuation mark will be excluded.
Here is a crazy thought with GNU awk:
awk 'BEGIN{FPAT="\\<\\w{8}\\>"}{c+=NF}END{print c}' file
Or if you want to make it work only on a select set of characters:
awk 'BEGIN{FPAT="\\<[a-df-z]{8}\\>"}{c+=NF}END{print c}' file
What this does is, it defines the fields, to be a set of 8 characters (\w as a word-constituent or [a-df-z] as a selected set) which is enclosed by word-boundaries (\< and \>). This is done with FPAT (note the Gory details about escaping).
Sometimes you might also have words which contain diatrics, so you have to expand. Then this might be the best solution:
awk 'BEGIN{FPAT="\\<\\w{8}\\>"}{for(i=1;i<=NF;++i) if($i !~ /e/) c++}END{print c}' file

Change some field separators in awk

I have a input file
1.txt
joshwin_xc8#yahoo.com:1802752:2222:
ihearttofurkey#yahoo.com:1802756:111113
www.rothmany#mail.com:xxmyaduh:13#;:3A
and I want an output file:
out.txt
joshwin_xc8#yahoo.com||o||1802752||o||2222:
ihearttofurkey#yahoo.com||o||1802756||o||111113
www.rothmany#mail.com||o||xxmyaduh||o||13#;:3A
I want to replace the first two ':' in 1.txt with '||o||', but with the script I am using
awk -F: '{print $1,$2,$3}' OFS="||o||" 3.txt
But it is not giving the expected output.
Any help would be highly appreciated.
Perl solution:
perl -pe 's/:/||o||/ for $_, $_' 1.txt
-p reads the input line by line and prints each line after processing it
s/// is similar to substitution you might know from sed
for in postposition runs the previous command for every element in the following list
$_ keeps the line being processed
For higher numbers, you can use for ($_) x N where N is the number. For example, to substitute the first 7 occurrences:
perl -pe 's/:/||o||/ for ($_) x 7' 1.txt
Following sed may also help you in same.
sed 's/:/||o||/;s/:/||o||/' Input_file
Explanation: Simply substituting 1st occurrence of colon with ||o|| and then 2nd occurrence of colon now becomes 1st occurrence of colon now and substituting that colon with ||o|| as per OP's requirement.
Perl solution also, but I think the idea can apply to other languages: using the limit parameter of split:
perl -nE 'print join q(||o||), split q(:), $_, 3' file
(q quotes because I'm on Windows)
Suppose if we need to replace first 2 occurrence of : use below code
Like this you can change as per your requirement suppose if you need to change for first 7 occurences change {1..2} to {1..7}.
Out put will be saved in orginal file. it wont display the output
for i in {1..2}
> do
> sed -i "s/:/||o||/1" p.txt
> done

How to delete 5 lines before and 6 lines after pattern match using Sed?

I want to search for a pattern "xxxx" in a file and delete 5 lines before this pattern and 6 lines after this match. How can i do this using Sed?
This might work for you (GNU sed):
sed ':a;N;s/\n/&/5;Ta;/xxxx/!{P;D};:b;N;s/\n/&/11;Tb;d' file
Keep a rolling window of 5 lines and on encountering the specified string add 6 more (11 in total) and delete.
N.B. This is a barebones solution and will most probably need tailoring to your specific needs. Questions such as: what if there are multiple string throughout the file? What if the string is within the first five lines or multiple strings are within five lines of each other etc etc etc.
Here's one way you could do it using awk. I assume that you also want to delete the line itself and that the file is small enough to fit into memory:
awk '{a[NR]=$0}/xxxx/{f=NR}END{for(i=1;i<=NR;++i)if(i<f-5||i>f+6)print a[i]}' file
Store every line into the array a. When the pattern /xxxx/ is matched, save the line number. After the whole file has been processed, loop through the array, only printing the lines you want to keep.
Alternatively, you can use grep to obtain the line number first:
grep -n 'xxxx' file | awk -F: 'NR==FNR{f=$1}NR<f-5||NR>f+6' - file
In both cases, the lines deleted will be surrounding the last line where the pattern is matched.
A third option would be to use grep to obtain the line number then use sed to delete the lines:
line=$(grep -nm1 'xxxx' file | cut -d: -f1)
sed "$((line-5)),$((line+6))d" file
In this case I've also added the -m switch so grep exits after finding the first match.
if you know, the line number (what is not difficult to obtain), you can use something like that:
filename="test"
start=`expr $curr_line - 5`
end=`expr $curr_line + 6`
sed "${start},${end}d" $filename (optionally sed -i)
of course, you have to remember about additional conditions like start shouldn't be less than 1 and end greater than number of lines in file.
Another - maybe more easy to follow - solution would be to use grep to find the keyword and the corresponding line:
grep -n 'KEYWORD' <file>
then use sed to get the line number only like this:
grep -n 'KEYWORD' <file> | sed 's/:.*//'
Now that you have the line number simply use sed like this:
sed -i "$(LINE_START),$(LINE_END) d" <file>
to remove lines before and/or after! With only the -i you will override the <file> (no backup).
A script example could be:
#!/bin/bash
KEYWORD=$1
LINES_BEFORE=$2
LINES_AFTER=$3
FILE=$4
LINE_NO=$(grep -n $KEYWORD $FILE | sed 's/:.*//' )
echo "Keyword found in line: $LINE_NO"
LINE_START=$(($LINE_NO-$LINES_BEFORE))
LINE_END=$(($LINE_NO+$LINES_AFTER))
echo "Deleting lines $LINE_START to $LINE_END!"
sed -i "$LINE_START,$LINE_END d" $FILE
Please note that this will work only if the keyword is found once! Adapt the script to your needs!

Pick a specific value in a program output (Bash)

I'm running LIBSVM in linux terminal called by a C program. Ok, i need to pick the output but the format is the following
Accuracy = 80% (24/30) (classification)
I need to pick only the "80" value as an integer. I tried with sed and came to this command:
sed 's/[^0-9^'%']//g' 'f' >> f
This is filtering all integers in the output and, thus, isn't working yet, so I need help. Thanks in advance
Try grep in PCRE mode (-P), printing only the matched parts (-o), with a lookahead assertion:
$ echo "Accuracy = 80% (24/30) (classification)" | grep -Po '[0-9]+(?=%)'
80
The regexp:
[0-9] # match a digit
+ # one or more times
(?=%) # assert that the digits are followed by a %
It is very trivial with awk. Identify the column you need and strip the '%' sign from it. The /^Accuracy/ regex ensures that the action is only performed on the lines starting with Accuracy. You don't need it if your file only contains one line.
awk '/^Accuracy/{sub(/%/,"");print $3}' inputFile
Alternatively, you can set space and % as field separators and do
awk -F'[ %]' '/^Accuracy/{print $3}' inputFile
If you want to do it with sed then you can try something like:
sed '/^Accuracy/s/.* \(.*\)%.*/\1/' inputFile
This might work for you (GNU sed):
sed -nr '/^Accuracy = ([^%]*)%.*/s//\1/p' file

Resources