How to extract the first word that follows a string?

How to extract the first word that follows a string? - string

For example, say I have a text file example.txt that reads:
I like dogs.
My favorite dog is George because he is my dog.
George is a nice dog.
Now how do I extract "George" given that it is the first word that follows "My favorite dog is"?
What if there as more than one space, e.g.
My favorite dog is George .....
Is there a way to reliably extract the word "George" regardless of the number of spaces between "My favorite dog is" and "George"?

If you do not have perl installed you can use sed:
cat example.txt | sed 's/my favourite dog is *\([a-zA-Z]*\) .*/\1/g'

Pure Bash:
string='blah blah ! HEAT OF FORMATION 105.14088 93.45997 46.89387 blah blah'
pattern='HEAT OF FORMATION ([^[:blank:]]*)'
[[ $string =~ $pattern ]]
match=${BASH_REMATCH[1]}

You can do:
cat example.txt | perl -pe 's/My favorite dog is\s+(\w+).*/\1/g'
It outputs Geroge

If you are trying to search a file, especially if you have a big file, using external tools like sed/awk/perl are faster than using pure bash loops and bash string manipulation.
sed 's/.*HEAT OF FOMATION[ \t]*\(.[^ \t]*\).*/\1/' file
Pure bash string manipulation are only good when you are processing a few simple strings inside your script. Like manipulating a variable.

Related

Remove Words Shorter Than 4 Characters Using Linux

I have read the following and tried to rework the command logic for what I want. But, I just haven't been able to get it right.
Delete the word whose length is less than 2 in bash
Tried: echo $example | sed -e 's/ [a-zA-Z0-9]\{4\} / /g'
Remove all words bigger than 6 characters using sed
Tried: sed -e s'/[A-Za-z]\{,4\}//g'
Please help me with a simple awk or sed command for the following:
Here is an example line of fantastic data
And get:
Here example line fantastic data

$ echo Here is an example line of fantastic data | sed -E 's/\b\(\w\)\{,3\}\b\s*//g'
Here is an example line of fantastic data

If you store the sentence in a variable, you can iterate through it in a for loop. Then you can evaluate if each word is greater than 2 characters.
sentence="Here is an example line of fantastic data";
for word in $sentence; do
if [ ${#word} -gt 2]; then
echo -n $word;
echo -n " ";
fi
done

This is a BASH example of how to do it if you have a lot of sentences in a file which would be the most common case right?
SCRIPT (Remove words with two letters or shorter)
#!/bin/bash
while read line
do
echo "$line" | sed -E 's/\b\w{1,2}\b//g'
done < <( cat sentences.txt )
INPUT
$ cat sentences.txt
Edgar Allan Poe (January 19, 1809 to October 7, 1849) was an
American writer, poet, critic and editor best known for evocative
short stories and poems that captured the imagination and interest
of readers around the world. His imaginative storytelling and tales
of mystery and horror gave birth to the modern detective story.
Many of Poe’s works, including “The Tell-Tale Heart” and
“The Fall of the House of Usher,” became literary classics. Some
aspects of Poe’s life, like his literature, is shrouded in mystery,
and the lines between fact and fiction have been blurred substantially
since his death.
OUTPUT
$ ./grep_tests.sh
Edgar Allan Poe (January , 1809 October , 1849) was
American writer, poet, critic and editor best known for evocative
short stories and poems that captured the imagination and interest
readers around the world. His imaginative storytelling and tales
mystery and horror gave birth the modern detective story.
Many Poe’ works, including “The Tell-Tale Heart” and
“The Fall the House Usher,” became literary classics. Some
aspects Poe’ life, like his literature, shrouded mystery,
and the lines between fact and fiction have been blurred substantially
since his death.

Find two lines and replace with one

I am looking for a solution that would allow me to search text files on a linux server that would look a file and find a pattern such as:
Text 123
Blue Green
And then replaces it with one line, every time it finds it in a file...
Order Blue Green
I am not sure what would be the easiest way to solve this. I have seen many guides using SED but only for finding one line and replacing it.

You ask about sed, here is an answer in sed.
Let me mention however, that while sed is fun for this kind of exercise, you probably should choose something else, more flexible and easier to learn; perl for example.
look for first line /Text 123/
when found start a loop :a
concat next line N
replace twins of searched text with single copy and print it
s/Text 123\nText 123/Text 123/p;
loop while that replaces ta;
try to replace s///
rely on concat being printed unchanged if replace does not trigger
Code:
sed "/Text 123/{:a;N;s/Text 123\nText 123/Text 123/p;ta;s/Text 123\nBlue Green/Order Blue Green/}"
Test input:
Text 123
Do not replace
Lala
Text 123
Blue Green
lulu
Text 123
Do not replace either
Text 123
Text 123
Blue Green
preceding should be replaced
Output:
Text 123
Do not replace
Lala
Order Blue Green
lulu
Text 123
Do not replace either
Text 123
Order Blue Green
preceding should be replaced
Platform: Windows and GNU sed version 4.2.1
Note:
On that platform the sed line allows to use the environment variables for the two text fragments, which you probably want to do:
sed "/%EnvVar2%/{:a;N;s/%EnvVar2%\n%EnvVar2%/%EnvVar2%/p;ta;s/%EnvVar2%\n%EnvVar%/Order %EnvVar%/}"
Platform2:
still Windows
using bash GNU bash, version 3.1.17(1)-release (i686-pc-msys)
GNU sed version 4.2.1 (same)
On this platform, variables can e.g. be used like:
sed "/${EnvVar2}/{:a;N;s/${EnvVar2}\n${EnvVar2}/${EnvVar2}/p;ta;s/${EnvVar2}\n${EnvVar}/Order ${EnvVar}/}"
On this platform it is important to use "..." in order to be able to use variables,
it does not work with '...'.
As #edMorton has hinted, on all platforms be careful however with trying to replace (using variables) text which looks like using a variable. E.g. with "Text $123" in bash. In that case, not using variables but trying to replace text which looks like variables, using '...' instead of "..." is the way to go.

sed is for simple substitutions on individual lines, that is all. If you find yourself trying to use constructs other than s, g, and p (with -n) then you are on the wrong track as all other sed constructs became obsolete in the mid-1970s when awk was invented.
Your problem is not doing substitutions on individual lines, it's on a multi-line record and to do that with GNU awk for multi-char RS is:
$ awk -v RS='^$' -v ORS= '{gsub(/Text 123\nBlue Green/,"Order Blue Green")}1' file
Order Blue Green
but there are several other approaches depending on your real needs.

How to reverse each word in a text file with linux commands without changing order of words

There's lots of questions indicating how to reverse each word in a sentence, and I could readily do this in Python or Javascript for example, but how can I do it with Linux commands? It looks like tac might be an option, but seems like this would likely reverse lines as well as words, rather than just words? What other tools can do this? I literally have no idea. I know rev and tac and awk all seem like contenders...
So I'd like to go from:
cat dog sleep
pillow green blue
to:
tac god peels
wollip neerg eulb
**slight followup
From this reference it looks like I could use awk to break each field up into an array of single characters and then write a for loop to reverse manually each word in this way. This is quite awkward. Surely there's a better/more succinct way to do this?

Try this on for size:
sed -e 's/\s+/ /g' -e 's/ /\n/g' < file.txt | rev | tr '\n' ' ' ; echo
It collapses all the space and counts punctuation as part of "words", but it looks like it (at least mostly) works. Hooray for sh!

Simple Text Search Bash

I have a text file with 10 k lines. How do I extract all the lines where a certain keyword appears? It's fundamental that I am able to select the entire line where a certain text pattern shows up. How can I do this in bash?

Use grep to search for text and print matching lines:
grep yourKeyword yourFile.txt
If the pattern consists of several words, you must quote the pattern:
grep "your key string" yourFile.txt

Besides using grep you can also use awk. Plus, awk has the advantage of doing processing as it searches the lines..
awk '/pattern/{ do stuff }' file

Simplest method to get the dictionary definitions of a list of words in a text file

File1:
hello
world
I don't know the best method to extract a list of words from a text file, find their definitions and paste them into an output textfile. I've been thinking of using WordNet - but don't know how to automate the process.
Does anyone have any ideas (perhaps google/APIs/linux applications) that one could use to find the definitions of words with, and then paste them into a text file?
File2:
an expression of greeting; "every morning they exchanged polite hellos"
universe: everything that exists anywhere; "they study the evolution of the universe"; "the biggest tree in existence"

Although an API or library is probably the way to go (here's some Perl stuff), the Bash script below, which is very rough might give you some ideas:
saveIFS="$IFS"
for w in hello goodbye bicycle world
do
echo
echo "------- $w -------"
def=$(wn $w -over)
IFS=$'\n'
for line in $def
do
echo -e "\t${line}"
IFS="$saveIFS"
if [[ $line =~ ^[[:digit:]]*\. ]]
then
for word in $line
do
echo -e "\t\t${word%*[,;]}"
done
fi
done
IFS="$saveIFS"
done
If you have a list of words in a file, one word to a line, change the first for and last done lines of the script above to:
while read -r w
# . . .
done < wordlist

See Dictionary API or Library for several solutions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to extract the first word that follows a string? - string

If you do not have perl installed you can use sed: cat example.txt | sed 's/my favourite dog is \([a-zA-Z]\) .*/\1/g'

Pure Bash: string='blah blah ! HEAT OF FORMATION 105.14088 93.45997 46.89387 blah blah' pattern='HEAT OF FORMATION ([^[:blank:]]*)' [[ $string =~ $pattern ]] match=${BASH_REMATCH[1]}

You can do: cat example.txt | perl -pe 's/My favorite dog is\s+(\w+).*/\1/g' It outputs Geroge

Related

Remove Words Shorter Than 4 Characters Using Linux

Find two lines and replace with one

How to reverse each word in a text file with linux commands without changing order of words

Simple Text Search Bash

Simplest method to get the dictionary definitions of a list of words in a text file

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to extract the first word that follows a string? - string

If you do not have perl installed you can use sed: cat example.txt | sed 's/my favourite dog is *\([a-zA-Z]*\) .*/\1/g'

Pure Bash: string='blah blah ! HEAT OF FORMATION 105.14088 93.45997 46.89387 blah blah' pattern='HEAT OF FORMATION ([^[:blank:]]*)' [[ $string =~ $pattern ]] match=${BASH_REMATCH[1]}

You can do: cat example.txt | perl -pe 's/My favorite dog is\s+(\w+).*/\1/g' It outputs Geroge

Related

Remove Words Shorter Than 4 Characters Using Linux

Find two lines and replace with one

How to reverse each word in a text file with linux commands without changing order of words

Simple Text Search Bash

Simplest method to get the dictionary definitions of a list of words in a text file

Categories

Resources

If you do not have perl installed you can use sed: cat example.txt | sed 's/my favourite dog is \([a-zA-Z]\) .*/\1/g'