Remove a repetitive string in a list with bash - string

How would you remove a string that repeats in most lines of a list with bash?
E.G.
My list looks like this:
Rex Rocket Steam Game
Magdalena Steam Game
FLASHOUT 2 Steam Game
Falcon
Girls Like Robots Steam Game
The Land Of Lamia Steam Game
Aeon Command
And I want to remove all the "Steam Game" string from all the lines that end that way.
I'm super rusty, this looks so easy but I can't figure it out.

There's many options, sed is probably the simplest.
$ sed 's/Steam Game$//' foo.txt
Rex Rocket
Magdalena
FLASHOUT 2
Falcon
Girls Like Robots
The Land Of Lamia
Aeon Command
You asked for "Steam Game" to be removed not " Steam Game". If you want the space also removed, add a space to the regex: 's/ Steam Game$//'. Or use 's/ *Steam Game$//' if there are more than one space.

Related

Script to extract strings between two strings in linux

I am trying to write a little script that will let me "org-capture" articles from my rss-reader (newsboat). So my scenario is this: I will pipe the article to a script; however, the article gets piped in one line, like this:
Title: ABC boss quits over Australian political interference claims Author: Date: Thu, 27 Sep 2018 09:39:16 +0200 Link: https://www.bbc.co.uk/news/world-australia-45661871 The broadcaster's chair quits amid allegations the government leaned on him to dismiss two journalists.
So what I need to do is to consistently store the link and the title in a variable and then call a command with these variables (emacsclient org-protocol:/ ...)
So basically I need this:
TITLE="ABC boss quits over Australian political interference claims"
URL="https://www.bbc.co.uk/news/world-australia-45661871"
I considered using awk or sed, but they work best for separate lines. So, I thought maybe split the single line at 'Title:', 'Author:', 'Date:' and 'Link:' and then extract with awk/sed.
I found similar use cases and questions here, but not quite the same. I want a pretty minimal script without necessarily using python.
Am I on the right track?
Thanks for helping out.
With GNU awk for the 3rd arg to match():
$ cat tst.awk
match($0,/^Title:\s*(.*)\s+Author:\s*(.*)\s+Date:\s*(.*)\s+Link:\s*(\S+)\s+(.*)/,a) {
printf "TITLE=\"%s\"\n", a[1]
printf "URL=\"%s\"\n", a[4]
}
$ awk -f tst.awk file
TITLE="ABC boss quits over Australian political interference claims"
URL="https://www.bbc.co.uk/news/world-australia-45661871"
I showed how to save all the other fields too so you can also do anything else you need to with your input.
This might work for you (GNU sed):
sed -r 's/^Title: (.*) Author:.* Link: (\S+).*/TITLE="\1"\nURL="\2"/' file
Use pattern matching to extract the fields required. The first may contain spaces so match on the key Author:. The second is a string of non-space characters following the key Link:.

Find two lines and replace with one

I am looking for a solution that would allow me to search text files on a linux server that would look a file and find a pattern such as:
Text 123
Blue Green
And then replaces it with one line, every time it finds it in a file...
Order Blue Green
I am not sure what would be the easiest way to solve this. I have seen many guides using SED but only for finding one line and replacing it.
You ask about sed, here is an answer in sed.
Let me mention however, that while sed is fun for this kind of exercise, you probably should choose something else, more flexible and easier to learn; perl for example.
look for first line /Text 123/
when found start a loop :a
concat next line N
replace twins of searched text with single copy and print it
s/Text 123\nText 123/Text 123/p;
loop while that replaces ta;
try to replace s///
rely on concat being printed unchanged if replace does not trigger
Code:
sed "/Text 123/{:a;N;s/Text 123\nText 123/Text 123/p;ta;s/Text 123\nBlue Green/Order Blue Green/}"
Test input:
Text 123
Do not replace
Lala
Text 123
Blue Green
lulu
Text 123
Do not replace either
Text 123
Text 123
Blue Green
preceding should be replaced
Output:
Text 123
Do not replace
Lala
Order Blue Green
lulu
Text 123
Do not replace either
Text 123
Order Blue Green
preceding should be replaced
Platform: Windows and GNU sed version 4.2.1
Note:
On that platform the sed line allows to use the environment variables for the two text fragments, which you probably want to do:
sed "/%EnvVar2%/{:a;N;s/%EnvVar2%\n%EnvVar2%/%EnvVar2%/p;ta;s/%EnvVar2%\n%EnvVar%/Order %EnvVar%/}"
Platform2:
still Windows
using bash GNU bash, version 3.1.17(1)-release (i686-pc-msys)
GNU sed version 4.2.1 (same)
On this platform, variables can e.g. be used like:
sed "/${EnvVar2}/{:a;N;s/${EnvVar2}\n${EnvVar2}/${EnvVar2}/p;ta;s/${EnvVar2}\n${EnvVar}/Order ${EnvVar}/}"
On this platform it is important to use "..." in order to be able to use variables,
it does not work with '...'.
As #edMorton has hinted, on all platforms be careful however with trying to replace (using variables) text which looks like using a variable. E.g. with "Text $123" in bash. In that case, not using variables but trying to replace text which looks like variables, using '...' instead of "..." is the way to go.
sed is for simple substitutions on individual lines, that is all. If you find yourself trying to use constructs other than s, g, and p (with -n) then you are on the wrong track as all other sed constructs became obsolete in the mid-1970s when awk was invented.
Your problem is not doing substitutions on individual lines, it's on a multi-line record and to do that with GNU awk for multi-char RS is:
$ awk -v RS='^$' -v ORS= '{gsub(/Text 123\nBlue Green/,"Order Blue Green")}1' file
Order Blue Green
but there are several other approaches depending on your real needs.

How can I search for two different patterns in two consecutive lines in a file using SED and print next 4 lines after pattern match?

I am using SED and looking for printing the line matched by pattern and next 4 lines after the pattern match.
Below is the summary of my issue.
"myfile.txt" content has:
As specified in doc.
risk involved in astra.
I am not a schizophrenic;and neither am I.;
Be polite to every idiot you meet.;He could be your boss tomorrow.;
I called the hospital;but the line was dead.;
Yes, I’ve lost to my computer at chess.;But it turned out to be no match for me at kickboxing.;
The urologist is about to leave his office and says:; "Ok, let's piss off now.";
What's the best place to hide a body?;Page two of Google.;
You know you’re old;when your friends start having kids on purpose.;
You won’t find anything more poisonous;than a harmonious;and friendly group of females.;
Two state clerks meet in the corridor.;One asks the other,;"Couldn't sleep either?";
Why do women put on make-up and perfume?;Because they are ugly and they smell.;
Bruce Lee’s all-time favorite drink?;Wataaaaaaaah!;
Daddy what is a transvestite?;-Ask Mommy, he knows.;
That moment when you have eye contact while eating a banana.;
I'm using below command.
sed -n -e '/You/h' -e '/Two/{x;G;p}' myfile.txt
Output by my command:
You won’t find anything more poisonous;than a harmonious;and friendly group of females.;
Two state clerks meet in the corridor.;One asks the other,;"Couldn't sleep either?";
Desired output:
You won’t find anything more poisonous;than a harmonious;and friendly group of females.;
Two state clerks meet in the corridor.;One asks the other,;"Couldn't sleep either?";
Why do women put on make-up and perfume?;Because they are ugly and they smell.;
Bruce Lee’s all-time favorite drink?;Wataaaaaaaah!;
Daddy what is a transvestite?;-Ask Mommy, he knows.;
That moment when you have eye contact while eating a banana.;
With GNU sed:
sed -n '/You/h;{/Two/{x;G;};//,+4p}' myfile.txt
Output:
You won’t find anything more poisonous;than a harmonious;and friendly group of females.;
Two state clerks meet in the corridor.;One asks the other,;"Couldn't sleep either?";
Why do women put on make-up and perfume?;Because they are ugly and they smell.;
Bruce Lee’s all-time favorite drink?;Wataaaaaaaah!;
Daddy what is a transvestite?;-Ask Mommy, he knows.;
That moment when you have eye contact while eating a banana.;
Explanation:
/You/h: copy matching line into the hold space. As there is only one hold space, h will store the last line matching You (ie You won’t...)
/Two/{x: when Two is found, x exchange the pattern space with the hold space. At this point:
into pattern space: You won’t find anything more poisonous;than a harmonious;and friendly group of females.;
into hold space: Two state clerks meet in the corridor.;One asks the other,;"Couldn't sleep either?";
G: appends a new line to the pattern space and copies the hold space after the new line
//,+4p is an address range starting from // (empty address repeats the last regular expression match, ie first 2 lines matching), up to next 4 lines +4. The address range is output with p
maybe this help you;
sed -n -e '/You/h' -e '/Two/{N;N;N;N;x;G;p}' myfile.txt
Example;
user#host:/tmp$ sed -n -e '/You/h' -e '/Two/{N;N;N;N;x;G;p}' myfile.txt
You won’t find anything more poisonous;than a harmonious;and friendly group of females.;
Two state clerks meet in the corridor.;One asks the other,;"Couldn't sleep either?";
Why do women put on make-up and perfume?;Because they are ugly and they smell.;
Bruce Lee’s all-time favorite drink?;Wataaaaaaaah!;
Daddy what is a transvestite?;-Ask Mommy, he knows.;
That moment when you have eye contact while eating a banana.;
This might work for you (GNU sed):
sed -r 'N;/You.*\n.*Two/{:a;$!{N;s/\n/&/4;Ta};p;d};D' file
Read two lines into the pattern space, pattern match and then print four further lines (if possible). Otherwise, delete the first line and repeat.

How to reverse each word in a text file with linux commands without changing order of words

There's lots of questions indicating how to reverse each word in a sentence, and I could readily do this in Python or Javascript for example, but how can I do it with Linux commands? It looks like tac might be an option, but seems like this would likely reverse lines as well as words, rather than just words? What other tools can do this? I literally have no idea. I know rev and tac and awk all seem like contenders...
So I'd like to go from:
cat dog sleep
pillow green blue
to:
tac god peels
wollip neerg eulb
**slight followup
From this reference it looks like I could use awk to break each field up into an array of single characters and then write a for loop to reverse manually each word in this way. This is quite awkward. Surely there's a better/more succinct way to do this?
Try this on for size:
sed -e 's/\s+/ /g' -e 's/ /\n/g' < file.txt | rev | tr '\n' ' ' ; echo
It collapses all the space and counts punctuation as part of "words", but it looks like it (at least mostly) works. Hooray for sh!

How can I remove lines that contain more than N words

Is there a good one-liner in bash to remove lines containing more than N words from a file?
example input:
I want this, not that, but thank you it is very nice of you to offer.
The very long sentence finding form ordering system always and redundantly requires an initial, albeit annoying and sometimes nonsensical use of commas, completion of the form A-1 followed, after this has been processed by the finance department and is legal, by a positive approval that allows for the form B-1 to be completed after the affirmative response to the form A-1 is received.
example output:
I want this, not that, but thank you it is very nice of you to offer.
In Python I would code something like this:
if len(line.split()) < 40:
print line
To only show lines containing less than 40 words, you can use awk:
awk 'NF < 40' file
Using the default field separator, each word is treated as a field. Lines with less than 40 fields are printed.
Note this answer assumes the first approach of the question: how to print those lines being shorter than a given number of characters
Use awk with length():
awk 'length($0)<40' file
You can even give the length as a parameter:
awk -v maxsize=40 'length($0) < maxsize' file
A test with 10 characters:
$ cat a
hello
how are you
i am fine but
i would like
to do other
things
$ awk 'length($0)<10' a
hello
things
If you feel like using sed for this, you can say:
sed -rn '/^.{,39}$/p' file
This checks if the line contains less than 40 characters. If so, it prints it.

Resources