How to use sed to delete a string with wildcards

How to use sed to delete a string with wildcards - linux

File1:
<a>hello</b> <c>foo</d>
<a>world</b> <c>bar</d>
Is an example of the file this would work on. How can one remove all strings which have a <c>*</d> using sed?

The following line will remove all text from <c> to </d> inclusive:
sed -e 's/<c>.*<\/d>//'
The bit inside the s/...// is a regular expression, not really a wildcard in the same way as the shell uses, so anything you can put in a regular expression you can put in there.

if all your data is like that of the example
# gawk 'BEGIN{FS=" <c>"}{print $1}' file
<a>hello</b>
<a>world</b>

Great Swiss-Army knife!
I modified it to pull header info out of eMails for an archiving script. It involved renaming the IMAP eMails with both date and sender info (otherwise IMAP just numbered 1, 2, 3, etc.). Here's the two mods:
for i in $mailarray; do date -d $(less -f $i | grep -im 1 "Date:\ " | sed -e 's_^.*\(ate: \)__') +%F_%T%Z; done
for i in $mailarray; do less -f "$i" | grep -iEm 1 "From:\ " | sed -e 's_^.*\(rom\).*<\|^.*\(rom:\).__' | sed -e 's_#.*$__'; done
They saved a great deal of extraneous coding. Thank you.

Related

How to use sed to replace multiple chars in a string?

I want to replace some chars of a string with sed.
I tried the following two approaches, but I need to know if there is a more elegant form to get the same result, without using the pipes or the -e option:
sed 's#a#A#g' test.txt | sed 's#l#23#g' > test2.txt
sed -e 's#a#A#g' -e 's#l#23#g' test.txt > test2.txt

Instead of multiple -e options, you can separate commands with ; in a single argument.
sed 's/a/A/g; s/1/23/g' test.txt > test2.txt
If you're looking for a way to do multiple substitutions in a single command, I don't think there's a way. If they were all single-character replacements you could use a command like y/abc/123, which would replace a with 1, b with 2, and c with 3. But there's no multi-character version of this.

In addition to the answer of Barmar, you might want to use regexp character classes to perform several chars to one specific character substitution.
Here's an example to clarify things, try to run it with and without sed to feel the effect
echo -e 'abc\ndef\nghi\nklm' | sed 's/[adgk]/1/g; s/[behl]/2/g; s/[cfim]/3/g'
P.S. never run example code from strangers outside of safe sandbox

When you have a lot strings for the replacement, you can collect them in a variable.
seds="s/a/A/;"
seds+="s/1/23/;"
echo "That was 1 big party" |
sed ${seds}

Replace multiple commas with a single one - linux command

This is an output from my google csv contacts (which contains more than 1000 contacts):
A-Tech Computers Hardware,A-Tech Computers,,Hardware,,,,,,,,,,,,,,,,,,,,Low,,,* My Contacts,,,,,,,,,Home,+38733236313,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
I need a linux cli command to replace the duplicate commas, with single commas, so i get this:
A-Tech Computers Hardware,A-Tech Computers,Hardware,Low,* My Contacts,Home,+38733236313,
What I usually do in notepad++ is Replace ",," with "," six times.
I tried with:
cat googlecontacts.txt | sed -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' -e 's/,,/,/g' > google.txt
But it doesn't work...
However, when I try it on smaller files (two lines) it works... :(
Help please!

Assuming your line still compliant after modification(not the concern of the question)
sed 's/,\{2,\}/,/g' googlecontacts.txt > google.txt
It replace any occurence greater than 1 of , by a single , any place on the line
any space between , is consider as a correct field, so not modified
In your command, you need to recursive change the character and not reexecute several time the same (there is always a gretear occurence possible) , like this
cat googlecontacts.txt | sed ':a
# make your change
s/,,/,/g
# if change occur, retry once again by returning to line :a
t a' > google.txt

You need the squeeze option of tr:
tr -s ',' < yourFile
You can see it in action like this:
echo hello,,there,,,,I,have,,too,many,,,commas | tr -s ,
hello,there,I,have,too,many,commas

This might work for you (GNU sed):
sed 's/,,*/,/g' file
or
sed 's/,\+/,/g' file

Thanks #potong, your solution worked for one of my requirement. I had to replace the | symbol in the first line of my file and used this solution with small change.
sed -i "1s/|'*//g" ${filename}
I was unable to add comments so thought of posting it as an answer. Please excuse

Modification of file names

I have a list of more than 1000 files on the following format.
0521865417_roman_pottery_in_the_archaeological_record_2007.pdf
0521865476_power_politics_and_religion_in_timurid_iran_2007.pdf
0521865514_toward_a_theory_of_human_rights_religion_law_courts_2006.pdf
0521865522_i_was_wrong_the_meanings_of_apologies_2008.pdf
I am on Linux and want to change them as follows
2007_roman_pottery_in_the_archaeological_record.pdf
2007_power_politics_and_religion_in_timurid_iran.pdf
2006_toward_a_theory_of_human_rights_religion_law_courts.pdf
2008_i_was_wrong_the_meanings_of_apologies.pdf
Using rename and awk I managed to get
2007_roman_pottery_in_the_archaeological_record_2007.pdf
2007_power_politics_and_religion_in_timurid_iran_2007.pdf
2006_toward_a_theory_of_human_rights_religion_law_courts_2006.pdf
2008_i_was_wrong_the_meanings_of_apologies_2008.pdf
The remaining task is now to remove the last field that holds the year.

A solution that uses sed to generate the new names and the rename commands then pipes them to bash:
ls -1 | sed -r 's/[0-9]*_([A-Za-z_]*)_[a-z]{3}_([0-9]{4})\.pdf$/mv & \2_\1.pdf/g' | bash

A work around from where you left of...
echo 2007_roman_pottery_in_the_archaeological_record_2007.pdf | awk -F '_' '{$NF=""; OFS="_"; print substr($0, 0, length($0)-1)".pdf";}'

How to delete 5 lines before and 6 lines after pattern match using Sed?

I want to search for a pattern "xxxx" in a file and delete 5 lines before this pattern and 6 lines after this match. How can i do this using Sed?

This might work for you (GNU sed):
sed ':a;N;s/\n/&/5;Ta;/xxxx/!{P;D};:b;N;s/\n/&/11;Tb;d' file
Keep a rolling window of 5 lines and on encountering the specified string add 6 more (11 in total) and delete.
N.B. This is a barebones solution and will most probably need tailoring to your specific needs. Questions such as: what if there are multiple string throughout the file? What if the string is within the first five lines or multiple strings are within five lines of each other etc etc etc.

Here's one way you could do it using awk. I assume that you also want to delete the line itself and that the file is small enough to fit into memory:
awk '{a[NR]=$0}/xxxx/{f=NR}END{for(i=1;i<=NR;++i)if(i<f-5||i>f+6)print a[i]}' file
Store every line into the array a. When the pattern /xxxx/ is matched, save the line number. After the whole file has been processed, loop through the array, only printing the lines you want to keep.
Alternatively, you can use grep to obtain the line number first:
grep -n 'xxxx' file | awk -F: 'NR==FNR{f=$1}NR<f-5||NR>f+6' - file
In both cases, the lines deleted will be surrounding the last line where the pattern is matched.
A third option would be to use grep to obtain the line number then use sed to delete the lines:
line=$(grep -nm1 'xxxx' file | cut -d: -f1)
sed "$((line-5)),$((line+6))d" file
In this case I've also added the -m switch so grep exits after finding the first match.

if you know, the line number (what is not difficult to obtain), you can use something like that:
filename="test"
start=`expr $curr_line - 5`
end=`expr $curr_line + 6`
sed "${start},${end}d" $filename (optionally sed -i)
of course, you have to remember about additional conditions like start shouldn't be less than 1 and end greater than number of lines in file.

Another - maybe more easy to follow - solution would be to use grep to find the keyword and the corresponding line:
grep -n 'KEYWORD' <file>
then use sed to get the line number only like this:
grep -n 'KEYWORD' <file> | sed 's/:.*//'
Now that you have the line number simply use sed like this:
sed -i "$(LINE_START),$(LINE_END) d" <file>
to remove lines before and/or after! With only the -i you will override the <file> (no backup).
A script example could be:
#!/bin/bash
KEYWORD=$1
LINES_BEFORE=$2
LINES_AFTER=$3
FILE=$4
LINE_NO=$(grep -n $KEYWORD $FILE | sed 's/:.*//' )
echo "Keyword found in line: $LINE_NO"
LINE_START=$(($LINE_NO-$LINES_BEFORE))
LINE_END=$(($LINE_NO+$LINES_AFTER))
echo "Deleting lines $LINE_START to $LINE_END!"
sed -i "$LINE_START,$LINE_END d" $FILE
Please note that this will work only if the keyword is found once! Adapt the script to your needs!

How do I remove newlines from a text file?

I have the following data, and I need to put it all into one line.
I have this:
22791
;
14336
;
22821
;
34653
;
21491
;
25522
;
33238
;
I need this:
22791;14336;22821;34653;21491;25522;33238;
EDIT
None of these commands is working perfectly.
Most of them let the data look like this:
22791
;14336
;22821
;34653
;21491
;25522

tr --delete '\n' < yourfile.txt
tr -d '\n' < yourfile.txt
Edit:
If none of the commands posted here are working, then you have something other than a newline separating your fields. Possibly you have DOS/Windows line endings in the file (although I would expect the Perl solutions to work even in that case)?
Try:
tr -d "\n\r" < yourfile.txt
If that doesn't work then you're going to have to inspect your file more closely (e.g. in a hex editor) to find out what characters are actually in there that you want to remove.

tr -d '\n' < file.txt
Or
awk '{ printf "%s", $0 }' file.txt
Or
sed ':a;N;$!ba;s/\n//g' file.txt
This page here has a bunch of other methods to remove newlines.
edited to remove feline abuse :)

perl -p -i -e 's/\R//g;' filename
Must do the job.

paste -sd "" file.txt

Expanding on a previous answer, this removes all new lines and saves the result to a new file (thanks to #tripleee):
tr -d '\n' < yourfile.txt > yourfile2.txt
Which is better than a "useless cat" (see comments):
cat file.txt | tr -d '\n' > file2.txt
Also useful for getting rid of new lines at the end of the file, e.g. created by using echo blah > file.txt.
Note that the destination filename is different, important, otherwise you'll wipe out the original content!

You can edit the file in vim:
$ vim inputfile
:%s/\n//g

use
head -n 1 filename | od -c
to figure WHAT is the offending character.
then use
tr -d '\n' <filename
for LF
tr -d '\r\n' <filename
for CRLF

Use sed with POSIX classes
This will remove all lines containing only whitespace (spaces & tabs)
sed '/^[[:space:]]*$/d'
Just take whatever you are working with and pipe it to that
Example
cat filename | sed '/^[[:space:]]*$/d'

Using man 1 ed:
# cf. http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
ed -s file <<< $'1,$j\n,p' # print to stdout
ed -s file <<< $'1,$j\nwq' # in-place edit

xargs consumes newlines as well (but adds a final trailing newline):
xargs < file.txt | tr -d ' '

Nerd fact: use ASCII instead.
tr -d '\012' < filename.extension
(Edited cause i didn't see the friggin' answer that had same solution, only difference was that mine had ASCII)

Using the gedit text editor (3.18.3)
Click Search
Click Find and Replace...
Enter \n\s into Find field
Leave Replace with blank (nothing)
Check Regular expression box
Click the Find button
Note: this doesn't exactly address the OP's original, 7 year old problem but should help some noob linux users (like me) who find their way here from the SE's with similar "how do I get my text all on one line" questions.

Was having the same case today, super easy in vim or nvim, you can use gJ to join lines. For your use case, just do
99gJ
this will join all your 99 lines. You can adjust the number 99 as need according to how many lines to join. If just join 1 line, then only gJ is good enough.

$ perl -0777 -pe 's/\n+//g' input >output
$ perl -0777 -pe 'tr/\n//d' input >output

If the data is in file.txt, then:
echo $(<file.txt) | tr -d ' '
The '$(<file.txt)' reads the file and gives the contents as a series of words which 'echo' then echoes with a space between them. The 'tr' command then deletes any spaces:
22791;14336;22821;34653;21491;25522;33238;

Assuming you only want to keep the digits and the semicolons, the following should do the trick assuming there are no major encoding issues, though it will also remove the very last "newline":
$ tr -cd ";0-9"
You can easily modify the above to include other characters, e.g. if you want to retain decimal points, commas, etc.

I usually get this usecase when I'm copying a code snippet from a file and I want to paste it into a console without adding unnecessary new lines, I ended up doing a bash alias
( i called it oneline if you are curious )
xsel -b -o | tr -d '\n' | tr -s ' ' | xsel -b -i
xsel -b -o reads my clipboard
tr -d '\n' removes new lines
tr -s ' ' removes recurring spaces
xsel -b -i pushes this back to my clipboard
after that I would paste the new contents of the clipboard into oneline in a console or whatever.

I would do it with awk, e.g.
awk '/[0-9]+/ { a = a $0 ";" } END { print a }' file.txt
(a disadvantage is that a is "accumulated" in memory).
EDIT
Forgot about printf! So also
awk '/[0-9]+/ { printf "%s;", $0 }' file.txt
or likely better, what it was already given in the other ans using awk.

You are missing the most obvious and fast answer especially when you need to do this in GUI in order to fix some weird word-wrap.
Open gedit
Then Ctrl + H, then put in the Find textbox \n and in Replace with an empty space then fill checkbox Regular expression and voila.

To also remove the trailing newline at the end of the file
python -c "s=open('filename','r').read();open('filename', 'w').write(s.replace('\n',''))"

fastest way I found:
open vim by doing this in your commandline
vim inputfile
press ":" and input the following command to remove all newlines
:%s/\n//g
Input this to also remove spaces incase some characters were spaces :%s/ //g
make sure to save by writing to the file with
:w
The same format can be used to remove any other characters, you can use a website like this
https://apps.timwhitlock.info/unicode/inspect
to figure out what character you're missing
You can also use this to figure out other characters you can't see and they have a tool as well
Tool to learn of other invisible characters

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use sed to delete a string with wildcards - linux

File1: <a>hello</b> <c>foo</d> <a>world</b> <c>bar</d> Is an example of the file this would work on. How can one remove all strings which have a <c>*</d> using sed?

The following line will remove all text from <c> to </d> inclusive: sed -e 's/<c>.*<\/d>//' The bit inside the s/...// is a regular expression, not really a wildcard in the same way as the shell uses, so anything you can put in a regular expression you can put in there.

if all your data is like that of the example # gawk 'BEGIN{FS=" <c>"}{print $1}' file <a>hello</b> <a>world</b>

Related

How to use sed to replace multiple chars in a string?

Replace multiple commas with a single one - linux command

Modification of file names

How to delete 5 lines before and 6 lines after pattern match using Sed?

How do I remove newlines from a text file?

Categories

Resources