Related
I have a HTML PAGE which I have extracted in unix using wget command, in that after the word "Check list" I need to remove all of the text and with the remaining I am trying to grep some data. I am unable to think on a way which can be helpful for removing the text after a keyword. if I do
s/Check list.*//g
It just removes the line , I want everything below that to be gone. How do I perform this?
The other solutions you have so far require non-POSIX-mandatory tools (GNU sed, GNU awk, or perl) so YMMV with their availability and will read the whole file into memory at once.
These will work in any awk in any shell on every Unix box and only read 1 line at a time into memory:
awk -F 'Check list' '{print $1} NF>1{exit}' file
or:
awk 'sub(/Check list.*/,""){f=1} {print} f{exit}' file
With GNU awk for multi-char RS you could do:
awk -v RS='Check list' '{print; exit}' file
but that would still read all of the text before Check list into memory at once.
Depending on which sed version you have, maybe
sed -z 's/Check list.*//'
The /g flag is useless as you only want to replace everything once.
If your sed does not have the -z option (which says to use the ASCII null character as line terminator instead of newline; this hinges on your file not containing any actual nulls, but that should trivially be true for any text file), try Perl:
perl -0777 -pe 's/Check list.*//s'
Unlike sed -z, this explicitly says to slurp the entire file into memory (the argument to -0 is the octal character code of a terminator character, but 777 is not a valid terminator character at all, so it always reads the entire file as a single "line") so this works even if there are spurious nulls in your file. The final s flag says to include newline in what . matches (otherwise s/.*// would still only substitute on the matching physical line).
I assume you are aware that removing everything will violate the integrity of the HTML file; it needs there to be a closing tag for every start tag near the beginning of the document (so if it starts with <html><body> you should keep </body></html> just before the end of the file, for example).
With awk you could make use of RS variable and then set field separator to regex with word boundaries and then print the very first field as per need.
awk -v RS="^$" -v FS='\\<check_list\\>' '{print $1}' Input_file
You might use q to instruct GNU sed to quit, thus ending processing, consider following simple example, let file.txt content be
123
456
789
and say you want to jettison everything beyond 5, then you could do
sed '/5/{s/5.*//;q}' file.txt
which gives output
123
4
Explanation: for line having 5, substitute 5 and everything beyond it with empty string (i.e. delete it), then q. Observe that lowercase q is used to provide printing of altered line before quiting.
(tested in GNU sed 4.7)
I'm trying to view the content of a file including its delimiters in terminal.
For example:
hello\t\tworld\n
hello\t\tworld\t\tagain\n
instead of just:
hello world
hello world again
I did this once awhile ago using either "sed" or "awk"....I think...but, I can't seem to remember any of it now.
Thanks for the help.
VI can show you this if open the file in it and type :set list.
e.g.
$ cat test.txt
hello world
hello world again
In VI the ^I are tabs and $ are Line Feeds.
Also like the comment states - cat -A will get you the same output:
$ cat -A test.txt
hello^I^Iworld$
hello^I^Iworld^I^Iagain$
you can use od command,
od -t a input_file | awk '{$1=""}1' |
awk 'BEGIN{RS="[ \t\n]+";ORS="";
d["sp"]=" "; d["nl"]="\\n\n"; d["ht"]="\\t"; d["cr"] = "\\r";
}length($0)>1{$0=d[$0]}1'
with input_file
hello world
hello world again
you get,
hello\t\tworld\n
hello\t\tworld again\n
This might work for you (GNU sed):
sed -n l0 file
This will show any tabs and newlines will be replaced by $.
If you wish to see newlines as \n then slurp the whole file:
sed -n '1h;1!H;$!d;x;l0' file
This sed should probably work, you can just modify it to match and display whatever it is you want to see (e.g. whitespace). The tricky part is matching newlines since sed utilizes newlines to delimit the stream, it is already consumed prior to you getting to it. So matching along the lines of sed 's/\n/\\n/' will fail, you can presume the newline by matching the end character and tack on the \\n.
sed 's/$/\\n/;s/\t/\\t/g;s/ \{4\}/\\t/g' file
I have a text file which has a particular line something like
sometext sometext sometext TEXT_TO_BE_REPLACED sometext sometext sometext
I need to replace the whole line above with
This line is removed by the admin.
The search keyword is TEXT_TO_BE_REPLACED
I need to write a shell script for this. How can I achieve this using sed?
You can use the change command to replace the entire line, and the -i flag to make the changes in-place. For example, using GNU sed:
sed -i '/TEXT_TO_BE_REPLACED/c\This line is removed by the admin.' /tmp/foo
You need to use wildcards (.*) before and after to replace the whole line:
sed 's/.*TEXT_TO_BE_REPLACED.*/This line is removed by the admin./'
The Answer above:
sed -i '/TEXT_TO_BE_REPLACED/c\This line is removed by the admin.' /tmp/foo
Works fine if the replacement string/line is not a variable.
The issue is that on Redhat 5 the \ after the c escapes the $. A double \\ did not work either (at least on Redhat 5).
Through hit and trial, I discovered that the \ after the c is redundant if your replacement string/line is only a single line. So I did not use \ after the c, used a variable as a single replacement line and it was joy.
The code would look something like:
sed -i "/TEXT_TO_BE_REPLACED/c $REPLACEMENT_TEXT_STRING" /tmp/foo
Note the use of double quotes instead of single quotes.
The accepted answer did not work for me for several reasons:
my version of sed does not like -i with a zero length extension
the syntax of the c\ command is weird and I couldn't get it to work
I didn't realize some of my issues are coming from unescaped slashes
So here is the solution I came up with which I think should work for most cases:
function escape_slashes {
sed 's/\//\\\//g'
}
function change_line {
local OLD_LINE_PATTERN=$1; shift
local NEW_LINE=$1; shift
local FILE=$1
local NEW=$(echo "${NEW_LINE}" | escape_slashes)
# FIX: No space after the option i.
sed -i.bak '/'"${OLD_LINE_PATTERN}"'/s/.*/'"${NEW}"'/' "${FILE}"
mv "${FILE}.bak" /tmp/
}
So the sample usage to fix the problem posed:
change_line "TEXT_TO_BE_REPLACED" "This line is removed by the admin." yourFile
All of the answers provided so far assume that you know something about the text to be replaced which makes sense, since that's what the OP asked. I'm providing an answer that assumes you know nothing about the text to be replaced and that there may be a separate line in the file with the same or similar content that you do not want to be replaced. Furthermore, I'm assuming you know the line number of the line to be replaced.
The following examples demonstrate the removing or changing of text by specific line numbers:
# replace line 17 with some replacement text and make changes in file (-i switch)
# the "-i" switch indicates that we want to change the file. Leave it out if you'd
# just like to see the potential changes output to the terminal window.
# "17s" indicates that we're searching line 17
# ".*" indicates that we want to change the text of the entire line
# "REPLACEMENT-TEXT" is the new text to put on that line
# "PATH-TO-FILE" tells us what file to operate on
sed -i '17s/.*/REPLACEMENT-TEXT/' PATH-TO-FILE
# replace specific text on line 3
sed -i '3s/TEXT-TO-REPLACE/REPLACEMENT-TEXT/'
for manipulation of config files
i came up with this solution inspired by skensell answer
configLine [searchPattern] [replaceLine] [filePath]
it will:
create the file if not exists
replace the whole line (all lines) where searchPattern matched
add replaceLine on the end of the file if pattern was not found
Function:
function configLine {
local OLD_LINE_PATTERN=$1; shift
local NEW_LINE=$1; shift
local FILE=$1
local NEW=$(echo "${NEW_LINE}" | sed 's/\//\\\//g')
touch "${FILE}"
sed -i '/'"${OLD_LINE_PATTERN}"'/{s/.*/'"${NEW}"'/;h};${x;/./{x;q100};x}' "${FILE}"
if [[ $? -ne 100 ]] && [[ ${NEW_LINE} != '' ]]
then
echo "${NEW_LINE}" >> "${FILE}"
fi
}
the crazy exit status magic comes from https://stackoverflow.com/a/12145797/1262663
In my makefile I use this:
#sed -i '/.*Revision:.*/c\'"`svn info -R main.cpp | awk '/^Rev/'`"'' README.md
PS: DO NOT forget that the -i changes actually the text in the file... so if the pattern you defined as "Revision" will change, you will also change the pattern to replace.
Example output:
Abc-Project written by John Doe
Revision: 1190
So if you set the pattern "Revision: 1190" it's obviously not the same as you defined them as "Revision:" only...
bash-4.1$ new_db_host="DB_HOSTNAME=good replaced with 122.334.567.90"
bash-4.1$
bash-4.1$ sed -i "/DB_HOST/c $new_db_host" test4sed
vim test4sed
'
'
'
DB_HOSTNAME=good replaced with 122.334.567.90
'
it works fine
To do this without relying on any GNUisms such as -i without a parameter or c without a linebreak:
sed '/TEXT_TO_BE_REPLACED/c\
This line is removed by the admin.
' infile > tmpfile && mv tmpfile infile
In this (POSIX compliant) form of the command
c\
text
text can consist of one or multiple lines, and linebreaks that should become part of the replacement have to be escaped:
c\
line1\
line2
s/x/y/
where s/x/y/ is a new sed command after the pattern space has been replaced by the two lines
line1
line2
cat find_replace | while read pattern replacement ; do
sed -i "/${pattern}/c ${replacement}" file
done
find_replace file contains 2 columns, c1 with pattern to match, c2 with replacement, the sed loop replaces each line conatining one of the pattern of variable 1
To replace whole line containing a specified string with the content of that line
Text file:
Row: 0 last_time_contacted=0, display_name=Mozart, _id=100, phonebook_bucket_alt=2
Row: 1 last_time_contacted=0, display_name=Bach, _id=101, phonebook_bucket_alt=2
Single string:
$ sed 's/.* display_name=\([[:alpha:]]\+\).*/\1/'
output:
100
101
Multiple strings delimited by white-space:
$ sed 's/.* display_name=\([[:alpha:]]\+\).* _id=\([[:digit:]]\+\).*/\1 \2/'
output:
Mozart 100
Bach 101
Adjust regex to meet your needs
[:alpha] and [:digit:]
are Character Classes and Bracket Expressions
This worked for me:
sed -i <extension> 's/.*<Line to be replaced>.*/<New line to be added>/'
An example is:
sed -i .bak -e '7s/.*version.*/ version = "4.33.0"/'
-i: The extension for the backup file after the replacement. In this case, it is .bak.
-e: The sed script. In this case, it is '7s/.*version.*/ version = "4.33.0"/'. If you want to use a sed file use the -f flag
s: The line number in the file to be replaced. In this case, it is 7s which means line 7.
Note:
If you want to do a recursive find and replace with sed then you can grep to the beginning of the command:
grep -rl --exclude-dir=<directory-to-exclude> --include=\*<Files to include> "<Line to be replaced>" ./ | sed -i <extension> 's/.*<Line to be replaced>.*/<New line to be added>/'
The question asks for solutions using sed, but if that's not a hard requirement then there is another option which might be a wiser choice.
The accepted answer suggests sed -i and describes it as replacing the file in-place, but -i doesn't really do that and instead does the equivalent of sed pattern file > tmp; mv tmp file, preserving ownership and modes. This is not ideal in many circumstances. In general I do not recommend running sed -i non-interactively as part of an automatic process--it's like setting a bomb with a fuse of an unknown length. Sooner or later it will blow up on someone.
To actually edit a file "in place" and replace a line matching a pattern with some other content you would be well served to use an actual text editor. This is how it's done with ed, the standard text editor.
printf '%s\n' '/TEXT_TO_BE_REPLACED/' d i 'This line is removed by the admin' . w q | \
ed -s /tmp/foo > /dev/null
Note that this only replaces the first matching line, which is what the question implied was wanted. This is a material difference from most of the other answers.
That disadvantage aside, there are some advantages to using ed over sed:
You can replace the match with one or multiple lines without any extra effort.
The replacement text can be arbitrarily complex without needing any escaping to protect it.
Most importantly, the original file is opened, modified, and saved. A copy is not made.
How it works
How it works:
printf will use its first argument as a format string and print each of its other arguments using that format, effectively meaning that each argument to printf becomes a line of output, which is all sent to ed on stdin.
The first line is a regex pattern match which causes ed to move its notion of "the current line" forward to the first line that matches (if there is no match the current line is set to the last line of the file).
The next is the d command which instructs ed to delete the entire current line.
After that is the i command which puts ed into insert mode;
after that all subsequent lines entered are written to the current line (or additional lines if there are any embedded newlines). This means you can expand a variable (e.g. "$foo") containing multiple lines here and it will insert all of them.
Insert mode ends when ed sees a line consisting of .
The w command writes the content of the file to disk, and
the q command quits.
The ed command is given the -s switch, putting it into silent mode so it doesn't echo any information as it runs,
the file to be edited is given as an argument to ed,
and, finally, stdout is thrown away to prevent the line matching the regex from being printed.
Some Unix-like systems may (inappropriately) ship without an ed installed, but may still ship with an ex; if so you can simply use it instead. If have vim but no ex or ed you can use vim -e instead. If you have only standard vi but no ex or ed, complain to your sysadmin.
It is as similar to above one..
sed 's/[A-Za-z0-9]*TEXT_TO_BE_REPLACED.[A-Za-z0-9]*/This line is removed by the admin./'
Below command is working for me. Which is working with variables
sed -i "/\<$E\>/c $D" "$B"
I very often use regex to extract data from files I just used that to replace the literal quote \" with // nothing :-)
cat file.csv | egrep '^\"([0-9]{1,3}\.[0-9]{1,3}\.)' | sed s/\"//g | cut -d, -f1 > list.txt
I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv
I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.