I am quite new to shell scripting.
I am scraping a website and the scraped text contains a lot of repetitions. Usually they are the menus on a forum, for example. Mostly, I do this in Python, but I thought that sed command will save me reading and printing the input, loops etc. I want to delete thousands of repeated lines from the same single file. I do not want to copy it to another file, because I will end up with 100 new files. The following is a shadow script which I run from the bash shell.
#!/bin/sed -f
sed -i '/^how$/d' input_file.txt
sed -i '/^is test$/d' input_file.txt
sed -i '/^repeated text/d' input_file.txt
This is the content of the input file:
how to do this task
why it is not working
this is test
Stackoverflow is a very helpful community of programmers
that is test
this is text
repeated text is common
this is repeated text of the above line
Then I run in the shell the following command:
sed -f scriptFile input_file.txt
I get the following error
sed: scriptFile line 2: untermindated `s' command
How can I correct the script, and what is the correct syntax of the command I should use to get it work?
Any help is highly appreciated.
assuming you know what your script is doing, it's very easy to put them into a script. in your case, the script should be:
/^how$/d
/^is test$/d
/^repeated text/d
that's good enough.
to make the script alone to be executable is easy too:
#!/usr/bin/env sed -f
/^how$/d
/^is test$/d
/^repeated text/d
then
chmod +x your_sed_script
./your_sed_script <old >new
here is a very good and compact tutorial. you can learn a lot from it.
following is an example from the site, just in case the link is dead:
If you have a large number of sed commands, you can put them into a file and use
sed -f sedscript <old >new
where sedscript could look like this:
# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g
Wouldn't it be easier to do it with egrep followed by a mv, for example
egrep -v 'pattern1|pattern2|pattern3|...' <input_file.txt >tmpfile.txt
mv tmpfile.txt input_file.txt
Each pattern would describe the lines being deleted, much like in sed. You would not end up with additional files, because the mv removes them.
If you have so many pattern, that you don't want to specify them directly on the command line, you can store them in a file use the -f option of egrep.
Related
I'm trying to use the sed command in terminal to replace a specific line in all my text files with a certain extension by a specific string:
sed -i.bak '35s/^.*$/5\) 1\-4/' fitting_file*.feedme
So I am trying to replace line 35 in each of these files with the string "5) 1-4". When I run an ls fitting_file*.feedme | wc -l command in this directory, I get 221 files. However, when I run the above sed command, it only edits the FIRST file in the order of ls fitting_file*.feedme. I know this because grep '5) 1-4' fitting_file*.feedme continually only returns the first file on the list after I run the replacement command. I also tried replacing fitting_file*.feedme with a space-separated list of a couple of these files in my sed command as a test, but it still only operated on the one I chose to list first. Why is this happening?
sed operates on a single stream. It essentially concats all the files together and treats that as a single stream. So it replaces the 35th line of the big concatenated stream.
To see this, make a 20 line file called A and a 20 line file called B. Apply your sed command as
sed -i.bak '35s/^.*$/5\) 1\-4/' A B
and you will see the 15th line of B replaced.
I think this should answer your direct question. As far how to get done what you like, I assume you've already figured out that wrapping your sed command in a for is one way to do it. :)
Try
Create a file containing your sed instruction like this
#!/bin/bash
sed -i.bak '35s/^.*$/5\) 1\-4/' $1
exit 0
and call it prog.sh. Next make it executable :
chmod u+x prog.sh
now you can solve your problem using
find . -name fitting_file\*.feedme -exec ./prog.sh {} \;
You could do all this on one line but frankly the number of escapes required is a bit much. Good luck.
To do what you're trying to do without using a shell loop is:
awk -i inplace -v inplace::suffix=.bak 'FNR==35{$0="5) 1-4"}1' fitting_file*.feedme
Note that unlike sed which can just count lines across all input files, awk has NR to track the number of records (lines by default) across all files and FNR for the same but just within the current file.
The above uses GNU awk for inplace editing just like GNU sed has -i for that. The default awk on MacOS is BSD awk, not GNU awk, but you should install GNU awk as it doesn't have all the bugs/quirks that BSD awk does and it has a ton of extremely useful extensions.
If you just want to use MacOS's awk then it'd be something like:
find . -name 'fitting_file*.feedme' -exec sh -c "\
awk 'FNR==35{\$0=\"5) 1-4\"}1' \"\$1\" > \"\$1.bak\" &&
mv -- \"\$1.bak\" \"\$1\"
" sh {} \;
which is obviously getting kinda complicated - I'd probably put the awk+mv script in a file to execute from sh -c or just resort to a shell loop myself if faced with that alternative (or a similar quoting nightmare with xargs)!
I have to implement an application in shell programming (Unix/Linux).
I have to search a word from a text file and replace that word with my given word. I have a knowledge on shell and still learning.
I am not expecting source code. Can anybody help me or suggest me or give me some similar solution....
cat abc.txt | grep "pattern" | sed 's/"pattern"/"new pattern"/g'
The above command should work
Thanks,
Regards,
Dheeraj Rampally
Say you are looking for pattern in a file (input.txt) and want to replace it with "new pattern" in another (output.txt)
Here is the main idea, without UUOC:
<input.txt sed 's/"pattern"/"new pattern"/g' >output.txt
todo
Now you need to embed this line in your program. You may want to make it interactive, or a command that you could use with 3 parameters.
edit
I tried to avoid the use of output.txt as a temporary file with this:
<input.txt sed 's/"pattern"/"new pattern"/g' >input.txt
but it empties input.txt for a reason I can't understand. So I tried with a subshell, so:
echo $(<input.txt sed 's/pattern/"new pattern"/g')>input.txt
... but the echo command removes line breaks... still looking.
edit2
From https://unix.stackexchange.com/questions/11067/is-there-a-way-to-modify-a-file-in-place , it looks like writing to the very same file at once it not easy at all. However, I could do what I wanted with sed -i for linux only:
sed -i 's/pattern/"new pattern"/g' input.txt
From sed -i + what the same option in SOLARIS , it looks like there's no alternative, and you must use a temporary file:
sed 's/pattern/"new pattern"/g' input.txt > input.tmp && mv input.tmp input.txt
I am trying to execute a command using Vi or ex to edit a file by deleting the first five lines, replace x with y, remove extra spaces at the end of each line but retain the carraige returns, and remove the last eight lines of the file, then rename the file into a shell script and run the new script from the current script.
This will be something that is scheduled in cron. I have been looking for a simple way to do it using the command line or a Vim script or something.
Any ideas? The format of the input file does not change, just the amount of lines, so I can't specify the line numbers for the last eight lines.
You actually have about half a dozen questions here. Here's an answer for the first five which are probably the ones you'll have the most difficulty solving:
sed -e ':label' -n -e '1d' -e 's/x/y/g' -e 's/[ \t]*$//g' -e '1,9!{P;N;D};N;b label' file.txt > script.sh
Vi is an interactive editor. You probably don't want to use it for something that'll be run by cron. Also, I agree with the comments saying this is probably a bad idea. Be that as it may:
printf 'one\ntwo\nthree\nfour\nfive\necho x \n1\n2\n3\n4\n5\n6\n7\n8\n' \
| sed '1,5d;s/ *$//;s/x/y/' \
| tail -r | sed 1,8d | tail -r \
| sh
Our first sed script does most of the work. We reverse the lines with tail -r, then delete the first 8 lines, then reverse again. That trims off the last 8 lines.
Note that on Linux systems (or any with GNU coreutils), you may also have a tac command which reverse lines, but tail -r is more portable.
Also, the final | sh simply runs the output. If you REALLY want to save this as a script, you can do that by redirecting the output to a file ... but I'll leave at least that to your imagination. Can't do all your scripting for you, can we?! :-)
To edit a file by a script, you could use ed (even if it hard to learn or remember).
You could also use some scripting language (Python, Perl, AWK, Ruby) to achieve your goal.
Assuming an ini-style file like this,
[Group]
Icon=xxx.ico
Title=An Image Editor
Description=Manipulates .ico, .png and .jpeg images
I want to replace/delete ".ico" ONLY in the line that starts with (or matches) "Icon="
I was trying this:
oldline="`cat "$file" | grep "Icon="`"
newline="`echo "$oldline" | tr ".ico" ".png"`"
cat "$oldfile" | tr "$oldline" "$newline" > $file
Then i realized that tr works completely different than i thought. Its NOT a tradicional "replace this for that" function. So i guess the correct way is using sed. But:
Ive never used sedbefore. No idea how it works. Is it overkill?
If the most indicated way is really using sed, given it is so powerful, is there any elegant way to accomplish this rather than this "fetch line -> modify line -> replace oldline for newline in file" approach?
Notes:
I cant replace ".ico" globally, i know that would be a LOT easier, i must restrict the replace to the Icon line, otherwise the Description line would be changed too.
Im new to shell scripting in Linux, so im looking not only to the solution itself, but also for the "proper" way to do it. Elegant, easy to read, conventional, etc
Thanks in advance!
Edit:
Thank you guys! Here is the final script, as a reference:
#! /bin/bash
# Fix the following WARNING in ~/.xsession-errors
# gnome-session[2035]: EggSMClient-WARNING: Desktop file '/home/xxx/.config/autostart/skype.desktop' has malformed Icon key 'skype.png'(should not include extension)
file="$HOME/.config/autostart/skype.desktop"
if [ -f "$file" ] ; then
if `cat "$file" | grep "Icon=" | grep -q ".png"` ; then
sed -i.bak '/^Icon=/s/\.png$//' "$file"
cp "$file" "$PWD"
cp "${file}.bak" "$PWD"
else
echo "Nothing to fix! (maybe fixed already?)"
fi
else
echo "Skype not installed (yet...)"
fi
MUCH sleeker than my original! The only thing i regret is that sed backup does not preserve original file timestamp. But i can live with that.
And, for the record, yes, ive created this script to fix an actual "bug" in Skype packaging.
Something like the following in sed should do what you need. First we check if the line starts with Icon= and if it does then we run the s command (i.e. substitute).
sed -i '/^Icon=/s/\.ico$/.png/' file
Edit: The sed script above can also be written like this:
/^Icon=/ { # Only run the following block when this matches
s/\.ico$/.png/ # Substitute '.ico' at the end of the line with '.png'
}
See this page for more details on how to restrict when commands are run.
sed is pretty easy to deal with. Here's one way:
sed 's/^\(Icon=.*\)\.ico$/\1.png/'
By default, sed works on every line in the file one at a time. The 's/.../.../' will do a regular expression match on the first argument and replace it with the second argument. The \1 stands for everything that matched the first group, which is demarcated by the parenthesis. You have to escape the parens with \.
The above works as part of a pipeline, but you can add an '-i' flag, like this
sed -i 's/^\(Icon=.*\)\.ico$/\1.png/' input.txt
to have it replace the file input.txt in place. Don't add that until you have tested your sed script a little.
I have a huge SQL file that gets executed on the server. The dump is from my machine and in it there are a few settings relating to my machine. So basically, I want every occurance of "c://temp" to be replace by "//home//some//blah"
How can this be done from the command line?
sed is a good choice for large files.
sed -i.bak -e 's%C://temp%//home//some//blah%' large_file.sql
It is a good choice because doesn't read the whole file at once to change it. Quoting the manual:
A stream editor is used to perform
basic text transformations on an input
stream (a file or input from a
pipeline). While in some ways similar
to an editor which permits scripted
edits (such as ed), sed works by
making only one pass over the
input(s), and is consequently more
efficient. But it is sed's ability to
filter text in a pipeline which
particularly distinguishes it from
other types of editors.
The relevant manual section is here. A small explanation follows
-i.bak enables in place editing leaving a backup copy with .bak extension
s%foo%bar% uses s, the substitution command, which
substitutes matches of first string
in between the % sign, 'foo', for the second
string, 'bar'. It's usually written as s//
but because your strings have plenty
of slashes, it's more convenient to
change them for something else so you
avoid having to escape them.
Example
vinko#mithril:~$ sed -i.bak -e 's%C://temp%//home//some//blah%' a.txt
vinko#mithril:~$ more a.txt
//home//some//blah
D://temp
//home//some//blah
D://temp
vinko#mithril:~$ more a.txt.bak
C://temp
D://temp
C://temp
D://temp
Just for completeness. In place replacement using perl.
perl -i -p -e 's{c://temp}{//home//some//blah}g' mysql.dmp
No backslash escapes required either. ;)
Try sed? Something like:
sed 's/c:\/\/temp/\/\/home\/\/some\/\/blah/' mydump.sql > fixeddump.sql
Escaping all those slashes makes this look horrible though, here's a simpler example which changes foo to bar.
sed 's/foo/bar/' mydump.sql > fixeddump.sql
As others have noted, you can choose your own delimiter, which would prevent the leaning toothpick syndrome in this case:
sed 's|c://temp\\|home//some//blah|' mydump.sql > fixeddump.sql
The clever thing about sed is that it operating on a stream rather than a file all at once, so you can process huge files using only a modest amount of memory.
There's also a non-standard UNIX utility, rpl, which does the exact same thing that the sed examples do; however, I'm not sure whether rpl operates streamwise, so sed may be the better option here.
The sed command can do that.
Rather than escaping the slashes, you can choose a different delimiter (_ in this case):
sed -e 's_c://temp/_/home//some//blah/_' file1.txt > file2.txt
perl -pi -e 's#c://temp#//home//some//blah#g' yourfilename
The -p will treat this script as a loop, it will read the specified file line by line running the regex search and replace.
-i This flag should be used in conjunction with the -p flag. This commands Perl to edit the file in place.
-e Just means execute this perl code.
Good luck
gawk
awk '{gsub("c://temp","//home//some//blah")}1' file