Linux command to replace string in LARGE file with another string

Linux command to replace string in LARGE file with another string - linux

I have a huge SQL file that gets executed on the server. The dump is from my machine and in it there are a few settings relating to my machine. So basically, I want every occurance of "c://temp" to be replace by "//home//some//blah"
How can this be done from the command line?

sed is a good choice for large files.
sed -i.bak -e 's%C://temp%//home//some//blah%' large_file.sql
It is a good choice because doesn't read the whole file at once to change it. Quoting the manual:
A stream editor is used to perform
basic text transformations on an input
stream (a file or input from a
pipeline). While in some ways similar
to an editor which permits scripted
edits (such as ed), sed works by
making only one pass over the
input(s), and is consequently more
efficient. But it is sed's ability to
filter text in a pipeline which
particularly distinguishes it from
other types of editors.
The relevant manual section is here. A small explanation follows
-i.bak enables in place editing leaving a backup copy with .bak extension
s%foo%bar% uses s, the substitution command, which
substitutes matches of first string
in between the % sign, 'foo', for the second
string, 'bar'. It's usually written as s//
but because your strings have plenty
of slashes, it's more convenient to
change them for something else so you
avoid having to escape them.
Example
vinko#mithril:~$ sed -i.bak -e 's%C://temp%//home//some//blah%' a.txt
vinko#mithril:~$ more a.txt
//home//some//blah
D://temp
//home//some//blah
D://temp
vinko#mithril:~$ more a.txt.bak
C://temp
D://temp
C://temp
D://temp

Just for completeness. In place replacement using perl.
perl -i -p -e 's{c://temp}{//home//some//blah}g' mysql.dmp
No backslash escapes required either. ;)

Try sed? Something like:
sed 's/c:\/\/temp/\/\/home\/\/some\/\/blah/' mydump.sql > fixeddump.sql
Escaping all those slashes makes this look horrible though, here's a simpler example which changes foo to bar.
sed 's/foo/bar/' mydump.sql > fixeddump.sql
As others have noted, you can choose your own delimiter, which would prevent the leaning toothpick syndrome in this case:
sed 's|c://temp\\|home//some//blah|' mydump.sql > fixeddump.sql
The clever thing about sed is that it operating on a stream rather than a file all at once, so you can process huge files using only a modest amount of memory.

There's also a non-standard UNIX utility, rpl, which does the exact same thing that the sed examples do; however, I'm not sure whether rpl operates streamwise, so sed may be the better option here.

The sed command can do that.
Rather than escaping the slashes, you can choose a different delimiter (_ in this case):
sed -e 's_c://temp/_/home//some//blah/_' file1.txt > file2.txt

perl -pi -e 's#c://temp#//home//some//blah#g' yourfilename
The -p will treat this script as a loop, it will read the specified file line by line running the regex search and replace.
-i This flag should be used in conjunction with the -p flag. This commands Perl to edit the file in place.
-e Just means execute this perl code.
Good luck

gawk
awk '{gsub("c://temp","//home//some//blah")}1' file

Related

Find line starts with and replace in linux using sed [duplicate]

This question already has answers here:
Replace whole line when match found with sed
(4 answers)
Closed 4 years ago.
How do I find line starts with and replace complete line?
File output:
xyz
abc
/dev/linux-test1/
Code:
output=/dev/sda/windows
sed 's/^/dev/linux*/$output/g' file.txt
I am getting below Error:
sed: -e expression #1, char 9: unknown option to `s'
File Output expected after replacement:
xyz
abc
/dev/sda/windows

Let's take this in small steps.
First we try changing "dev" to "other":
sed 's/dev/other/' file.txt
/other/linux-test1/
(Omitting the other lines.) So far, so good. Now "/dev/" => "/other/":
sed 's//dev///other//' file.txt
sed: 1: "s//dev///other//": bad flag in substitute command: '/'
Ah, it's confused, we're using '/' as both a command delimiter and literal text. So we use a different delimiter, like '|':
sed 's|/dev/|/other/|' file.txt
/other/linux-test1/
Good. Now we try to replace the whole line:
sed 's|^/dev/linux*|/other/|' file.txt
/other/-test1/
It didn't replace the whole line... Ah, in sed, '*' means the previous character repeated any number of times. So we precede it with '.', which means any character:
sed 's|^/dev/linux.*|/other/|' file.txt
/other/
Now to introduce the variable:
sed 's|^/dev/linux.*|$output|' file.txt
$output
The shell didn't expand the variable, because of the single quotes. We change to double quotes:
sed "s|^/dev/linux.*|$output|" file.txt
/dev/sda/windows

This might work for you (GNU sed):
output="/dev/sda/windows"; sed -i '\#/dev/linux.*/#c'"$output" file
Set the shell variable and change the line addressed by /dev/linux.*/ to it.
N.B. The shell variable needs to interpolated hence the ; i.e. the variable may be set on a line on its own. Also the the delimiter for the sed address must be changed so as not to interfere with the address, hence \#...#, and finally the shell variable should be enclosed in double quotes to allow full interpolation.

I'd recommend not doing it this way. Here's why.
Sed is not a programming language. It's a stream editor with some constructs that look and behave like a language, but it offers very little in the way of arbitrary string manipulation, format control, etc.
Sed only takes data from a file or stdin (also a file). Embedding strings within your sed script is asking for errors -- constructs like s/re/$output/ are destined to fail at some point, almost regardless of what workarounds you build into your sed script. The best solutions for making sed commands like this work is to do your input sanitization OUTSIDE of sed.
Which brings me to ... this may be the wrong tool for this job, or might be only one component of the toolset for the job.
The error you're getting is obviously because the sed command you're using is horribly busted. The substitute command is:
s/pattern/replacement/flags
but the command you're running is:
s/^/dev/linux*/$output/g
The pattern you're searching for is ^, the null at the beginning of the line. Your replacement pattern is dev, then you have a bunch of text that might be interpreted as flags. This plainly doesn't work, when your search string contains the same character that you're using as a delimiter to the options for the substitute command.
In regular expressions and in sed, you can escape things. You while you might get some traction with s/^\/dev\/linux.*/$output/, you'd still run into difficulty if $output contained slashes. If you're feeding this script to sed from bash, you could use ${output//\//\\\/}, but you can't handle those escapes within sed itself. Sed has no variables.
In a proper programming language, you'd have better separation of variable content and the commands used for the substitution.
output="/dev/sda/windows"
awk -v output="$output" '$1~/\/dev\/linux/ { $0=output } 1' file.txt
Note that I've used $1 here because in your question, your input lines (and output) appear to have a space at the beginning of each line. Awk automatically trims leading and trailing space when assigning field (positional) variables.
Or you could even do this in pure bash, using no external tools:
output="/dev/sda/windows"
while read -r line; do
[[ "$line" =~ ^/dev/linux ]] && line="$output"
printf '%s\n' "$line"
done < file.txt
This one isn't resilient in the face of leading whitespace. Salt to taste.
So .. yes, you can do this with sed. But the way commands get put together in sed makes something like this risky, and despite the available workarounds like switching your substitution command delimiter to another character, you'd almost certainly be better off using other tools.

SED or other editor - remove strings from file on Windows

I need to find a string in a textfile, delete the line containing it, and save the file. The string is found (read from) another textfile, containing hundreds of different strings, one per row. The process is to go on from the first to the last string in the file.
Any (hopefully easy to use) text editors (on Windows OS) recommended ? To achive the task.
I am not into serious day-to-day editing. So I'd be ever so happy if the task could be accomplished with a easy-to-use but still reliable editor.
Thanks a bunch,
Frank

You can try notepad++ since it has a lot of plugins, also a great search algorithm. I did a similar task where I had to do a lot of search/replace stuff, and used a plugin I dug up from the internet, can't remember the name exactly (try google-ing I think it's replaacc for notepad++ or something similar).

On unix/linux/cygwin:
grep -v -f pattern_file unmodified_file > new_file
Remove all lines containing the patterns in pattern_file from unmodified_file, write to new_file.
grep -v outputs lines not matching any pattern. -f reads patterns from a file.
On windows this appears equivalent to running this at the command prompt:
FINDSTR /V /G:pattern_file unmodified_file > new_file
That's it. If you already have the two source files, it's a one-liner.
pattern_file is going to be whitespace and case sensitive unless you delve into other options, which are described with FINDSTR /?

Using sed:
sed -n '/PATTERN/n;p' FILE > FILE.new # then copy FILE.new to FILE
Tells sed to not output anything by default (-n), find the pattern (/PATTERN/) and skip this line if found (n), otherwise print the line (;p). If you have GNU sed you do can just call
sed -i -n '/PATTERN/n;p' FILE`
which automatically updates the file due to (-i /--in-place).

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?

Can you please try this
sed -i 's/\\n/\n/g' input_filename

What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.

awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\

$ echo "\n" | sed -e 's/[\\][n]/hello/'

sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

Linux Prompt Change Content Within File based on File Name

I know how to do a search and replace amongst group of files:
perl -pi -w -e 's/search/replace/g;' *.php
So I can use that to search for a keyword or phrase and change it. But I have a more complicated task I dont know how to do.
I want to do a search and replace among all my php files to search for a specific Keyword and replace it with the File Name minus the extension.
Example: Search the file Mountains.php for the keyword Trees and everywhere you see Trees, replace it with Mountains
Of course I want to be able to do that in batch, for a few hundred php files all with different names, however, all containing the search term Trees.
If someone is looking for an extra challenge, haha, it would be even better if I could do a more complex scenario such as....
Example: Search the file MountainTowns.php for the keyword Trees and everywhere you see Trees, replace it with "Mountain Towns" (note the extra space, Capital Letters could would indicate where spaces go)
Thanks for your time and considering my question.

Well, the filename is in $ARGV, so there is not much more work needed.
perl -i -pe '($x=$ARGV)=~s{.php$}{};s{Trees}{$x}g' BlueMountains.php RedMountains.php
Add in
$x=~s{(.)([A-Z])}{$1 $2}g;
to add the space before upcased letters, for a complete line of
perl -i -pe '($x=$ARGV)=~s{.php$}{};$x=~s{(.)([A-Z])}{$1 $2}g;s{Trees}{$x}g' BlueRedMountains.php

This might work for you:
printf "%s\n" *.php |perl -pwe 's|(.*).php|perl -pi -we "s/Trees/$1/g;" $&|' | bash
This uses perl to write a script to do you bidding.
Other little languages could be employed, like awk or:
printf "%s\n" *.php |sed 'h;s/\.php//;s/\B[A-Z]/ &/;G;s|\(.*\)\n\(.*\)|sed -i "s/Trees/\1/g" \2|' | bash
This uses sed to provide a solution for the second request.

You want a separate replacement for each file, so run a separate search and replace for each:
for file in *.php; do sed -i "s/foo/${file%.*}/g" "$file"; done
And your second request is a bit harder, it at least requires a subshell.
for file in *; do sed -i "s/bar/$(echo ${file%.*} | sed 's/\(.\)\([A-Z]\)/\1 \2/')/g" "$file"; done
It's a bit more readable if you put it in a script:
#!/bin/bash
for file in "$#"; do
replacement=$(echo ${file%.*} | sed 's/\(.\)\([A-Z]\)/\1 \2/')
sed -i "s/bar/$replacement/g" "$file";
done
This will work over all the arguments passed it, so call with ./script.sh *.php.

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.

It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.

sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv

Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv

What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.

Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).

Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile

yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string