What's the fastest / most efficient find/replace app on *nix - linux

I've got a large SQL dump 250MB+ and I need to replace www.mysite with dev.mysite. I have tried nano and vi for doing find/replace but both choke. Nano can't even open it, and vi has been doing the find/replace now for an hour.
Anyone know of a tool on *nix or windows systems that does fast Find/Replace on large files?

sed -i 's/www\.mysite/dev.mysite/g' dump.sql
(requires temporary storage space equal to the size of the input)

Search/replace on a SQL dump is not a good idea
They aren't text files
SQL syntax errors are easily introduced
They contain very long lines sometimes.
What you should do is load it into a non-production database server, run the appropriate UPDATE statements then dump it again. You can use the REPLACE function in MySQL for this.

you need sed
example
sed -e "s/www.mysite/dev.mysite/g" your_large_sql
alternatively, import the sql into database, then use replace to replace for matched strings

Related

^# character wreaking havoc in Windows Postgres backup file on Linux

I got some Postgres table dumps from somebody using pgAdmin3 on Windows. (Blech.) First of all, it has a whole bunch of extra crap at the top of the file that I've had to get rid of-- things like "toc.dat" without comments, etc.
I've resorted to editing them by hand to get them in workable format to be imported, because as it stands they are somewhat garbled; for the most part I've succeeded, but when I open them in emacs, for example, they tend to be littered with the following character:
^#
and sometimes just alot of:
###
I haven't figured out how to remove them using sed or awk, mainly because I have no idea what they are (I don't think they are null characters) or even how to search for them in emacs. They show up as red for 'unprintable' characters. (Screenshot above.) They also don't seem to be printed to the terminal when I cat the file or when I open it in my OS X Text editor, but they certainly cause errors when I try to import the file in to postgres using
psql mydatabase < table.backup
unless I edit them all out.
Anybody have any idea of a good way to get rid of these short of editing them by hand? I've tried in place sed and also tried using tr, but to no effect-- perhaps I'm looking for the wrong thing. (As I'm sure you are aware, trying to google for '^#' is futile!)
Just was wondering if anybody had come across this at all because it's going to eat at me unless I figure it out...
Thanks!
Those are null characters. You can remove them with:
tr -d '\000' < file1 > file2
where the -d parameter is telling tr to remove characters with the octal value 000.
I found the tr command on this forum post, so some credit goes to them.
I might suggest acquiring access to a Windows machine (never thought I'd say that), loading the original dumps they gave you, and exporting in some other formats to see if you can avoid the problem altogether. Which to me seems safer than running any for of sed or tr on a database dump before importing. Good luck!

Cygwin replace all instances of a character in a text file

I have a text file that has numerous instances of the character '^'. I need all those instances replaced with a '['.
I prefer to use cygwin but would use windows command prompt if there is a direct way to do this. My initial instinct was to use Access (no other DB installed) to use the 'replace' function, but as I'm connecting using Jet - apparently this is not possible as per Exception when trying to execute "REPLACE" against MS Access
What's the cleanest way to achieve this?
Try this:
sed -i 's/\^/\[/g' myfile
I've tested this: sed 's/\^/[/g' //by the command line

Sed doesn't read CSV when saved in Excel

I'm trying to write a bash script that imports a CSV file and sends it off to somewhere on the web. If I use a handwritten CSV, i.e:
summary,description
CommaTicket1,"Description, with a comma"
QuoteTicket2,"Description ""with quotes"""
CommaAndQuoteTicke3,"Description, with a commas, ""and quotes"""
DoubleCommaTicket4,"Description, with, another comma"
DoubleQuoteTicket5,"Description ""with"" double ""quoty quotes"""
the READ command is able to read the file fine. However, if I create "the same file" (i.e: with the same fields) in Excel, READ doesn't work as it should and usually just reads the first value and that's all.
In relatively new to Bash scripting, so if someone thinks its a problem with my code, I'll upload it, but it seems it's a problem with the way Excel for Mac saves files, and I thought someone might have some thoughts on that.
Anything you guys can contribute will be much appreciated. Cheers!
By default, Excel on Mac indicates new records using the carriage-return character, but bash is looking for records using the newline character. When saving a file in Excel for Mac, be sure to change the character encoding (an option that is available when saving the file) to DOS or Windows, or the like, which should pop in a carriage-return and a newline, and should be "readable".
Alternatively, you could just process the file with tr, and convert all the CRs to LFs, i.e.,
tr '\r' '\n' < myfile.csv > newfile.csv
One way you can verify if this actually is the problem is by using od to inspect the file. Use something like:
od -c myfile.csv
And look for the end-of-line character.
Finally, you could also investigate bash's internal IFS variable, and set it to include "\r" in it. See: http://tldp.org/LDP/abs/html/internalvariables.html

text and file utility in windows

I need to do things like: taking the first x lines of text file and save it into another text file, what kind of text utilities can I use in windows?
Use a decent text editor like Notepad++ or Vim.
If you aren't afraid of using the command line, I'd suggest taking a look at Gnuwin32, which is a port of many useful *nix utilities for Windows.
It contains heavyweight such as Sed, Awk, Grep etc., which are more than suited for any kind of text surgery.
if you want to write a batchfiles that extracts the first 10 lines of file myInputFile.txt to myOutputFile.txt use
head.exe --lines=10 myInputFile.txt > myOutputFile.txt
head.exe is one of severeal GnuUtilities for MsWindows.

Edit very large sql dump/text file (on linux)

I have to import a large mysql dump (up to 10G). However the sql dump already predefined with a database structure with index definition. I want to speed up the db insert by removing the index and table definition.
That means I have to remove/edit the first few lines of a 10G text file. What is the most efficient way to do this on linux?
Programs that require loading the entire file into RAM will be an overkill to me.
Rather than removing the first few lines, try editing them to be whitespace.
The hexedit program can do this-- it reads files in chunks, so opening a 10GB file is no different from opening a 100KB file to it.
$ hexedit largefile.sql.dump
tab (switch to ASCII side)
space (repeat as needed until your header is gone)
F2 (save)/Ctrl-X (save and exit)/Ctrl-C (exit without saving)
joe is an editor that works well with large files. I just used it to edit a ~5G SQL dump file. It took about a minute to open the file and a few minutes to save it, with very little use of swap (on a system with 4G RAM).
sed 's/OLD_TEXT/NEW_TEXT/g' < oldfile > newfile
or
cat file | sed 's/OLD_TEXT/NEW_TEXT/g' > newfile
Perl can read the file line by line:
perl -pi.bak -e 's/^create index/--create index/'

Resources