Linux command to replace string in HUGE file with another string - vim

I have a huge file (8GB), I want replace on the first 30 lines the String LATIN1 with UTF-8 what is the most efficient method? Means exist there a way to use probably sed but to quit after parsed first 30 lines.
VIM was not able to save the file in 3 hours.

The problem is that in the event of a replacement, all programs will make a copy of the file with the substitution in place in order to replace the original file ultimately -- they don't want to risk losing the original for obvious reasons.
With perl, you can do this in a one-liner, but that doesn't make it any shorter (well, it probably does compared to vim, since vim preserves history in yet another file, which perl doesn't):
perl -pi -e 's,\bLATIN1\b,UTF-8,g if $. <= 30' thefile

With sed, you can quit using q:
sed -e 's/LATIN1/UTF-8/g' -e 30q

untested, but I think ed will edit the file in-place without writing to a temp file.
ed yourBigFile << END
1,30s/LATIN1/UTF-8/g
w
q
END

Related

How to fzf recent files of vim/nvim, not inside vim but from terminal

I know how to fzf.vim, but I'd like to open from terminal.
Grepping history or viminfo may be achieve thst, but I wonder if there is any smart way.
This is how you can save the list of recent files from vim to a file:
vim -c "call append(0, v:oldfiles)" -c "write vim-oldfiles.tmp" -c exit
Put v:oldfiles (the list of recent files saved in ~/.viminfo) into the first (new and empty at the start) buffer, write the buffer to a file, exit.
Now you can pass the content of file to fzf.
Not exact solution. But you could open a terminal buffer on the lower part of your vim edit like an IDE and use your terminal fzf
However, not sure if this will let you open a file in a new vim tab
I have an zsh autoloaded function called old:
function old(){
vim -c 'redir >> /tmp/oldfiles.txt | silent oldfiles | redir end | q'
sed -i '/NvimTree$/d' /tmp/oldfiles.txt
local fname
fname=$(awk '/home/ && !/man:/ {print $2}' /tmp/oldfiles.txt | fzf) || return
vim "$fname"
\rm /tmp/oldfiles.txt
}
If you're having trouble executing vim on files that have ~ in their path (vim open a new blank file instead of the desired file) because fzf and vim don't expand tilde (~), here's how I do it:
export FZF_DEFAULT_OPTS=$FZF_DEFAULT_OPTS"
--bind 'ctrl-e:execute(vim -c \"execute \\\"edit\\\" expand({})\" >/dev/tty)'
"
It's trial and error, based on this.
Combining some of the other answers, here's a version that does not need a temporary file and writes to stdout (so you can pipe this into another command, or capture the output using $(...)).
vim -e -c "redir >> /dev/fd/100 | for f in v:oldfiles | silent echo substitute(f, \"^\\\\~\", \$HOME, \"g\") | endfor | redir end | q" 100>&1 &>/dev/null
This solution combines elements from other solutions, but with some improvements:
It uses some shell redirection to duplicate stdout to some free fd (100>&1) and then uses /dev/fd/100 to force writing output there. This ensures that vim actually writes to stdout rather than the terminal. Note that this can also be made to work using /dev/fd/1 (but only when omitting redir end for some reason), but then we cannot apply the next point.
It redirects stdout (and for good measure) also stderr to /dev/null, to prevent vim writing some terminal escape codes to stdout on startup, so using a different fd ensures clean output.
It uses vim in "ex" mode (vim -e) to suppress the "Vim: Warning: Output is not to a terminal" output and accompanying delay. [source]
It uses a for-loop to iterate over v:oldfiles to output just the filenames (the oldfiles command used by https://stackoverflow.com/a/70749181/740048 adds line numbers).
It uses a substitute to expand ~ in the filenames returned by vim (making the returned filenames easier to proces. Normally, shells like bash expand ~ in arguments passed to commands, but this happens only for tildes in the command typed, not tildes that result from variables or command substitution. To prevent having to rely on unsafe eval'ing later, better to expand (just) the tildes beforehand.
I also tried using the append / write combo from https://stackoverflow.com/a/60018642/740048, which worked with the /dev/fd/100 trick, but then ended up putting /dev/fd/100 in the list of oldfiles, so I did not use that approach.

Understanding sed

I am trying to understand how
sed 's/\^\[/\o33/g;s/\[1G\[/\[27G\[/' /var/log/boot
worked and what the pieces mean. The man page I read just confused me more and I tried the info sai Id but had no idea how to work it! I'm pretty new to Linux. Debian is my first distro but seemed like a rather logical place to start as it is a root of many others and has been around a while so probably is doing stuff well and fairly standardized. I am running Wheezy 64 bit as fyi if needed.
The sed command is a stream editor, reading its file (or STDIN) for input, applying commands to the input, and presenting the results (if any) to the output (STDOUT).
The general syntax for sed is
sed [OPTIONS] COMMAND FILE
In the shell command you gave:
sed 's/\^\[/\o33/g;s/\[1G\[/\[27G\[/' /var/log/boot
the sed command is s/\^\[/\o33/g;s/\[1G\[/\[27G\[/' and /var/log/boot is the file.
The given sed command is actually two separate commands:
s/\^\[/\o33/g
s/\[1G\[/\[27G\[/
The intent of #1, the s (substitute) command, is to replace all occurrences of '^[' with an octal value of 033 (the ESC character). However, there is a mistake in this sed command. The proper bash syntax for an escaped octal code is \nnn, so the proper way for this sed command to have been written is:
s/\^\[/\033/g
Notice the trailing g after the replacement string? It means to perform a global replacement; without it, only the first occurrence would be changed.
The purpose of #2 is to replace all occurrences of the string \[1G\[ with \[27G\[. However, this command also has a mistake: a trailing g is needed to cause a global replacement. So, this second command needs to be written like this:
s/\[1G\[/\[27G\[/g
Finally, putting all this together, the two sed commands are applied across the contents of the /var/log/boot file, where the output has had all occurrences of ^[ converted into \033, and the strings \[1G\[ have been converted to \[27G\[.

Linux replace ^M$ with $ in csv

I have received a csv file from a ftp server which I am ingesting into a table.
While ingesting the file I am receiving the error "File was a truncated file"
The actual reason is the data in a file contains $ and ^M$ in end of the line.
e.g :
ACT_RUN_TM, PROG_RUN_TM, US_HE_DT*^M$*
"CONFIRMED","","3600"$
How can I remove these $ and ^M$ from end of the line using linux command.
The ultimately correct solution is to transfer the file from the FTP server in text mode rather than binary mode, which does the appropriate end-of-line conversion for you. Change your download scripts or FTP application configuration to enable text transfers to fix this in future.
Assuming this is a one-shot transfer and you have already downloaded the file and just want to fix it, you can use tr(1) to translate characters. So to remove all control-M characters from a file, you can pipe through tr -d '\r'. Or if you want to replace them with control-J instead – for example you would do this if the file came from a pre-OSX Mac system — do tr '\r' '\n'.
It's odd to see ^M as not-the-last character, but:
sed -e 's/^M*\$$//g' <badfile >goodfile
Or use "sed -i" to update in-place.
(Note that "^M" is entered on the command line by pressing CTRL-V CTRL_M).
Update: It's been established that the question is wrong as the "^M$" are not in the file but displayed with VI. He actually wants to change CRLF pairs to just LF.
sed -e 's/^M$//g' <badfile >goodfile

vim | remove first few lines from a 700MB file

How do I quickly scrape off first few lines from a large file, without opening the whole file in main memory?
UPDATE
I do not want to pipe the starting x lines into another file and then cut the first few lines, I want to update the original file.
Not exactly vim, but to cut of the first 10 lines you could use
tail --lines=+10 somefile.txt > newfile.txt
tail -n+11 somefile.txt | vim -
To chop off the first 10* lines and open the file for edit, without creating a temporary file. Note that the file will have no name in vim when you open it this way. That's the only drawback.
* Note that although I used 11 in the command, this starts from line 11. So it will chop off the first 10 lines.
The original question was never actually answered here. I believe this is a solution:
sed -i 's/`head -n 500 foo.txt`//' foo.txt
This would eliminate the first 500 lines of a file without having to create a temporary file. (Actually, you might have to do head -n 499) I think it's actually quite useful as a one-liner for say, cleaning up log files, without just erasing the entire log.
$ seq 1 502 > foo.txt
$ sed -i 1,500d foo.txt
$ cat foo.txt
501
502
vim will always want/need to read in the whole file, so there's no way to do it using (only) vim. Darcara's suggestion looks good.
This process will always involve copying all but the first part of the file to another, so I don't see any way of doing it quickly.
Depending on what you can do with the file you may be better of using sed or awk for editing such a big file.
How about ..
split the original file into 2 parts. (p1: lines 0 - x) (p2: lines x+1 - n)
edit p1 since you want to edit the first x lines. We'll call it p1'
combine p1' and p2
In short
file -> p1 and p2
p1 -> p1'
p1' + p2 -> new_file
Commands
use split or cut
use vim or editor of your choice.
use cat to combine

How can I replace a specific line by line number in a text file?

I have a 2GB text file on my linux box that I'm trying to import into my database.
The problem I'm having is that the script that is processing this rdf file is choking on one line:
mismatched tag at line 25462599, column 2, byte 1455502679:
<link r:resource="http://www.epuron.de/"/>
<link r:resource="http://www.oekoworld.com/"/>
</Topic>
=^
I want to replace the </Topic> with </Line>. I can't do a search/replace on all lines but I do have the line number so I'm hoping theres some easy way to just replace that one line with the new text.
Any ideas/suggestions?
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
The tool for editing text files in Unix, is called ed (as opposed to sed, which as the name implies is a stream editor).
ed was once intended as an interactive editor, but it can also easily scripted. The way ed works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s command, which takes the same regexp that sed would. So, to change the 42nd line, you would write something like 42s/old/new/.
Here's the entire command:
FILENAME=/path/to/whereever
LINENUMBER=25462599
ed -- "${FILENAME}" <<-HERE
${LINENUMBER}s!</Topic>!</Line>!
w
q
HERE
The advantage of this is that ed is standardized, while the -i flag to sed is a proprietary GNU extension that is not available on a lot of systems.
Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.
Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).
My shell script:
#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
if (NR == line) {
print new_content;
} else {
print $0;
}
}' $3
Arguments:
first: line number you want change
second: text you want instead original line contents
third: file name
This script prints output to stdout then you need to redirect. Example:
./script.sh 5 "New fifth line text!" file.txt
You can improve it, for example, by taking care that all your arguments has expected values.

Resources