vim | remove first few lines from a 700MB file - vim

How do I quickly scrape off first few lines from a large file, without opening the whole file in main memory?
UPDATE
I do not want to pipe the starting x lines into another file and then cut the first few lines, I want to update the original file.

Not exactly vim, but to cut of the first 10 lines you could use
tail --lines=+10 somefile.txt > newfile.txt

tail -n+11 somefile.txt | vim -
To chop off the first 10* lines and open the file for edit, without creating a temporary file. Note that the file will have no name in vim when you open it this way. That's the only drawback.
* Note that although I used 11 in the command, this starts from line 11. So it will chop off the first 10 lines.

The original question was never actually answered here. I believe this is a solution:
sed -i 's/`head -n 500 foo.txt`//' foo.txt
This would eliminate the first 500 lines of a file without having to create a temporary file. (Actually, you might have to do head -n 499) I think it's actually quite useful as a one-liner for say, cleaning up log files, without just erasing the entire log.

$ seq 1 502 > foo.txt
$ sed -i 1,500d foo.txt
$ cat foo.txt
501
502

vim will always want/need to read in the whole file, so there's no way to do it using (only) vim. Darcara's suggestion looks good.
This process will always involve copying all but the first part of the file to another, so I don't see any way of doing it quickly.

Depending on what you can do with the file you may be better of using sed or awk for editing such a big file.

How about ..
split the original file into 2 parts. (p1: lines 0 - x) (p2: lines x+1 - n)
edit p1 since you want to edit the first x lines. We'll call it p1'
combine p1' and p2
In short
file -> p1 and p2
p1 -> p1'
p1' + p2 -> new_file
Commands
use split or cut
use vim or editor of your choice.
use cat to combine

Related

How to use m with the ed function in a Bash Script [duplicate]

I just need to move a line up in sed. I can select the line with
sed -i '7s///'
I need to move line 7 up 2 lines so it will be line 5.
I can't find anything on the internet to do this without complicated scripts, I can't find a simple solution of moving a specific line a specific number of times.
seq 10|sed '5{N;h;d};7G'
when up to line 5 append next line(line 6) into pattern space then save them into hold space and delete them from pattern space; up to line 7 then append the hold space content("5\n6") behind the line 7; now, pattern space is "7\n5\n6";finally,sed will print the pattern space at the end of current cycle by default(if no "-n" parameter)
ed is better at this, since it has a "move" command that does exactly what you want. To move line 7 to be the line after line 4, just do 7m4. ed doesn't write the data back by default, so you need to explicitly issue a w command to write the data:
printf '7m4\nw\n' | ed input
Although it is perhaps better to use a more modern tool:
ex -s -c 7m4 -c w -c q input

How to use sed command to delete lines without backup file?

I have large file with size of 130GB.
# ls -lrth
-rw-------. 1 root root 129G Apr 20 04:25 syslog.log
So I need to reduce file size by deleting line which starts with "Nov 2" , So I have given the following command,
sed -i '/Nov 2/d' syslog.log
So I can't edit file using VIM editor also.
When I trigger SED command , its creating backup file also. But I don't have much space in root. Please try to give alternate solution to delete particular line from this file without increasing space in server.
It does not create a real backup file. sed is a stream editor. When applied to a file with option -i it will stream that file through the sed process, write the output to a new file (a temporary one), when everything is done, it will rename the new file to the original name.
(There are options to create backup files also, but you didn't give them, so I won't mention that further.)
In your case you have a very large file and don't want to create any copy, however temporary. For this you need to open the file for reading and writing at the same time, then your sed process can overwrite the original. After this, you will have to truncate the file at the end of the writing.
To demonstrate how this can be done, we first perform a test case.
Create a test file, containing lots of lines:
seq 0 999999 > x
Now, lets say we want to remove all lines containing the digit 4:
grep -v 4 1<>x <x
This will open the file for reading and writing as STDOUT (1), and for reading as STDIN. The grep command will read all lines and will output only the lines not containing a 4 (option -v).
This will effectively overwrite the beginning of the original file.
You will not know how long the output is, so after the output the original contents of the file will appear:
…
999991
999992
999993
999995
999996
999997
999998
999999
537824
537825
537826
537827
537828
537829
…
You can use the Unix tool truncate to shorten your file manually afterwards. In a real scenario you will have trouble finding the right spot for this, so it makes sense to count the number of bytes written (using wc):
(Don't forget to recreate the original x for this test.)
(grep -v 4 <x | tee /dev/stderr 1<>x) |& wc -c
This will preform the step above and additionally print out the number of bytes written to the terminal, in this example case the output will be 3653658. Now use truncate:
truncate -s 3653658 x
Now you have the result you want.
If you want to do this in a script, i. e. without interaction, you can use this:
length=$((grep -v 4 <x | tee /dev/stderr 1<>x) |& wc -c)
truncate -s "$length" x
I cannot guarantee that this will work for files >2GB or >4GB on your machine; depending on your operating system (32bit?) and the versions of the installed tools you might run into largefile issues. I'd perform tests with large files first (>4GB as this is typically a limit for many things) and then cross your fingers and give it a try :)
Some caveats you have to keep in mind:
Of course, nobody is supposed to append log entries to that log file while the procedure is running.
Also, any abort during the running of the process (power failure, signal caught, etc.) will leave the file in an undefined state. But re-running the command again after such a mishap will in most cases produce the correct output; some lines might be doubled, but not more than a single line should be corrupted then.
The output must be smaller than the input, of course, otherwise the writing will overtake the reading, corrupting the whole result so that lines which should be there will be missing (or truncated at the start).

Linux command to replace string in HUGE file with another string

I have a huge file (8GB), I want replace on the first 30 lines the String LATIN1 with UTF-8 what is the most efficient method? Means exist there a way to use probably sed but to quit after parsed first 30 lines.
VIM was not able to save the file in 3 hours.
The problem is that in the event of a replacement, all programs will make a copy of the file with the substitution in place in order to replace the original file ultimately -- they don't want to risk losing the original for obvious reasons.
With perl, you can do this in a one-liner, but that doesn't make it any shorter (well, it probably does compared to vim, since vim preserves history in yet another file, which perl doesn't):
perl -pi -e 's,\bLATIN1\b,UTF-8,g if $. <= 30' thefile
With sed, you can quit using q:
sed -e 's/LATIN1/UTF-8/g' -e 30q
untested, but I think ed will edit the file in-place without writing to a temp file.
ed yourBigFile << END
1,30s/LATIN1/UTF-8/g
w
q
END

Resolving patch conflicts manually [duplicate]

I'm having trouble applying a patch to my source tree, and it's not the usual -p stripping problem. patch is able to find the file to patch.
Specifically, my question is how to read / interpret the .rej files patch creates when it fails on a few hunks. Most discussions of patch/diff I have seen don't include this.
A simple example:
$ echo -e "line 1\nline 2\nline 3" > a
$ sed -e 's/2/b/' <a >b
$ sed -e 's/2/c/' <a >c
$ diff a b > ab.diff
$ patch c < ab.diff
$ cat c.rej
***************
*** 2
- line 2
--- 2 -----
+ line b
As you can see: The old file contains line 2 and the new file should contain line b. However, it actually contains line c (that's not visible in the reject file).
In fact, the easiest way to resolve such problems is to take the diff fragment from the .diff/.patch file, insert it at the appropriate place in the file to be patched and then compare the code by hand to figure out, what lines actually cause the conflict.
Or - alternatively: Get the original file (unmodified), patch it and run a three way merge on the file.
Wiggle is a great tool for applying .rej files when patch does not succeed.
I'm not an expert on dealing with patch files, but I'd like to add some clarity on how to read them based on my understanding of the information they contain.
Your .rej files will tell you:
the difference between the original and the .rej file;
where the problem code starts in the original file, how many lines it goes on
for in that file;
and where the code starts in the new file, and how many lines it goes on for in that file.
So given this message, noted in the beginning of my .rej file:
diff a/www/js/app.js b/www/js/app.js (rejected hunks)
## -4,12 +4,24 ##
I see that for my problem file (www/js/app), the difference between the original (noted as a/www/js/app.js on the first line) and the .rej file (noted as b/www/js/) starts on line 4 of the original and goes on for 12 lines (the part before the comma in ## -4,12, +4,24 ## on line two), and starts on line 4 of the new version of the file and goes on for 24 lines (the part after the comma in ## -4,12, +4,24 ##.
For further information, see the excellent overview of patch files (containing the information I note above, as well as details on lines added and/or between file versions) at http://blog.humphd.org/vocamus-906/.
Any corrections or clarifications welcome of course.

How can I replace a specific line by line number in a text file?

I have a 2GB text file on my linux box that I'm trying to import into my database.
The problem I'm having is that the script that is processing this rdf file is choking on one line:
mismatched tag at line 25462599, column 2, byte 1455502679:
<link r:resource="http://www.epuron.de/"/>
<link r:resource="http://www.oekoworld.com/"/>
</Topic>
=^
I want to replace the </Topic> with </Line>. I can't do a search/replace on all lines but I do have the line number so I'm hoping theres some easy way to just replace that one line with the new text.
Any ideas/suggestions?
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
The tool for editing text files in Unix, is called ed (as opposed to sed, which as the name implies is a stream editor).
ed was once intended as an interactive editor, but it can also easily scripted. The way ed works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s command, which takes the same regexp that sed would. So, to change the 42nd line, you would write something like 42s/old/new/.
Here's the entire command:
FILENAME=/path/to/whereever
LINENUMBER=25462599
ed -- "${FILENAME}" <<-HERE
${LINENUMBER}s!</Topic>!</Line>!
w
q
HERE
The advantage of this is that ed is standardized, while the -i flag to sed is a proprietary GNU extension that is not available on a lot of systems.
Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.
Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).
My shell script:
#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
if (NR == line) {
print new_content;
} else {
print $0;
}
}' $3
Arguments:
first: line number you want change
second: text you want instead original line contents
third: file name
This script prints output to stdout then you need to redirect. Example:
./script.sh 5 "New fifth line text!" file.txt
You can improve it, for example, by taking care that all your arguments has expected values.

Resources