Related
I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program 'Popoolution' allows us to comment out the "bad" records with an asterisk. Our problem is that we cannot find a text editor that will load the data so that we can comment out the bad records. Any suggestions? We have both Windows and Linux boxes.
UPDATE: More information
The program Popoolution (https://code.google.com/p/popoolation/) crashes when it reaches a "bad" record giving us the line number that we can then comment out. Specifically, we get a message from Perl that says "F#€%& Scaffolding". The manual suggests we can just use an asterisk to comment out the bad line. Sadly, we will have to repeat this process many times...
One more thought... Is there an approach that would allow us to add the asterisk to the line without opening the entire text file at once. This could be very useful given that we will have to repeat the process an unknown number of times.
Based on your update:
One more thought... Is there an approach that would allow us to add
the asterisk to the line without opening the entire text file at once.
This could be very useful given that we will have to repeat the
process an unknown number of times.
Here you have an approach: If you know the line number, you can add an asterisk in the beginning of that line saying:
sed 'LINE_NUMBER s/^/*/' file
See an example:
$ cat file
aa
bb
cc
dd
ee
$ sed '3 s/^/*/' file
aa
bb
*cc
dd
ee
If you add -i, the file will be updated:
$ sed -i '3 s/^/*/' file
$ cat file
aa
bb
*cc
dd
ee
Even though I always think it's better to do a redirection to another file
sed '3 s/^/*/' file > new_file
so that you keep intact your original file and save the updated one in new_file.
If you are required to have a person mark these records manually with a text editor, for whatever reason, you should probably use split to split the file up into manageable pieces.
split -a4 -d -l100000 hugefile.txt part.
This will split the file up into pieces with 100000 lines each. The names of the files will be part.0000, part.0001, etc. Then, after all the files have been edited, you can combine them back together with cat:
cat part.* > new_hugefile.txt
The simplest solution is to use a stream-oriented editor such as sed. All you need is to be able to write one or more regular expression(s) that will identify all (and only) the bad records. Since you haven't provided any details on how to identify the bad records, this is the only possible answer.
A basic pattern in R is to read the data in chunks, edit, and write out
fin = file("fin.txt", "r")
fout = file("fout.txt", "w")
while (length(txt <- readLines(fin, n=1000000))) {
## txt is now 1000000 lines, add an asterix to problem lines
## bad = <create logical vector indicating bad lines here>
## txt[bad] = paste0("*", txt[bad])
writeLines(txt, fout)
}
close(fin); close(fout)
While not ideal, this works on Windows (implied by the mention of Notepad++) and in a language that you are presumably familiar (R). Using sed (definitely the appropriate tool in the long run) would require installation of additional software and coming up to speed with sed.
I know there are ways to automatically set the width of text in vim using set textwidth (like Vim 80 column layout concerns). What I am looking for is something similar to = (the indent line command) but to wrap to 80. The use case is sometimes you edit text with textwidth and after joining lines or deleting/adding text it comes out poorly wrapped.
Ideally, this command would completely reorganize the lines I select and chop off long lines while adding to short ones. An example:
long line is long!
short
After running the command (assuming the wrap was 13 cols):
long line is
long! short
If this isn't possible with a true vim command, perhaps there is a command-line program which does this that I can pipe the input to?
After searching I found this reference which has some more options: http://www.cs.swarthmore.edu/help/vim/reformatting.html
Set textwidth to 80 (:set textwidth=80), move to the start of the file (can be done with Ctrl-Home or gg), and type gqG.
gqG formats the text starting from the current position and to the end of the file. It will automatically join consecutive lines when possible. You can place a blank line between two lines if you don't want those two to be joined together.
Michael's solution is the key, but I most often find I want to reformat the rest of the
current paragraph; for this behavior, use gq}.
You can use gq with any movement operators. For example, if you only want to reformat to the end of the current line (i.e. to wrap the line that your cursor is on) you can use gq$
You can also reformat by selecting text in visual mode (using `v and moving) and then typing gq.
There are other options for forcing lines to wrap too.
If you want vim to wrap your lines while you're inserting text in them instead of having to wait till the end to restructure the text, you will find these options useful:
:set textwidth=80
:set wrapmargin=2
(Don't get side-tracked by wrap and linebreak, which only reformat the text displayed on screen, and don't change the text in the buffer)
Thanks to a comment from DonaldSmith I found this, as the textwidth option didn't reformat my long line of text (I was converting playing with hex-to-byte conversions):
:%!fold -w 60
That reformated the whole file (which was one line for me) into lines of length 60.
If you're looking for a non-Vim way, there's always the UNIX commands fmt and par.
Notes:
I can't comment on Unicode, it may or may not behave differently.
#nelstrom has already mentioned using par in his webcast.
Here's how we would use both for your example.
$ echo -e 'long line is long!\nshort' > 3033423.txt
$ cat 3033423.txt
long line is long!
short
$ fmt -w 13 3033423.txt
long line is
long! short
$ par 13gr 3033423.txt
long line is
long! short
To use from inside Vim:
:%! fmt -w 13
:%! par 13gr
You can also set :formatprg to par or fmt and override gq. For more info, call :help formatprg inside Vim.
Almost always I use gq in visual mode. I tell my students it stands for "Gentlemens' Quarterly," a magazine for fastidious people.
I need to read through some gigantic log files on a Linux system. There's a lot of clutter in the logs. At the moment I'm doing something like this:
cat logfile.txt | grep -v "IgnoreThis\|IgnoreThat" | less
But it's cumbersome -- every time I want to add another filter, I need to quit less and edit the command line. Some of the filters are relatively complicated and may be multi-line.
I'd like some way to apply filters as I am reading through the log, and a way to save these filters somewhere.
Is there a tool that can do this for me? I can't install new software so hopefully it's something that would already be installed -- e.g., less, vi, something in a Python or Perl lib, etc.
Changing the code that generates the log to generate less is not an option.
Use &pattern command within less.
From the man page for less
&pattern
Display only lines which match the pattern; lines which do not
match the pattern are not displayed. If pattern is empty (if
you type & immediately followed by ENTER), any filtering is
turned off, and all lines are displayed. While filtering is in
effect, an ampersand is displayed at the beginning of the
prompt, as a reminder that some lines in the file may be hidden.
Certain characters are special as in the / command:
^N or !
Display only lines which do NOT match the pattern.
^R Don't interpret regular expression metacharacters; that
is, do a simple textual comparison.
Try the multitail tool - as well as letting you view multile logs at once, I'm pretty sure it lets you apply regex filters interactively.
Based on ghostdog74's answer and the less manpage, I came up with this:
~/.bashrc:
export LESSOPEN='|~/less-filter.sh %s'
export LESS=-R # to allow ANSI colors
~/less-filter.sh:
#!/bin/sh
case "$1" in
*logfile*.log*) ~/less-filter.sed < $1
;;
esac
~/less-filter.sed:
/deleteLinesLikeThis/d # to filter out lines
s/this/that/ # to change text on lines (useful to colorize using ANSI escapes)
Then:
less logfileFooBar.log.1 -- applies the filter applies automatically.
cat logfileFooBar.log.1 | less -- to see the log without filtering
This is adequate for now but I would still like to be able to edit the filters on the fly.
see the man page of less. there are some options you can use to search for words for example. It has line editing mode as well.
There's an application by Casstor Software Solutions called LogFilter (www.casstor.com) that can edit Windows/Mac/Linux text files and can easily perform file filtering. It supports multiple filters as well as regular expressions. I think it might be what you're looking for.
I have the requirement of separating an ASCII document into pages of max length 58 lines per page. At the bottom of each page there is a 3 line footer. I'm not aware of any pagination abilities within Vim that would accomplish this.
Is there a good way to do this with Vim? Perhaps highlighting every 58th line or something of the sort.
N.B. I'm seeing answers involving using a separate tool to do this; which I have thought of. What I'm interested in is a Vim solution.
Thanks.
The proper tool you're looking for is very likely a2ps.
a2ps --lines-per-page 58 --footer=footer_text document.txt
It's possible in vim as a script. Put the following in a file and :source it while the file to change is open. The s:footer list are the lines to insert after each run of 58 lines.
let s:footer = ["","Footer",""]
let s:line = 0
while s:line <= line("$") - 58
let s:line = s:line + 58
call append(s:line, s:footer)
let s:line = s:line + len(s:footer)
endwhile
Why is it important to use vim? You could use split and cat more efficiently.
Assuming your original file is called file and you have a file, footer, created that includes your footer text.
$ split -l 58 file file_parts
$ for i in file_parts*; do cat $i footer > $i.footered; done
$ cat file_parts*.footered > file.footered
file.footered would have your original file with the contents of footer inserted at every 58th line.
This is assuming you want it all back in the original file. If you don't, then the resulting file_parts*.footered files would be the already separated pages so you could skip the last step.
The two most effective ways for doing that in Vim are a script (like
#Geoff has already suggested) and a substitution command, like
:%s#\%(.*\n\)\{58}#\0---\rfooter\r---\r#
A macro (as suggested in a comment to the question) is the slowest,
a script is the fastest. A substitution command is slower than script,
but much faster than a macro.
So probably substitution is the best Vim-only solution unless its
performance is unacceptable. Only in that case, I think, it is worth
writing a script.
You're probably trying to use the wrong tool for this. You could do it much easier programmatically, for example with this simple Perl oneliner:
perl -pe'print "your\nfooter\nhere\n" unless $. % 58' inputfilename > outputfilename
A recursive macro might work. Experiment with the following (position the cursor on the first character of the first line and switch to normal mode):
qqq
qq
57j
:read footer.txt
3j
#q
q
Note that the register to which you record the macro must be cleared (qqq) and that you must not use tab-completion when reading the footer-file (:read footer.txt).
You can then use the macro (normal mode):
#q
How can I add line numbers to a range of lines in a file opened in Vim? Not as in :set nu—this just displays line numbers—but actually have them be prepended to each line in the file?
With
:%s/^/\=line('.')/
EDIT: to sum up the comments.
This command can be tweaked as much as you want.
Let's say you want to add numbers in front of lines from a visual selection (V + move), and you want the numbering to start at 42.
:'<,'>s/^/\=(line('.')-line("'<")+42)/
If you want to add a string between the number and the old text from the line, just concatenate (with . in VimL) it to the number-expression:
:'<,'>s/^/\=(line('.')-line("'<")+42).' --> '/
If you need this to sort as text, you may want to zero pad the results, which can be done using printf for 0001, 0002 ... instead of 1, 2... eg:
:%s/^/\=printf('%04d', line('.'))/
Anyway, if you want more information, just open vim help: :h :s and follow the links (|subreplace-special|, ..., |submatch()|)
cat -n adds line numbers to its input. You can pipe the current file to cat -n and replace the current buffer with what it prints to stdout. Fortunately this convoluted solution is less than 10 characters in vim:
:%!cat -n
Or, if you want just a subselection, visually select the area, and type this:
:!cat -n
That will automatically put the visual selection markers in, and will look like this after you've typed it:
:'<,'>!cat -n
In order to erase the line numbers, I recommend using control-v, which will allow you to visually select a rectangle, you can then delete that rectangle with x.
On a GNU system: with the external nl binary:
:%!nl
With Unix-like environment, you can use cat or awk to generate a line number easily, because vim has a friendly interface with shell, so everything work in vim as well as it does in shell.
From Vim Tip28:
:%!cat -n
or
:%!awk '{print NR,$0}'
But, if you use vim in MS-DOS, of win9x, win2000, you loss these toolkit.
here is a very simple way to archive this only by vim:
fu! LineIt()
exe ":s/^/".line(".")."/"
endf
Or, a sequence composed with alphabet is as easy as above:
exe "s/^/".nr2char(line("."))."/"
You can also use a subst:
:g/^/exe ":s/^/".line(".")."^I/"
You can also only want to print the lines without adding them to the file:
"Sometimes it could be useful especially be editing large source files to print the line numbers out on paper.
To do so you can use the option :set printoptions=number:y to activate and :set printoptions=number:n to deactivate this feature.
If the line number should be printed always, place the line set printoptions=number:y in the vimrc."
First, you can remove the existing line numbers if you need to:
:%s/^[0-9]*//
Then, you can add line numbers. NR refers to the current line number starting at one, so you can do some math on it to get the numbering you want. The following command gives you four digit line numbers:
:%!awk '{print 1000+NR*10,$0}'
The "VisIncr" plugin is good for inserting columns of incrementing numbers in general (or letters, dates, roman numerals etc.). You can control the number format, padding, and so on. So insert a "1" in front of every line (via :s or :g or visual-block insert), highlight that column in visual-block mode, and run one of the commands from the plugin.
If someone wants to put a tab (or some spaces) after inserting the line numbers using the this excellent answer, here's a way. After going into the escape mode, do:
:%s/^/\=line('.').' '/
^ means beginning of a line and %s is the directive for substitution. So, we say that put a line number at the beginning of each line and add 4 spaces to it and then put whatever was the contents of the line before the substitution, and do this for all lines in the file.
This will automatically substitute it. Alternatively, if you want the command to ask for confirmation from you, then do:
:%s/^/\=line('.').' '/igc
P.S: power of vim :)
The best reply is done in a duplicate question.
In summary:
with CTRL-V then G I 0 You can insert a column of zero.
Then select the whole column and increment:
CTRL-V g CTRL-A
See also: https://vim.fandom.com/wiki/Making_a_list_of_numbers#Incrementing_selected_numbers