Here you can see an output of "cat tcl.log":
Discovered serial numbers for slot 1 olt 1:
sernoID Vendor Serial Number sernoID Vendor Serial Number
5 ZNTS 032902A6
And that's how it looks in VIM:
^MDiscovered serial numbers for slot 1 olt 1:
^MsernoID Vendor Serial Number sernoID Vendor Serial Number
^M<SPACE> for next page, <CR> for next line, A for all, Q to quit^H ^H^H ^H^...
5 ZNTS 032902A6
I don't mind the ^M and ^H characters, I know how to get rid of them. The problem is that for some reason my C++ program (unlike cat) is seeing the line starting with "< SPACE >". What can I do about it? I'm using the fstream library to read the log file and I want it to ignore the line I mentioned. I tried to do something like this:
std::ofstream logFinal("logFinal");
std::ifstream log("tcl.log");
std::string temp;
while (std::getline(log, temp)){
if (temp.find("SPACE") != std::string::npos){
temp = "";
}
logFinal << temp << std::endl;
}
But for some reason it doesn't find any "SPACE" in the temp variable. It looks like the "< SPACE >" is some kind of a special character of which I've never heard about.
You're obtaining that log file from/via some sort of program that does paging. (It could be buried inside things; these things happen.) That paging program prints a message like this at the end of a page:
<SPACE> for next page, <CR> for next line, A for all, Q to quit
The <SPACE> is just part of some message with human-readable text; it's seven very ordinary characters. HOWEVER, the ^H that follow it are more interesting, as they're really backspace characters; it's where the preceding characters are deleted again to make way for the next line of real output.
The easiest way (assuming you're on — or have easy access to — a Unix/Linux system) is to feed that log file through col -b (the col program with the -b option, to do backspace elimination). Check out this little cut-n-paste from a shell session:
bash$ echo -e 'abc\b\b\bdef'
def
bash$ echo -e 'abc\b\b\bdef' | od -c
0000000 a b c \b \b \b d e f \n
0000012
bash$ echo -e 'abc\b\b\bdef' | col -b | od -c
0000000 d e f \n
0000004
(The \b should be the same as ^H in your log file.)
Related
I have a file that contains some PCL sequences. I have this sequence at the end of the file (hex):
461b 2670 3158 0a F.&p1X.
I want to remove the sequence: <Esc>&p1X including the character that follows. In 99% of cases, LF follows the sequence.
I tried this command:
sed -b 's/\o33&p[0-9]X$//Mg' ~/test.txt >test2.txt
However, it appends LF at the end of test2.txt. Also, if, instead of $ I specify . it doesn't match the line anymore.
If you want to play with this, generate the input file using this command:
echo -e "SomeString\033&p1X" > ~/test.txt
The redirect appends an LF char at the end.
Thanks
If I have understood well you know for sure that your file contains that sequence of characters at the end. If this is the case I would simply truncate the last six bytes. It will work regardless the very last character being new-line or whatever you want...
Example:
$ echo -e "SomeString\033&p1X" > test.txt
$ od -c test.txt
0000000 S o m e S t r i n g 033 & p 1 X \n
0000020
$ truncate -s -6 test.txt
$ od -c test.txt
0000000 S o m e S t r i n g
0000012
This is also very efficient as it will use the system call truncate().
This seems to do the trick based on this thread:
perl -pi -e 's/\x1b&p[0-9]X\n//g' ~/test.txt
(I am a perl beginner as well - any comments would be appreciated).
I'm trying to make a script to remove this characters (/r/n) that windows puts. BUT ONLY if they are between this ( " ) why this?
because the dump file puts this characters I don't know why.
and why between quotes? because it only affect me if they are chopping my result
For Example. "this","is","a","result","from","database"
the problem :
"this","is","a","result","from","da
tabase"
[EDIT]
Thanks to the answer of #Cyrus I got something like this
, but it gets bad flag in substitute command '}' I'm on MAC OSX
Can you help me?
Thanks
OS X uses a different sed than the one that's typically installed in Linux.
The big differences are that sequences like \r and \n don't get expanded or used as part of the expression as you might expect, and you tend to need to separate commands with semicolons a little more.
If you can get by with a sed one-liner that implements a rule like "Remove any \r\n on lines containing quotes", it will certainly simplify your task...
For my experiments, I used what I infer is your sample input data:
$ od -c input.txt
0000000 F o r E x a m p l e . " t h
0000020 i s " , " i s " , " a " , " r e
0000040 s u l t " , " f r o m " , " d a
0000060 t a \r \n b a s e " \n
0000072
First off, a shell-only solution might be to use smaller tools that are built in to the operating system. For example, here's a one-liner:
od -A n -t o1 -v input.txt | rs 0 1 | while read n; do [ $n -eq 015 ] && read n && continue; printf "\\$n"; done
Broken out for easier reading, here's what this looks like:
od -A n -t o1 -v input.txt | rs 0 1 - convert the file into a stream of ocal numbers
| while read n; do - step through the numbers...
[ $n -eq 015 ] && - if the current number is 15 (i.e. octal for a Carriage Return)
read n - read a line (thus skipping it),
&& continue - and continue to the next octal number (thus skipping the newline after a CR)
printf "\\$n"; done - print the current octal number.
This kind of data conversion and stream logic works nicely in a pipeline, but is a bit harder to implement in sed, which only knows how to deal with the original input rather than its converted form.
Another bash option might be to use conditional expressions matching the original lines of input:
while read line; do
if [[ $line =~ .*\".*$'\r'$ ]]; then
echo -n "${line:0:$((${#line}-1))}"
else
echo "$line"
fi
done < input.txt
This walks through text, and if it sees a CR, it prints everything up to and not including it, with no trailing newline. For all other lines, it just prints them as usual. The result is that lines that had a carriage return are joined, other lines are not.
From sed's perspective, we're dealing with two input lines, the first of which ends in a carriage return. The strategy for this would be to search for carriage returns, remove them and join the lines. I struggled for a while trying to come up with something that would do this, then gave up. Not to say it's impossible, but I suspect a generally useful script will be lengthy (by sed standards).
I would like to diff two very large files (multi-GB), using linux command line tools, and see the line numbers of the differences. The order of the data matters.
I am running on a Linux machine and the standard diff tool gives me the "memory exhausted" error. -H had no effect.
In my application, I only need to stream the diff results. That is, I just want to visually look at the first few differences, I don't need to inspect the entire file. If there are differences, a quick glance will tell me what is wrong.
'comm' seems well suited to this, but it does not display line numbers of the differences.
In general, my multi-GB files only have a few hundred lines that are different, the rest of the file is the same.
Is there a way to get comm to dump the line number? Or a way to make diff run without loading the entire file into memory? (like cutting the input files into 1k blocks, without actually creating a million 1k-files in my filesystem and cluttering everything up)?
I won't use comm, but as you said WHAT you need, in addition to HOW you thought you should do it, I'll focus on the "WHAT you need" instead :
An interesting way would be to use paste and awk : paste can show 2 files "side by side" using a separator. If you use \n as separator, it display the 2 files with line 1 of each , followed by line 2 of each etc.
So the script you could use could be simply (once you know that there are the same number of lines in each files) :
paste -d '\n' /tmp/file1 /tmp/file2 | awk '
NR%2 { linefirstfile=$0 ; }
!(NR%2) { if ( $0 != linefirstfile )
{ print "line",NR/2,": "; print linefirstfile ; print $0 ; } }'
(Interrestingly, this solution will allow be easily extended to do a diff of N files in a single read, whatever the sizes of the N files are ... just adding a check that all have the same amount of lines before doing the comparison steps (otherwise "paste" will in the end show only lines from the bigger files))
Here is a (short) example, to show how it works:
$ cat > /tmp/file1
A
C %FORGOT% fmsdflmdflskdf dfldksdlfkdlfkdlkf
E
$ cat > /tmp/file2
A
C sdflmsdflmsdfsklmdfksdmfksd fmsdflmdflskdf dfldksdlfkdlfkdlkf
E
$ paste -d '\n' /tmp/file1 /tmp/file2
A
A
C %FORGOT% fmsdflmdflskdf dfldksdlfkdlfkdlkf
C sdflmsdflmsdfsklmdfksdmfksd fmsdflmdflskdf dfldksdlfkdlfkdlkf
E
E
$ paste -d '\n' /tmp/file1 /tmp/file2 | awk '
NR%2 { linefirstfile=$0 ; }
!(NR%2) { if ( $0 != linefirstfile )
{ print "line",NR/2,": "; print linefirstfile ; print $0 ; } }'
line 2 :
C %FORGOT% fmsdflmdflskdf dfldksdlfkdlfkdlkf
C sdflmsdflmsdfsklmdfksdmfksd fmsdflmdflskdf dfldksdlfkdlfkdlkf
If it happens that the files don't have the same amount of lines, then you can add first a check of the number of line, comparing $(wc -l /tmp/file1) and $(wc -l /tmp/file2) , and only do the past...|awk if they have the same amount of line, to ensure the "paste" works correctly by always having one line of each! (But of course, in that case, there will be one (fast!) entire read of each file...)
You can easily adjust it to display exactly as you need it to. And you could quit after the Nth difference (either automatically, with a counter in the awk loop, or by pressing CTRL-C when you saw enough)
Which versions of diff have you tried? GNU diff has a "--speed-large-files" which may help.
The comm tool assumes the lines are sorted.
Does anyone know how to replace line a with line b and line b with line a in a text file using the sed editor?
I can see how to replace a line in the pattern space with a line that is in the hold space (i.e., /^Paco/x or /^Paco/g), but what if I want to take the line starting with Paco and replace it with the line starting with Vinh, and also take the line starting with Vinh and replace it with the line starting with Paco?
Let's assume for starters that there is one line with Paco and one line with Vinh, and that the line Paco occurs before the line Vinh. Then we can move to the general case.
#!/bin/sed -f
/^Paco/ {
:notdone
N
s/^\(Paco[^\n]*\)\(\n\([^\n]*\n\)*\)\(Vinh[^\n]*\)$/\4\2\1/
t
bnotdone
}
After matching /^Paco/ we read into the pattern buffer until s// succeeds (or EOF: the pattern buffer will be printed unchanged). Then we start over searching for /^Paco/.
cat input | tr '\n' 'ç' | sed 's/\(ç__firstline__\)\(ç__secondline__\)/\2\1/g' | tr 'ç' '\n' > output
Replace __firstline__ and __secondline__ with your desired regexps. Be sure to substitute any instances of . in your regexp with [^ç]. If your text actually has ç in it, substitute with something else that your text doesn't have.
try this awk script.
s1="$1"
s2="$2"
awk -vs1="$s1" -vs2="$s2" '
{ a[++d]=$0 }
$0~s1{ h=$0;ind=d}
$0~s2{
a[ind]=$0
for(i=1;i<d;i++ ){ print a[i]}
print h
delete a;d=0;
}
END{ for(i=1;i<=d;i++ ){ print a[i] } }' file
output
$ cat file
1
2
3
4
5
$ bash test.sh 2 3
1
3
2
4
5
$ bash test.sh 1 4
4
2
3
1
5
Use sed (or not at all) for only simple substitution. Anything more complicated, use a programming language
A simple example from the GNU sed texinfo doc:
Note that on implementations other than GNU `sed' this script might
easily overflow internal buffers.
#!/usr/bin/sed -nf
# reverse all lines of input, i.e. first line became last, ...
# from the second line, the buffer (which contains all previous lines)
# is *appended* to current line, so, the order will be reversed
1! G
# on the last line we're done -- print everything
$ p
# store everything on the buffer again
h
Is there a command to determine length of a longest line in vim? And to append that length at the beginning of the file?
Gnu's wc command has a -L --max-line-length option which prints out the max line length of the file. See the gnu man wc. The freebsd wc also has -L, but not --max-line-length, see freebsd man wc.
How to use these from vim? The command:
:%!wc -L
Will filter the open file through wc -L and make the file's contents the maximum line length.
To retain the file contents and put the maximum line length on the first line do:
:%yank
:%!wc -L
:put
Instead of using wc, Find length of longest line - awk bash describes how to use awk to find the length of the longest line.
Ok, now for a pure Vim solution. I'm somewhat new to scripting, but here goes. What follows is based on the FilterLongestLineLength function from textfilter.
function! PrependLongestLineLength ( )
let maxlength = 0
let linenumber = 1
while linenumber <= line("$")
exe ":".linenumber
let linelength = virtcol("$")
if maxlength < linelength
let maxlength = linelength
endif
let linenumber = linenumber+1
endwhile
exe ':0'
exe 'normal O'
exe 'normal 0C'.maxlength
endfunction
command PrependLongestLineLength call PrependLongestLineLength()
Put this code in a .vim file (or your .vimrc) and :source the file. Then use the new command:
:PrependLongestLineLength
Thanks, figuring this out was fun.
If you work with tabulations expanded, a simple
:0put=max(map(getline(1,'$'), 'len(v:val)'))
is enough.
Otherwise, I guess we will need the following (that you could find as the last example in :h virtcol(), minus the -1):
0put=max(map(range(1, line('$')), "virtcol([v:val, '$'])-1"))
:!wc -L %
rather than
:%!wc -L
To append that length at the beginning of the file:
:0r !wc -L % | cut -d' ' -f1
Here is a simple, hence easily-remembered approach:
select all text: ggVG
substitute each character (.) with "a": :'<,'>s/./a/g
sort, unique: :'<,'>sort u
count the characters in the longest line (if too many characters to easily count, just look at the column position in the Vim status bar)
I applied this to examine Enzyme Commission (EC) numbers, prior to making a PostgreSQL table:
I copied the ec_numbers data to Calc, then took each column in Neovim, replaced each character with "a",
:'<,'>s/./a/g
and then sorted for unique lines
:'<,'>sort u
aaaaaaa
aaaaaaaa
aaaaaaaaa
aaaaaaaaaa
aaaaaaaaaaa
... so the longest EC number entry [x.x.x.x] is 11 char, VARCHAR(11).
Similarly applied to the Accepted Names, we get
aaaaa
aaaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
i.e. the longest name is 147 char: VARCHAR(200) should cover it!
For neovim users:
local lines = vim.api.nvim_buf_get_lines(bufnr, 0, -1, false)
local width = #(lines[1])
for _, line in ipairs(lines) do
if #line > width then
width = #line
end
end