I need to remove the unique lines and keep the duplicates in my text file(read the articles written to remove duplicate lines but I want to do the opposite). Is there any way I could do that using expressions or textfx?
E.g:
file1.txt
hello
world
hello
After operation, output should be
hello
hello
Thanks in advance
In the Replace dialogue:
Find:
^(.+)\r?\n(?!(.|\r?\n)*\1)
Replace:
*leave empty!*
Options:
Select radio button "Regular Expression"
Leave checkbox ". matches newline" unselected
Pros:
Duplicate line doesn't need to be immediately after the 1st occurrence
Cons:
If a line appears x times in your data, after the regex x-1 occurrences will be left and not x as asked in OP.
This finds all lines followed by a line repetition (it does NOT find the last line, though):
.+\r\n(?=(.+\r\n)\1)
000000 111111 22
This matches a non-empty line 0, but only if it is followed by (a non-empty line \1, which is followed by \1).
Note that this assumes \r\n (Windows) line separations. On a Unix text file, just \n, on a Mac text file, just \r.
In the search box, mark Regular expression, unmark . matches newline, Replace with = "".
Example:
"Zulu
Alpha
Alpha
Bravo
Charlie
Charlie
Delta
Echo
Echo
Foxtrott
"
(file ends with empty line)
-->
"Alpha
Alpha
Charlie
Charlie
Echo
Echo
Foxtrott
"
Related
I'm trying to insert a string in multiple text files at a random line number. before adding the string in the text files i want to add a newline.
For example, a text file has 4 paragraphs.
paragraph 1
paragraph 2
paragraph 3
paragraph 4
I want the output to be
paragraph 1
STRING
paragraph 2
paragraph 3
paragraph 4
My code is working fine, but its not adding the empty newline before the string.
$ for i in *.txt; do sed -i "$(shuf -n 1 -e 2 4 6)i \n\rSTRING \n\r" $i ; done
The i command is actually i\, from the GNU manual:
'i\'
'TEXT'
insert TEXT before a line.
So the backslash before the n is "eaten" by the i command. Add an extra backslash and it should work.
I want to remove certain lines from a tab-delimited file and write output to a new file.
a b c 2017-09-20
a b c 2017-09-19
es fda d 2017-09-20
es fda d 2017-09-19
The 4th column is Date, basically I want to keep only lines that has 4th column as "2017-09-19" (keep line 2&4) and write to a new file. The new file should have same format as the raw file.
How to write the linux command for this example?
Note: The search criteria should be on the 4th field as I have other fields in the real data and possibly have same value as 4th field.
With awk:
awk 'BEGIN{OFS="\t"} $4=="2017-09-19"' file
OFS: output field separator, a space by default
Use grep to filter:
cat file.txt | grep '2017-09-19' > filtered_file.txt
This is not perfect, since the string 2017-09-19 is not required to appear in the 4th column, but if your file looks like the example, it'll work.
Sed solution:
sed -nr "/^([^\t]*\t){3}2017-09-19/p" input.txt >output.txt
this is:
-n - don't output every line
-r - extended regular expresion
/regexp/p - print line that contains regular expression regexp
^ - begin of line
(regexp){3} - repeat regexp 3 times
[^\t] - any character except tab
\t - tab character
* - repeat characters multiple times
2017-09-19 - search text
That is, skip 3 columns separated by a tab from the beginning of the line, and then check that the value of column 4 coincides with the required value.
awk '/2017-09-19/' file >newfile
cat newfile
a b c 2017-09-19
es fda d 2017-09-19
In vim, in a Windows machine (with no access to "unix"-like commands such command column) I want to reformat this code to make it more readable:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
And I want to have this using as less commands as possible:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FORMAT 99
COLUMN VALUE_2 FORMAT 99
COLUMN VALUE_3 FORMAT 999
COLUMN VALUE_4 FORMAT 999
Note this is just an excerpt, as there many more lines in which I must do the same.
You could use the following command:
:%s/\w\zs\s*\zeFORMAT/^I
The pattern will match the whitespaces between FORMAT and the end of the previous word and replace it by a tab:
\w Any 'word' character
\zs Start the matching
\s* Any number of whitespace
\ze End the matching
FORMAT The actual word format
\zs and \ze allow to apply the substitution only on the whitespaces see: :h /\zs and :h /\ze
Note that ^I should be inserted with ctrl+vtab
The tabular plugin recommended by #SatoKatsura would be a good way to do it too.
You can also generalize that. Let's say you have the following file:
COLUMN KEY_ID FORMAT 9999999999
COLUMN VALUE_1 FOO 99
COLUMN VALUE_2 BAR 99
You could use this command:
:%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/
Were the pattern can be detailed like that:
^ Match the beginning of the line
\(\w*\s\)\{1} One occurrence of the pattern \w*\s i.e. one column
\w* Another column
\zs\s*\ze The whitespaces after the previous column
You could change the value of \{1} to apply the command on the next columns.
EDIT to answer #aturegano comment, here is a way to align the column to another one:
%s/^\(\w*\s\)\{1}\w*\zs\s*\ze/\=repeat(' ', 30-matchstrpos(getline('.'), submatch(0))[1])
The idea is still to match the whitespaces which must be aligned, on the second part of the substitution command we use a sub-replace-expression (See :h sub-replace-expression).
This allows us to use a command from the substitution part, which can be explained like this:
\= Interpret the next characters as a command
repeat(' ', XX) Replace the match with XX whitespaces
XX is decomposed like this:
30- 30 less the next expression
matchstrpos()[1] Returns the columns where the second argument appears in the first one
getline('.') The current line (i.e. the one containing the match
submatch(0) The matched string
[1] Necessary since matchstrpos() returns a list:
[matchedString, StartPosition, EndPosition]
and we are looking for the second value.
You then simply have to replace 30 by the column where you want to move your next column.
See :h matchstrpos(), :h getline() and :h submatch()
For alignment, there are three well-known plugins:
the venerable Align - Help folks to align text, eqns, declarations, tables, etc
the modern tabular
the contender vim-easy-align
Posting an answer as requested:
:g/^COLUMN / s/.*/\=call('printf', ['%s %-30s %s %s'] + split(submatch(0)))/
Explanation:
g/^COLUMN / - apply the following command to lines matching /^COLUMN / (cf. :h :global)
\= - replace with the result of evaluating an expression, rather than with a fixed string (cf. :h s/\=)
submatch(0) - the line being matched
split(...) - split line into words
printf(...) - format the line
call(...) - we'd like to have printf('%s %-30s %s %s', list), but printf() doesn't take "real" lists as arguments, so we have to unfold the list with a call(...) (cf. :h call()).
Yet another solution:
:%s/ \{2,}/ /g
This solution is not perfect because the result will have an extra single space on the first line. To fix this problem:
:%s/\%>15c \{2,}/ /g
Explanation of pattern:
%>15c\s\{2,}
%>15c Matches only after column 15
\s\{2,} Matches two or more white spaces
I have a 3 column file. I would like to append a third column which is just one word repeated many times. I tried the following
paste file.tsv <(echo 'new_text') > new_file.tsv
But the text 'new_text' only appears on the first line, not every line.
How can I get 'new_text' to appear on every line.
Thanks
sed '1,$ s/$/;ABC/' infile > outfile
This replaces the line end ("$") with ";ABC".
How to match a tab only when it is between two numbers?
Sample script
209.65834 27.23204908
119.37987 15.03317082
74.240635 8.30561924
29.1014 0
931.8861 -100.00000
-16.03784 -8.30562
;
_mirror
l
;
29.1014 0
1028.10 0.00
n
_spline
935.4875 250
924.2026913 269.8820375
912.9178825 277.4506484
890.348265 287.3181854
(in the above script, the tabs are between the numbers, not the spaces) (blank lines are significant; there is nothing in them, but I can't lose them)
I wish to get a "," between the numbers. Tried with :%s/\t/\,/ but that will touch the empty lines too, and the end of lines.
Try this:
:%s/\(\d\)\t\(-\?\d\)/\1,\2/
\d matches any digit. -? means "an optional -. The pair of (escaped) parenthesis capture the match, and \1 refers to the first captured match, \2 refers to the second.
google://vim+regex -> http://vimregex.com/ ->
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/gc
You have 2 groups of numbers here ([0-9]) and tab-symbols \t between them. Add some escape symbols and you have the answer.
g for multichange in single line, c for some asking.
\1 and \2 are matching groups (numbers in your case).
It's not really hard to find answer for questions like that by yourself.
try
:%s/\([0-9]\)\t\([0-9]\)/\1,\2/g
explanation - search the patten <digit>\t<digit> and remember the part that matches <digit> .
\( ... \) captures and remembers the part that matches.
\1 recalls the first captured digit, \2 the second captured digit.
so if the match was on 123\t789, <digit>,<digit> matches 3\t7
the 3 and 7 are rememberd as \1 and \2
or
:g/[0-9]/ s/\t/,/g
explanation - filter all lines with a digit, then substitute tabs with a comma on those lines