For instance let say I have a text file:
worker1, 0001, company1
worker2, 0002, company2
worker3, 0003, company3
How would I use sed to take the first 2 characters of the first column so "wo" and remove the rest of the text and attach it to the second column so the output would look like this:
wo0001,company1
wo0002,company2
wo0003,company3
$ sed -E 's/^(..)[^,]*, ([^,]*,) /\1\2/' file
wo0001,company1
wo0002,company2
wo0003,company3
s/ begin substitution
^(..) match the first two characters at the beginning of the line, captured in a group
[^,]* match any amount of non-comma characters of the first column
, match a comma and a space character
([^,]*,) match the second field and comma captured in a group (any amount of non-comma characters followed by a comma)
match the next space character
/\1\2/ replace with the first and second capturing group
I want to remove certain lines from a tab-delimited file and write output to a new file.
a b c 2017-09-20
a b c 2017-09-19
es fda d 2017-09-20
es fda d 2017-09-19
The 4th column is Date, basically I want to keep only lines that has 4th column as "2017-09-19" (keep line 2&4) and write to a new file. The new file should have same format as the raw file.
How to write the linux command for this example?
Note: The search criteria should be on the 4th field as I have other fields in the real data and possibly have same value as 4th field.
With awk:
awk 'BEGIN{OFS="\t"} $4=="2017-09-19"' file
OFS: output field separator, a space by default
Use grep to filter:
cat file.txt | grep '2017-09-19' > filtered_file.txt
This is not perfect, since the string 2017-09-19 is not required to appear in the 4th column, but if your file looks like the example, it'll work.
Sed solution:
sed -nr "/^([^\t]*\t){3}2017-09-19/p" input.txt >output.txt
this is:
-n - don't output every line
-r - extended regular expresion
/regexp/p - print line that contains regular expression regexp
^ - begin of line
(regexp){3} - repeat regexp 3 times
[^\t] - any character except tab
\t - tab character
* - repeat characters multiple times
2017-09-19 - search text
That is, skip 3 columns separated by a tab from the beginning of the line, and then check that the value of column 4 coincides with the required value.
awk '/2017-09-19/' file >newfile
cat newfile
a b c 2017-09-19
es fda d 2017-09-19
I have a comma delimited file (CSV file) test.csv as shown below.
FHEAD,1,2,3,,,,,,
FDEP,2,3,,,,,,,,
FCLS,3,,,4-5,,,,,,,
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
I wanted to remove the empty columns only from 2nd and 3rd row of the file i.e. were ever the records starts with FDEP and FCLS only in those rows I wanted to remove the empty columns (,,).
after removing the empty columns the same file test.csv should look like
FHEAD,1,2,3,,,,,,
FDEP,2,3
FCLS,3,4-5
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,
How can I do this in Unix???
Here's one way to do it, using sed:
sed '/^F\(DEP\|CLS\),/ { s/,\{2,\}/,/g; s/,$// }'
We use a range of /^F\(DEP\|CLS\),/, i.e. the following command will only process lines matching ^F\(DEP\|CLS\),. This regex matches beginning-of-string, followed by F, followed by either DEP or CLS, followed by ,. In other words, we look for lines starting with FDEP, or FCLS,.
Having found such a line, we first substitute (s command) all runs (g flag, match as many times as possible) of 2 or more (\{2,\}) commas (,) in a row by a single ,. This squeezes ,,, down to a single ,.
Second, we substitute , at end-of-string by nothing. This gets rid of any trailing comma.
I need to remove the unique lines and keep the duplicates in my text file(read the articles written to remove duplicate lines but I want to do the opposite). Is there any way I could do that using expressions or textfx?
E.g:
file1.txt
hello
world
hello
After operation, output should be
hello
hello
Thanks in advance
In the Replace dialogue:
Find:
^(.+)\r?\n(?!(.|\r?\n)*\1)
Replace:
*leave empty!*
Options:
Select radio button "Regular Expression"
Leave checkbox ". matches newline" unselected
Pros:
Duplicate line doesn't need to be immediately after the 1st occurrence
Cons:
If a line appears x times in your data, after the regex x-1 occurrences will be left and not x as asked in OP.
This finds all lines followed by a line repetition (it does NOT find the last line, though):
.+\r\n(?=(.+\r\n)\1)
000000 111111 22
This matches a non-empty line 0, but only if it is followed by (a non-empty line \1, which is followed by \1).
Note that this assumes \r\n (Windows) line separations. On a Unix text file, just \n, on a Mac text file, just \r.
In the search box, mark Regular expression, unmark . matches newline, Replace with = "".
Example:
"Zulu
Alpha
Alpha
Bravo
Charlie
Charlie
Delta
Echo
Echo
Foxtrott
"
(file ends with empty line)
-->
"Alpha
Alpha
Charlie
Charlie
Echo
Echo
Foxtrott
"
I'm working with a large text file and need to be able delete lines based on the value of the 25th character on the line, i.e. if it is equal to H, K or Z. Is this possible, either just by matching one of the letters and running 3 commands or (even better) by all 3 in one command? Any help greatly appreciated!
You can use global to find a regex and then execute a command on the line that regex was found.
In this case it looks for any character 24 times from the beginning of the line and if the character after it matches H, K, or Z delete that line. (d at the end of the command stands for delete).
:g/^.\{24\}[HKZ]/d
Edit: as Peter Ricker points out \%25c would also work.
:g/\%25c[HKZ]/d
\%25c matches the 25th column then preforms the regex from there.
You could also use \%v if you wanted to match virtual columns instead.
You can try following ex command:
:if match( "HKZ", strpart( getline("."), 24, 1) ) != -1 | delete | endif