Vim how to move csv column - vim

Suppose a classic csv like this :
1,2,3,4,5,6,7
1,2,3,4,5,6,7
1,2,3,4,5,6,7
I was trying to move the 6th column of each line to the beginning of the line using a simple and elegant one liner.
Is there a way to achieve this using something like :
g/,/norm 5n<?>d0P
I don't know what to put in place of <?> to select the word right after the 5th comma

A little modification on the global command
:g/./exec 'normal 5f,lvld0P'
:g .............. globally
/./ ............. on each line that has something
exec ............. execute the following normal command
5f, .............. jumpt to the fifht ,
l ................ move to the number
vl ............... select the number and its coma
d ................ cut to the unamed register
0 ................ jump to the beginning of line
P ................ past the content of default register (before)
Update
Instead of simply selecting the number and the next coma we select until the next coma. (This is a more generic solution), avoiding issues with columns who possibly have double digit or more numbers.
g/./exec 'normal 5f,ldf,0P'
Using gnu awk
awk -i inplace -F, -v OFS="," '{print 6,1,2,3,4,5,7}' target-file
-i inplace ............. no need to use temp file
-F ...................... field separator
-v OFS .................. output field separator
Calling awk from vim
:%! awk -F, -v OFS="," '{print 6,1,2,3,4,5,7}'
% ............... current file
! ............... filter trhough external command

Related

find lines existing in one file and not in another, based on a portion of the line

I have two files A.dat and B.dat.
A.dat
112381550RSAP002839002C00000000020200600000110102020-05-26
112539961RSAP002839002C00000000020200700000140102020-05-26
140823748RSAP002839002C00000000020210200000050102020-05-26
110604754RSAP002839002C00000000020200600000110102020-05-26
B.dat
112381550RSAP002839002C00000000020200600000000102020-05-26
112539961RSAP002839002C00000000020200700000000102020-05-26
119A06559RSAP002839002C00000000020210100000000102020-05-26
119231672RSAP002839002C00000000020200900000000102020-05-26
118372226RSAP002839002C00000000020200800000000102020-05-26
I want to find records in B.dat that do not exist in A.dat based on the first 22 characters (in BOLD)
the output should be below
119A06559RSAP002839002C00000000020210100000000102020-05-26
119231672RSAP002839002C00000000020200900000000102020-05-26
118372226RSAP002839002C00000000020200800000000102020-05-26
Tried using grep like below
grep -Fvxf B.dat A.dat > c.dat
But didn't find a way to compare only that portion of the data.
Could you please try the following.
awk 'FNR==NR{array[substr($0,1,22)];next} !(substr($0,1,22) in array)' A.dat B.dat
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition if FNR==NR then do following.
array[substr($0,1,22)] ##Creating an array whose index is first 22 elements of current line.
next ##next will skip all further statements from here.
}
!(substr($0,1,22) in array) ##Checking condition if current line first 22 characters are NOT in array the print the current line.
' A.dat B.dat ##Mentioning Input_file names here.
I would use the following method based on awk:
awk '{s=substr($0,1,22)}(FNR==NR){a[s];next}!(s in a)' A.dat B.dat
This ensures that you will always match the first 22 characters.
It essentially does the following: everytime a line is read (disregarding the file) it creates a little string s containing the first 22 characters of the line. If we process the first file (FNR==NR) store the string in an array a, if we process the second file, check if that string is a member of a and if not, print the line.
You could also attempt a grep based solution, but this could lead to false positives, depending on how you like your input:
cut -c1-22 A.dat | grep -vFf - B.dat
This however could match the first 22 characters of the lines of A.dat anywhere in the lines of B.dat (not necessarily the first 22 characters)
You can do this with just grep and colrm as follows (a filename of "-" is understood as stdin and you can use that with "-f"):
colrm 23 < A.dat | grep -F -v -f - B.dat
If you're not 100% sure those 22-character patterns are going to match only at the starts of lines, you need to add a '^' to each line of output from colrm and elide the "-F" flag from grep's flags, like so:
colrm 23 < A.dat | sed -e 's/^/\^/;' | grep -v -f - B.dat
If the order of the output is unimportant, here's a grep-free method using bash, sort, and GNU uniq:
sort {A,A,B}.dat | uniq -uw 22
...or in POSIX shell:
sort A.dat A.dat B.dat | uniq -uw 22
Output of either method:
118372226RSAP002839002C00000000020200800000000102020-05-26
119231672RSAP002839002C00000000020200900000000102020-05-26
119A06559RSAP002839002C00000000020210100000000102020-05-26

I want to remove multiple line of text on linux

Just like this.
Before:
1
19:22
abcde
2
19:23
3
19:24
abbff
4
19:25
abbc
After:
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
I want remove the section having no alphabet like section 2.
I think that I should use perl or sed. But I don't know how to do.
I tried like this. But it didn't work.
sed 's/[0-9]\n[0-9]\n%s\n//'
sed is for doing s/old/new/ on individual lines, that is all. For anything else you should be using awk:
$ awk -v RS= -v ORS='\n\n' '/[[:alpha:]]/' file
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
The above is simply this:
RS= tells awk the input records are separated by blank lines.
ORS='\n\n' tells awk the output records must also be separated by blank lines.
/[[:alpha:]]/ searches for and prints records that contain alphabetic characters.
Simple enough in Perl. The secret is to put Perl in "paragraph mode" by setting the input record separator ($/) to an empty string. Then we only print records if they contain a letter.
#!/usr/bin/perl
use strict;
use warnings;
# Paragraph mode
local $/ = '';
# Read from STDIN a record (i.e. paragraph) at a time
while (<>) {
# Only print records that include a letter
print if /[a-z]/i;
}
This is written as a Unix filter, i.e. it reads from STDIN and writes to STDOUT. So if it's in a file called filter, you can call it like this:
$ filter < your_input_file > your_output_file
Alternatively this is a simple command line script in Perl (-00 is the command line option to put Perl into paragraph mode):
$ perl -00 -ne'print if /[a-z]/' < your_input_file > your_output_file
If there's exactly one blank line after each paragraph you can use a long awk oneliner (three patterns, so probably not a oneliner actually):
$ echo '1
19:22
abcde
2
19:23
3
19:24
abbff
4
19:25
abbc
' | awk '/[^[:space:]]/ { accum = accum $0 "\n" } /^[[:space:]]*$/ { if(on) print accum $0; on = 0; accum = "" } /[[:alpha:]]/ { on = 1 }'
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
The idea is to accumulate non-blank lines, setting flag once an alphabetical character found, and on a blank input line, flush the whole accumulated paragraph if that flag is set, reset accum to empty string and reset flag to zero.
(Note that if the last line of input is not necessarily empty you might need to add an END block that checks if currently there's a paragraph unflushed and flush it as needed.)
This might work for you (GNU sed):
sed ':a;$!{N;/^$/M!ba};/[[:alpha:]]/!d' file
Gather up lines delimited by an empty line or end-of-file and delete the latest collection if it does not contain an alpha character.
This presupposes that the file format is fixed as in the example. To be more accurate use:
sed -r ':a;$!{N;/^$/M!ba};/^[1-9][0-9]*\n[0-9]{2}:[0-9]{2}\n[[:alpha:]]+\n?$/!d' file
Similar to the solution of Ed Morton but with the following assumptions:
The text blocks consist of 2 or 3 lines.
If there is a third line, it contains characters from any alphabet.
In essence, under these conditions we only need to check for a third field:
awk 'BEGIN{RS=;ORS="\n\n";FS="\n"}(NF<3)' file
or similar without BEGIN:
awk -v RS= -v ORS='\n\n' -F '\n' '(NF<3)' file

sed for print two different word alernatively

I have a requirement to print two different words in alternative white spaces in the file.
For example,
ABCD
EFGH
IGKL
MNOP
The above scenario, I want print ab and /ab alternatively like below:
ab
ABCD
/ab
ab
EFGH
/ab
ab
IGKL
/ab
ab
MNOP
/ab
*I want this one by one in a line by line format(Not horizontal format).*I know sed 's|^[[:blank:]]*$|</ab>|' this command is almost near to my case. But I don't know how to apply this. Please, someone, help me.
With gnu sed
sed -e 'i\ab' -e 'a\/ab' infile
How this work ?
On each line
first insert ab before with 'i\ab'
next append /ab after with 'a\/ab'
You must use 2 separates commands with '-e' to do that.
You can't use sed 'i\ab;a\/ab' because the first command i (insert) don't know where end the text to insert and get all the line.
So the inserted text is ab;a/ab before each line.
Another way to do that with all sed is
sed -e 'i\
ab
a\
/ab' infile
If you are ok with awk then following may help you here.
awk -v start="ab" -v end="/ab" '{print start ORS $0 ORS end}' Input_file
In case you need to save output into Input_file itself then append > temp_file && mv temp_file Input_file in above code too.

vi sed or awk. every line in a text file. replace 9 characters starting at position 75

I have a huge file
from line 3 to end of (#lines in file -1 )
starting at character position 75 on the line. I need to change the string to 123456789.
thought suggestions? I can't the existing characters per line are not duplicates so I can't search on that.
The joys of hiding pii data
In vim, you can do this:
%s/\(^.\{75\}\)\#<=........./1234567890/g
which basically does a lookbehind of 75 characters (which starts at the beginning of the line), and replaces the rest of the line with your string.
Let's consider this test file:
$ cat testfile
.........-.........-.........-.........-.........-.........-.........-....ReplaceMeKeep
.........-.........-.........-.........-.........-.........-.........-....OldData..Keep
Using sed
This replaces the nine characters starting with column 75 on with 123456789:
$ sed -E 's/(.{74}).{0,9}/\1123456789/' testfile
.........-.........-.........-.........-.........-.........-.........-....123456789Keep
.........-.........-.........-.........-.........-.........-.........-....123456789Keep
Using awk
This puts the new string in place of the first nine characters starting at position 75:
$ awk '{print substr($0,1,74) "123456789" substr($0,75+9)}' testfile
.........-.........-.........-.........-.........-.........-.........-....123456789Keep
.........-.........-.........-.........-.........-.........-.........-....123456789Keep

grep to search data in first column

I have a text file with two columns.
Product Cost
Abc....def 10
Abc.def 20
ajsk,,lll 04
I want to search for product starts from "Abc" and ends with "def" then for those entries I want to add Cost.
I have used :
grep "^Abc|def$" myfile
but it is not working
Use awk. cat myfile | awk '{print $1}' | grep query
If you can use awk, try this:
text.txt
--------
Product Cost
Abc....def 10
Abc.def 20
ajsk,,lll 04
With only awk:
awk '$1 ~ /^Abc.*def$/ { SUM += $2 } END { print SUM } ' test.txt
Result: 30
With grep and awk:
grep "^Abc.*def.*\d*$" test.txt | awk '{SUM += $2} END {print SUM}'
Result: 30
Explanation:
awk reads each line and matches the first column with a regular expression (regex)
The first column has to start with Abc, followed by anything (zero or more times), and ends with def
If such match is found, add 2nd column to SUM variable
After reading all lines print the variable
Grep extracts each line that starts with Abc, followed by anything, followed by def, followed by anything, followed by a number (zero or more times) to end. Those lines are fed/piped to awk. Awk just increments SUM for each line it receives. After reading all lines received, it prints the SUM variable.
Thanks edited. Do you want the command like this?
grep "^Abc.*def *.*$"
If you don't want to use cat, and also show the line numbers:
awk '{print $1}' filename | grep -n keyword
If applicable, you may consider caret ^: grep -E '^foo|^bar' it will match text at the beginning of the string. Column one is always located at the beginning of the string.
Regular expression > POSIX basic and extended
^ Matches the starting position within the string. In line-based tools, it matches the starting position of any line.

Resources