Add new column based on value of an existing column - linux

I am trying to transform a delimited file into a table of data in linux. The meaning of value in certain columns are dependent the value in a separate column. How can I create additional columns based on the value of the column?
Depending on the value of column 2, i.e. 00 or 01 the interpretation of columns 3 and 4 are different. So if I had the following values.
A1,00,N1,T1
A1,01,N2,T2
A2,00,N3,T3
A2,01,N4,T4
The expected results should be as follows. Notice how I now have two new columns.
A1,00,N1,T1,N2,T2
A2,01,N3,T3,N4,T4

$ awk -F, ' #1
{A[$1] = A[$1] FS $3 FS $4} #2
END {for(i in A) print i FS "00" A[i]} #3
' file
A1,00,N1,T1,N2,T2
A2,00,N3,T3,N4,T4
Set Field Separator to comma.
On every line, set Array[first-column] to its current value followed by the third and fourth columns.
At the end, for every index, print the index name, a comma, the string "00", and the value of that index.
The end value of A[A1] is ,N1,T1,N2,T2

Related

awk : merge multiple rows into one row per first column record value

I have one text file which contains some records like below,
100,a
b
c
101,d,e
f
102,g
103,h
104,i
j
k
so,some rows start with number,some rows start with string ,and I want to merge rows which rows are order by number and merge rows like below:
100,a,b,c
101,d,e,f
102,g
103,h
104,i,j,k
How can I use awk to do this ?
Thanks
You can do something like:
awk '/^[0-9]/{if(buf){print buf};buf=$0}/^[a-zA-Z]/{buf=buf","$0}END{print buf}' yourfile.txt
This will
Check if the current line starts with a number /^[0-9]/
If so then it will print out what is stored in variable buf if that variable has some value in it if (buf){print buf}
It will then reset the variable buf to the value of the current line buf=$0
If the current line starts with a letter /^[a-zA-Z]/
Then it will add the current line to the value in the variable buf with a comma separator buf=buf","$0
Finally when it reaches the end of the file, it prints out whatever is left in the buf variable. END{print buf}

Find if the first 10 digits of two columns on csv file are matched in bash

I have a file which contains two columns (names.csv), values are separated by comma
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
These columns have values with 10 digits, which are unique identifiers, and some extra junk in the values (-anything).
I want to see if the columns have the prefix matched!
To verify the values on first and second column I use:
cat /home/names.csv | parallel --colsep ',' echo column 1 = {1} column 2 = {2}
Which print the values. Because the values are HEX digits, it is cumbersome to verify one by one by only reading. Is there any way to see if the 10 digits of each column pair are exact matches? They might contain special characters!
Expected output (example, but anything that says the columns are matched or not can work):
Matches (including first line):
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
Non-matches
e123456777-anything,e123456999-anything
Here's one way using awk. It prints every line where the first 10 characters of the first two fields match.
% cat /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything
e123456777-anything,e123456999-anything
% awk -F, 'substr($1,1,10)==substr($2,1,10)' /tmp/names.csv
,
a123456789-anything,a123456789-anything
b123456789-anything,b123456789-anything
c123456789-anything,c123456789-anything
d123456789-anything,d123456789-anything
e123456789-anything,e123456789-anything

Edit values in one column in 4,000,000 row CSV file

I have a CSV file I am trying to edit to add a numeric ID-type column in with unique integers from 1 - approx 4,000,000. Some of the fields already have an ID value, so I was hoping I could just sort those and then fill in starting on the largest value + 1. However, I cannot open this file to edit in Excel because of its size (I can only see the max of 1,048,000 or whatever rows). Is there an easy way to do this? I am not familiar with coding, so I was hoping there was a way to do it manually that is similar to Excel's fill series feature.
Thanks!
-also - I know there are threads on how to edit a large CSV file, but I was hoping for help with how to edit this specific feature. Thanks!
-I want to basically sort the rows based on idnumber and then add unique IDs to rows without that ID value
Screenshot of file
one way, using Notepad++, and a plugin named SQL:
Load the CSV in Notepad++
SELECT a+1,b,c FROM data
Hit 'start'
When starting with a file like this:
a,b,c
1,2,3
4,5,6
7,8,9
The results after look like this:
SQL Plugin 1.0.1025
Query : select a+1,b,c from data
Sourcefile : abc.csv
Delimiter : ,
Number of hits: 3
===================================================================================
Query result:
2,2,3
5,5,6
8,8,9
Or, in words, the first column is incremented by 1.
2nd solution, using gawk, downloaded from https://www.klabaster.com/freeware.htm#mawk:
D:\TEMP>type abc.csv
a,b,c
1,2,3
4,5,6
7,8,9
D:\TEMP>gawk "BEGIN{ FS=OFS=\",\"; getline; print $0 }{ print $1+1,$2,$3 }" abc.csv
a,b,c
2,2,3
5,5,6
8,8,9
(g)awk id a tool which reads a file line by line. The line is then accessible via $0, and the parts from the line via $1,$2,$3,... using a separator.
This separator is set in my example (FS=OFS=\",\";) in the BEGIN section which is only done once per input file. Do not get confused by the \". This is because the script is between double quotes, and a variable (like OFS) is set using double quotes too, so it needs to be escaped like \".
The getline; print $0, do take care of the first line in a CSV which typically hold column names.
Then, for every line, this piece of code print $1+1,$2,$3 will increment the first column, and print the second and third column.
To extend this second example:
gawk "BEGIN{ FS=OFS=\",\"; getline; print $0 }{ print ($1<5?$1+1:$1),$2,$3 }" abc.csv
The ($1<5?$1+1:$1) will check if value of $1is less then 5 ($1<5), if true, it will return $1+1, and else $1. Or, in words, it will only add 1 if the current value is less than 5.
With your data you end up with something like this (untested!):
gawk "BEGIN{ FS=OFS=\",\"; getline; a=42; print $0 }{ if($4+0==0){ a++ }; print ($4<=0?$a:$1),$2,$3 }" input.csv
a=42 to set the initial value for the column values which needs to be update (you need to change this to the correct value )
The if($4+0==0){ a++ } will increment the value of a when the fourth column equals 0 (The $4+0 is done to convert empty values like "" to a numeric value 0).

How to edit the left hand column and replace with values in Linux?

I have the following text file:
.txt file
In the left hand column all the values are '0' is there a way to change only the left hand column to replace all the zeros with the value 15. I cant find all and replace as other columns contain '0' which cant be altered, this also cant be done manually as the file contains 10,000 lines. I'm wondering if this is possible from the command line or with a script.
Thanks
Using awk:
awk '$1 == 0 { $1 = 15 } 1' file.txt
Replaces the first column with 15 on each line only if the original value is 0.

match 2 files using awk when first file as key has multiple values

I have 2 files A & B.
And I'm using this command for matching:
awk -F"\t" 'NR==FNR {a[$1]=$3","$2; next}; $1 in a {print $0a[$1]}' A B
Problem is file A has repeated entries in column 1. Now there'll be many values for a single key in array 'a'. Will this command print all the values or first value only present in array a?

Resources