Edit values in one column in 4,000,000 row CSV file - excel

I have a CSV file I am trying to edit to add a numeric ID-type column in with unique integers from 1 - approx 4,000,000. Some of the fields already have an ID value, so I was hoping I could just sort those and then fill in starting on the largest value + 1. However, I cannot open this file to edit in Excel because of its size (I can only see the max of 1,048,000 or whatever rows). Is there an easy way to do this? I am not familiar with coding, so I was hoping there was a way to do it manually that is similar to Excel's fill series feature.
Thanks!
-also - I know there are threads on how to edit a large CSV file, but I was hoping for help with how to edit this specific feature. Thanks!
-I want to basically sort the rows based on idnumber and then add unique IDs to rows without that ID value
Screenshot of file

one way, using Notepad++, and a plugin named SQL:
Load the CSV in Notepad++
SELECT a+1,b,c FROM data
Hit 'start'
When starting with a file like this:
a,b,c
1,2,3
4,5,6
7,8,9
The results after look like this:
SQL Plugin 1.0.1025
Query : select a+1,b,c from data
Sourcefile : abc.csv
Delimiter : ,
Number of hits: 3
===================================================================================
Query result:
2,2,3
5,5,6
8,8,9
Or, in words, the first column is incremented by 1.
2nd solution, using gawk, downloaded from https://www.klabaster.com/freeware.htm#mawk:
D:\TEMP>type abc.csv
a,b,c
1,2,3
4,5,6
7,8,9
D:\TEMP>gawk "BEGIN{ FS=OFS=\",\"; getline; print $0 }{ print $1+1,$2,$3 }" abc.csv
a,b,c
2,2,3
5,5,6
8,8,9
(g)awk id a tool which reads a file line by line. The line is then accessible via $0, and the parts from the line via $1,$2,$3,... using a separator.
This separator is set in my example (FS=OFS=\",\";) in the BEGIN section which is only done once per input file. Do not get confused by the \". This is because the script is between double quotes, and a variable (like OFS) is set using double quotes too, so it needs to be escaped like \".
The getline; print $0, do take care of the first line in a CSV which typically hold column names.
Then, for every line, this piece of code print $1+1,$2,$3 will increment the first column, and print the second and third column.
To extend this second example:
gawk "BEGIN{ FS=OFS=\",\"; getline; print $0 }{ print ($1<5?$1+1:$1),$2,$3 }" abc.csv
The ($1<5?$1+1:$1) will check if value of $1is less then 5 ($1<5), if true, it will return $1+1, and else $1. Or, in words, it will only add 1 if the current value is less than 5.
With your data you end up with something like this (untested!):
gawk "BEGIN{ FS=OFS=\",\"; getline; a=42; print $0 }{ if($4+0==0){ a++ }; print ($4<=0?$a:$1),$2,$3 }" input.csv
a=42 to set the initial value for the column values which needs to be update (you need to change this to the correct value )
The if($4+0==0){ a++ } will increment the value of a when the fourth column equals 0 (The $4+0 is done to convert empty values like "" to a numeric value 0).

Related

awk : merge multiple rows into one row per first column record value

I have one text file which contains some records like below,
100,a
b
c
101,d,e
f
102,g
103,h
104,i
j
k
so,some rows start with number,some rows start with string ,and I want to merge rows which rows are order by number and merge rows like below:
100,a,b,c
101,d,e,f
102,g
103,h
104,i,j,k
How can I use awk to do this ?
Thanks
You can do something like:
awk '/^[0-9]/{if(buf){print buf};buf=$0}/^[a-zA-Z]/{buf=buf","$0}END{print buf}' yourfile.txt
This will
Check if the current line starts with a number /^[0-9]/
If so then it will print out what is stored in variable buf if that variable has some value in it if (buf){print buf}
It will then reset the variable buf to the value of the current line buf=$0
If the current line starts with a letter /^[a-zA-Z]/
Then it will add the current line to the value in the variable buf with a comma separator buf=buf","$0
Finally when it reaches the end of the file, it prints out whatever is left in the buf variable. END{print buf}

How to edit the left hand column and replace with values in Linux?

I have the following text file:
.txt file
In the left hand column all the values are '0' is there a way to change only the left hand column to replace all the zeros with the value 15. I cant find all and replace as other columns contain '0' which cant be altered, this also cant be done manually as the file contains 10,000 lines. I'm wondering if this is possible from the command line or with a script.
Thanks
Using awk:
awk '$1 == 0 { $1 = 15 } 1' file.txt
Replaces the first column with 15 on each line only if the original value is 0.

Add new column based on value of an existing column

I am trying to transform a delimited file into a table of data in linux. The meaning of value in certain columns are dependent the value in a separate column. How can I create additional columns based on the value of the column?
Depending on the value of column 2, i.e. 00 or 01 the interpretation of columns 3 and 4 are different. So if I had the following values.
A1,00,N1,T1
A1,01,N2,T2
A2,00,N3,T3
A2,01,N4,T4
The expected results should be as follows. Notice how I now have two new columns.
A1,00,N1,T1,N2,T2
A2,01,N3,T3,N4,T4
$ awk -F, ' #1
{A[$1] = A[$1] FS $3 FS $4} #2
END {for(i in A) print i FS "00" A[i]} #3
' file
A1,00,N1,T1,N2,T2
A2,00,N3,T3,N4,T4
Set Field Separator to comma.
On every line, set Array[first-column] to its current value followed by the third and fourth columns.
At the end, for every index, print the index name, a comma, the string "00", and the value of that index.
The end value of A[A1] is ,N1,T1,N2,T2

Awk matching values of first two columns and printing in blank field

I have a csv file which looks like below:
2212,A1,
2212,A1,128
2307,B1,
2307,B1,107
how can i copy value of 3rd column in place of missing values in 3rd column of if value of first 2 column is same. e.g. first two columns of first two rows are same so automatically it should print value of 3rd column of second row in missing place of third column of first row.
expected output:
2212,A1,128
2212,A1,128
2307,B1,107
2307,B1,107
Please help as i couldn't even think of a solution and there are millions of values such like this in my file..
If you first sort the file in reverse order, the rows with data preceed the empty rows:
$ sort -r file
2307,B1,107
2307,B1,
2212,A1,128
2212,A1,
Then use following awk to process the output of sort:
$ sort -r file | awk 'NR>1 && match(prev,$0) {$0=prev} {prev=$0} 1'
2307,B1,107
2307,B1,107
2212,A1,128
2212,A1,128
awk -F, '{a[$1FS$2]++;b[$1FS$2]=$NF}END{for (i in b) {for(j=1;j<=a[i];j++) print i FS b[i]}}' file

How can I use awk to modify this field

I am using awk to create a .cue sheet for a long mp3 from a list of track start times, so the input may look like this:
01:01:00-Title-Artist
01:02:00:00-Title2-Artist2
Currently, I am using "-" as the field separator so that I can capture the start time, Artist and Title for manipulation.
The first time can be used as is in a cue sheet. The second time needs to be converted to 62:00:00 (the cue sheet cannot handle hours). What is the best way to do this? If necessary, I can force all of the times in the input file to have "00:" in the hours section, but I'd rather not do this if I don't have to.
Ultimately, I would like to have time, title and artist fields with the time field having a number of minutes greater than 60 rather than an hour field.
fedorqui's solution is valid: just pipe the output into another instance of awk. However, if you want to do it inside one awk process, you can do something like:
awk 'split($1,a,":")==4 { $1 = a[1] * 60 + a[2] ":" a[3] ":" a[4]}
1' FS=- OFS=- input
The split works on the first field only. If there are 4 elements, the pattern re-writes the first field in the desired output.
Like this, for example:
$ awk -F: '{if (NF>3) $0=($1*60+$2)FS$3FS$4}1' file
01:01:00-Title-Artist
62:00:00-Title2-Artist2
In case the file contains 4 or more fields based on : split, it joins 1st and 2nd with the rule 60*1st + 2nd. FS means field separator and is set to : in the beginning.

Resources