Add a variable to a column in a CSV file - linux

I have a large file (~10GB) and I want to duplicate that file 10 times but each time add a variable to the first column:
for i in (1, 10):
var = (i-1) * 1000
# add var to the first column of the file and save the file as file(i).csv
So far I have tried:
#!/bin/bash
for i in {1..10}
do
t=1
j=$(( $i - t ))
s=1000
person_id=$(( j * add ))
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1+$person_id} {print $0}' file.csv > file$i.csv
done
but no change in column value.

Awk variables are different from shell variables.
Replace:
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1+$person_id} {print $0}' file.csv > file$i.csv
With:
awk -F"," -v id="$person_id" 'BEGIN{OFS=","} NR>1{$1=$1+id} {print $0}' file.csv > "file$i.csv"
This uses the -v option to define an awk variable id whose value is the value of the shell variable person_id.
Because , is not a shell-active character, the code can be simplified. Also, changing the location of the definition of OFS can further shorten the code:
awk -F, -v id="$person_id" 'NR>1{$1+=id} 1' OFS=, file.csv > "file$i.csv"
Lastly, we replaced {print $0} with the cryptic shorthand 1. (This works because awk interprets 1 as a logical condition which it evaluates to true and, since no action was supplied, awk will perform the default action which is to print the line.)

Related

awk with bash variable along with condition to be checked

I need to search and replace a pattern from file
[ec2_server]
server_host=something
[list_server]
server_host=old_name
to
[ec2_server]
server_host=something
[list_server]
server_host=new_name
I'm able to get it working with
awk '/\[list_server]/ { print; getline; $0 = "server_host=new_name" } 1'
But I'm trying to parameterize the search pattern, the parameter name to change and the parameter value to change.
PATTERN_TO_SEARCH=[list_server]
PARAM_NAME=server_host
PARAM_NEW_VALUE=new_name
But it is not working when I parameterize and pass the variables to awk
awk -v patt=$PATTERN_TO_SEARCH -v parm=$PARAM_NAME -v parmval=$PARAM_NEW_VALUE '/\patt/ { print; getline; $0 = "parm=parmval" } 1' file.txt
You have two instances of the same problem: you're trying to use a
variable name inside a string value. Awk can't read your mind: it
can't intuit that sometimes when your write "HOME" you mean "print the
value of the variable HOME" and other times you mean "print the word
HOME".
We need to make two separate changes:
First, to use a variable in your search pattern, you can use
syntax like this:
awk -v patt='some text' '$0 == patt {print}'
(Note that here we're using an equality match, ==; you can also use a regular expression match, ~, but in this particular case that would only complicate things).
With your example file content, running:
awk -v patt='[list_server]' '$0 == patt {print}' file.txt
Produces:
[list_server]
Next, when you write $0 = "parm=parmval", you're setting $0 to the literal string parm=parmval. If you want to perform variable substitution, consider using sprintf():
awk \
-v patt="$PATTERN_TO_SEARCH" \
-v parm="$PARAM_NAME" \
-v parmval="$PARAM_NEW_VALUE"\
'
$0 == patt { print; getline; $0 = sprintf("%s=%s\n", parm, parmval) } 1
' file.txt
Which gives us:
[ec2_server]
server_host=something
[list_server]
server_host=new_server
Have your awk code in following way, as experts recommend not to use getline(since it has edge cases in its use). So I am going with find the string and then set flag(custom variable made by me in program) and then print the line accordingly with using regex along with passed value from shell variable.
Along with matching and printing the new value we need to set field separator also to fetch correct value and replace/print it with new value. So I made field separator as = here for whole Input_file. By doing this approach you need not to pass any variable which has server_host value in it, since its already present in Input_file so we can take it from there.
awk solution with mentioning value within awk variable itself and then check regex in main program of awk for comparison.
awk -v var="list_server" -v newVal="NEW_VALUE" '
BEGIN{ FS=OFS="=" }
$0 ~ "^\\[" var "\\]$"{
found=1
print
next
}
found{
print $1 OFS newVal
found=""
next
}
1
' Input_file
OR awk solution to get value from shell variable and then use regex inside awk to match condition:
varS="list_server" ##Shell variable
newvalue="NEW_VALUE" ##Shell variable
awk -v var="$varS" -v newVal="$newvalue" '
BEGIN{ FS=OFS="=" }
$0 ~ "^\\[" var "\\]$"{
found=1
print
next
}
found{
print $1 OFS newVal
found=""
next
}
1
' Input_file
$ awk -v pat="$PATTERN_TO_SEARCH" -v parm="$PARAM_NAME" -v parmval="$PARAM_NEW_VALUE" '
f{$0=parm"="parmval; f=0} $0==pat{f=1} 1
' file
[ec2_server]
server_host=something
[list_server]
server_host=new_name
This makes the assumption "${PARAM_NAME}" immediately follows the search pattern row :
_P2S_='[list_server]'
_PNM_='server_host'
_PNV_='new_name'
echo "${...input...}" | gtee >( gpaste - | gcat -b >&2; echo ) | gcat - |
{m,n,g}awk -v __="${_P2S_}=${_PNM_}=${_PNV_}" -F= 'BEGIN {
$(_-=_)=__;___= $(_ = NF); FS ="^"(OFS = $--_ FS)
__= $-(_+=-_--) } (NR-_)< NF ? ($NF =___)^(_-=_) :_=NR*(-!!_)^(__!=$!_)' |
gcat -b | gcat -n | ecp
1 [ec2_server]
2 server_host=something
3 [list_server]
4 server_host=old_name
1 1 [ec2_server]
2 2 server_host=something
3
4 3 [list_server]
5 4 server_host=new_name

Count number of ';' in column

I use the following command to count number of ; in a first line in a file:
awk -F';' '(NR==1){print NF;}' $filename
I would like to do same with all lines in the same file. That is to say, count number of ; on all line in file.
What I have :
$ awk -F';' '(NR==1){print NF;}' $filename
11
What I would like to have :
11
11
11
11
11
11
Straight forward method to count ; per line should be:
awk '{print gsub(/;/,"&")}' Input_file
To remove empty lines try:
awk 'NF{print gsub(/;/,"&")}' Input_file
To do this in OP's way reduce 1 from value of NF:
awk -F';' '{print (NF-1)}' Input_file
OR
awk -F';' 'NF{print (NF-1)}' Input_file
I'd say you can solve your problem with the following:
awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
You want to keep a running count of all the occurrences made (variable a).
As NF will return the number of fields, which is one more than the number of separators, you'll need to subtract 1 for each line. This is the NF-1 part.
However, you don't want to count "-1" for the lines in which there is no separator at all. To skip those you need the if (NF) part.
Here's a (perhaps contrived) example:
$ cat test.txt
;;
; ; ; ;;
; asd ;;a
a ; ;
$ awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
12
Notice the empty line at the end (to test against the "no separator" case).
A different approach using tr and wc:
$ tr -cd ';' < file | wc -c
42
Your code returns a number one more than the number of semicolons; NF is the number of fields you get from splitting on a semicolon (so for example, if there is one semicolon, the line is split in two).
If you want to add this number from each line, that's easy;
awk -F ';' '{ sum += NF-1 } END { print sum }' "$filename"
If the number of fields is consistent, you could also just count the number of lines and multiply;
awk -F ':' 'END { print NR * (NF-1) }' "$filename"
But that's obviously wrong if you can't guarantee that all lines contain exactly the same number of fields.

Search array index with double quotes string using awk

File1:
"1"|data|er
"2"|text|rq
""|test2|req
"3"|test4|teq
File2:
1
2
3
Expected Output should be (file3.txt)
"1"|data|er
"2"|text|rq
"3"|test4|teq
awk -F''$Delimeter'' '{print $1}' file1.txt | awk '{gsub(/"/, "", $1); print $1}' | awk 'NF && !seen[$1]++' | sort -n > file2.txt
I am able to extract the ids 1,2,3 from file1 and removed the double quotes and written into the file2 but i need to search these 1,2,3 ids in my file1.txt("1","2","3"), problem is search not recognizing due to dobule qoutes in the file
awk 'BEGIN {FS=OFS="|"} NR==FNR{a[$1]; next} \"$1\" in a' file2.txt file1.txt > file3.txt
Could you please try following.
awk -v s1='"' '
FNR==NR{
val=s1 $0 s1
a[val]
next
}
($1 in a)
' Input_file2 FS='|' Input_file1
Explanation: Adding detailed explanation for above code.
awk -v s1='"' ' ##Starting awk program from here and creating variable s1 whose value is ".
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read.
val=s1 $0 s1 ##Creating variable val whose value is s1 current line value and s1 here.
a[val] ##Creating an array named a whose index is variable val.
next ##next will skip all further statements from here.
} ##Closing FNR==NR BLOCK of this code here.
($1 in a) ##Checking condition if $1 of current line is present in array a then print that line of Input_file1.
' Input_file2 FS='|' Input_file1 ##Mentioning Input_file2 then setting FS as pipe and mentioning Input_file1 name here.
Let's say that your input is
"1"|data|er
"2"|text|rq
""|test2|req
"3"|test4|teq
And you want from these information 2 types of data :
The ids
The lines containing an id
The easiest way to achieve this is, I think, to first get lines that has id, then from this, retrieve the ids.
To do so :
$ awk -F'|' '$0 ~ /"[0-9]+"/' input1 >input3; cat input3
"1"|data|er
"2"|text|rq
"3"|test4|teq
$ sed 's/^"//; s/".*$//' input3 >input2; cat input2
1
2
3

splitting content by ID (1st column) and generate new data file based on format

How i want to split content to multiple files using date format as following below:
Test_<ID name><ddmmyyyy>.CSV
How can I split according to the format?
as before this i use:
awk -F"," 'NR>1 {print > "Test_<ID name><ddmmyyyy>.CSV_"$1".csv"}' Original.CSV
Edit
I got there with
awk -v DATE="$(date +"%d%m%Y")" -F"," 'BEGIN{OFS=","}NR>1 { gsub(/"/,"",$1); print > "Assignment_"$1"_"DATE".csv"}' Test_01012020.CSV
but then I want to include my column name too. How?
You could try using variables from the shell in your thing:
_DATE=` date '+%d%m%Y' `
_ID=my_value
F_EXT=${_ID}${_DATE}
# here "var" is set to the value defined from the shell "F_EXT"
awk -v var=${F_EXT} -F"," 'NR>1 {print > "Test_" var ".CSV_"$1".csv"}' Original.CSV
(I didn't get where you were taking your "ID name", so here it's my_value)
Edit
If you want to include your column name, then read it with the case when NR==1:
awk -v DATE="$(date +"%d%m%Y")" -F"," 'BEGIN{OFS="," } NR==1 {COLUMN_NAME=$1} NR>1 { gsub(/"/,"",$1); print > "Assignment_"$1"_"COLUMN_NAME"_"DATE".csv"}' a.txt

Want to use a variable inside if outside of it in awk command

awk < MigStat.stat -F: '{ if ($a == "load")b=8; } $1 == "D0001" && FNR== '$c' {print $b}'
now i want the print the value present in column $b but in this case it prints the whole line.
What i want do is use the value of b from the if statement to print the value in that column
To pass in values use the -v option:
awk -F: -v a=2 -v c=10 '$a=="load"{b=8}$1=="D0001"&&FNR==c{print $b}' MigStat
Notes:
To pass in variables use the -v option of awk.
awk reads file you don't need to use redirection.
The structure of awk is condition{block}.
awk initialises variable to 0 so if b hasn't been set in the block $a=="load"{b=8} then {print $b} will be {print $0} where $0 is the whole line.
You don't need redundantly assigning b=8 . If the script is like what you posted there, you don't need the $b at all.
if $a, $c are shell variables:
awk -F: -v a="$a" -v c="$c" 'a=="load"&&$1=="D0001"&&NR==c{print $8;exit}' MigStat.stat
and better after print $8 call exit; to stop awk processing further lines.

Resources