rearranging column based on condition - linux

I have a *.csv file. with value as below
"ASDP02","8801942183589"
"ASDP06","8801939151023"
"CSDP04","8801963981740"
"ASDP09","8801946305047"
"ASDP12","8801941195677"
"ASDP05","8801922826186"
"CSDP08","8801983008938"
"ASDP04","8801944346555"
"CSDP11","8801910831518"
or sometimes the value is as below
"8801989353984","KSDP05"
"8801957608165","ASDP11"
"8801991455848","CSDP10"
"8801981363116","CSDP07"
"8801921247870","KSDP07"
"8801965386240","CSDP06"
"8801956293036","KSDP10"
"8801984383904","KSDP11"
"8801944211742","ASDP09"
I just want to put the numeric value (e.g. 8801989353984) always in 1st column. Is it possible using BASH script?

Sed is also your friend here
Input
cat 41189347
"ASDP02","8801942183589"
"ASDP06","8801939151023"
"CSDP04","8801963981740"
"ASDP09","8801946305047"
"ASDP12","8801941195677"
"ASDP05","8801922826186"
"CSDP08","8801983008938"
"ASDP04","8801944346555"
"CSDP11","8801910831518"
Script
sed -E 's/^("[[:alpha:]]+.*"),("[[:digit:]]+")$/\2,\1/' 41189347
Output
"8801942183589","ASDP02"
"8801939151023","ASDP06"
"8801963981740","CSDP04"
"8801946305047","ASDP09"
"8801941195677","ASDP12"
"8801922826186","ASDP05"
"8801983008938","CSDP08"
"8801944346555","ASDP04"
"8801910831518","CSDP11"

awk to the rescue!
$ awk -F, -v OFS=, '$1~/[A-Z]/{t=$2;$2=$1;$1=t}1' file
if first field has alpha chars, swap first and second columns and print.

Bash can do the work but awk might be a better choice for rearrange your file:
sample.csv:
"ASDP02","8801942183589"
"8801944211742","ASDP09"
command:
awk -F, 'BEGIN{OFS=","}{$1=$1;if(substr($1, 2, length($1) - 2) + 0 == substr($1, 2, length($1) - 2)){print $1,$2}else{print $2,$1}}' sample.csv
substr($1, 2, length($1) - 2) + 0 == substr($1, 2, length($1) - 2) checks the column is numeric or not. If it is, print the original line otherwise switch column1 and column2
Output:
"8801942183589","ASDP02"
"8801944211742","ASDP09"

You can create a pure bash script to generate other file which has the structure you need:
#!/bin/bash
csv_file="/path/to/your/csvfile"
output_file="/path/to/output_file"
#Optional
rm -rf "${output_file}"
readarray -t LINES < <(cat < "${csv_file}" 2> /dev/null)
for item in "${LINES[#]}"; do
if [[ $item =~ ^\"([0-9A-Z]+)\"\,\"([0-9]+)\" ]]; then
echo "\"${BASH_REMATCH[2]}\",\"${BASH_REMATCH[1]}\"" >> "${output_file}"
else
echo "$item" >> "${output_file}"
fi
done
This works even if your file is "mixed" I mean with some lines in the right format and other lines in the bad format.

The following commands assume that the cells in the CSV files do not contain newlines and commas. Otherwise, you should write a more complicated script in Perl, PHP, or other programming language capable of parsing CSV files properly. But Bash, definitely, is not appropriate for this task.
Perl
perl -F, -nle '#F = reverse #F if $F[0] =~ /^"\d+"$/;
print join(",", #F)' file
Beware, If the cells contain newlines, or commas, use Perl's Text::CSV module, for instance. Although it is a simple task in Perl, it goes beyond the scope of the current question.
The command splits the input lines by commas (-F,) and stores the result into #F array, for each line. The items in the array are reversed, if the first field $F[0] matches the regular expression. You can also swap the items this way: ($F[0], $F[1]) = ($F[1], $F[0]).
Finally, the joins the array items with commas, and prints to the standard output.
If you want to edit the file in-place, use -i option: perl -i.backup -F, ....
AWK
awk -F, -vOFS=, '/^"[0-9]+",/ {print; next}
{ t = $1; $1 = $2; $2 = t; print }' file
The input and output field separators are set to , with -F, and -vOFS=,.
If the line matches the pattern /^"[0-9]+",/ (the line begins with a "numeric" CSV column), the script prints the record and advances to the next record. Otherwise the next block is executed.
In the next block, it swaps the first two columns and prints the result to the standard output.
If you want to edit the file in-place, see answers to this question.

Related

Add an index column to a csv using awk

How can I add an index to a csv file using awk? For example lets assume I have a file
data.txt
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
I would like to add another column, which is the index. Basically I would like an output of
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
I was trying to use awk '{for (i=1; i<=NF; i++) print $i}' but it does not seem to be working right. And what is the best way to just add a comma for the first line but add incrementing number and a comma to the rest of the lines?
You may use this awk solution:
awk '{print (NR == 1 ? "" : NR-2) "," $0}' file
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Use this Perl one-liner:
perl -pe '$_ = ( $. > 1 ? ($. - 2) : "" ) . ",$_";' data.txt > out.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
$. : Current input line number.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
I would use GNU AWK for this task following way, let file.txt content be
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
then
awk 'BEGIN{OFS=","}{print NR==1?"":i++,$0}' file.txt
gives output
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Explanation: firstly I inform GNU AWK that output field separator (OFS) is ,, so arguments to print will be concatenated using that character. Then for each line I use so-called ternary operator i.e. condition?valueiftrue:valueiffalse to decide what will be 1st argument, for 1st line (NR==1) it is empty string for all else it is counter which will be first returned then increased by 1, 2nd argument to print is always whole original line ($0).
(tested in gawk 4.2.1)
gawk 'sub("^",substr(++_",",3^(NF~NR)))' FS='^$' \_=-2
mawk 'sub("^",++_+NF ? _",":",")' FS='^$' \_=-2
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3

How to parse a specific column or data without losing content from other columns/rows after parsing?

I have the following output to grep the value in this case "225". This value is actually a variable $pd so it could change depending on users input" It could be integer numbers or an alphanumeric character case-insensitive exact match. Example if value of variable is "225" then a "0225" or "11225" its not a valid output from the file Im reading it.
Input File:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.223.10|2000-H1|1/1/3|Undiscoverable|Unkwn
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
10.20.223.10|2000-H1|1/1/8|DeviceY_01225_|Kenmore
10.20.225.10|2000-H1|1/1/8|DeviceY_2250_|Kenmore
Desired Output File:
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
If user input is "lg"; then it should output the line without not ignoring it because the input file has "lg" in uppercase. (This part is already fixed on the script).
Desired Output:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
$ awk -F'|' -v n='225' '$4 ~ n' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or if you don't want a partial match (e.g. against 1225) then one way is:
$ awk -F'|' -v n='225' '$4 ~ ("(^|[^0-9])" n "([^0-9]|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or:
$ awk -F'|' -v n='225' '$4 ~ ("(^|_)" n "(_|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
There are other possibilities too. The right solution depends on the requirements you haven't told us about and will pass or fail when using input other then you've shown us yet.
awk
awk -F"|" -v var="[A-Za-z].225_" '$4 ~ var{print}'
sed
sed -n '/[A-Za-z].225./p'
grep
grep '[A-Za-z].225.'
Output
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
Using sed:
sed -n '/^\([^|]*\|\)\{3\}[^|]*225/p' < input
Explanation:
the -n option disables automatic output at the end of each sed cycle
the pattern matches arbitrary contents of the first three (\{3\}) columns of data via the \(parenthesized\) pattern [^|]*\| -- any number of non-delmiter characters followed by the column delimiter
it matches additional input at the beginning of the fourth column, but not spanning columns, with a similar subexpression: [^|]*
then comes the literal text you want to match
the p command after the pattern causes the line to be printed to sed's output in the event that it matches the pattern
There's almost certainly an awk solution too, but in Perl it's this:
$ perl -aF'\|' -ne '$F[3] =~ 225 and print' < input
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
-a: Autosplit the input into array #F
-F'\|: Set the autosplit delimiter to |
-n: Run code for each line in the input file
-e: Here's the code to run
$F[3]: The 4th element of the autosplit array #F
=~: Regex match
and print: Print the input line if the regex matches
Update: You can get the string you're interested in from a command line parameter by assigning it in a BEGIN block.
$ perl -aF'\|' -ne 'BEGIN { $x = shift } $F[3] =~ $x and print' 225 < input

how to modify a text file that every line has same number of columns?

I've got a text file which includes several lines. Every line has words which are separated with a comma. The number of words in lines are not the same. I would like with the help of the awk command to make every line have same number of column. For example, if the text file is as follows:
word1, text, help, test
number, begin
last, line, line
I would like the output be as the following which every line has same size in column with an extra null word:
word1, text, help, test
number, begin, null, null
last, line, line, null
I tried the following code:
awk '{print $0,Null}' file.txt
$ awk 'BEGIN {OFS=FS=", "}
NR==FNR {max=max<NF?NF:max; next}
{for(i=NF+1;i<=max;i++) $i="null"}1' file{,}
first scan to find the max number of columns and fill the missing entries in the second round. If the first line contains all the columns (header perhaps), you can change to
$ awk 'BEGIN {OFS=FS=", "}
NR==1 {max=NF}
{for(i=NF+1;i<=max;i++) $i="null"}1' file
file{,} is expanded by bash to file file, a neat trick not to repeat the filename (and eliminates possible typos).
Passing twice through the input file, using getline on first pass:
awk '
BEGIN {
OFS=FS=", "
while(getline < ARGV[1]) {
if (NF > max) {max = NF}
}
close(ARGV[1])
}
{ for(i=NF+1; i<=max; i++) $i="null" } 1
' file.txt
Alternatively, keeping it simple by running awk twice...
#!/bin/bash
infile="file.txt"
maxfields=$(awk 'BEGIN {FS=", "} {if (NF > max) {max = NF}} END{print max}' "$infile" )
awk -v max="$maxfields" 'BEGIN {OFS=FS=", "} {for(i=NF+1;i<=max;i++) $i="null"} 1' "$infile"
Use these Perl one-liners. The first one goes through the file and finds the max number of fields to use. The second one goes through the file and prints the input fields, padded at the end by the null strings:
export num_fields=$( perl -F'/,\s+/' -lane 'print scalar #F;' in_file | sort -nr | head -n1 )
perl -F'/,\s+/' -lane 'print join ", ", map { defined $F[$_] ? $F[$_] : "null" } 0..( $ENV{num_fields} - 1 );' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'/,\s+/' : Split into #F on comma with whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Using awk to delete multiple lines using argument passed on via function

My input.csv file is semicolon separated, with the first line being a header for attributes. The first column contains customer numbers. The function is being called through a script that I activate from the terminal.
I want to delete all lines containing the customer numbers that are entered as arguments for the script. EDIT: And then export the file as a different file, while keeping the original intact.
bash deleteCustomers.sh 1 3 5
Currently only the last argument is filtered from the csv file. I understand that this is happening because the output file gets overwritten each time the loop runs, restoring all previously deleted arguments.
How can I match all the lines to be deleted, and then delete them (or print everything BUT those lines), and then output it to one file containing ALL edits?
delete_customers () {
echo "These customers will be deleted: "$#""
for i in "$#";
do
awk -F ";" -v customerNR=$i -v input="$inputFile" '($1 != customerNR) NR > 1 { print }' "input.csv" > output.csv
done
}
delete_customers "$#"
Here's some sample input (first piece of code is the first line in the csv file). In the output CSV file I want the same formatting, with the lines for some customers completely deleted.
Klantnummer;Nationaliteit;Geslacht;Title;Voornaam;MiddleInitial;Achternaam;Adres;Stad;Provincie;Provincie-voluit;Postcode;Land;Land-voluit;email;gebruikersnaam;wachtwoord;Collectief ;label;ingangsdatum;pakket;aanvullende verzekering;status;saldo;geboortedatum
1;Dutch;female;Ms.;Josanne;S;van der Rijst;Bliek 189;Hellevoetsluis;ZH;Zuid-Holland;3225 XC;NL;Netherlands;JosannevanderRijst#dayrep.com;Sourawaspen;Lae0phaxee;Klant;CZ;11-7-2010;best;tand1;verleden;-137;30-12-1995
2;Dutch;female;Mrs.;Inci;K;du Bois;Castorweg 173;Hengelo;OV;Overijssel;7557 KL;NL;Netherlands;InciduBois#gustr.com;Hisfireeness;jee0zeiChoh;Klant;CZ;30-8-2015;goed ;geen;verleden;188;1-8-1960
3;Dutch;female;Mrs.;Lusanne;G;Hijlkema;Plutostraat 198;Den Haag;ZH;Zuid-Holland;2516 AL;NL;Netherlands;LusanneHijlkema#dayrep.com;Digum1969;eiTeThun6th;Klant;Achmea;12-2-2010;best;mix;huidig;-335;9-3-1973
4;Dutch;female;Dr.;Husna;M;Hoegee;Tiendweg 89;Ameide;ZH;Zuid-Holland;4233 VW;NL;Netherlands;HusnaHoegee#fleckens.hu;Hatimon;goe5OhS4t;Klant;VGZ;9-8-2015;goed ;gezin;huidig;144;12-8-1962
5;Dutch;male;Mr.;Sieds;D;Verspeek;Willem Albert Scholtenstraat 38;Groningen;GR;Groningen;9711 XA;NL;Netherlands;SiedsVerspeek#armyspy.com;Thade1947;Taexiet9zo;Intern;CZ;17-2-2004;beter;geen;verleden;-49;12-10-1961
6;Dutch;female;Ms.;Nazmiye;R;van Spronsen;Noorderbreedte 180;Amsterdam;NH;Noord-Holland;1034 PK;NL;Netherlands;NazmiyevanSpronsen#jourrapide.com;Whinsed;Oz9ailei;Intern;VGZ;17-6-2003;beter;mix;huidig;178;8-3-1974
7;Dutch;female;Ms.;Livia;X;Breukers;Everlaan 182;Veenendaal;UT;Utrecht;3903
Try this in loop..
awk -v variable=$var '$1 != variable' input.csv
awk - to make decision based on columns
-v - to use a variable into a awk command
variable - store the value for awk to process
$var - to search for a specific string in run-time
!= - to check if not exist
input.csv - your input file
It's awk's behavior, when you use -v it can will work with variable on run-time and provide an output that doesn't contain the value you passed. This way, you get all the values that are not matching to your variable. Hope this is helpful. :)
Thanks
This bash script should work:
!/bin/bash
FILTER="!/(^"$(echo "$#" | sed -e "s/ /\|^/g")")/ {print}"
awk "$FILTER" input.csv > output.csv
The idea is to build an awk relevant FILTER and then use it.
Assuming the call parameters are: 1 2 3, the filter will be: !/(^1|^2|^3)/ {print}
!: to invert matching
^: Beginning of the line
The input data are in the input.csv file and output result will be in the output.csv file.

Use sed or awk to replace line after match

I'm trying to create a little script that basically uses dig +short to find the IP of a website, and then pipe that to sed/awk/grep to replace a line. This is what the current file looks like:
#Server
123.455.1.456
246.523.56.235
So, basically, I want to search for the '#Server' line in a text file, and then replace the two lines underneath it with an IP address acquired from dig.
I understand some of the syntax of sed, but I'm really having trouble figuring out how to replace two lines underneath a match. Any help is much appreciated.
Based on the OP, it's not 100% clear exactly what needs to replaced where, but here's a a one-liner for the general case, using GNU sed and bash. Replace the two lines after "3" with standard input:
echo Hoot Gibson | sed -e '/3/{r /dev/stdin' -e ';p;N;N;d;}' <(seq 7)
Outputs:
1
2
3
Hoot Gibson
6
7
Note: sed's r command is opaquely documented (in Linux anyway). For more about r, see:
"5.9. The 'r' command isn't inserting the file into the text" in this sed FAQ.
here's how in awk:
newip=12.34.56.78
awk -v newip=$newip '{
if($1 == "#Server"){
l = NR;
print $0
}
else if(l>0 && NR == l+1){
print newip
}
else if(l==0 || NR != l+2){
print $0
}
}' file > file.tmp
mv -f file.tmp file
explanation:
pass $newip to awk
if the first field of the current line is #Server, let l = current line number.
else if the current line is one past #Server, print the new ip.
else if the current row is not two past #Server, print the line.
overwrite original file with modified version.

Resources