Being a new perl user, it's hard to understand everything with not much experience.
Bellow is a portion of a perl script which is supposed to build a spreadsheet from parsing a text file generated by a custom software . After execution, the xlsx is built by perl, the product group is populated in each row of the designated column, and the product is inserted in the next column. When I have too many products they end up being inserted in the next column, and the next column, and on and on, of the given group.
Ideally I would like to have the multiple products inserted in the next row of the same products column.
Currently:
==========================
Group | Product_1 | Prodcut_2 |
===========================
Ideally:
==================
Group | Product1 |
==================
(blank) | Product2 |
==================
(blank) | Product3 |
==================
How would this be accomplished?
Below is a snippet of the current code:
my $product_regex ='^Newly listed product: (.*)$';
if ($line =~ /$product_regex/) {
$change = $1;
if ($some_flag == 1) {
$some_flag = 0;
push(#{ $product_changes{$group} } , $change);
}
}
my $format = $workbook->add_format();
$format->set_text_wrap();
my $worksheet = $workbook->add_worksheet( 'Sheet1' );
# Writing 2 column headers
$worksheet->write( 0, 0, 'Group' );
$worksheet->write( 0, 1, 'Product' );
my $row;
$row = 1;
for my $key (sort keys %product_changes) {
# To avoid evaluation from Excel, $key must be placed in quotes
$worksheet->write($row, 0, $key, $format );
$worksheet->write($row, 1, $product_changes{"$key"},$format );
$row++;
Not sure if you are looking for help in parsing, or writing the output.
Here is a way to write the output, assuming you have it parsed into a hash of arrayrefs, group => [ product_change, product_change, ...]
my $row = 1;
for my $key (sort keys %product_changes) {
$worksheet->write($row, 0, $key, $format);
for my $pc (#{$product_changes{"$key"}}) {
$worksheet->write($row++, 1, $pc, $format);
}
}
Related
So i have an excel file like this
Document Number, Qty, Price
1111-01,1,3.00
1112-00A,2,4.00
And what I am doing is importing it into powershell, the going line by line.
If the quantity is ever greater than 1, I have to duplicate that line that many times whlie changing the quantity to 1 each time and updateing the document number so its unique on each line. I am then adding to an array so i can at the very end export as an excel file.
$report = Import-Excel "pathToFile.xlsx"
$report2 = #()
foreach($line in $report){
$report2+=$line.PSObject.Copy()
}
$template = #()
foreach($line in $report2)
...
some irrelevant code
...
if($line.Qty -gt 1){
$line2 = $line.PSObject.Copy()
$ogInvoice = $line2.'Document Number'.Split("-")[0]
$invoiceAfter = $line2.'Document Number'.Split("-")[1]
if($invoiceAfter -match "^*[A-Z]$"){
$letter = $invoiceAfter.Substring($invoiceAfter.Length-1,1)
}else{
$letter = ""
}
$qty = $line2.Qty
$line2.Qty = 1
$counterQty = 0
while($counterQty -lt $qty){
$invoiceLastTwoNumber = [int]('{0:d2}' -f[int] $invoiceAfter.Substring(0,2)) + $counter
$line2.'Document Number' = (-join($ogInvoice,"-",$invoiceLastTwoNumber.ToString(),$letter))
$counter = $counter + 1
$template+=$line2
$counterQty = $counterQty + 1
}
}
The problem is that after checking the progress, the first time i add the line, the document number is 1112-50A like it should be, then the next time I add the line into $template, the document number is 1112-51A but it updates the previously added line.
So i get
1111-01,1,3.00
1112-51A,1,4.00
1112-51A,1,4.00
Instead of what i want which is:
1111-01,1,3.00
1112-50A,1,4.00
1112-51A,1,4.00
NOTE: the extra coding like PSObject.Copy is other stuff i found online because apparently iterating over the $report is more like a pointer.
If I understand correctly, you're looking to repeat the current object as many times as .Qty only if .Qty is greater than 1 and in addition, update the property Value to 1.
In addition, seems like you're looking to increment the last digits of the property values of Document Number.
Leaving aside the extra code you are currently showing us and focusing only on the question being asked, this is how you could accomplish it, using $csv as an example of your source data.
$csv = #'
Document Number,Qty,Price
1111-01,1,3.00
1112-00A,2,4.00
1113-15A,4,5.00
'# | ConvertFrom-Csv
$re = [regex] '(\d+)(?=[A-Z]$)'
$output = foreach($line in $csv) {
if($line.Qty -gt 1) {
$loopCount = $line.Qty
$line.Qty = 1
for($i = 0; $i -lt $loopCount; $i++) {
$newLine = $line.PSObject.Copy()
$docNumber = $newLine.'Document Number'
$newLine.'Document Number' = $re.Replace($docNumber, {
param($s)
($i + $s.Groups[1].Value).ToString('D2')
})
$newLine
}
continue
}
$line
}
The expected output from the example $csv would be:
Document Number Qty Price
--------------- --- -----
1111-01 1 3.00
1112-00A 1 4.00
1112-01A 1 4.00
1113-15A 1 5.00
1113-16A 1 5.00
1113-17A 1 5.00
1113-18A 1 5.00
I have a 2 text file. file1 contains IDs:
0 ABCD
3 ABDF
4 ACGFR
6 ABCD
7 GFHTRSFS
And file2:
ID001 AB ACGFR DF FD GF TYFJ ANH
ID002 DFR AG ABDF HGT MNJ POI YUI
ID003 DGT JHY ABCD YTRE NHYT PPOOI IUYNB
ID004 GFHTRSFS MJU UHY IUJ POL KUH KOOL
If the second column of file 1 matches to any entry in file 2 then the first column of file 2 should be the answer for it.
The output should be like:
0 ID003
3 ID002
4 ID001
6 ID003
7 ID004
(2nd column of file1 (ABCD) found match to 3rd row of file 2 which has ID003. So, ID003 should be the answer to it).
I have tried examples form other posts too, but somehow they are not matching to this.
Any help will be grateful.
Kind Regards
When trying to match up records from one file with records in another, the idea is to use a hash ( also known as an associative array, set of key-value pairs, or dictionaries ) to store the relationship between the first column and the rest of the columns. In effect, create the following relationships:
file1: ABCD -> 0
ABDF -> 3
ACGFR -> 4
FGHTRSS -> 6
GFHTRSFS -> 7
file2: AB -> ID001
ACGFR -> ID001
DF -> ID001
...
ANH -> ID001
DFR -> ID002
AG -> ID002
...
KUH -> ID004
KOOL -> ID004
The actual matching up of records between the files amounts to determining
if both hashes, here file1 and file2 both have keys defined for each file1 record. Here we can see that ACGFR is a key for both, therefore we can match up 4 and ID001, and so on for the rest of the keys.
In perl, we can create a hash by assigning pairs of values:
my %hash = ( foo => 1, bar => 2 );
A hash can also be created using references:
my $hash_ref = { foo => 1, bar => 2 };
Keys can be found using the keys function, and individual values can be extracted:
my $val1 = $hash{ foo }; # regular hash
my $val2 = $hash_ref->{ foo }; # hash reference
Whether a particular key is a member of a hash can be tested using the exists function.
With that background out of the way, here is one way to do this in perl:
matchup_files.pl
#!/usr/bin/env perl
use warnings;
use strict;
my $usage = "usage: $0 file1 file2\n";
my ($file1, $file2) = #ARGV;
for my $file ($file1, $file2) {
die $usage unless defined $file && -f $file; # -f checks whether $file is an actual file
}
# Create mappings col2 -> col1
# col3 -> col1
# col4 -> col1
my $h1 = inverted_hash_file_on_first_column( $file1 );
my $h2 = hash_file_on_first_column( $file2 );
# Try to find matching pairs
my $matches = {};
for my $h1_key ( keys %$h1 ) {
my $h1_val = $h1->{$h1_key};
if ( exists $h2->{ $h1_val } ) {
# We have a match!
my $num = $h1_key;
my $id = $h2->{ $h1_val };
$matches->{ $num } = $id;
}
}
# Print them out in numerical order
for my $num ( sort { $a <=> $b } keys %$matches ) {
my $id = $matches->{$num};
print join(" ", $num, $id) . "\n";
}
exit 0; # Success
sub inverted_hash_file_on_first_column {
my ($file) = #_;
return _hash_file($file, 1);
}
sub hash_file_on_first_column {
my ($file) = #_;
return _hash_file($file, 0);
}
sub _hash_file {
my ($file, $inverted) = #_;
my $fhash = {};
open my $fh, "<", $file or die "Unable to open $file : $!";
while ( my $line = <$fh> ) {
my #fields = split /\s+/, $line; # Split line on whitespace
my $key = shift #fields; # First column
for my $field ( #fields ) {
if ( $inverted ) {
die "Duplicated field '$field'" if exists $fhash->{ $key };
$fhash->{ $key } = $field;
} else {
die "Duplicated field '$field'" if exists $fhash->{ $field };
$fhash->{ $field } = $key;
}
}
}
return $fhash;
}
output
matchup_files.pl input1 input2
0 ID003
3 ID002
4 ID001
6 ID003
7 ID004
Sorry I wasn't quite sure how to word the question.
This is a follow-on from a previous question here: Take Excel cell content and place it into a formatted .txt file
The Import-XLS function I'm using is from here:
https://gallery.technet.microsoft.com/office/17bcabe7-322a-43d3-9a27-f3f96618c74b
My current code looks like this:
. .\Import-XLS.ps1
$OutFile = ".\OutTest$(get-date -Format dd-MM).txt"
$Content = Import-XLS '.\DummyData.xlsx'
$Content | foreach-object{
$field1 = $("{0},{1},{2},{3},{4},{5},{6}" -f "Field1", $_."Value1", $_."Value2", $_."Value3", $_."Value4", $_."Value5", $_."Value6")
$field1.Split(",",[System.StringSplitOptions]::RemoveEmptyEntries) -join ","
$field2 = $("{0},{1}" -f "Field2", $_."Value1")
$field2.Split(",",[System.StringSplitOptions]::RemoveEmptyEntries) -join ","
} | Out-File $OutFile
My Dummydata is essentially this (I've inserted $null to point out the blank values)
Entries Value1 Value2 Value3 Value4 Value5 Value6 Value7 Value8
Entry 1 1 2 $null 4 5 6 7 8
Entry 2 $null B A B A B A B
So I've managed to have the code 'ignore/skip' a null value within a set.
My output looks like this
Field1,1,2,4,5,6
Field2,1
Field1,B,A,B,A,B
Field2
What I would like help with now is how to either remove "Field2" because it has no value, or comment it out using ;.
So my output would look like
Field1,1,2,4,5,6
Field2,1
Field1,B,A,B,A,B
or
Field1,1,2,4,5,6
Field2,1
Field1,B,A,B,A,B
;Field2
Essentially, if a row has no data in any of it's fields that are being written for that line, it should be ignored.
Thanks SO MUCH for your help.
EDIT:
I've discovered I need to remove the comma "," between the {0},{1} and use a space instead. So I'm using
$field2 = $("{0} {1}" -f "Field 2", $_."Value1")
$field2 = $field2.Split(" ",[System.StringSplitOptions]::RemoveEmptyEntries)
if ( $field2.Count -le 1) { ";$field2" } else { $field2 -join "`t`t" }
Which works for 'most' of my fields.
However there are 'some' Fields and Values that have spaces in them.
Additionally there some values like "TEST TEXT"
So now I'm getting
Field1 3,B,A,B,A,B
Field 2 TEST TEXT
Instead of (quotes for clarity)
"Field1" 3,B,A,B,A,B
"Field 2" "TEST TEXT"
I'm happy to just use some kind of exception only for these few fields.
I've tried a few other things, but I end up breaking the IF statement, and it ;comments out fields with values, or doesn't ;comment out fields with no values.
Next code snippet could help:
### … … … ###
$Content | foreach-object{
$field1 = $("{0},{1},{2},{3},{4},{5},{6}" -f "Field1", $_."Value1", $_."Value2", $_."Value3", $_."Value4", $_."Value5", $_."Value6")
$auxarr = $field1.Split(",",[System.StringSplitOptions]::RemoveEmptyEntries)
if ( $auxarr.Count -le 1) { ";$auxarr" } else { $auxarr -join "," }
$field2 = $("{0},{1}" -f "Field2", $_."Value1")
$auxarr = $field2.Split(",",[System.StringSplitOptions]::RemoveEmptyEntries)
if ( $auxarr.Count -le 1) { ";$auxarr" } else { $auxarr -join "," }
} | Out-File $OutFile
Edit to answer additional (extending) subquestion I need to remove the comma "," between the {0},{1} and use a space instead:
$field2 = $("{0},{1}" -f "Field 2", $_."Value1")
### keep comma ↑ ↓ or any other character not contained in data
$field2 = $field2.Split(",",[System.StringSplitOptions]::RemoveEmptyEntries)
if ( $field2.Count -le 1) { ";$field2" } else { $field2 -join "`t`t" }
If your data contain a significant comma then use Split(String(), StringSplitOptions) form of String.Split Method e.g. as follows:
$burstr = [string[]]'#-#-#'
$field2 = $("{0}$burstr{1}" -f "Field 2", $_."Value1")
$field2 = $field2.Split($burstr,[System.StringSplitOptions]::RemoveEmptyEntries)
if ( $field2.Count -le 1) { ";$field2" } else { $field2 -join "`t`t" }
Trying to iterate through two files. Everything works although once I get to the negation of my if statement it messes everything up. The only thing that will print is the else statement
Please disregard any unused variables, when defined. Will clean it up after.
#!/usr/bin/perl
#
# Packages and modules
#
use strict;
use warnings;
use version; our $VERSION = qv('5.16.0'); # This is the version of Perl to be used
use Text::CSV 1.32; # We will be using the CSV module (version 1.32 or higher)
# to parse each line
#
# readFile.pl
# Authors: schow04#mail.uoguelph + anilam#mail.uoguelph.ca
# Project: Lab Assignment 1 Script (Iteration 0)
# Date of Last Update: Monday, November 16, 2015.
#
# Functional Summary
# readFile.pl takes in a CSV (comma separated version) file
# and prints out the fields.
# There are three fields:
# 1. name
# 2. gender (F or M)
# 3. number of people with this name
#
# This code will also count the number of female and male
# names in this file and print this out at the end.
#
# The file represents the names of people in the population
# for a particular year of birth in the United States of America.
# Officially it is the "National Data on the relative frequency
# of given names in the population of U.S. births where the individual
# has a Social Security Number".
#
# Commandline Parameters: 1
# $ARGV[0] = name of the input file containing the names
#
# References
# Name files from http://www.ssa.gov/OACT/babynames/limits.html
#
#
# Variables to be used
#
my $EMPTY = q{};
my $SPACE = q{ };
my $COMMA = q{,};
my $femalecount = 0;
my $malecount = 0;
my $lines = 0;
my $filename = $EMPTY;
my $filename2 = $EMPTY;
my #records;
my #records2;
my $record_count = -1;
my $top_number = 0;
my $male_total = 0;
my $male_count = 0;
my #first_name;
my #gender;
my #first_name2;
my #number;
my $count = 0;
my $count2 = 0;
my $csv = Text::CSV->new({ sep_char => $COMMA });
#
# Check that you have the right number of parameters
#
if ($#ARGV != 1) {
print "Usage: readTopNames.pl <names file> <course names file>\n" or
die "Print failure\n";
exit;
}
$filename = $ARGV[0];
$filename2 = $ARGV[1];
#
# Open the input file and load the contents into records array
#
open my $names_fh, '<', $filename
or die "Unable to open names file: $filename\n";
#records = <$names_fh>;
close $names_fh or
die "Unable to close: $ARGV[0]\n"; # Close the input file
open my $names_fh2, '<', $filename2
or die "Unable to open names file: $filename2\n";
#records2 = <$names_fh2>;
close $names_fh2 or
die "Unable to close: $ARGV[1]\n"; # Close the input file
#
# Parse each line and store the information in arrays
# representing each field
#
# Extract each field from each name record as delimited by a comma
#
foreach my $class_record (#records)
{
chomp $class_record;
$record_count = 0;
$count = 0;
foreach my $name_record ( #records2 )
{
if ($csv->parse($name_record))
{
my #master_fields = $csv->fields();
$record_count++;
$first_name[$record_count] = $master_fields[0];
$gender[$record_count] = $master_fields[1];
$number[$record_count] = $master_fields[2];
if($class_record eq $first_name[$record_count])
{
if($gender[$record_count] eq 'F')
{
print("$first_name[$record_count] ($record_count)\n");
}
if($gender[$record_count] eq 'M')
{
my $offset = $count - 2224;
print("$first_name[$record_count] ($offset)\n");
}
}
} else {
warn "Line/record could not be parsed: $records[$record_count]\n";
}
$count++;
}
}
#
# End of Script
#
Adam (187)
Alan (431)
Alejandro (1166)
Alex (120)
Alicia (887)
Ambrose (305)
Caleb (794)
Sample output from running the following code.
This is correct: Although if a name is not found in the second file it is supposed to say:
Adam (187)
Alan (431)
Name (0)
Alejandro (1166)
Alex (120)
Alicia (887)
Ambrose (305)
Caleb (794)
That is what the else is supposed to find. Whether the if statement returned nothing.
else {
print("$first_name[$record_count] (0)\n");
}
The output that i get when i add that else, to account for the negation is literally:
Elzie (0)
Emer (0)
Enna (0)
Enriqueta (0)
Eola (0)
Eppie (0)
Ercell (0)
Estellar (0)
It's really tough to help you properly without better information, so I've written this, which looks for each name from the names file in the master data file and displays the associated values
There's never a reason to write a long list of declarations like that at the top of a program, and you've written way too much code before you started debugging. You should write no more than three or four lines of code before you test that it works and carry on adding to it. You've ended up with 140 lines — mostly of them comments — that don't do what you want, and you're now lost as to what you should fix first
I haven't been able to fathom what all your different counters are for, or why you're subtracting a magic 2224 for male records, so I've just printed the data directly from the master file
I hope you'll agree that it's far clearer with the variables declared when they're required instead of making a huge list at the top of your program. I've dropped the arrays #first_name, #gender and #number because you were only ever using the latest value so they had no purpose
#!/usr/bin/perl
use strict;
use warnings;
use v5.16.0;
use autodie;
use Text::CSV;
STDOUT->autoflush;
if ( #ARGV != 2 ) {
die "Usage: readTopNames.pl <names file> <master names file>\n";
}
my ( $names_file, $master_file ) = #ARGV;
my #names = do {
open my $fh, '<', $names_file;
<$fh>;
};
chomp #names;
my #master_data = do {
open my $fh, '<', $master_file;
<$fh>;
};
chomp #master_data;
my $csv = Text::CSV->new;
for my $i ( 0 .. $#names ) {
my $target_name = $names[$i];
my $found;
for my $j ( 0 .. $#master_data ) {
my $master_rec = $master_data[$j];
my $status = $csv->parse($master_rec);
unless ( $status ) {
warn qq{Line/record "$master_rec" could not be parsed\n};
next;
}
my ( $name, $gender, $count ) = $csv->fields;
if ( $name eq $target_name ) {
$found = 1;
printf "%s %s (%d)\n", $name, $gender, $count;
}
}
unless ( $found ) {
printf "%s (%d)\n", $target_name, 0;
}
}
output
Adam F (7)
Adam M (5293)
Alan F (9)
Alan M (2490)
Name (0)
Alejandro F (6)
Alejandro M (2593)
Alex F (157)
Alex M (3159)
Alicia F (967)
Ambrose M (87)
Caleb F (14)
Caleb M (9143)
4 changes proposed:
foreach my $class_record (#records)
{
chomp $class_record;
$record_count = 0;
$count = 0;
# add found - modification A
my $found = 0;
foreach my $name_record ( #records2 )
{
# should not be here
#$record_count++;
if ($csv->parse($name_record))
{
my #master_fields = $csv->fields();
$record_count++;
$first_name[$record_count] = $master_fields[0];
$gender[$record_count] = $master_fields[1];
$number[$record_count] = $master_fields[2];
if($class_record eq $first_name[$record_count])
{
if($gender[$record_count] eq 'F')
{
print("$first_name[$record_count] ($record_count)\n");
}
if($gender[$record_count] eq 'M')
{
my $offset = $count - 2224;
print("$first_name[$record_count] ($offset)\n");
}
# modification B - set found =1
$found = 1;
#last; # no need to keep looping
next; # find next one if try to find more than 1
}
} else {
warn "Line/record could not be parsed: $records[$record_count]\n";
}
$count++;
}
# modification C -
if($found){
}else{
print "${class_record}(0)\n";
}
}
Using any combination of Linux tools (without going into any full featured programming language) how can I sort this list
A,C 1
C,B 2
B,A 3
into
A,B 3
A,C 1
B,C 2
Not applying for any beauty contest, this seems to come close:
#!/bin/bash
while read one two; do
one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e '
1 {h; d}
$! {H; d}
H; g; s/\n/,/g;
'`
echo $one $two
done | sort
Change the internal field separator, then compare the the first two letters with ">":
(
IFS=" ,";
while read a b n; do
if [ "$a" \> "$b" ]; then
echo "$b,$a $n";
else
echo "$a,$b $n";
fi;
done;
) <<EOF | sort
A,C 1
C,B 2
B,A 3
EOF
In case somebody is interested. I was not realy satisfied with any suggestions. Probably because I hoped for view lines solution and such doesn't exist as far as I know.
Anyway I did wrote an utility, called ljoin (for left join like in databases) which does exactly what I was asking for (of course :D)
#!/usr/bin/perl
=head1 NAME
ljoin.pl - Utility to left join files by specified key column(s)
=head1 SYNOPSIS
ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>
To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.
=head1 DESCRIPTION
This utility merges multiple file into one using specified column as a key
=head2 OPTIONS
=item --field-separator=<separator>, -fs <separator>
Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.
=item --no-sort-fields, -no-sf
Do not sort columns when creating a key for merging files
=item --complex-key-separator=<separator>, -ks <separator>
Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.
=item --no-sort-complex-keys, -no-sk
Do not sort complex column values when creating a key for merging files
=item --include-primary-field, -i
Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.
=item --primary-field-index=<index>, -f <index>
Specifies index of the column which should be used for matching lines. You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"
=item --help, -?
Get help and documentation
=cut
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my #primaryFieldIndexes;
GetOptions(
"field-separator|fs=s" => \$fieldSeparator,
"sort-fields|sf!" => \$sortFields,
"complex-key-separator|ks=s" => \$complexKeySeparator,
"sort-complex-keys|sk!" => \$sortComplexKeys,
"contains-titles|t!" => \$containsTitles,
"include-primary-field|i!" => \$includePrimaryField,
"primary-field-index|f=i#" => \#primaryFieldIndexes,
"help|?!" => sub { pod2usage(0) }
) or pod2usage(2);
pod2usage(0) if $#ARGV < 1;
push #primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;
my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
$primaryFieldIndexesHash{$i} = 1;
}
print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = #primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";
my $fieldsCount = 0;
my %keys_hash = ();
my %files = ();
my %titles = ();
# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1))
{
# Find files with specified pattern
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
open INPUT_FILE, $inputPath or die $!;
my %lines;
my $lineNumber = -1;
while (my $line = <INPUT_FILE>)
{
next if $containsTitles && $lineNumber == 0;
# Don't use chomp line. It doesn't handle unix input files on windows and vice versa
$line =~ s/[\r\n]+$//g;
# Skip lines that don't have columns
next if $line !~ m/($fieldSeparator)/;
# Split fields and count them (store maximum number of columns in files for later use)
my #fields = split($fieldSeparator, $line);
$fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;
# Sort complex key
my #multipleKey;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
my #complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
#complexKey = sort(#complexKey) if $sortFields;
push #multipleKey, join($complexKeySeparator, #complexKey)
}
# sort multiple keys and create key string
#multipleKey = sort(#multipleKey) if $sortFields;
my $fullKey = join $fieldSeparator, #multipleKey;
$lines{$fullKey} = \#fields;
$keys_hash{$fullKey} = 1;
}
close INPUT_FILE;
$files{$inputPath} = \%lines;
}
}
# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my #keys = sort keys(%keys_hash);
# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
print OUTPUT_FILE $fieldSeparator;
}
# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
print OUTPUT_FILE $inputPath;
for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
{
print OUTPUT_FILE $fieldSeparator;
}
}
}
# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key ( #keys )
{
print OUTPUT_FILE $key;
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
my $lines = $files{$inputPath};
for(my $i = 0; $i < $fieldsCount; $i++)
{
next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
print OUTPUT_FILE $fieldSeparator;
print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
}
}
}
print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;