reading a tuple from a file with awk - linux

Hi I need an script to read number of eth interrupts from the /proc/interrupts file with awk and find the total number of interrupts per CPU core.An then I want to use them in bash.The content of the file is;
CPU0 CPU1 CPU2 CPU3
47: 33568 45958 46028 49191 PCI-MSI-edge eth0-rx-0
48: 0 0 0 0 PCI-MSI-edge eth0-tx-0
49: 1 0 1 0 PCI-MSI-edge eth0
50: 28217 42237 65203 39086 PCI-MSI-edge eth1-rx-0
51: 0 0 0 0 PCI-MSI-edge eth1-tx-0
52: 0 1 0 1 PCI-MSI-edge eth1
59: 114991 338765 77952 134850 PCI-MSI-edge eth4-rx-0
60: 429029 315813 710091 26714 PCI-MSI-edge eth4-tx-0
61: 5 2 1 5 PCI-MSI-edge eth4
62: 1647083 208840 1164288 933967 PCI-MSI-edge eth5-rx-0
63: 673787 1542662 195326 1329903 PCI-MSI-edge eth5-tx-0
64: 5 6 7 4 PCI-MSI-edge eth5
I am reading this file with awk in this code:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
print core_count
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%d\n", totals[i])
}
' $FILE)
core_count=$(echo $output | cut -d' ' -f1)
output=$(echo $output | sed 's/^[0-9]*//')
totals=(${output// / })
In this approach, I handle the total core count and then the total interrupts for per CORE in order to sort them in my script.But I can only handle the number in totals array lke this,
totals[0]=22222
totals[1]=33333
But I need to handle them as tuple with the name of the CPU cores.
totals[0]=(cPU1,2222)
totals[1]=(CPU',3333)
I think I must assign the names to an array and read themn to bash as tuples in my SED. How can I achieve this?

First of all, there's no such thing as a 'tuple' in bash. And arrays are completely flat. It means that you either have a 'scalar' variable, or one level array of scalars.
There is a number approaches to the task you're facing. Either:
If you're using a new enough bash (4.2 AFAIR), you can use an associative array (hash, map or however you call it). Then, the CPU names will be keys and the numbers will be values;
Create a plain array (perl-like hash) where odd indexes will have the keys (CPU names) and even ones — the values.
Create two separate arrays, one with the CPU names and other with the values,
Create just a single array, with CPU names separated from values by some symbol (i.e. = or :).
Let's cover approach 2 first:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
for (i = 1; i <= core_count; i++)
names[i-1] = $i
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%s %d\n", names[i], totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
totals=(${output})
Note a few things I've changed to make the script simpler:
awk now outputs `cpu-name number', one per line, separated by a single space;
the core count is not output by awk (to avoid preprocessing output) but instead deduced from number of lines in the output,
the totals array is created by flattening the output — both spaces and newlines will be treated as whitespace and use to separate the values.
The resulting array looks like:
totals=( CPU0 12345 CPU1 23456 ) # ...
To iterate over it, you could use something like (the simple way):
set -- "${totals[#}}"
while [[ $# -gt 0 ]]; do
cpuname=${1}
value=${2}
# ...
shift;shift
done
Now let's modify it for approach 1:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
for (i = 1; i <= core_count; i++)
names[i-1] = $i
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("[%s]=%d\n", names[i], totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
declare -A totals
eval totals=( ${output} )
Note that:
the awk output format has been changed to suit the associative array semantics,
totals is declared as an associative array (declare -A),
sadly, eval must be used to let bash directly handle the output.
The resulting array looks like:
declare -A totals=( [CPU0]=12345 [CPU1]=23456 )
And now you can use:
echo ${totals[CPU0]}
for cpu in "${!totals[#]}"; do
echo "For CPU ${cpu}: ${totals[${cpu}]}"
done
The third approach can be done a number of different ways. Assuming you can allow two reads of /proc/interrupts, you could even do:
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%d\n", totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
names=( $(cat /proc/interrupts | head -n 1) )
totals=( ${output} )
So, now the awk is once again only outputting the counts, and names are obtained by bash from first line of /proc/interrupts directly. Alternatively, you could create split arrays from a single array obtained in approach (2), or parsing the awk output some other way.
The result would be in two arrays:
names=( CPU0 CPU1 )
totals=( 12345 23456 )
And output:
for (( i = 0; i < core_count; i++ )); do
echo "${names[$i]} -> ${totals[$i]}"
done
And the last approach:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
for (i = 1; i <= core_count; i++)
names[i-1] = $i
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%s=%d\n", names[i], totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
totals=( ${output} )
Now the (regular) array looks like:
totals=( CPU0=12345 CPU1=23456 )
And you can parse it like:
for x in "${totals[#]}"; do
name=${x%=*}
value=${x#*=}
echo "${name} -> ${value}"
done
(note now that splitting CPU name and value occurs in the loop).

Related

What to do in order to create a continuous .txt files without replacing the already existing .txt files using bash

I am trying to write a bash script to create multiple .txt files.
With the below code I created the files, but when I run the script again I get the same output instead of having more files with increasing number.
#! /bin/bash
for z in $(seq -w 1 10);
do
[[ ! -f "${z}_name.txt" ]] && {touch "${z}_name.txt";}
done
Based in part on work by Raman Sailopal in a now-deleted answer (and on comments I made about that answer, as well as comments I made about the question), you could use:
shopt -s nullglob
touch $(seq -f '%.0f_name.txt' \
$(printf '%s\n' [0-9]*_name.txt |
awk 'BEGIN { max = 0 }
{ val = $0 + 0; if (val > max) max = val; }
END { print max + 1, max + 10 }'
)
)
The shopt -s nullglob command means that if there are no names that match the glob expression [0-9]*_name.txt, nothing will be generated in the arguments to the printf command.
The touch command is given a list of file names. The seq command formats a range of numbers using zero decimal places (so it formats them as integers) plus the rest of the name (_name.txt). The range is given by the output of printf … | awk …. The printf() command lists file names that start with a digit and end with _name.txt one per line. The awk command keeps a track of the current maximum number; it coerces the name into a number (awk ignores the material after the last digit) and checks whether the number is larger than before. At the end, it prints two values, the largest value plus 1 and the largest value plus 10 (defaulting to 1 and 10 if there were no files). Adding the -w option to seq is irrelevant when you specify -f and a format; the file names won't be generated with leading zeros. There are ways to deal with this if they're crucial — probably simplest is to drop the -f option to seq and add the -w option, and output the output through sed 's/$/_name.txt/'.
You can squish the awk script onto a single line; you can squish the whole command onto a single line. However, it is arguably easier to see the organization of the command when they are spread over multiple lines.
Note that (apart from a possible TOCTOU — Time of Check, Time of Use — issue), there is no need to check whether the files exist. They don't; they'd have been listed by the glob [0-9]*_name.txt if they did, and the number would have been accounted for. If you want to ensure no damage to existing files, you'd need to use set -C or set -o noclobber and then create the files one by one using shell I/O redirection.
[…time passes…]
Actually, you can have awk do the file name generation instead of using seq at all:
touch $(printf '%s\n' [0-9]*_name.txt |
awk 'BEGIN { max = 0 }
{ val = $0 + 0; if (val > max) max = val; }
END { for (i = max + 1; i <= max + 10; i++)
printf "%d_name.txt\n", i
}'
)
And, if you try a bit harder, you can get rid of the printf command too:
touch $(awk 'BEGIN { max = 0
for (i = 1; i <= ARGC; i++)
{
val = ARGV[i] + 0;
if (val > max)
max = val
}
for (i = max + 1; i <= max + 10; i++)
printf "%d_name.txt\n", i
}' [0-9]*_name.txt
)
Don't forget the shopt -s nullglob — that's still needed for maximum resiliency.
You might even choose to get rid of the separate touch command by having awk write to the files:
awk 'BEGIN { max = 0
for (i = 0; i < ARGC; i++)
{
val = ARGV[i] + 0;
if (val > max)
max = val
}
for (i = max + 1; i <= max + 10; i++)
{
name = sprintf("%d_name.txt", i)
printf "" > name
}
exit
}' [0-9]*_name.txt
Note the use of exit. Note that the POSIX specification for awk says that ARGC is the number of arguments in ARGV and that the elements in ARGV are indexed from 0 to ARGC - 1 — as in C programs.
There are few shell scripts that cannot be improved. The first version shown runs 4 commands; the last runs just one. That difference could be quite significant if there were many files to be processed.
Beware: eventually, the argument list generated by the glob will get too big; then you have to do more work. You might be obliged to filter the output from ls (with its attendant risks and dangers) and feed the output (the list of file names) into the awk script and process the lines of input once more. While your lists remain a few thousand files long, it probably won't be a problem.

Split string to fixed length chunks and write in separate line in Raku

I have a file test.txt:
Stringsplittingskills
I want to read this file and write to another file out.txt with three characters in each line like
Str
ing
spl
itt
ing
ski
lls
What I did
my $string = "test.txt".IO.slurp;
my $start = 0;
my $elements = $string.chars;
# open file in writing mode
my $file_handle = "out.txt".IO.open: :w;
while $start < $elements {
my $line = $string.substr($start,3);
if $line.chars == 3 {
$file_handle.print("$line\n")
} elsif $line.chars < 3 {
$file_handle.print("$line")
}
$start = $start + 3;
}
# close file handle
$file_handle.close
This runs fine when the length of string is not multiple of 3. When the string length is multiple of 3, it inserts extra newline at the end of output file. How can I avoid inserting new line at the end when the string length is multiple of 3?
I tried another shorter approach,
my $string = "test.txt".IO.slurp;
my $file_handle = "out.txt".IO.open: :w;
for $string.comb(3) -> $line {
$file_handle.print("$line\n")
}
Still it suffers from same issue.
I looked for here, here but still unable to solve it.
spurt "out.txt", "test.txt".IO.comb(3).join("\n")
Another approach using substr-rw.
subset PositiveInt of Int where * > 0;
sub break( Str $str is copy, PositiveInt $length )
{
my $i = $length;
while $i < $str.chars
{
$str.substr-rw( $i, 0 ) = "\n";
$i += $length + 1;
}
$str;
}
say break("12345678", 3);
Output
123
456
78
The correct answer is of course to use .comb and .join.
That said, this is how you might fix your code.
You could change the if line to check if it is at the end, and use else.
if $start+3 < $elements {
$file_handle.print("$line\n")
} else {
$file_handle.print($line)
}
Personally I would change it so that only the addition of \n is conditional.
while $start < $elements {
my $line = $string.substr($start,3);
$file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) ));
$start += 3;
}
This works because < returns either True or False.
Since True == 1 and False == 0, the x operator repeats the \n at most once.
'abc' x 1; # 'abc'
'abc' x True; # 'abc'
'abc' x 0; # ''
'abc' x False; # ''
If you were very cautious you could use x+?.
(Which is actually 3 separate operators.)
'abc' x 3; # 'abcabcabc'
'abc' x+? 3; # 'abc'
infix:« x »( 'abc', prefix:« + »( prefix:« ? »( 3 ) ) );
I would probably use loop if I were going to structure it like this.
loop ( my $start = 0; $start < $elements ; $start += 3 ) {
my $line = $string.substr($start,3);
$file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) ));
}
Or instead of adding a newline to the end of each line, you could add it to the beginning of every line except the first.
while $start < $elements {
my $line = $string.substr($start,3);
my $nl = "\n";
# clear $nl the first time through
once $nl = "";
$file_handle.print($nl ~ $line);
$start = $start + 3;
}
At the command line prompt, three one-liner solutions below.
Using comb and batch (retains incomplete set of 3 letters at end):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.batch(3);'
Str
ing
spl
itt
ing
ski
lls
X
Simplifying (no batch, only comb):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.put for .comb(3);'
Str
ing
spl
itt
ing
ski
lls
X
Alternatively, using comb and rotor (discards incomplete set of 3 letters at end):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.rotor(3);'
Str
ing
spl
itt
ing
ski
lls

How can I reverse print the characters of a string in each cell using AWK?

Beth 45 4.00 0 0 .072
Danny 33 3.75 ^0 0 .089
The above is the file I want to operate.
I want to write an AWK script that can reverse print the characters of a string in every cell.
Here is the code:
BEGIN { OFS = "\t\t" }
function reverse_print(str)
{
s = "";
N = length(str);
for (i = 1; i <= N; i++)
a[i] = substr(str, i, 1);
for (i = N; i >= 1; i--)
s = s a[i];
return s;
}
{
for (i = 1; i <= NF; i++)
$i = reverse_print($i) ;
print;
}
END {}
However, it does not work. The program somehow becomes dead.
I have found if I don't use the loop and handle each field one by one like the following,
BEGIN { OFS = "\t\t" }
function reverse_print(str)
{
s = "";
N = length(str);
for (i = 1; i <= N; i++)
a[i] = substr(str, i, 1);
for (i = N; i >= 1; i--)
s = s a[i];
return s;
}
{
$1 = reverse_print($1) ;
$2 = reverse_print($2) ;
$3 = reverse_print($3) ;
$4 = reverse_print($4) ;
$5 = reverse_print($5) ;
$6 = reverse_print($6) ;
print;
}
END {}
it can work well.
Here is my desired output:
hteB 54 00.4 0 0 270.
ynnaD 33 57.3 0^ 0 980.
I have thought hard but still cannot figure out where I did wrong using the loop.
Who can tell me why ?
You're using the same variable i inside and outside of the function. Use a different variable in either location or change the function definition to reverse_print(str, i) to make the i used within the function local to that function rather than the same global variable being used in the calling code.
You should also make s and N function local:
function reverse_print(str, i, s, N)
but in fact the code should be written as:
$ cat tst.awk
BEGIN { OFS = "\t\t" }
function reverse_print(fwd, rev, i, n)
{
n = length(fwd)
for (i = n; i >= 1; i--)
rev = rev substr(fwd, i, 1);
return rev
}
{
for (i = 1; i <= NF; i++)
$i = reverse_print($i)
print
}
$ awk -f tst.awk file
hteB 54 00.4 0 0 270.
ynnaD 33 57.3 0^ 0 980.
Could you please try following.(This program is tested on GNU awk only and as per Ed sir's comment too this is undefined behavior for POSIX awk)
awk '
BEGIN{
OFS="\t\t"
}
{
for(i=1;i<=NF;i++){
num=split($i,array,"")
for(j=num;j>0;j--){
val=(j<num?val:"") array[j]
}
printf "%s%s",val,(i<NF?OFS:ORS)}
val=""
}' Input_file
There is a rev command in Linux: rev - reverse lines characterwise.
You can reverse a string by calling rev with awk builtin function system like:
#reverse-fields.awk
{
for (i = 1; i <= NF; i = i + 1) {
# command line
cmd = "echo '" $i "' | rev"
# read output into revfield
cmd | getline revfield
# remove leading new line
a = gensub(/^[\n\r]+/, "", "1", revfield)
# print reversed field
printf("%s", a)
# print tab
if (i != NF) printf("\t")
# close command
close(cmd)
}
# print new line
print ""
}
$ awk -f reverse-fields.awk emp.data
0 00.4 hteB
0 57.3 naD
01 00.4 yhtaK
02 00.5 kraM
22 05.5 yraM
81 52.4 eisuS

how to iterate over two sets of data?

I'm trying to create my own program to do a recursive listing: each line corresponds to the full path of a single file. The tricky part I'm working on now is: I don't want bind mounts to trick my program into listing files twice.
So I already have a program that produces the right output except that if /foo is bind mounted to /bar then my program incorrectly lists
/foo/file
/bar/file
I need the program to list just what's below (EDIT: even if it was asked to list the contents of /foo)
/bar/file
One approach I thought of is to mount | grep bind | awk '{print $1 " " $3}' and then iterate over this to sed every line of the output, then sort -u.
My question is how do I iterate over the original output (a bunch of lines) and the output from mount (another bunch of lines)? (or is there a better approach) This needs to be POSIX (EDIT: and work with /bin/sh)
Place the 'mount | grep bind' command into the AWK within a BEGIN block and store the data.
Something like:
PROG | awk 'BEGIN{
# Define the data you want to store
# Assign to global arrays
command = "mount | grep bind";
while ((command | getline) > 0) {
count++;
mount[count] = $1;
mountPt[count] = $3
}
}
# Assuming input is line-by-line and that mountPt is the value
# that is undesired
{
replaceLine=0
for (i=1; i<=count; i++) {
idx = index($1, mountPt[i]);
if (idx == 1) {
replaceLine = 1;
break;
}
}
if (replaceLine == 1) {
sub(mountPt[i], mount[i], $1);
}
if (printed[$1] != 1) {
print $1;
}
printed[$1] = 1;
} '
Where I assume your current program, PROG, outputs to stdout.
find YourPath -print > YourFiles.txt
mount > Bind.txt
awk 'FNR == NR && $0 ~ /bind/ {
Bind[ $1] = $3
if( ( ThisLevel = split( $3, Unused, "/") - 1 ) > Level) Level = ThisLevel
}
FNR != NR && $0 !~ /^ *$/ {
RealName = $0
for( ThisLevel = Level; ThisLevel > 0; ThisLevel--){
match( $0, "(/[^/]*){" ThisLevel "}" )
UnBind = Bind[ substr( $0, 1, RLENGTH) ]
if( UnBind !~ /^$/) {
RealName = UnBind substr( $0, RLENGTH + 1)
ThisLevel = 0
}
}
if( ! File[ RealName]++) print RealName
}
' Bind.txt YourFiles.txt
search based on a exact path/bind comparaison from a bind array loaded first
Bind.txt and YourFiles.txt could be a direct redirection to be "1" instruction and no temporary files
have to be adapted (first part of awk) if path in bind are using space character (assume not here)
file path are changed live when reading, compare to an existing bind relation
print file if not yet known

Sorting files with unordered multi-part key

Using any combination of Linux tools (without going into any full featured programming language) how can I sort this list
A,C 1
C,B 2
B,A 3
into
A,B 3
A,C 1
B,C 2
Not applying for any beauty contest, this seems to come close:
#!/bin/bash
while read one two; do
one=`echo $one | sed -e 's/,/\n/g' | sort | sed -e '
1 {h; d}
$! {H; d}
H; g; s/\n/,/g;
'`
echo $one $two
done | sort
Change the internal field separator, then compare the the first two letters with ">":
(
IFS=" ,";
while read a b n; do
if [ "$a" \> "$b" ]; then
echo "$b,$a $n";
else
echo "$a,$b $n";
fi;
done;
) <<EOF | sort
A,C 1
C,B 2
B,A 3
EOF
In case somebody is interested. I was not realy satisfied with any suggestions. Probably because I hoped for view lines solution and such doesn't exist as far as I know.
Anyway I did wrote an utility, called ljoin (for left join like in databases) which does exactly what I was asking for (of course :D)
#!/usr/bin/perl
=head1 NAME
ljoin.pl - Utility to left join files by specified key column(s)
=head1 SYNOPSIS
ljoin.pl [OPTIONS] <INFILE1>..<INFILEN> <OUTFILE>
To successfully join rows one must suply at least one input file and exactly one output file. Input files can be real file names or a patern, like [ABC].txt or *.in etc.
=head1 DESCRIPTION
This utility merges multiple file into one using specified column as a key
=head2 OPTIONS
=item --field-separator=<separator>, -fs <separator>
Specifies what string should be used to separate columns in plain file. Default value for this option is tab symbol.
=item --no-sort-fields, -no-sf
Do not sort columns when creating a key for merging files
=item --complex-key-separator=<separator>, -ks <separator>
Specifies what string should be used to separate multiple values in multikey column. For example "A B" in one file can be presented as "B A" meaning that this application should somehow understand that this is the same key. Default value for this option is space symbol.
=item --no-sort-complex-keys, -no-sk
Do not sort complex column values when creating a key for merging files
=item --include-primary-field, -i
Specifies whether key which is used to find matching lines in multiple files should be included in the output file. First column in output file will be the key in any case, but in case of complex column the value of first column will be sorted. Default value for this option is false.
=item --primary-field-index=<index>, -f <index>
Specifies index of the column which should be used for matching lines. You can use multiple instances of this option to specify a multi-column key made of more than one column like this "-f 0 -f 1"
=item --help, -?
Get help and documentation
=cut
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
my $fieldSeparator = "\t";
my $complexKeySeparator = " ";
my $includePrimaryField = 0;
my $containsTitles = 0;
my $sortFields = 1;
my $sortComplexKeys = 1;
my #primaryFieldIndexes;
GetOptions(
"field-separator|fs=s" => \$fieldSeparator,
"sort-fields|sf!" => \$sortFields,
"complex-key-separator|ks=s" => \$complexKeySeparator,
"sort-complex-keys|sk!" => \$sortComplexKeys,
"contains-titles|t!" => \$containsTitles,
"include-primary-field|i!" => \$includePrimaryField,
"primary-field-index|f=i#" => \#primaryFieldIndexes,
"help|?!" => sub { pod2usage(0) }
) or pod2usage(2);
pod2usage(0) if $#ARGV < 1;
push #primaryFieldIndexes, 0 if $#primaryFieldIndexes < 0;
my %primaryFieldIndexesHash;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
$primaryFieldIndexesHash{$i} = 1;
}
print "fieldSeparator = $fieldSeparator\n";
print "complexKeySeparator = $complexKeySeparator \n";
print "includePrimaryField = $includePrimaryField\n";
print "containsTitles = $containsTitles\n";
print "primaryFieldIndexes = #primaryFieldIndexes\n";
print "sortFields = $sortFields\n";
print "sortComplexKeys = $sortComplexKeys\n";
my $fieldsCount = 0;
my %keys_hash = ();
my %files = ();
my %titles = ();
# Read columns into a memory
foreach my $argnum (0 .. ($#ARGV - 1))
{
# Find files with specified pattern
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
open INPUT_FILE, $inputPath or die $!;
my %lines;
my $lineNumber = -1;
while (my $line = <INPUT_FILE>)
{
next if $containsTitles && $lineNumber == 0;
# Don't use chomp line. It doesn't handle unix input files on windows and vice versa
$line =~ s/[\r\n]+$//g;
# Skip lines that don't have columns
next if $line !~ m/($fieldSeparator)/;
# Split fields and count them (store maximum number of columns in files for later use)
my #fields = split($fieldSeparator, $line);
$fieldsCount = $#fields+1 if $#fields+1 > $fieldsCount;
# Sort complex key
my #multipleKey;
for(my $i = 0; $i <= $#primaryFieldIndexes; $i++)
{
my #complexKey = split ($complexKeySeparator, $fields[$primaryFieldIndexes[$i]]);
#complexKey = sort(#complexKey) if $sortFields;
push #multipleKey, join($complexKeySeparator, #complexKey)
}
# sort multiple keys and create key string
#multipleKey = sort(#multipleKey) if $sortFields;
my $fullKey = join $fieldSeparator, #multipleKey;
$lines{$fullKey} = \#fields;
$keys_hash{$fullKey} = 1;
}
close INPUT_FILE;
$files{$inputPath} = \%lines;
}
}
# Open output file
my $outputPath = $ARGV[$#ARGV];
open OUTPUT_FILE, ">" . $outputPath or die $!;
my #keys = sort keys(%keys_hash);
# Leave blank places for key columns
for(my $pf = 0; $pf <= $#primaryFieldIndexes; $pf++)
{
print OUTPUT_FILE $fieldSeparator;
}
# Print column headers
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
print OUTPUT_FILE $inputPath;
for(my $f = 0; $f < $fieldsCount - $#primaryFieldIndexes - 1; $f++)
{
print OUTPUT_FILE $fieldSeparator;
}
}
}
# Print merged columns
print OUTPUT_FILE "\n";
foreach my $key ( #keys )
{
print OUTPUT_FILE $key;
foreach my $argnum (0 .. ($#ARGV - 1))
{
my $filePattern = $ARGV[$argnum];
my #matchedFiles = < $filePattern >;
foreach my $inputPath (#matchedFiles)
{
my $lines = $files{$inputPath};
for(my $i = 0; $i < $fieldsCount; $i++)
{
next if exists $primaryFieldIndexesHash{$i} && !$includePrimaryField;
print OUTPUT_FILE $fieldSeparator;
print OUTPUT_FILE $lines->{$key}->[$i] if exists $lines->{$key}->[$i];
}
}
}
print OUTPUT_FILE "\n";
}
close OUTPUT_FILE;

Resources