I would like to know if there is a method for creating awk objects inside an awk call. I need to build a key/value map and use it in an awk call. More in details, I have a map linking some labels with a unique id (e.g. "ID1002", "External compartment"). I would like to use this map to identify a set of unique ids from another table. Here is what I was thinking about:
awk 'BEGIN{map=system(awk '{m[$1]=$2}' first.csv)}{print map[$1]}' second.csv
Obviously this doesn't work and I was wondering how can I do something like that without building an awk script.
The common way this done in awk is:
$ awk 'NR==FNR{m[$1]=$2;next}{print m[$1]}' first.csv second.csv
Explanation:
NR is a special variable that gets incremented on each record read
FNR is similar to NR however it is reset for each new file read
next instructs awk to stop executing for the current record and get the next record.
With the definitions set you can read the script as:
NR==FNR # Conditional that is only true when reading the first file
{m[$1]=$2;next} # Create a map and move on to the next line
{print m[$1]} # Using next in the first block means this only runs on the second file
Related
I have a problem with my bash script, I would like to retrieve information contained in several files and gather them in one.
I have a file in this form which contains about 15000 lines: (file1)
1;1;A0200101C
2;2;A0200101C
3;3;A1160101A
4;4;A1160101A
5;5;A1130304G
6;6;A1110110U
7;7;A1110110U
8;8;A1030002V
9;9;A1030002V
10;10;A2120100C
11;11;A2120100C
12;12;A3410071A
13;13;A3400001A
14;14;A3385000G1
15;15;A3365070G1
I would need to retrieve the first record of each row matching the id.
My second file is this, I just need to retrieve the 3rd row: (file2)
count
-------
131
(1 row)
I would therefore like to be able to assemble the id of (file1) and the 3rd line of (file2) in order to achieve this result:
1;131
2;131
3;131
4;131
5;131
6;131
7;131
8;131
9;131
11;131
12;131
13;131
14;131
15;131
Thank you.
One possible way:
#!/usr/bin/env bash
count=$(awk 'NR == 3 { print $1 }' file2)
while IFS=';' read -r id _; do
printf "%s;%s\n" "$id" "$count"
done < file1
First, read just the third line of file2 and save that in a variable.
Then read each line of file1 in a loop, extracting the first semicolon-separated field, and print it along with that saved value.
Using the same basic approach in a purely awk script instead of shell will be much faster and more efficient. Such a rewrite is left as an exercise for the reader (Hint: In awk, FNR == NR is true when reading the first file given, and false on any later ones. Alternatively, look up how to pass a shell variable to an awk script; there are Q&As here on SO about it.)
This is my txt file
type=0
vcpu_count=10
maste=0
h=0
p=0
memory=23.59
num=2
I want to get the vcpu_count and memory values and store it in some array through perl(automating script) .
awk -F'=' '/vcpu_count/{printf "\n",$1}' .vmConfig.txt
i am using this command just to test on terminal.but am getting a blank line. How do i do it. I need to get these two values and check for condition
If you are using Perl anyway, just use Perl for this too.
my %array;
open ($config, "<", ".vmConfig.txt") or die "$0: Could not open .vmConfig.txt: $!\n";
while (<$config>) {
next unless /^\s*(vcpu_count|memory)\s*=\s*(.*?)\s*\n/;
$array{$1} = $2;
}
close($config);
If you don't want the result to be an associative array (aka hash), refactoring should be relatively easy.
Following awk may help you on same.
Solution 1st:
awk '/vcpu_count/{print;next} /memory/{print}' Input_file
Output will be as follows:
vcpu_count=10
memory=23.59
Solution 2nd:
In case you want to print the values on a single line using printf then following may help you on same:
awk '/vcpu_count/{val=$0;next} /memory/{printf("%s AND %s\n",val,$0)}' Input_file
Output will be as follows:
vcpu_count=10 AND memory=23.59
when you use awk -F'=' '/vcpu_count/{printf "\n",$1}' .vmConfig.txt there are a couple of mistakes. Firstly, printf "\n" will only ever print a new line, as you have found. You need to add a format specifier - something like printf "%s\n", $2 will treat field 2 as a string and add it into the printed string. Checking out man printf at the command line will explain a bit more,.
Secondly, as I changed there, when you used $1 you were using the first field, which is the key in this case (while $0 is the whole line.)
Triplees solution is probably the most appropriate, but if there is a particular reason to start awk to perform this before perl, the following may help.
As you have done, it splits on =, but then outputs as csv, which you can change as appropriate. Even if input lines are not always in same order, will output in predictable order on single line
awk 'BEGIN {
FS="=";
OFS="," # tabs, etc if wanted, delete for spaces.
}
/vcpu_count/ {cpu=$2}
/memory/ {mem=$2}
END { print cpu, mem }'
This gives
10,23.59
I'm learning some awk. I found an example online of taking a fixed width file and converting it to a csv file. There is just one part I do not understand, even after going through many man pages and online tutorials:
1: awk -v FIELDWIDTHS='1 10 4 2 2' -v OFS=',' '
2: { $1=$1 ""; print }
3: ' data.txt`
That is verbatim from the sample online (found here).
What I don't understand is line 2. I get there is no condition, so the 'program' (contained in brackets) will always execute per record (line). I don't understand why it is doing the $1=$1 as well as the empty string statement "";. However, removing these causes incorrect behavior.
$1=$1 assigns a value to $1 (just happens to be the same value it already had). Assigning any value to a field cause awk to recompile the current record using the OFS value between fields (effectively replacing all FSs or FIELDSEPS spacings with OFSs).
$ echo 'a,b,c' | awk -F, -v OFS="-" '{print; $1=$1; print}'
a,b,c
a-b-c
The "" is because whoever wrote the script doesn't fully understand awk and thinks that's necessary to ensure numbers retain their precision by converting them to a string before the assignment.
Given an input list like the following:
405:alice#level1
405:bob#level2
405:chuck#level1
405:don#level3
405:eric#level1
405:francis#level1
004:ac#jjj
004:la#jjj
004:za#zzz
101:amy#floor1
101:brian#floor3
101:christian#floor1
101:devon#floor1
101:eunuch#floor2
101:frank#floor3
005:artie#le2
005:bono#nuk1
005:bozo#nor2
(As you can see, the first field was randomly sorted (the original input had all of the first field in numerical order, with 004 coming first, then 005, 101, 405, et al) but the second field is in alphabetical order on the first character.)
What is desired is a randomized sort where the first field - as separated by a colon ':', is randomly sorted so that all of the entries of the second field do not matter during the random sort, so long as all lines where the first field are the same are grouped together but randomly distributed throughout the file - is to have the second field randomly sorted as well. That is, in the final output, lines with the same value in the first field are grouped together (but randomly distributed throughout the file) but also to have the second field randomly sorted. I am unable to get this desired result as I am not too familiar with sort keys and whatnot.
The desired output would look similar to this:
405:francis#level1
405:don#level3
405:eric#level1
405:bob#level2
405:alice#level1
405:chuck#level1
004:za#zzz
004:ac#jjj
004:la#jjj
101:christian#floor1
101:amy#floor1
101:frank#floor3
101:eunuch#floor2
101:brian#floor3
101:devon#floor1
005:bono#nuk1
005:artie#le2
005:bozo#nor2
Does anyone know how to achieve this type of sort?
Thank you!
You can do this with awk pretty easily.
As a one-liner:
awk -F: 'BEGIN{cmd="sort -R"} $1 != key {close(cmd)} {key=$1; print | cmd}' input.txt
Or, broken apart for easier explanation:
-F: - Set awk's field separator to colon.
BEGIN{cmd="sort -R"} - before we start, set a variable that is a command to do the "randomized sort". This one works for me on FreeBSD. Should work with GNU sort as well.
$1 != key {close(cmd)} - If the current line has a different first field than the last one processed, close the output pipe...
{key=$1; print | cmd} - And finally, set the "key" var, and print the current line, piping output through the command stored in the cmd variable.
This usage takes advantage of a bit of awk awesomeness. When you pipe through a string (be it stored in a variable or not), that pipe is automatically created upon use. You can close it any time, and a subsequent use will reopen a new command.
The impact of this is that each time you close(cmd), you print the current set of randomly sorted lines. And awk closes cmd automatically once you come to the end of the file.
Of course, for this solution to work, it's vital that all lines with a shared first field are grouped together.
not as elegant but a different method
$ awk -F: '!($1 in a){a[$1]=c++} {print a[$1] "\t" $0}' file |
sort -R -k2 |
sort -nk1,1 -s |
cut -f2-
or, this alternative which doesn't assume initial grouping
$ sort -R file |
awk -F: '!($1 in a){a[$1]=c++} {print a[$1] "\t" $0}' |
sort -nk1,1 -s |
cut -f2-
I have some data files from a legacy system that I would like to process using Awk. Each file consists of a list of records. There are several different record types and each record type has a different set of fixed-width fields (there is no field separator character). The first two characters of the record indicate the type, from this you then know which fields should follow. A file might look something like this:
AAField1Field2LongerField3
BBField4Field5Field6VeryVeryLongField7Field8
CCField99
Using Gawk I can set the FIELDWIDTHS, but that applies to the whole file (unless I am missing some way of setting this on a record-by-record basis), or I can set FS to "" and process the file one character at a time, but that's a bit cumbersome.
Is there a good way to extract the fields from such a file using Awk?
Edit: Yes, I could use Perl (or something else). I'm still keen to know whether there is a sensible way of doing it with Awk though.
Hopefully this will lead you in the right direction. Assuming your multi-line records are guaranteed to be terminated by a 'CC' type row you can pre-process your text file using simple if-then logic. I have presumed you require fields1,5 and 7 on one row and a sample awk script would be.
BEGIN {
field1=""
field5=""
field7=""
}
{
record_type = substr($0,1,2)
if (record_type == "AA")
{
field1=substr($0,3,6)
}
else if (record_type == "BB")
{
field5=substr($0,9,6)
field7=substr($0,21,18)
}
else if (record_type == "CC")
{
print field1"|"field5"|"field7
}
}
Create an awk script file called program.awk and pop that code into it. Execute the script using :
awk -f program.awk < my_multi_line_file.txt
You maybe can use two passes:
1step.awk
/^AA/{printf "2 6 6 12" }
/^BB/{printf "2 6 6 6 18 6"}
/^CC/{printf "2 8" }
{printf "\n%s\n", $0}
2step.awk
NR%2 == 1 {FIELDWIDTHS=$0}
NR%2 == 0 {print $2}
And then
awk -f 1step.awk sample | awk -f 2step.awk
You probably need to suppress (or at least ignore) awk's built-in field separation code, and use a program along the lines of:
awk '/^AA/ { manually process record AA out of $0 }
/^BB/ { manually process record BB out of $0 }
/^CC/ { manually process record CC out of $0 }' file ...
The manual processing will be a bit fiddly - I suppose you'll need to use the substr function to extract each field by position, so what I've got as one line per record type will be more like one line per field in each record type, plus the follow-on printing.
I do think you might be better off with Perl and its unpack feature, but awk can handle it too, albeit verbosely.
Could you use Perl and then select an unpack template based on the first two chars of the line?
Better use some fully featured scripting language like perl or ruby.
What about 2 scripts? E.g. 1st script inserts field separators based on the first characters, then the 2nd should process it?
Or first of all define some function in your AWK script, which splits the lines into variables based on the input - I would go this way, for the possible re-usage.