I found this command {print NF} to show total number of columns:
$ nova list | awk '{print NF}' | sort -n | uniq
1
9
10
But I wish to print for every column their number.
See example with field separator |:
$ nova list | head
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
1 2 3 4
Let just understand what awk is doing here
nova list | awk '{print NF}' | sort -n | uniq
here in awk '{printf NF}' NF is number of fields where the field separator is taken as space. So in output below NF=9 (count pipe '|' symbol also).
| ID | Name | Status | Networks |
and same goes for below data line
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
In your output you are getting 1 and 10 also some line in nova list command output must have single field or 10 fields.
Now coming to your problem you wish to print field with its NF or field value with its NF value.
nova list | awk '{for(I=1;I<=NF;I++){printf I"-"$I" "}printf "\n"}'
It will not print the field number in last of the file but with the data.
Perl to the rescue:
nova list | \
perl -ne 'print;
$s = $_ if /\|/;
}{
$s =~ s/[^|]/ /g;
$s =~ s/\|/++$i/ge;
print " $s\n"
'
-n reads the input line by line
each line is printed and remembered in $s if it contains | (to skip the final border)
when the input ends }{, everything that's not a | is replaced by a space
all | are replaced by numbers
the result is printed
More tweaking needed if the number of columns > 10 (numbers get wider than 1 char):
$s =~ s/(??{" {".((length(0+$i))-1)."}"})\|/++$i/ge;
nova list | {
read line; echo "$line" # read and print the first line
read header; echo "$header" # read, remember and print the 2nd line
cat # all the rest of the nova list output
# then, use the header, and transform the words into numbers
echo "$header" | perl -pe 's/(\w+)/ sprintf "%-*d", length($1), ++$n /ge';
}
output
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
| 1 | 2 | 3 | 4 |
Actually, I can put that all is a quick perl script:
echo "$nova_list" | perl -ne '
$header = $_ if $. == 2;
print;
} {
$header =~ s/(\w+)/ sprintf "%-*d", length($1), ++$n /ge;
print $header;
'
You can get the last line of the file and "clean" it so that it becomes the footer with the field numbers:
nova list | awk -F"|" '1;
END {gsub (/[^|]/," ")
for(i=1;i<=NF;i++)
sub(/\| /, " " i)
gsub(/\|/," ")
print
}'
This:
replaces everything but | with a space.
replaces all strings "| " with an autoincremented number.
replaces the trailing |.
prints the result.
Test
$ cat a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
See output:
$ awk -F"|" '1; END {line=$0; fields=NF; gsub (/[^\|]/," "); for(i=1;i<=fields;i++) sub(/\| /, " " i); gsub(/\|/," "); print}' a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
1 2 3 4
Or keeping the field separators:
$ awk -F"|" '1; END {line=$0; fields=NF; gsub (/[^\|]/," "); for(i=1;i<=fields;i++) sub(/\| /, "| " i); print}' a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
| 1 | 2 | 3 | 4 |
Related
Good day.
I have two files, vmList and flavorList, the vmList containing the following:
$ cat /tmp/vmList
cf0012vm001| OS-SRV-USG:terminated_at | -
cf0012vm001| accessIPv4 |
cf0012vm001| accessIPv6 |
cf0012vm001| cf0012v_internal_network network | 192.168.210.10
cf0012vm001| created | 2021-09-17T17:21:39Z
cf0012vm001| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm001| hostId | fcf39100bcc6ae57a8212f97d3251ac43913719f2aebcaa72006956e
cf0012vm001| key_name | -
cf0012vm002| OS-SRV-USG:terminated_at | -
cf0012vm002| accessIPv4 |
cf0012vm002| accessIPv6 |
cf0012vm002| cf0012v_internal_network network | 192.168.210.11
cf0012vm002| created | 2021-09-17T17:21:37Z
cf0012vm002| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm002| hostId | e1590af8ddd57f1e2e74617d6c3631195e410bdd188a0b59813ffbef
cf0012vm002| id | 0e292900-6b50-4055-9842-d95e54fa1490
and the flavorList containing the following information:
$ cat /tmp/flavorList
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| 711f0ff2f01d403689819b6cbab36e42 | nd.c4r8d21s8e21 | 8192 | 21 | 21 | 8192 | 4 | | N/A |
| 78a70b62efae4fbcb35994aeb0f87678 | nd.c8r16d31s8e31 | 16384 | 31 | 31 | 8192 | 8 | | N/A |
| 78f4fe71cc3340a59c62fc0b32d81e3f | nd.c4r16d100 | 16384 | 100 | 0 | | 4 | | N/A |
| 7a7e6ae4bfe34ac4ab3983b8f764a8ce | nd.c2r8d40 | 8192 | 40 | 0 | | 2 | | N/A |
| 832169fed2244bb6b1739ab3db0f232e | nd.c1r4d100 | 4096 | 100 | 0 | | 1 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 8e968623e5c44674b33e1cc1f892e32d | nd.c9r40d50 | 40960 | 50 | 0 | | 9 | | N/A |
| 8e96a7044566406f9ef7eba48c2a8c55 | nd.c5r4d81 | 4096 | 81 | 0 | | 5 | | N/A |
| 8fd07e2004f84658a76af1cd8b9cea43 | nd.c2r8d50 | 8192 | 50 | 0 | | 2 | | N/A |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
My goal is to find the 'flavor' in the vmList, then grep the flavor value (nd.c8r16d50e60) from the flavorList, which in itself works:
$ for f in `grep flavor /tmp/vmList|awk '{print $4}'`;do grep ${f} /tmp/flavorList;done
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
However, I would like to add the first parameter from the vmList (cf0012vm001 and cf0012vm002) to precede the output, either in a line above the output or in front of the line:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
or even:
cf0012vm001
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
Please advise.
Bjoern
Assumptions:
a flavor does not contain spaces
a specific ordering of the output has not be stated
vmList: column/field #1 could be associated with different flavors [NOTE: not supported by sample data set; OP would need to refute/confirm]
One GNU awk idea that uses an array of arrays:
awk -F'|' ' # input field delimiter = "|" for both files
FNR==NR { # for 1st file ...
name=gensub(/ /,"","g",$2) # remove all spaces from field #2 and save in awk variable "name"
if (name == "flavor") { # if field #2 == "flavor" ...
split($3,arr,"(") # split field #3 using "(" as delimiter, storing results in array arr[]
gsub(" ","",arr[1]) # remove all spaces from first array entry
flavors[arr[1]] # keep track of unique flavors
col1[arr[1]][$1] # keep track of associated values from column/field #1
}
next
}
FNR>3 { # for 2nd file, after ignoring first 3 lines ...
if (NF == 1) next # skip line if it only has 1 "|" delimited field
name=gensub(/ /,"","g",$3) # remove all spaces from field #3 and save in awk variable "name"
if (name in flavors) # if name is in our list of flavors ...
for (i in col1[name]) # loop through list of columns (from 1st file)
print i,$0 # print column (from 1st file) plus current line
}
' vmList flavorList
This generates:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
NOTE: while this output appears to be sorted by the first column this is merely a coincidence; if a specific order needs to be guaranteed this can likely be done by adding an appropriate PROCINFO["sorted_in"] entry; OP just needs to state the desired ordering
Would you please try the following:
echo "VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |"
echo "------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+"
awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR==FNR && $2=="flavor" {sub(/[[:blank:]].+/, "", $3); a[$1]=$3; next}
{
for (i in a) {
if (a[i] == $3) print i " " $0
}
}
' /tmp/vmList /tmp/flavorList | sort -k1.9,1.11n
Output:
VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
The field separator [[:blank:]]*\\|[[:blank:]]* splits the record
on the pipe character with preceding / following blank characters if any.
The condition NR==FNR && $2=="flavor" matches the flavor line
in vmList.
The statement sub(/[[:blank:]].+/, "", $3) extracts the nd.xxx
field by removing the substring after the blank character.
a[$1]=$3 stores the nd.xxx field keyed by the 1st cfxxx field.
The final for (i in a) loop prints the matched lines in flavorList with prepending the cfxxx field.
sort -k1.9,1.11n sorts the output by the substring from the 1st field 9th character to the 1st field 11th character. The trailing n option specifies the numerical sort.
I am trying to sort the results of sklearn.ensemble.RandomForestRegressor's feature_importances_
I have the following function:
def get_feature_importances(cols, importances):
feats = {}
for feature, importance in zip(cols, importances):
feats[feature] = importance
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance')
return importances
I use it like so:
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)
And I get the following results:
| PART | 0.035034 |
| MONTH1 | 0.02507 |
| YEAR1 | 0.020075 |
| MONTH2 | 0.02321 |
| YEAR2 | 0.017861 |
| MONTH3 | 0.042606 |
| YEAR3 | 0.028508 |
| DAYS | 0.047603 |
| MEDIANDIFF | 0.037696 |
| F2 | 0.008783 |
| F1 | 0.015764 |
| F6 | 0.017933 |
| F4 | 0.017511 |
| F5 | 0.017799 |
| SS22 | 0.010521 |
| SS21 | 0.003896 |
| SS19 | 0.003894 |
| SS23 | 0.005249 |
| SS20 | 0.005127 |
| RR | 0.021626 |
| HI_HOURS | 0.067584 |
| OI_HOURS | 0.054369 |
| MI_HOURS | 0.062121 |
| PERFORMANCE_FACTOR | 0.033572 |
| PERFORMANCE_INDEX | 0.073884 |
| NUMPA | 0.022445 |
| BUMPA | 0.024192 |
| ELOH | 0.04386 |
| FFX1 | 0.128367 |
| FFX2 | 0.083839 |
I thought the line importances.sort_values(by='Gini-importance') would sort them. But it is not. Why is this not performing correctly?
importances.sort_values(by='Gini-importance') returns the sorted dataframe, which is overlooked by your function.
You want return importances.sort_values(by='Gini-importance').
Or you could make sort_values inplace:
importances.sort_values(by='Gini-importance', inplace=True)
return importances
I need to analyze Weekly order frequencies over last 1 year period to find out what is the min/max/average frequencies of orders for each product.
whether it is new or old,system should calculate the first occurrence of the order in the year as the starting week of the order. Min order frequency is difference between successive ordering weeks. If the first order is in wk 3 and the second order is in wk6, implies the order frequency is 3 weeks (=>6-3). Orders can be at any week in the past 52 weeks. Average order frequency = (52 - First order week) / no of weeks that have orders.
Attaching the excel for better understanding the issue.
Original image
+---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+----------------+-------------------------+-----+-----------------------------------+--+
| Product | wk1 | wk2 | wk3 | wk4 | wk5 | wk6 | wk7 | wk8 | wk9 | wk10 | wk11 | wk12 | wk13 | wk14 | wk15 | wk16 | wk17 | wk18 | wk19 | wk20 | wk21 | wk22 | wk23 | wk24 | wk25 | wk26 | wk27 | wk28 | wk29 | wk30 | wk31 | wk32 | wk33 | wk34 | wk35 | wk36 | wk37 | wk38 | wk39 | wk40 | wk41 | wk42 | wk43 | wk44 | wk45 | wk46 | wk47 | wk48 | wk49 | wk50 | wk51 | wk52 | Order start wk | Order frequency (Weeks) | | | |
+---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+----------------+-------------------------+-----+-----------------------------------+--+
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Min | Max | Average | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (End wk - Start week)/No of times | |
| SKU 1 | | | | | | | | | y | | y | | y | | y | | y | | y | | y | | y | y | | | y | | y | | y | | y | | | | | | y | | y | | y | | y | | y | | y | | y | | 9 | 1 | 6 | 2.15 | |
| SKU 2 | | | | | | | y | | | | | | y | | | | | | y | | | | | | y | | | | | | y | | | | | | y | | | | | | y | | | | | | y | | | | 1 | 0 | 0 | 7.29 | |
| SKU 3 | | | | | | | | | | | | | | | y | | | | | | | | | | | | | | | | y | | | | | | | | y | | | | | | | | y | | | | | | 15 | 8 | 15 | 9.25 | |
+---------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+----------------+-------------------------+-----+-----------------------------------+--+
So as mentioned #Barry Houdini solves the problem of finding the longest sequence of zeroes separated by ones elegantly here
You only have to change it slightly to check for repeated blank cells separated by 'y'. The only thing is that you don't want to include cells before the first 'y', and (although this isn't clear) may not want to include blank cells after the last 'y'.
The formula for MIN becomes
=MIN(IF((ROW(A$1:INDEX(A:A,COUNTA(B4:BA4)+1))>1)*(ROW(A$1:INDEX(A:A,COUNTA(B4:BA4)+1))<COUNTA(B4:BA4)+1),FREQUENCY(IF(B4:BA4="",COLUMN(B4:BA4)),IF(B4:BA4="y",COLUMN(B4:BA4)))))+1
and the formula for MAX becomes (the same)
=MAX(IF((ROW(A$1:INDEX(A:A,COUNTA(B4:BA4)+1))>1)*(ROW(A$1:INDEX(A:A,COUNTA(B4:BA4)+1))<COUNTA(B4:BA4)+1),FREQUENCY(IF(B4:BA4="",COLUMN(B4:BA4)),IF(B4:BA4="y",COLUMN(B4:BA4)))))+1
where you need to add 1 to make the results agree with the question because #Barry's formula counts numbers of blanks but OP wants interval between two successive y's. An array of ny+1 elements is generated where ny is the number of y's. This is because the FREQUENCY function returns an array with n+1 elements where n is the number of cut points (bins_array in documentation and because the column numbers of cells containing y are used as cut points so there are ny of them.
These are both array formulas and need to be entered with CtrlShiftEnter
The formula for the average is just
=(COLUMNS(B4:BA4)-MATCH("y",B4:BA4,0))/COUNTA(B4:BA4)
I am trying to parse the output below:
+--------------------------------------+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+--------------------------------------+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| 1 | m1.tiny | 512 | 1 | 0 | | 1 | 1.0 | True |
| 2 | m1.small | 2048 | 20 | 0 | | 1 | 1.0 | True |
| 214b272c-e6a4-4bb5-96a4-c74c64984e5a | MC | 2048 | 100 | 0 | | 1 | 1.0 | True |
| 3 | m1.medium | 4096 | 40 | 0 | | 2 | 1.0 | True |
| 4 | m1.large | 8192 | 80 | 0 | | 4 | 1.0 | True |
| 5 | m1.xlarge | 16384 | 160 | 0 | | 8 | 1.0 | True |
| 71aa57d1-52e3-4499-abd2-23985949aeb4 | slmc | 4096 | 32 | 0 | | 2 | 1.0 | True |
| 7cf1d926-c904-47b8-af70-499196a1f65f | new test flavor | 1 | 1 | 0 | | 1 | 1.0 | True |
| 97b3dc38-f752-437b-881d-c3415c8a682c | slstore | 10240 | 32 | 0 | | 4 | 1.0 | True |
+--------------------------------------+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
It is the list of flavours in open-stack. I am expecting output as below:
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new test flavor;slstore;
What I tried:
I came up with below command for parsing:
nova flavor-list | grep '|' | awk 'NR>1 {print $4}' | tr '\n' ';'
but the issue is that the command returns output as follows:
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new;slstore;
There is a problem with the space in new test flavor.
Below command will give expected output
nova flavor-list | grep '|' | awk -F "|" 'NR>1 {print $3}' | tr '\n' ';'
Above command will give output will white spaces i.e.
$ nova flavor-list | grep '|' | awk -F "|" 'NR>1 {print $3}' | tr '\n' ';'
m1.tiny ; m1.small ; MC ; m1.medium ; m1.large ; m1.xlarge ; slmc ; new test flavor ; slstore ;
To get output without white spaces use below command
$nova flavor-list | grep '|' | awk -F "|" 'NR>1 {print $3}' | awk -F "\n" '{gsub(/^[ \t]+|[ \t]+$/, "", $1)}1' | tr '\n' ';'
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new test flavor;slstore;
Use the following command
sed -e '1,3d' < flavor-list | sed -e 's/-//g' | sed -e 's/+//g' | cut -f 3 -d "|" | sed -e 's/^ //g' | sed -e 's/\s\+$//g' | tr '\n' ';'
Software tools (no sed, no awk):
tail -n +4 flavor-list | grep -v '\-\-' | cut -d'|' -f3 | cut -d' ' -f2- | \
tr -s ' ' | rev | cut -d' ' -f2- | rev | tr '\n' ';' ; echo
Pure bash (uses no utils at all):
while read a b c d ; do \
d="${d/ [ |]*/}" ; \
[ -z "$d" -o "$d" = Name ] && continue ; \
echo -n "$d;" ; \
done < flavor-list ; echo
Output (either one):
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new test flavor;slstore;
how to join to files with awk/sed/grep/bash similar to SQL JOIN?
I have a file that looks like this:
and another one that looks like this:
i've also a text version of the image above:
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
| 21548598 | DSND001906102.2 | 0107 | 001906102 | 02 | FROZEN / O.S.T. | FROZEN / O.S.T. | 001 | 024 | | | 11.49 | 13.95 | 050087295745 | 11/25/2013 | | | N | N | 30 | | 1 | E | 1 | 10/07/2013 | 02/27/2014 | 10/07/2013 | 10/07/2013 |
| 25584998 | WD1194190DVD | 0819 | 1194190 | 18 | FROZEN / (WS DOL DTS) | FROZEN / (WS DOL DTS) | 050 | 110 | | G | 21.25 | 29.99 | 786936838961 | 03/18/2014 | | | N | N | 0 | | 1 | A | 2 | 12/20/2013 | 03/13/2014 | 12/20/2013 | 12/20/2013 |
| 25812794 | WHV1000292717BR | 0526 | 1000292717 | BR | GRAVITY / (UVDC) | GRAVITY / (UVDC) | 050 | 093 | | PG13 | 29.49 | 35.99 | 883929244577 | 02/25/2014 | | | N | N | 30 | | 1 | E | 3 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 24475594 | SNY303251.2 | 0085 | 303251 | 02 | BEYONCE | BEYONCE | 001 | 004 | | | 14.99 | 17.97 | 888430325128 | 12/20/2013 | | | N | N | 30 | | 1 | A | 4 | 12/19/2013 | 01/02/2014 | 12/19/2013 | 12/19/2013 |
| 25812787 | WHV1000284958DVD | 0526 | 1000284958 | 18 | GRAVITY (2PC) / (UVDC SPEC 2PK) | GRAVITY (2PC) / (UVDC SPEC 2PK) | 050 | 093 | | PG13 | 21.25 | 28.98 | 883929242528 | 02/25/2014 | | | N | N | 30 | | 1 | E | 5 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 21425462 | PBSDMST64400DVD | E349 | 64400 | 18 | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | 050 | 095 | 094 | | 30.49 | 49.99 | 841887019705 | 01/28/2014 | | | N | N | 30 | | 1 | A | 6 | 09/06/2013 | 01/15/2014 | 09/06/2013 | 09/06/2013 |
| 25584974 | WD1194170BR | 0819 | 1194170 | BR | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | 050 | 110 | | G | 27.75 | 39.99 | 786936838923 | 03/18/2014 | | | N | N | 0 | | 2 | A | 7 | 12/20/2013 | 03/13/2014 | 01/15/2014 | 01/15/2014 |
| 21388262 | HBO1000394029DVD | 0203 | 1000394029 | 18 | GAME OF THRONES: SEASON 3 | GAME OF THRONES: SEASON 3 | 050 | 095 | 093 | | 47.99 | 59.98 | 883929330713 | 02/18/2014 | | | N | N | 30 | | 1 | E | 8 | 08/29/2013 | 02/28/2014 | 08/29/2013 | 08/29/2013 |
| 25688450 | WD11955700DVD | 0819 | 11955700 | 18 | THOR: THE DARK WORLD / (AC3 DOL) | THOR: THE DARK WORLD / (AC3 DOL) | 050 | 093 | | PG13 | 21.25 | 29.99 | 786936839500 | 02/25/2014 | | | N | N | 30 | | 1 | A | 9 | 12/24/2013 | 02/20/2014 | 12/24/2013 | 12/24/2013 |
| 23061316 | PRT359054DVD | 0818 | 359054 | 18 | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | 050 | 110 | | R | 21.75 | 29.98 | 097363590545 | 01/28/2014 | | | N | N | 30 | | 1 | E | 10 | 12/06/2013 | 03/12/2014 | 12/06/2013 | 12/06/2013 |
| 21548611 | DSND001942202.2 | 0107 | 001942202 | 02 | FROZEN / O.S.T. (BONUS CD) (DLX) | FROZEN / O.S.T. (BONUS CD) (DLX) | 001 | 024 | | | 14.09 | 19.99 | 050087299439 | 11/25/2013 | | | N | N | 30 | | 1 | E | 11 | 10/07/2013 | 02/06/2014 | 10/07/2013 | 10/07/2013 |
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
The 2nd column from the first file can be joined to the 14th column of the second file!
here's what i've been trying to do:
join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
but i am getting these errors:
$ join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
sort: unknown option -- F
Try sort --help' for more information.
sort: unknown option -- F
Try sort --help' for more information.
-700476409 [waitproc] -bash 10336 sig_send: error sending signal 20 to pid 10336, pipe handle 0x84, Win32 error 109
the output i would like would be something like this:
+-------+-------+---------------+
| 12.99 | 14.77 | 3383510002151 |
| 13.97 | 17.96 | 3383510002175 |
| 13.2 | 13 | 3383510002267 |
| 13.74 | 14.19 | 3399240165349 |
| 9.43 | 9.52 | 3399240165363 |
| 12.99 | 4.97 | 3399240165479 |
| 7.16 | 7.48 | 3399240165677 |
| 11.24 | 9.43 | 4011550620286 |
| 13.86 | 13.43 | 4260182980316 |
| 13.98 | 12.99 | 4260182980507 |
| 10.97 | 13.97 | 4260182980514 |
| 11.96 | 13.2 | 4260182980545 |
| 15.88 | 13.74 | 4260182980552 |
+-------+-------+---------------+
what am i doing wrong?
You can do all the work in join and sort
join -1 2 -2 14 -t $'\t' -o 2.12,1.1,0 \
<( sort -t $'\t' -k 2,2 output1.csv ) \
<( sort -t $'\t' -k 14,14 aecprda12.tab )
Notes:
$'\t' is a bash ANSI-C quoted string which is a tab character: neither join nor sort seem to recognize the 2-character string "\t" as a tab
-k col,col sorts the file on the specified column
join has several options to control how it works; see the join(1) man page.
sort awk -F...
is not a valid command; it means sort a file named awk but of course, like the error message says, there is no -F option to sort. The syntax you are looking for is
awk -F ... | sort
However, you might be better off doing the joining in Awk directly.
awk -F"\t" 'NR==FNR{k[$14]=$12; next}
k[$2] { print $2, $1, k[$2] }' aecprda12.tab output1.csv
I am assuming that you don't know whether every item in the first file has a corresponding item in the second file - and that you want only "matching" items. There is indeed a good way to do this in awk. Create the following script (as a text file, call it myJoin.txt):
BEGIN {
FS="\t"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
a[$2]=$1 # create one array element for each $1/$2 pair
next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
# see if the associative array element exists:
gsub(/ /,"",$14) # trim leading/ trailing spaces
if (a[$14]) { # see if the value in $14 was seen in the first file
# print out the three values you care about:
print $12 " " a[$14] " " $14
}
}
Now execute this with
awk -f myJoin.txt file1 file2
Seems to work for me...