Parsing in Linux

Parsing in Linux - linux

I want to parse the compute zones in open-stack command output as below
+-----------------------+----------------------------------------+
| Name | Status |
+-----------------------+----------------------------------------+
| internal | available |
| |- controller | |
| | |- nova-conductor | enabled :-) 2016-07-07T08:09:57.000000 |
| | |- nova-consoleauth | enabled :-) 2016-07-07T08:10:01.000000 |
| | |- nova-scheduler | enabled :-) 2016-07-07T08:10:00.000000 |
| | |- nova-cert | enabled :-) 2016-07-07T08:10:00.000000 |
| Compute01 | available |
| |- compute01 | |
| | |- nova-compute | enabled :-) 2016-07-07T08:09:53.000000 |
| Compute02 | available |
| |- compute02 | |
| | |- nova-compute | enabled :-) 2016-07-07T08:10:00.000000 |
| nova | not available |
+-----------------------+----------------------------------------+
i want to parse the result as below, taking only nodes having nova-compute
Compute01;Compute02
I used below command:
nova availability-zone-list | awk 'NR>2 {print $2}' | grep -v '|' | tr '\n' ';'
but it returns output like this
;internal;Compute01;Compute02;nova;;

In Perl (and written rather more verbosely than is really necessary):
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $node; # Store current node name
my #compute_nodes; # Store known nova-compute nodes
while (<>) { # Read from STDIN
# If we find the start of line, followed by a pipe, a space and
# a series of word characters...
if (/^\| (\w+)/) {
# Store the series of word characters (i.e. the node name) in $node
$node = $1;
}
# If we find a line that contains "nova-compute", add the current
# node name in #compute_nodes
push #compute_nodes, $node if /nova-compute/;
}
# Print out all of the values in #compute_nodes
say join ';', #compute_nodes;

I detest one-line programs except for the most simple of applications. They are unnecessarily cryptic, they have none of the usual programming support, and they are stored only in the terminal buffer. Want to do the same thing tomorrow? You must start coding again
Here's a Perl solution. Run it as
$ perl nova-compute.pl command-output.txt
use strict;
use warnings 'all';
my ($node, #nodes);
while ( <> ) {
$node = $1 if /^ \| \s* (\w+) /x;
push #nodes, $node if /nova-compute/;
}
print join(';', #nodes), "\n";
output
Compute01;Compute02
Now all of that is saved on disk. It may be run again at any time, modified for similar results, or fixed if you got it wrong. It is also readable. No contest

$ nova availability-zone-list | awk '/^[|] [^|]/{node=$2} node && /nova-compute/ {s=s ";" node} END{print substr(s,2)}'
Compute01;Compute02
How it works:
/^[|] [^|]/{node=$2}
Any time a line begins with | followed by space followed by a character not |, then save the second field as a node name.
node && /nova-compute/ {s=s ";" node}
If node is non-empty and the current line contains nova-compute, then append node to the string s.
END{print substr(s,2)}
After we have read all the lines, print out string s minus its first character which is a superfluous ;.

Related

Bash: For Loop & save each output as a new column in a csv

I have a folder with a mixture of files types (.bam, .bam.bai, and .log). I created a for loop to perform two commands on each of the .bam files. My current code direct the output of each command into a separate csv files, because I could not figure out how to direct the outputs to separate columns.
TYIA!
Question 1
I want to export the output from the commands into the same csv. How can I alter my code so that the output from my first command is saved as the first column of a csv, and the output from my second command is saved as the second column of the same csv.
Question 2
What is the name of the syntax used to select files in a for loop? For instance, the * in *.bam represents a wildcard. Is this regex? I had a tough time trying to alter this so that only *.bam files were selected for the for loop (and .bam.bai were excluded). I ended up with *[.bam] by guessing and empirically testing my outputs. Are there any websites that do a good job of explaining this syntax and provide lots of examples (coder level: newbie)
Current Code
> ~/Desktop/Sample_Names.csv
> ~/Desktop/Read_Counts.csv
echo "Sample" | cat - > ~/Desktop/Sample_Names.csv
echo "Total_Reads" | cat - > ~/Desktop/Read_Counts.csv
for file in *[.bam]
do
samtools view -c $file >> ~/Desktop/Read_Counts.csv
gawk -v RS="^$" '{print FILENAME}' $file >> ~/Desktop/Sample_Names.csv
done
Current Outputs (truncated)
>Sample_Names.csv
| Sample |
|--------------|
| B40-JV01.bam |
| B40-JV02.bam |
| B40-JV03.bam |
>Read_Counts.csv
| Total_Reads |
|-------------|
| 3835555 |
| 4110463 |
| 144558 |
Desired Output
>Combined_Outputs.csv
| Sample | Total_Reads |
|--------------|-------------|
| B40-JV01.bam | 3835555 |
| B40-JV02.bam | 4110463 |
| B40-JV03.bam | 144558 |

Something like
echo "Sample,Total_Reads" > Combined_Outputs.csv
for file in *.bam; do
printf "%s,%s\n" "$file" "$(samtools view -c "$file")"
done >> Combined_Outputs.csv
Print one line for each file, and move the output redirection outside of the loop for efficiency.

Convert number to gigabytes (sed, awk, sh, bash)

I have the following multi-line output:
Volume Name: GATE02-SYSTEM
Size: 151934468096
Volume Name: GATE02-S
Size: 2000112582656
Is it possible to convert strings like:
Volume Name: GATE02-SYSTEM
Size: 141.5 Gb
Volume Name: GATE02-S
Size: 1862.75 Gb
with sed/awk? I looked for answers for the only number output like:
echo "151934468096" | awk '{ byte =$1 /1024/1024**2 ; print byte " GB" }'
but can't figure out how to apply it to my case.
EDIT: I'd like to clarify my case. I have the following one-liner:
esxcfg-info | grep -A 17 "\\==+Vm FileSystem" | grep -B 15 "[I]s Accessible.*true" | grep -B 4 -A 11 "Head Extent.*ATA.*$" | sed -r 's/[.]{2,}/: /g;s/^ {12}//g;s/\|----//g' | egrep -i "([V]olume Name|^[S]ize:)"
which generates output above. I want to know what to add more after the last command to get the desired output.

Using (GNU) sed and bc:
... | sed 's#Size: \([0-9]*\)#printf "Size: %s" $(echo "scale = 2; \1 / 1024 ^ 3" | bc)GiB#e'
^^^^ ^^^^^^^^^^ ^^^^^^ ^^^^^^^^^ ^^ ^^^^^^^^^^ ^^ ^
1 2 3 4 5 6 7 8
Matching Size: lines.
Capturing the number of bytes (potentially vulnerable if empty or zero, strengthen for production use)
Replacing with printf output
scale = sets the number of decimals for bc
Using the sed capture group (number of bytes)
The mathematical operation on the byte number
Piping all that to bc for evaluation
Telling sed to execute the "replace" part of the s statement in a subshell.
Another option is numfmt (where available, since GNU coreutils 8.21):
... | sed 's#Size: \([0-9]*\)#printf "%s" $(echo \1 | numfmt --to=iec-i --suffix=B)#e'
This doesn't give you control over the decimals, but "works" for all sizes (i.e. gives TiB where the number is big enough, and does not truncate to "0.00 GiB" for too-small numbers).

printf "Volume Name: GATE02-SYSTEM\nSize: 151934468096\nVolume Name: GATE02-S\nSize: 2000112582656" |
awk '/^Size: / { $2 = $2/1024/1024**2 " Gb" }1'

Use the same examples in multiple Cucumber scenario outlines

How do I structure tests for the following program:
I'm writing a unit test framework for simulated combinatorial circuits. This framework will support multiple digital logic simulators (JLS, Logisim, TKGate, etc.) Thus, each test should be run once for each supported simulator.
My first idea is to do something like this:
Scenario Outline: Test of valid circuit
when I run DLUnit with "testCircuit1.<type> testFile"
Then I should see "All tests (4) passed." on stdout
Examples:
| type |
| jls | # extension for JLS files
| circ | # extension for Logisim files
| v | # extension for tkgate files
Scenario Outline: Test of invalid circuit
when I run DLUnit with "brokenCircuit1.<type> testFile"
Then I should see "There were failures" on stdout
Examples:
| type |
| jls |
| circ |
| v |
# Many more tests to follow
Although this will technically work, it results in feature code that may be difficult to maintain: Each feature is followed by a list of supported simulators. Adding support for an additional simulator would require adding the same line to each test.
I could also create jls.feature, then use sed to automatically create logisim.feature and tkgate.feature; but, I'd like to avoid that type of complexity if Cucumber offers a simpler built-in solution.

Perhaps you could do something like this in RSpec:
describe DLUnit do
[
'jls', 'testCircuit1', true,
'jls', 'brokenCircuit1', false,
# ...
].each do |simulator, circuit, expected_validity|
it "with the #{simulator} simulator finds the #{circuit} circuit #{expected_validity ? 'valid' : 'invalid' }" do
actual_output = DLUnit.run "#{circuit.simulator}" # obviously I'm making this part up
expect(actual_output).to include(expected_validity ? 'passed' : 'failures')
end
end
end
The test code itself is a little involved, but you only have to write it once, and the RSpec output should be clear.

Not much of an upgrade to what you already have but how about merging into one scenario outline.
Addition of new simulator you will need to make two changes in one examples table. Also you can make it more configurable in terms of changes in a different valid test file for two simulators or different result messages. But same can be done for your existing solution all be it you will have to do change the steps and examples.
Scenario Outline: Testing circuit
when I run <kind> DLUnit with "<circuit>.<type> testFile"
Then I should see <result> on stdout
Examples:
| type | kind | circuit | result |
| jls | valid | testCircuit1 | All tests (4) passed |
| jls | invalid | brokenCircuit1 | There were failures |
| circ | valid | testCircuit1 | All tests (4) passed |
| circ | invalid | brokenCircuit1 | There were failures |
| v | valid | testCircuit1 | All tests (4) passed |
| v | invalid | brokenCircuit1 | There were failures |

Array placeholder in Gherkin syntax

Hi I am trying to write express a set of requirements in gherkin syntax, but it requires a good deal of repetition. I saw here that I can use placeholders which would be perfect for my task, however some of the data in my Given and in my then are collections. How would I go about representing collections in the examples?
Given a collection of spaces <spaces>
And a <request> to allocate space
When I allocate the request
Then I should have <allocated_spaces>
Examples:
| spaces | request | allocated_spaces |
| ? | ? | ? |

A bit hacky, but you can delimit a string:
Given a collection of spaces <spaces>
And a <request> to allocate space
When I allocate the request
Then I should have <allocated_spaces>
Examples:
| spaces | request | allocated_spaces |
| a,b,c | ? | ? |
Given(/^a collection of spaces (.*?)$/) do |arg1|
collection = arg1.split(",") #=> ["a","b","c"]
end

You can use Data Tables. I never try to have param in data table before, but in theory it should work.
Given a collection of spaces:
| space1 |
| space2 |
| <space_param> |
And a <request> to allocate space
When I allocate the request
Then I should have <allocated_spaces>
Examples:
| space_param | request | allocated_spaces |
| ? | ? | ? |
The given data table would be an instance of Cucumber::Ast::Table, checkout the rubydoc for its API.

Here's an example, again using split, but without the regex:
Scenario Outline: To ensure proper allocation
Given a collection of spaces <spaces>
And a <request> to allocate space
When I allocate the request
Then I should have <allocated_spaces>
Examples:
| spaces | request | allocated_spaces |
| "s1, s2, s3" | 2 | 2 |
| "s1, s2, s3" | 3 | 3 |
| "s1, s2" | 3 | 2 |
I use cucumber-js so this is what the code may look like:
Given('a collection of spaces {stringInDoubleQuotes}', function (spaces, callback) {
// Write code here that turns the phrase above into concrete actions
this.availableSpaces = spaces.split(", ");
callback();
});
Given('a {int} to allocate space', function (numToAllocate, callback) {
this.numToAllocate = numToAllocate;
// Write code here that turns the phrase above into concrete actions
callback();
});
When('I allocate the request', function (callback) {
console.log("availableSpaces:", this.availableSpaces);
console.log("numToAllocate:", this.numToAllocate);
// Write code here that turns the phrase above into concrete actions
callback();
});
Then('I should have {int}', function (int, callback) {
// Write code here that turns the phrase above into concrete actions
callback(null, 'pending');
});

Understanding dequeue_rt_stack() for RT scheduling class linux

enqueue_task_rt function in ./kernel/sched/rt.c is responsible for queuing the task to the run queue. enqueue_task_rt contains call to enqueue_rt_entity which calls dequeue_rt_stack. Most part of the code seems logical but I am a bit lost because of the function dequeue_rt_stack unable to understand what it does. Can somebody tell what is the logic that I am missing or suggest some good read.
Edit: The following is the code for dequeue_rt_stack function
struct sched_rt_entity *back = NULL;
/* macro for_each_sched_rt_entity defined as
for(; rt_se; rt_se = rt_se->parent)*/
for_each_sched_rt_entity(rt_se) {
rt_se->back = back;
back = rt_se;
}
for (rt_se = back; rt_se; rt_se = rt_se->back) {
if (on_rt_rq(rt_se))
__dequeue_rt_entity(rt_se);
}
More specifically, I do not understand why there is a need for this code:
for_each_sched_rt_entity(rt_se) {
rt_se->back = back;
back = rt_se;
}
What is its relevance.

When a task is to be added to some queue, it must first be removed from the queue that it currently is on, if any.
With the group scheduler, a task is always at the lowest level of the tree, and might have multiple ancestors:
NULL
^
|
+-----parent------+
| |
| top-level group |
| |
+-----------------+
^ ^_____________
| \
+-----parent------+ +-----parent------+
| | | |
| mid-level group | | other group | ...
| | | |
+-----------------+ +-----------------+
^ ^_____________
| \
+-----parent------+ +-----------------+
| | | |
| task | | other task | ...
| | | |
+-----------------+ +-----------------+
To remove the task from the tree, it must be removed from all groups' queues, and this must be done first at the top-level group (otherwise, the scheduler might try to run an already partially-removed task). Therefore, dequeue_rt_stack uses the back pointers to constructs a list in the opposite direction:
NULL back
^ |
| V
+-parent----------+
| |
| top-level group |
| |
+----------back---+
^ | ^_____________
| V \
+-parent----------+ +-----parent------+
| | | |
| mid-level group | | other group | ...
| | | |
+----------back---+ +-----------------+
^ | ^_____________
| V \
+-parent----------+ +-----------------+
| | | |
| task | | other task | ...
| | | |
+----------back---+ +-----------------+
|
V
NULL
That back list can then be used to walk down the tree to remove the entities in the correct order.

I am a fresh man in kernel hacking. This is my first time to answer linux kernel question.
Maybe this help to you.
I read the source code. I think it maybe relates to group scheduling.
When kernel have these codes:
#ifdef CONFIG_RT_GROUP_SCHED
It represents that we can collect some schedule entities in to one schduling group.
static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
{
dequeue_rt_stack(rt_se);
for_each_sched_rt_entity(rt_se)
__enqueue_rt_entity(rt_se, head);
}
Function dequeue_rt_stack(rt_se) extracts all the scheduling entities belong to the group, then add them to run queue.
Hierarchical group I/O scheduling
CFS group scheduling

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parsing in Linux - linux

Related

Bash: For Loop & save each output as a new column in a csv

Convert number to gigabytes (sed, awk, sh, bash)

Use the same examples in multiple Cucumber scenario outlines

Array placeholder in Gherkin syntax

Understanding dequeue_rt_stack() for RT scheduling class linux

Categories

Resources