Signatures smartmatching misunderstanding - signature

While reading and trying signature smartmatching I run into something strange.
Executing the following smartmaching signature pairs:
my #sigs = :($a, $b), :($a, #b), :($a, %b);
my #signatures_to_check = :($, $), :($, #), :($, %);
my $c = 0;
for #sigs -> $sig {
for #signatures_to_check -> $s {
$c++;
if $sig ~~ $s {
say " [ $c ] " ~ $sig.gist ~ ' match ' ~ $s.gist;
next;
}
say " [ $c ] " ~ $sig.gist ~ ' do NOT match ' ~ $s.gist;
}
say "\n" ~ '#' x 40 ~ "\n";
}
I've got the following results:
[ 1 ] ($a, $b) match ($, $)
[ 2 ] ($a, $b) do NOT match ($, #)
[ 3 ] ($a, $b) do NOT match ($, %)
########################################
[ 4 ] ($a, #b) match ($, $)
[ 5 ] ($a, #b) match ($, #)
[ 6 ] ($a, #b) do NOT match ($, %)
########################################
[ 7 ] ($a, %b) match ($, $)
[ 8 ] ($a, %b) do NOT match ($, #)
[ 9 ] ($a, %b) match ($, %)
I've tried explaining myself cases [ 4 ] and [ 7 ] but I've failed!
Can somebody explain it to me?

How many things is a value that does the Positional role? Or one that does the Associative role?
The hint is in "a value that does..." and "one that does...". It's a single thing.
So, yes, an given Array or Hash has zero, one, two, or more elements. But it is, as itself, a single thing.
$ indicates a scalar symbol or value. What is the constraint on a scalar symbol or value? It is that it binds to a single thing at a time (even if that thing itself can contain multiple elements).

Related

Compare two strings and find mismatch and match and count them both in perl

Compare two strings and find mismatch and mismatch and count them both
string1 = "SEQUENCE"
string2 = "SEKUEAEE"
I want output like. With the mismatch and match count.
'SS' match 1
'EE' match 3
'UU' match 1
'QK' mismatch 1
'NA' mismatch 1
'CE' mismatch 1
Here's a solution in old Perl. Also works with however many strings you want
use warnings;
use strict;
use List::AllUtils qw( mesh part count_by pairs );
my #strings = ("SEQUENCES", "SEKUEAEES", "SEKUEAEES");
my $i = 0;
print join "",
map { $_->[0] . " " . ($_->[1] > 1 ? 'match' : 'mismatch') . " " . $_->[1] ."\n" }
pairs
count_by { $_ }
map { join "", #$_ }
part { int($i++/scalar #strings) }
&mesh( #{[ map { [ split // ] } #strings ]} )
;
And here for comparison, analogous code in Perl 6.
my #strings = "SEQUENCES", "SEKUEAEES", "SEKUEAEES";
([Z] #strings>>.comb)
.map({ .join })
.Bag
.map({ "{.key} { .value > 1 ?? 'match' !! 'mismatch' } {.value}\n" })
.join
.say;
Isn't that just pretty?
Solution that works for any amount of strings.
use List::Util qw(max);
use Perl6::Junction qw(all);
my #strings = qw(SEQUENCE SEKUEAEE);
my (%matches, %mismatches);
for my $i (0 .. -1 + max map { length } #strings) {
my #c = map { substr $_, $i, 1 } #strings;
if ($c[0] eq all #c) {
$matches{join '', #c}++;
} else {
$mismatches{join '', #c}++;
}
}
for my $k (keys %matches) {
printf "'%s' match %d\n", $k, $matches{$k};
}
for my $k (keys %mismatches) {
printf "'%s' mismatch %d\n", $k, $mismatches{$k};
}
__END__
'SS' match 1
'UU' match 1
'EE' match 3
'QK' mismatch 1
'NA' mismatch 1
'CE' mismatch 1
Useing the non-core but very handy List::MoreUtils module.
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use List::MoreUtils qw/each_array/;
sub count_matches {
die "strings must be equal length!" unless length $_[0] == length $_[1];
my #letters1 = split //, $_[0];
my #letters2 = split //, $_[1];
my (%matches, %mismatches);
my $iter = each_array #letters1, #letters2;
while (my ($c1, $c2) = $iter->()) {
if ($c1 eq $c2) {
$matches{"$c1$c2"} += 1;
} else {
$mismatches{"$c1$c2"} += 1;
}
}
say "'$_' match $matches{$_}" for sort keys %matches;
say "'$_' mismatch $mismatches{$_}" for sort keys %mismatches;
}
count_matches qw/SEQUENCE SEKUEAEE/;

Cascade in Rebol

In Logo Language, cascade is a procedure to to compose a function with itself several times (it is almost like fold in functional language).
Example:
add 4 add 4 add 4 5 --> cascade 3 [add 4 ?1] 5 == 17
2^8 --> cascade 8 [?1 * 2] 1
fibonacci 5 --> (cascade 5 [?1 + ?2] 1 [?1] 0)
factorial 5 --> (cascade 5 [?1 * ?2] 1 [?2 + 1] 1)
General notation for multi-input cascade, in Logo:
(cascade how many function1 start1 function2 start2 ...) with:
function1 -> ?1 ,
function2 -> ?2 ...
Cascade returns the final value of ?1.
In Rebol:
cascade1: func [howmany function1 start1] [....]
cascade2: func [howmany function1 start1 function2 start2] [....]
How to write cascade1 and cascade2 in Rebol ?
My answer uses REBOL 3, but could be backported to 2 without too much trouble. (I'd have done it in REBOL 2, but I don't have REBOL 2 on my system and haven't used it in a long time.) This implements cascade fully (i.e., with any number of "functions") and does it in an idiomatically REBOL kind of way: It uses a simple DSL.
cascade: funct [
count [integer!]
template [block!]
/only "Don't reduce TEMPLATE"
/local arg fun-block
][
param-list: copy []
param-number: 1
arg-list: copy []
fun-list: copy []
template-rules: [
some [
copy fun-block block! (
append param-list to word! rejoin ["?" ++ param-number]
append fun-list fun-block
)
copy arg any-type! (
append arg-list :arg
)
]
end
]
unless only [template: reduce template]
unless parse template template-rules [
do make error! rejoin ["The template " mold/flat template " contained invalid syntax."]
]
while [! tail? fun-list] [
fun-list: change fun-list func param-list first fun-list
]
fun-list: head fun-list
loop count [
temp-args: copy []
for f 1 length? fun-list 1 [
append/only temp-args apply pick fun-list f arg-list
]
arg-list: copy temp-args
]
first arg-list
]
Using it is simple:
print cascade 23 [[?1 + ?2] 1 [?1] 0]
This correctly gives the value 46368 from one of the cascade examples given in the Logo cascade documentation linked by the questioner. The syntax of the DSL should be brutally obvious. It's a series of blocks followed by starting arguments. The outer block is reduced unless the /only refinement is used. A block itself will work just fine as an argument, e.g.,
cascade 5 [[?1] [1 2 3]]
This is because the first block is interpreted as a "function", the second as a starting argument, the third as a "function" and so on until the template block is exhausted.
As far as I can tell, this is a complete (and rather elegant) implementation of cascade. Man, I love REBOL. What a shame this language didn't take off.
With bind, that Binds words to a specified context (in this case local context of function), and compose function, I get:
cascade: func [
times
template
start
] [
use [?1] [
?1: start
template: compose [?1: (template)]
loop times bind template '?1
?1
]
]
cascade 8 [?1 * 2] 1
== 256
cascade 3 [add 4 ?1] 5
== 17
val: 4
cascade 3 [add val ?1] 5
== 17
cascade2: func [
times
template1 start1
template2 start2
/local **temp**
] [
use [?1 ?2] [ ; to bind only ?1 and ?2 and to avoid variable capture
?1: start1
?2: start2
loop
times
bind
compose [**temp**: (template1) ?2: (template2) ?1: **temp**]
'?1
?1
]
]
cascade2 5 [?1 * ?2] 1 [?2 + 1] 1
== 120
cascade2 5 [?1 + ?2] 1 [?1] 0
== 8
Here is a somewhat working cascade in Rebol. It won't work with op! datatype--i.e. +, *--but it will work with add and multiply. You may want to check out the higher order functions script to see some other examples. I haven't had time to write cascade2 yet
cascade: func [
times [integer!]
f [any-function!]
partial-args [series!]
last-arg
][
expression: copy reduce [last-arg]
repeat n times [
insert head expression partial-args
insert head expression get 'f
]
expression
]
With your examples:
probe cascade 3 :add [4] 5
print cascade 3 :add [4] 5
will result in:
[make action! [[
"Returns the addition of two values."
value1 [scalar! date!]
value2
]] 4 make action! [[
"Returns the addition of two values."
value1 [scalar! date!]
value2
]] 4 make action! [[
"Returns the addition of two values."
value1 [scalar! date!]
value2
]] 4 5]
17
and
probe cascade 8 :multiply [2] 1
print cascade 8 :multiply [2] 1
Will result in:
[make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 make action! [[
"Returns the first value multiplied by the second."
value1 [scalar!]
value2 [scalar!]
]] 2 1]
256

If...else if...else in REBOL

I've noticed that REBOL doesn't have a built in if...elsif...else syntax, like this one:
theVar: 60
{This won't work}
if theVar > 60 [
print "Greater than 60!"
]
elsif theVar == 3 [
print "It's 3!"
]
elsif theVar < 3 [
print "It's less than 3!"
]
else [
print "It's something else!"
]
I have found a workaround, but it's extremely verbose:
theVar: 60
either theVar > 60 [
print "Greater than 60!"
][
either theVar == 3 [
print "It's 3!"
][
either theVar < 3 [
print "It's less than 3!"
][
print "It's something else!"
]
]
]
Is there a more concise way to implement an if...else if...else chain in REBOL?
The construct you would be looking for would be CASE. It takes a series of conditions and code blocks to evaluate, evaluating the blocks only if the condition is true and stopping after the first true condition is met.
theVar: 60
case [
theVar > 60 [
print "Greater than 60!"
]
theVar == 3 [
print "It's 3!"
]
theVar < 3 [
print "It's less than 3!"
]
true [
print "It's something else!"
]
]
As you see, getting a default is as simple as tacking on a TRUE condition.
Also: if you wish, you can have all of the cases run and not short circuit with CASE/ALL. That prevents case from stopping at the first true condition; it will run them all in sequence, evaluating any blocks for any true conditions.
And a further option is to use all
all [
expression1
expression2
expression3
]
and as long as each expression returns a true value, they will continue to be evaluated.
so,
if all [ .. ][
... do this if all of the above evaluate to true.
... even if not all true, we got some work done :)
]
and we also have any
if any [
expression1
expression2
expression3
][ this evaluates if any of the expressions is true ]
You can use the case construct for this, or the switch construct.
case [
condition1 [ .. ]
condition2 [ ... ]
true [ catches everything , and is optional ]
]
The case construct is used if you're testing for different conditions. If you're looking at a particular value, you can use switch
switch val [
va1 [ .. ]
val2 [ .. ]
val3 val4 [ either or matching ]
]

Checking the number of param in bash shell script

I use for checking the number of params in bash shell as follows:
#! /bin/bash
usage() {
echo "Usage: $0 <you need to specify at least 1 param>"
exit -1
}
[ x = x$1 ] && usage
where, if [ x = x$1 ] condition is not satisfied, execute usage.
Here, my question is, I never really think about the expression [ x = x$1 ] which looks a lot like a condition expression. Is x counted as a literal? and how come can we use = for comparison. Typically should it be something like ==?
Could anybody please fill the void here?
[ x = y ] condition is for comparing strings. (just once =)
x$1 means concatenating two string x and $1.
So, if $1 is empty, x$1 equals x, and [ x = x ] will be true as a result.
[ x = x$1 ] is error prone, don't use it. Do this instead [ "$1" ]
The difference is that if $1 contains a space or other special characters, your script will crash, for example:
$ a='hello world'
$ [ x = x$a ] && echo works
-bash: [: too many arguments
To fix this you could do [ x = x"$1" ], but [ "$1" ] is shorter, so what's the point.
[] expressions are used extensively in shell scripts, I recommend to read help test. The [ is a synonym for the "test" builtin, but the last argument must be a literal ], to match the opening [. In there you will find the explanation of the differences between = and == operators.
Finally, literal text in conditions is evaluated to true if not empty. Some more examples:
[ x ] # true
[ abc ] # true
[ a = a ] # true
[ a = x ] # false
[ '' ] # false
[ b = '' ] # false
Also common gotchas are these:
[ 0 ] # true
[ -n ] # true
[ -blah ] # true
[ false ] # true
These are true, because 0, -n, false or anything being the only argument, they are treated as literal strings, and so the condition evaluates to true.
Use $# to count the number of parameters and use the OR operator instead of the AND so that usage() only gets executed if the first condition fails.
#! /bin/bash
usage() {
echo "Usage: $0 <you need to specify at least 1 param>"
exit -1
}
[ x = $# ] || usage

Determining the ratio of matches to non-matches of 2 primary strands? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC?
Im trying to write a Perl script that compares two DNA sequences (60 characters in length each lets say) in alignment, and then show the ratio of matches to non-matches of the sequences to each other. But i'm not having much luck. if it helps i can upload my code, but its no use. here's an example of what im trying to achieve below.
e.g
A T C G T A C
| | | | | | |
T A C G A A C
So the matches of the above example would be 4. and non-matches are: 3. Giving it a ratio of 4.3.
Any help would be much appreciated. thanks.
in general, please do post your code. It does help. In any case, something like this should do what you are asking:
#!/usr/bin/perl -w
use strict;
my $d1='ATCGTAC';
my $d2='TACGAAC';
my #dna1=split(//,$d1);
my #dna2=split(//,$d2);
my $matches=0;
for (my $i=0; $i<=$#dna1; $i++) {
$matches++ if $dna1[$i] eq $dna2[$i];
}
my $mis=scalar(#dna1)-$matches;
print "Matches/Mismatches: $matches/$mis\n";
Bear in mind though that the ratio of 4 to 3 is most certainly not 4.3 but ~1.3. If you post some information on your input file format I will update my answer to include lines for parsing the sequence from your file.
Normally I'd say "What have you tried" and "upload your code first" because it doesn't seem to be a very difficult problem. But let's give this a shot:
create two arrays, one to hold each sequence:
#sequenceOne = ("A", "T", "C", "G", "T", "A", "C");
#sequenceTwo = ("T", "A", "C", "G", "A", "A", "C");
$myMatch = 0;
$myMissMatch = 0;
for ($i = 0; $i < #sequenceOne; $i++) {
my $output = "Comparing " . $sequenceOne[$i] . " <=> " . $sequenceTwo[$i];
if ($sequenceOne[$i] eq $sequenceTwo[$i]) {
$output .= " MATCH\n";
$myMatch++;
} else {
$myMissMatch++;
$output .= "\n";
}
print $output;
}
print "You have " . $myMatch . " matches.\n";
print "You have " . $myMissMatch . " mismatches\n";
print "The ratio of hits to misses is " . $myMatch . ":" . $myMissMatch . ".\n";
Of course, you'd probably want to read the sequence from something else on the fly instead of hard-coding the array. But you get the idea. With the above code your output will be:
torgis-MacBook-Pro:platform-tools torgis$ ./dna.pl
Comparing A <=> T
Comparing T <=> A
Comparing C <=> C MATCH
Comparing G <=> G MATCH
Comparing T <=> A
Comparing A <=> A MATCH
Comparing C <=> C MATCH
You have 4 matches.
You have 3 mismatches
The ratio of hits to misses is 4:3.
So many ways to do this. Here's one.
use strict;
use warnings;
my $seq1 = "ATCGTAC";
my $seq2 = "TACGAAC";
my $len = length $seq1;
my $matches = 0;
for my $i (0..$len-1) {
$matches++ if substr($seq1, $i, 1) eq substr($seq2, $i, 1);
}
printf "Length: %d Matches: %d Ratio: %5.3f\n", $len, $matches, $matches/$len;
exit 0;
Just grab the length of one of the strings (we're assuming string lengths are equal, right?), and then iterate using substr.
my #strings = ( 'ATCGTAC', 'TACGAAC' );
my $matched;
foreach my $ix ( 0 .. length( $strings[0] ) - 1 ) {
$matched++
if substr( $strings[0], $ix, 1 ) eq substr( $strings[1], $ix, 1 );
}
print "Matches: $matched\n";
print "Mismatches: ", length( $strings[0] ) - $matched, "\n";
I think substr is the way to go, rather than splitting the strings into arrays.
This is probably most convenient if presented as a subroutine:
use strict;
use warnings;
print ratio(qw/ ATCGTAC TACGAAC /);
sub ratio {
my ($aa, $bb) = #_;
my $total = length $aa;
my $matches = 0;
for (0 .. $total-1) {
$matches++ if substr($aa, $_, 1) eq substr($bb, $_, 1);
}
$matches / ($total - $matches);
}
output
1.33333333333333
Bill Ruppert's right that there are many way to do this. Here's another:
use Modern::Perl;
say compDNAseq( 'ATCGTAC', 'TACGAAC' );
sub compDNAseq {
my $total = my $i = 0;
$total += substr( $_[1], $i++, 1 ) eq $1 while $_[0] =~ /(.)/g;
sprintf '%.2f', $total / ( $i - $total );
}
Output:
1.33
Here is an approach which gives a NULL, \0, for each match in an xor comparison.
#!/usr/bin/perl
use strict;
use warnings;
my $d1='ATCGTAC';
my $d2='TACGAAC';
my $len = length $d1; # assumes $d1 and $d2 are the same length
my $matches = () = ($d1 ^ $d2) =~ /\0/g;
printf "ratio of %f", $matches / ($len - $matches);
Output: ratio of 1.333333

Resources