Illegal expression when pattern matching with maps in Erlang - hashmap

I'm trying to execute sample code on the interactive shell from Armstrong's Erlang book. This is what the book says is the case:
1> Henry8 = #{ class => king, born => 1491, died => 1547 }. #{ born =>
1491, class=> king, died => 1547 }.
2> #{ born => B } = Henry8.
#{ born => 1491, class=> king, died => 1547 }.
However, this is what I'm getting on the shell, it seems the pattern matching is failing:
1> Henry8 = #{ class => king, born => 1491, died => 1547 }.
#{born => 1491,class => king,died => 1547}
2> #{ born => B } = Henry8.
* 1: illegal pattern

=> is for constructing a map. To pattern match a map, you need to use := instead.
1> Henry8 = #{ class => king, born => 1491, died => 1547 }.
#{born => 1491,class => king,died => 1547}
2> #{ born := B } = Henry8.
#{born => 1491,class => king,died => 1547}
3> B.
1491
This is documented in the section "Maps in Patterns" here.

The code example was preceded by the text:
Pattern Matching the Fields of a Map
The := syntax we used in a map literal can also be used as a map pattern.
And that text was preceded by a whole section explaining the differences between => and := when constructing a map, so you should have been aware of the two different syntaxes.
In the book, line 2 of the example says:
2> #{born := B} = Henry8.
yet in the shell you typed:
2> #{ born => B } = Henry8.
I suggest you reread section 5.3 a little more carefully, and also read the pertinent section of LYSE, which includes this example:
1> Pets = #{"dog" => "winston", "fish" => "mrs.blub"}.
#{"dog" => "winston","fish" => "mrs.blub"}
2> #{"fish" := CatName, "dog" := DogName} = Pets.
#{"dog" => "winston","fish" => "mrs.blub"}
7> CatName.
"mrs.blub"
8> DogName.
"winston"
Here it's possible to grab the contents of any number of items at a
time, regardless of order of keys. You'll note that elements are set
with => and matched with :=. The := operator can also be used to update an existing key in a map

Related

Perl - How to set key based on column header when converting from xlsx to perl hash

I have a xlsx that im converting into a perl hash
Name
Type
Symbol
Colour
JOHN
SUV
X
R
ROB
MPV
Y
B
JAMES
4X4
Y
G
Currently, I can only set the hash superkey to the column wanted based on column array. I cant seem to figure out how to choose based on column header.
use Data::Dumper;
use Text::Iconv;
my $converter = Text::Iconv->new("utf-8", "windows-1251");
use Spreadsheet::XLSX;
my $excel = Spreadsheet::XLSX->new('file.xlsx', $converter);
foreach my $sheet (#{$excel->{Worksheet}}) {
if ($sheet->{Name} eq "sheet1"){
my %data;
for my $row ( 0 .. $sheet-> {MaxRow}) {
if ($sheet->{Cells}[0][$col]->{Val} eq "Symbol"){
my $super_key = $sheet->{Cells}[$row][$col]{Val};
}
my $key = $sheet->{Cells}[$row][0]{Val};
my $value = $sheet->{Cells}[$row][2]{Val};
my $value2= $sheet->{Cells}[$row][3]{Val};
$data{$super_key}->{$key}->{$value}=${value2};
}
print Dumper \%data;
}}
The outcome i get is,
$VAR1 = {
'' => {
'JOHN' => {
'SUV' => R
I would like to have;
$VAR1 = {
'X' => {
'JOHN' => {
'SUV' => R
`
You are missing use strict; in your perl script. If you had it, you would have seen your error yourself
Defining the $super_key with my in your If-clause, makes this variable lose scope as soon as you exit it.
And using a variable $col without defining it doesn't work either.
Better (and probably working) is:
for my $row ( 0 .. $sheet-> {MaxRow}) {
my $super_key;
foreach my $col (0 .. 3) {
if ($sheet->{Cells}[0][$col]->{Val} eq "Symbol"){
$super_key = $sheet->{Cells}[$row][$col]{Val};
}
}
my $key = $sheet->{Cells}[$row][0]{Val};
my $value = $sheet->{Cells}[$row][2]{Val};
my $value2= $sheet->{Cells}[$row][3]{Val};
$data{$super_key}->{$key}->{$value}=${value2};
}

Regex OR to match one or more patterns

I am currently using python to count the number of PHP array values in a PHP script. The arrays could be multidimensional, paired, or simply a list.
$arr = ['test','test',$test, $test->test,$arr[0][1][1],['test','test'=> 'another test'], array('test','test')];
$arr2 = array('test' => '','test' => '','test' => '','test' => '','test' => '','test' => '','test' => '','test' => '');
$arr3 = [ 'test' => array('test','test','test','test',
'test' => array('test') )];
Notice that I could have an array declared with square brackets or the array keyword.
Currently, I am using the following Python code
R1 = re.findall(r"\[.*\]",String)
for L in R1:
print( len(L.split(',')) )
return None
R1 = re.findall(r"array\(.*\)",String)
for L in R1:
print( len(L.split(',')) )
return None
It seems redundant to use two for loops for this. How can I combine the regex expression to count all array values in the three arrays?

ANTLR4 String and Comments Lexer

I'm new to ANTLR so I hope you guy explains for me explicitly.
I have a /* comment */ (BC) lexer in ANTLR, I want it to be like this:
/* sample */ => BC
/* s
a
m
p
l
e */ => BC
"" => STRING
" " => STRING
"a" => STRING
"hello world \1" => STRING
but I got this:
/* sample */
/* s
a
m
p
l
e */ => BC
""
" "
"a"
"hello world \1" => STRING
it only take the 1st /* and the last */, same with my String token. Here's the code of Comments:
BC: '/*'.*'*/';
And the String:
STRING: '"'(~('"')|(' '|'\b'|'\f'|'r'|'\n'|'\t'|'\"'|'\\'))*'"';
Lexer rules are greedy by default, meaning they try to consume the longest matching sequence. So they stop at the last closing delimiter.
To make a rule non-greedy, use, well, nongreedy rules:
BC: '/*' .*? '*/';
This will stop at the first closing */ which is exactly what you need.
Same with your STRING. Read about it in The Definitive ANTLR4 Reference, page 285.
Also you can use the following code fragment without non-greedy syntax (more general soultion):
MultilineCommentStart: '/*' -> more, mode(COMMENTS);
mode COMMENTS;
MultilineComment: '*/' -> mode(DEFAULT_MODE);
MultilineCommentNotAsterisk: ~'*'+ -> more;
MultilineCommentAsterisk: '*' -> more;

Find the number of matching two characters in a string in Perl

Is there a method in Perl (not BioPerl) to find the number of each two consecutive letters.
I.e., number of AA, AC, AG, AT, CC, CA, ... in a sequence like this:
$sequence = 'AACGTACTGACGTACTGGTTGGTACGA'
PS: We can make it manually by using the regular expression, i.e., $GC=($sequence=~s/GC/GC/g) which return the number of GC in the sequence.
I need an automated and generic way.
You had me confused for a while, but I take it you want to count the dinucleotides in a given string.
Code:
my #dinucs = qw(AA AC AG CC CA CG);
my %count;
my $sequence = 'AACGTACTGACGTACTGGTTGGTACGA';
for my $dinuc (#dinucs) {
$count{$dinuc} = ($sequence =~ s/\Q$dinuc\E/$dinuc/g);
}
Output from Data::Dumper:
$VAR1 = {
"AC" => 5,
"CC" => "",
"AG" => "",
"AA" => 1,
"CG" => 3,
"CA" => ""
};
Close to TLP's answer, but without substitution:
my $sequence = 'AACGTACTGACGTACTGGTTGGTACGA';
my #dinucs = qw(AA AC AG AT CC CG);
my %count = map{$_ => 0}#dinucs;
for my $dinuc (#dinucs) {
while($sequence=~/$dinuc/g) {
$count{$dinuc}++;
}
}
Benchmark:
my $sequence = 'AACGTACTGACGTACTGGTTGGTACGA';
my #dinucs = qw(AA AC AG AT CC CG);
my %count = map{$_ => 0}#dinucs;
my $count = -3;
my $r = cmpthese($count, {
'match' => sub {
for my $dinuc (#dinucs) {
while($sequence=~/$dinuc/g) {
$count{$dinuc}++;
}
}
},
'substitute' => sub {
for my $dinuc (#dinucs) {
$count{$dinuc} = ($sequence =~ s/\Q$dinuc\E/$dinuc/g);
}
}
});
Output:
Rate substitute Match
Substitute 13897/s -- -11%
Match 15622/s 12% --
Regex works if you're careful, but there's a simple solution using substr that will be faster and more flexible.
(As of this posting, the regex solution marked as accepted will fail to correctly count dinucleotides in repeated regions like 'AAAA...', of which there are many in naturally occurring sequences.
Once you match 'AA', the regex search resumes on the third character, skipping the middle 'AA' dinucleotide. This doesn't affect the other dinucleotides since if you have 'AC' at one position, you're guaranteed not to have it in the next base, naturally. The particular sequence given in the question will not suffer from this problem since no base appears three times in a row.)
The method I suggest is more flexible in that it can count words of any length; extending the regex method to longer words is complicated since you have to do even more gymnastics with your regex to get an accurate count.
sub substrWise {
my ($seq, $wordLength) = #_;
my $cnt = {};
my $w;
for my $i (0 .. length($seq) - $wordLength) {
$w = substr($seq, $i, $wordLength);
$cnt->{$w}++;
}
return $cnt;
}
sub regexWise {
my ($seq, $dinucs) = #_;
my $cnt = {};
for my $d (#$dinucs) {
if (substr($d, 0,1) eq substr($d, 1,1) ) {
my $n = substr($d, 0,1);
$cnt->{$d} = ($seq =~ s/$n(?=$n)/$n/g); # use look-ahead
} else {
$cnt->{$d} = ($seq =~ s/$d/$d/g);
}
}
return $cnt;
}
my #dinucs = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT);
my $sequence = 'AACGTACTGACGTACTGGTTGGTACGA';
use Test::More tests => 1;
my $rWise = regexWise($sequence, \#dinucs);
my $sWise = substrWise($sequence, 2);
$sWise->{$_} //= '' for #dinucs; # substrWise will not create keys for words not found
# this seems like desirable behavior IMO,
# but i'm adding '' to show that the counts match
is_deeply($rWise, $sWise, 'verify equivalence');
use Benchmark qw(:all);
cmpthese(100000, {
'regex' => sub {
regexWise($sequence, \#dinucs);
},
'substr' => sub {
substrWise($sequence, 2);
}
Output:
1..1
ok 1 - verify equivalence
Rate regex substr
regex 11834/s -- -85%
substr 76923/s 550% --
For longer sequences (10-100 kbase), the advantage is not as pronounced, but it still wins by about 70%.

Reading Multiple Inputs from the Same Line the Scala Way

I tried to use readInt() to read two integers from the same line but that is not how it works.
val x = readInt()
val y = readInt()
With an input of 1 727 I get the following exception at runtime:
Exception in thread "main" java.lang.NumberFormatException: For input string: "1 727"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:231)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at scala.Console$.readInt(Console.scala:356)
at scala.Predef$.readInt(Predef.scala:201)
at Main$$anonfun$main$1.apply$mcVI$sp(Main.scala:11)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:75)
at Main$.main(Main.scala:10)
at Main.main(Main.scala)
I got the program to work by using readf but it seems pretty awkward and ugly to me:
val (x,y) = readf2("{0,number} {1,number}")
val a = x.asInstanceOf[Int]
val b = y.asInstanceOf[Int]
println(function(a,b))
Someone suggested that I just use Java's Scanner class, (Scanner.nextInt()) but is there a nice idiomatic way to do it in Scala?
Edit:
My solution following paradigmatic's example:
val Array(a,b) = readLine().split(" ").map(_.toInt)
Followup Question: If there were a mix of types in the String how would you extract it? (Say a word, an int and a percentage as a Double)
If you mean how would you convert val s = "Hello 69 13.5%" into a (String, Int, Double) then the most obvious way is
val tokens = s.split(" ")
(tokens(0).toString,
tokens(1).toInt,
tokens(2).init.toDouble / 100)
// (java.lang.String, Int, Double) = (Hello,69,0.135)
Or as mentioned you could match using a regex:
val R = """(.*) (\d+) (\d*\.?\d*)%""".r
s match {
case R(str, int, dbl) => (str, int.toInt, dbl.toDouble / 100)
}
If you don't actually know what data is going to be in the String, then there probably isn't much reason to convert it from a String to the type it represents, since how can you use something that might be a String and might be in Int? Still, you could do something like this:
val int = """(\d+)""".r
val pct = """(\d*\.?\d*)%""".r
val res = s.split(" ").map {
case int(x) => x.toInt
case pct(x) => x.toDouble / 100
case str => str
} // Array[Any] = Array(Hello, 69, 0.135)
now to do anything useful you'll need to match on your values by type:
res.map {
case x: Int => println("It's an Int!")
case x: Double => println("It's a Double!")
case x: String => println("It's a String!")
case _ => println("It's a Fail!")
}
Or if you wanted to take things a bit further, you could define some extractors which will do the conversion for you:
abstract class StringExtractor[A] {
def conversion(s: String): A
def unapply(s: String): Option[A] = try { Some(conversion(s)) }
catch { case _ => None }
}
val intEx = new StringExtractor[Int] {
def conversion(s: String) = s.toInt
}
val pctEx = new StringExtractor[Double] {
val pct = """(\d*\.?\d*)%""".r
def conversion(s: String) = s match { case pct(x) => x.toDouble / 100 }
}
and use:
"Hello 69 13.5%".split(" ").map {
case intEx(x) => println(x + " is Int: " + x.isInstanceOf[Int])
case pctEx(x) => println(x + " is Double: " + x.isInstanceOf[Double])
case str => println(str)
}
prints
Hello
69 is Int: true
0.135 is Double: true
Of course, you can make the extrators match on anything you want (currency mnemonic, name begging with 'J', URL) and return whatever type you want. You're not limited to matching Strings either, if instead of StringExtractor[A] you make it Extractor[A, B].
You can read the line as a whole, split it using spaces and then convert each element (or the one you want) to ints:
scala> "1 727".split(" ").map( _.toInt )
res1: Array[Int] = Array(1, 727)
For most complex inputs, you can have a look at parser combinators.
The input you are describing is not two Ints but a String which just happens to be two Ints. Hence you need to read the String, split by the space and convert the individual Strings into Ints as suggested by #paradigmatic.
One way would be splitting and mapping:
// Assuming whatever is being read is assigned to "input"
val input = "1 727"
val Array(x, y) = input split " " map (_.toInt)
Or, if you have things a bit more complicated than that, a regular expression is usually good enough.
val twoInts = """^\s*(\d+)\s*(\d+)""".r
val Some((x, y)) = for (twoInts(a, b) <- twoInts findFirstIn input) yield (a, b)
There are other ways to use regex. See the Scala API docs about them.
Anyway, if regex patterns are becoming too complicated, then you should appeal to Scala Parser Combinators. Since you can combine both, you don't loose any of regex's power.
import scala.util.parsing.combinator._
object MyParser extends JavaTokenParsers {
def twoInts = wholeNumber ~ wholeNumber ^^ { case a ~ b => (a.toInt, b.toInt) }
}
val MyParser.Success((x, y), _) = MyParser.parse(MyParser.twoInts, input)
The first example was more simple, but harder to adapt to more complex patterns, and more vulnerable to invalid input.
I find that extractors provide some machinery that makes this type of processing nicer. And I think it works up to a certain point nicely.
object Tokens {
def unapplySeq(line: String): Option[Seq[String]] =
Some(line.split("\\s+").toSeq)
}
class RegexToken[T](pattern: String, convert: (String) => T) {
val pat = pattern.r
def unapply(token: String): Option[T] = token match {
case pat(s) => Some(convert(s))
case _ => None
}
}
object IntInput extends RegexToken[Int]("^([0-9]+)$", _.toInt)
object Word extends RegexToken[String]("^([A-Za-z]+)$", identity)
object Percent extends RegexToken[Double](
"""^([0-9]+\.?[0-9]*)%$""", _.toDouble / 100)
Now how to use:
List("1 727", "uptime 365 99.999%") collect {
case Tokens(IntInput(x), IntInput(y)) => "sum " + (x + y)
case Tokens(Word(w), IntInput(i), Percent(p)) => w + " " + (i * p)
}
// List[java.lang.String] = List(sum 728, uptime 364.99634999999995)
To use for reading lines at the console:
Iterator.continually(readLine("prompt> ")).collect{
case Tokens(IntInput(x), IntInput(y)) => "sum " + (x + y)
case Tokens(Word(w), IntInput(i), Percent(p)) => w + " " + (i * p)
case Tokens(Word("done")) => "done"
}.takeWhile(_ != "done").foreach(println)
// type any input and enter, type "done" and enter to finish
The nice thing about extractors and pattern matching is that you can add case clauses as necessary, you can use Tokens(a, b, _*) to ignore some tokens. I think they combine together nicely (for instance with literals as I did with done).

Resources