Splitting string into 2 dimensional table in Lua - string

Let's say I have this string:
map_data = " *-* ; /|x|\ ; *-*-*-*; /|x|x|x|;-*-*-*-*-*; \|x|x|x|; *-*-*-*; \|x|/ ; *-* ;"
I would like to split the string into an ordered table at the semicolons. Once I have done that I would like to take each element of the table and split each character into an ordered table (nested within the first table). The idea is to create a 2 dimensional table for an ascii map.
I have tried this (but it's not working and I also suspect there is an easier way):
map_data = " *-* ; /|x|\ ; *-*-*-*; /|x|x|x|;-*-*-*-*-*; \|x|x|x|; *-*-*-*; \|x|/ ; *-* ;"
map = {}
p = 1
pp = 1
for i in string.gmatch(map_data, "(.*);") do
map[p] = {}
for ii in string.gmatch(i, ".") do
map[p][pp] = ii
pp = pp + 1
end
p = p + 1
end

To start with, the string map_data is invalid, because \ needs to be escaped. Or you could use the long string syntax [[ ... ]]:
map_data = [[ *-* ; /|x|\ ; *-*-*-*; /|x|x|x|;-*-*-*-*-*; \|x|x|x|; *-*-*-*; \|x|/ ; *-* ;]]
The problem of the pattern (.*); is, the modifier * is greedy. Instead, use - modifier which is lazy:
for i in string.gmatch(map_data, "(.-);") do

It's been years since I've touched Lua but assuming you fix the escape character issue can't you then just do something along the lines of...
map = {{}} -- map initially contains one empty line
for i = 1, #map_data do
local c = map_data:sub(i,i)
if c == ';' then
map[#map+1] = {} -- add another line to the end of map
else
map[#map][ #map[#map] + 1] = c -- add c to last line in map
end
end

Related

convert 0 into string

i'm working on a script in perl.
This script read a DB and generate config file for other devices.
I have a problem with "0".
From my database, i get a 0 (int) and i want this 0 become a "0" in the config file. When i get any other value (1,2,3, etc), the script generate ("1","2","3", etc). But the 0 become an empty string "".
I know, for perl:
- undef
- 0
- ""
- "0"
are false.
How can i convert a 0 to "0" ? I try qw,qq,sprintf, $x = $x || 0, and many many more solutions.
I juste want to make a explicit conversion instead of an implicite conversion.
Thank you for your help.
If you think you have zero, but the program thinks you have an empty string, you are probably dealing with a dualvar. A dualvar is a scalar that contains both a string and a number. Perl usually returns a dualvar when it needs to return false.
For example,
$ perl -we'my $x = 0; my $y = $x + 1; CORE::say "x=$x"'
x=0
$ perl -we'my $x = ""; my $y = $x + 1; CORE::say "x=$x"'
Argument "" isn't numeric in addition (+) at -e line 1.
x=
$ perl -we'my $x = !1; my $y = $x + 1; CORE::say "x=$x"'
x=
As you can see, the value returned by !1 acts as zero when used as a number, and acts as an empty string when used as a string.
To convert this dualvar into a number (leaving other numbers unchanged), you can use the following:
$x ||= 0;

find a substring with character index [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
How to join two substrings in perl
How could I find a substring of a string array which corresponds to user entered index of character?
For example: If there is a string $string = 'MFHYRAKCLAMSCTLPHCAKNDHGCTH';and it get broken into array #string = ( "MFHYRA","KCLAM", "SCTLP", "HCAKNDHGCTH" ) ; now if user enters position of A as 6 and 10, how could two corresponding substrings be searched and joined?
Essentially, what you ask is this: we have a position in the initial string. The string gets split up in substrings. In which of the substrings is the position?
Assume that
#pos = ( 5, 7, 9 ) ;
is the list of positions for which you would like to find the substrings.
my $n = 0 ; # current position
my %results ;
foreach my $ss ( #substrings ) {
$n += length( $ss ) ;
foreach my $p ( #pos ) {
if( ! $results{$p} and $p < $n ) { $results{$p} = $ss ; }
}
}
foreach my $p ( #pos ) {
print "Position $p, substring $results{$p}\n" ;
}
Clearly, this code could use some optimalization, for example -- no need to loop over the elements of #pos if we already have a substring for them, and we should break the operation after the last element of #pos got its substring, but for a few positions entered for the user this doesn't really matter.

How to tell apart numeric scalars and string scalars in Perl?

Perl usually converts numeric to string values and vice versa transparently. Yet there must be something which allows e.g. Data::Dumper to discriminate between both, as in this example:
use Data::Dumper;
print Dumper('1', 1);
# output:
$VAR1 = '1';
$VAR2 = 1;
Is there a Perl function which allows me to discriminate in a similar way whether a scalar's value is stored as number or as string?
A scalar has a number of different fields. When using Perl 5.8 or higher, Data::Dumper inspects if there's anything in the IV (integer value) field. Specifically, it uses something similar to the following:
use B qw( svref_2object SVf_IOK );
sub create_data_dumper_literal {
my ($x) = #_; # This copying is important as it "resolves" magic.
return "undef" if !defined($x);
my $sv = svref_2object(\$x);
my $iok = $sv->FLAGS & SVf_IOK;
return "$x" if $iok;
$x =~ s/(['\\])/\\$1/g;
return "'$x'";
}
Checks:
Signed integer (IV): ($sv->FLAGS & SVf_IOK) && !($sv->FLAGS & SVf_IVisUV)
Unsigned integer (IV): ($sv->FLAGS & SVf_IOK) && ($sv->FLAGS & SVf_IVisUV)
Floating-point number (NV): $sv->FLAGS & SVf_NOK
Downgraded string (PV): ($sv->FLAGS & SVf_POK) && !($sv->FLAGS & SVf_UTF8)
Upgraded string (PV): ($sv->FLAGS & SVf_POK) && ($sv->FLAGS & SVf_UTF8)
You could use similar tricks. But keep in mind,
It'll be very hard to stringify floating point numbers without loss.
You need to properly escape certain bytes (e.g. NUL) in string literals.
A scalar can have more than one value stored in it. For example, !!0 contains a string (the empty string), a floating point number (0) and a signed integer (0). As you can see, the different values aren't even always equivalent. For a more dramatic example, check out the following:
$ perl -E'open($fh, "non-existent"); say for 0+$!, "".$!;'
2
No such file or directory
It is more complicated. Perl changes the internal representation of a variable depending on the context the variable is used in:
perl -MDevel::Peek -e '
$x = 1; print Dump $x;
$x eq "a"; print Dump $x;
$x .= q(); print Dump $x;
'
SV = IV(0x794c68) at 0x794c78
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PVIV(0x7800b8) at 0x794c78
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x785320 "1"\0
CUR = 1
LEN = 16
SV = PVIV(0x7800b8) at 0x794c78
REFCNT = 1
FLAGS = (POK,pPOK)
IV = 1
PV = 0x785320 "1"\0
CUR = 1
LEN = 16
There's no way to find this out using pure perl. Data::Dumper uses a C library to achieve it. If forced to use Perl it doesn't discriminate strings from numbers if they look like decimal numbers.
use Data::Dumper;
$Data::Dumper::Useperl = 1;
print Dumper(['1',1])."\n";
#output
$VAR1 = [
1,
1
];
Based on your comment that this is to determine whether quoting is needed for an SQL statement, I would say that the correct solution is to use placeholders, which are described in the DBI documentation.
As a rule, you should not interpolate variables directly in your query string.
One simple solution that wasn't mentioned was Scalar::Util's looks_like_number. Scalar::Util is a core module since 5.7.3 and looks_like_number uses the perlapi to determine if the scalar is numeric.
The autobox::universal module, which comes with autobox, provides a type function which can be used for this purpose:
use autobox::universal qw(type);
say type("42"); # STRING
say type(42); # INTEGER
say type(42.0); # FLOAT
say type(undef); # UNDEF
When a variable is used as a number, that causes the variable to be presumed numeric in subsequent contexts. However, the reverse isn't exactly true, as this example shows:
use Data::Dumper;
my $foo = '1';
print Dumper $foo; #character
my $bar = $foo + 0;
print Dumper $foo; #numeric
$bar = $foo . ' ';
print Dumper $foo; #still numeric!
$foo = $foo . '';
print Dumper $foo; #character
One might expect the third operation to put $foo back in a string context (reversing $foo + 0), but it does not.
If you want to check whether something is a number, the standard way is to use a regex. What you check for varies based on what kind of number you want:
if ($foo =~ /^\d+$/) { print "positive integer" }
if ($foo =~ /^-?\d+$/) { print "integer" }
if ($foo =~ /^\d+\.\d+$/) { print "Decimal" }
And so on.
It is not generally useful to check how something is stored internally--you typically don't need to worry about this. However, if you want to duplicate what Dumper is doing here, that's no problem:
if ((Dumper $foo) =~ /'/) {print "character";}
If the output of Dumper contains a single quote, that means it is showing a variable that is represented in string form.
You might want to try Params::Util::_NUMBER:
use Params::Util qw<_NUMBER>;
unless ( _NUMBER( $scalar ) or $scalar =~ /^'.*'$/ ) {
$scalar =~ s/'/''/g;
$scalar = "'$scalar'";
}
The following function returns true (1) if the input is numeric and false ("") if it is a string. The function also returns true (-1) if the input is a numeric Inf or NaN. Similar code can be found in the JSON::PP module.
sub is_numeric {
my $value = shift;
no warnings 'numeric';
# string & "" -> ""
# number & "" -> 0 (with warning)
# nan and inf can detect as numbers, so check with * 0
return unless length((my $dummy = "") & $value);
return unless 0 + $value eq $value;
return 1 if $value * 0 == 0; # finite number
return -1; # inf or nan
}
I don't think there is perl function to find type of value. One can find type of DS(scalar,array,hash). Can use regex to find type of value.

Why is my word frequency counter example written in Perl failing to produce useful output?

I am very new to Perl, and I am trying to write a word frequency counter as a learning exercise.
However, I am not able to figure out the error in my code below, after working on it. This is my code:
$wa = "A word frequency counter.";
#wordArray = split("",$wa);
$num = length($wa);
$word = "";
$flag = 1; # 0 if previous character was an alphabet and 1 if it was a blank.
%wordCount = ("null" => 0);
if ($num == -1) {
print "There are no words.\n";
} else {
print "$length";
for $i (0 .. $num) {
if(($wordArray[$i]!=' ') && ($flag==1)) { # start of a new word.
print "here";
$word = $wordArray[$i];
$flag = 0;
} elsif ($wordArray[$i]!=' ' && $flag==0) { # continuation of a word.
$word = $word . $wordArray[$i];
} elsif ($wordArray[$i]==' '&& $flag==0) { # end of a word.
$word = $word . $wordArray[$i];
$flag = 1;
$wordCount{$word}++;
print "\nword: $word";
} elsif ($wordArray[$i]==" " && $flag==1) { # series of blanks.
# do nothing.
}
}
for $i (keys %wordCount) {
print " \nword: $i - count: $wordCount{$i} ";
}
}
It's neither printing "here", nor the words. I am not worried about optimization at this point, though any input in that direction would also be much appreciated.
This is a good example of a problem where Perl will help you work out what's wrong if you just ask it for help. Get used to always adding the lines:
use strict;
use warnings;
to the top of your Perl programs.
Fist off,
$wordArray[$i]!=' '
should be
$wordArray[$i] ne ' '
according to the Perl documentation for comparing strings and characters. Basically use numeric operators (==, >=, …) for numbers, and string operators for text (eq, ne, lt, …).
Also, you could do
#wordArray = split(" ",$wa);
instead of
#wordArray = split("",$wa);
and then #wordArray wouldn't need to do the wonky character checking and you never would have had the problem. #wordArray will be split into the words already and you'll just have to count the occurrences.
You seem to be writing C in Perl. The difference is not just one of style. By exploding a string into a an array of individual characters, you cause the memory footprint of your script to explode as well.
Also, you need to think about what constitutes a word. Below, I am not suggesting that any \w+ is a word, rather pointing out the difference between \S+ and \w+.
#!/usr/bin/env perl
use strict; use warnings;
use YAML;
my $src = '$wa = "A word frequency counter.";';
print Dump count_words(\$src, 'w');
print Dump count_words(\$src, 'S');
sub count_words {
my $src = shift;
my $class = sprintf '\%s+', shift;
my %counts;
while ($$src =~ /(?<sequence> $class)/gx) {
$counts{ $+{sequence} } += 1;
}
return \%counts;
}
Output:
---
A: 1
counter: 1
frequency: 1
wa: 1
word: 1
---
'"A': 1
$wa: 1
=: 1
counter.";: 1
frequency: 1
word: 1

Fast Way to Find Difference between Two Strings of Equal Length in Perl

Given pairs of string like this.
my $s1 = "ACTGGA";
my $s2 = "AGTG-A";
# Note the string can be longer than this.
I would like to find position and character in in $s1 where it differs with $s2.
In this case the answer would be:
#String Position 0-based
# First col = Base in S1
# Second col = Base in S2
# Third col = Position in S1 where they differ
C G 1
G - 4
I can achieve that easily with substr(). But it is horribly slow.
Typically I need to compare millions of such pairs.
Is there a fast way to achieve that?
Stringwise ^ is your friend:
use strict;
use warnings;
my $s1 = "ACTGGA";
my $s2 = "AGTG-A";
my $mask = $s1 ^ $s2;
while ($mask =~ /[^\0]/g) {
print substr($s1,$-[0],1), ' ', substr($s2,$-[0],1), ' ', $-[0], "\n";
}
EXPLANATION:
The ^ (exclusive or) operator, when used on strings, returns a string composed of the result of an exclusive or on each bit of the numeric value of each character. Breaking down an example into equivalent code:
"AB" ^ "ab"
( "A" ^ "a" ) . ( "B" ^ "b" )
chr( ord("A") ^ ord("a") ) . chr( ord("B") ^ ord("b") )
chr( 65 ^ 97 ) . chr( 66 ^ 98 )
chr(32) . chr(32)
" " . " "
" "
The useful feature of this here is that a nul character ("\0") occurs when and only when the two strings have the same character at a given position. So ^ can be used to efficiently compare every character of the two strings in one quick operation, and the result can be searched for non-nul characters (indicating a difference). The search can be repeated using the /g regex flag in scalar context, and the position of each character difference found using $-[0], which gives the offset of the beginning of the last successful match.
Use binary bit ops on the complete strings.
Things like $s1 & $s2 or $s1 ^ $s2 run incredibly fast, and work with strings of arbitrary length.
I was bored on Thanksgiving break 2012 and answered the question and more. It will work on strings of equal length. It will work if they are not. I added a help, opt handling just for fun. I thought someone might find it useful.
If you are new to PERL add don't know. Don't add any code in your script below DATA to the program.
Have fun.
./diftxt -h
usage: diftxt [-v ] string1 string2
-v = Verbose
diftxt [-V|--version]
diftxt [-h|--help] "This help!"
Examples: diftxt test text
diftxt "This is a test" "this is real"
Place Holders: space = "·" , no charater = "ζ"
cat ./diftxt
----------- cut ✂----------
#!/usr/bin/perl -w
use strict;
use warnings;
use Getopt::Std;
my %options=();
getopts("Vhv", \%options);
my $helptxt='
usage: diftxt [-v ] string1 string2
-v = Verbose
diftxt [-V|--version]
diftxt [-h|--help] "This help!"
Examples: diftxt test text
diftxt "This is a test" "this is real"
Place Holders: space = "·" , no charater = "ζ"';
my $Version = "inital-release 1.0 - Quincey Craig 11/21/2012";
print "$helptxt\n\n" if defined $options{h};
print "$Version\n" if defined $options{V};
if (#ARGV == 0 ) {
if (not defined $options{h}) {usage()};
exit;
}
my $s1 = "$ARGV[0]";
my $s2 = "$ARGV[1]";
my $mask = $s1 ^ $s2;
# setup unicode output to STDOUT
binmode DATA, ":utf8";
my $ustring = <DATA>;
binmode STDOUT, ":utf8";
my $_DIFF = '';
my $_CHAR1 = '';
my $_CHAR2 = '';
sub usage
{
print "\n";
print "usage: diftxt [-v ] string1 string2\n";
print " -v = Verbose \n";
print " diftxt [-V|--version]\n";
print " diftxt [-h|--help]\n\n";
exit;
}
sub main
{
print "\nOrig\tDiff\tPos\n----\t----\t----\n" if defined $options{v};
while ($mask =~ /[^\0]/g) {
### redirect stderr to allow for test of empty variable with error message from substr
open STDERR, '>/dev/null';
if (substr($s2,$-[0],1) eq "") {$_CHAR2 = "\x{03B6}";close STDERR;} else {$_CHAR2 = substr($s2,$-[0],1)};
if (substr($s2,$-[0],1) eq " ") {$_CHAR2 = "\x{00B7}"};
$_CHAR1 = substr($s1,$-[0],1);
if ($_CHAR1 eq "") {$_CHAR1 = "\x{03B6}"} else {$_CHAR1 = substr($s1,$-[0],1)};
if ($_CHAR1 eq " ") {$_CHAR1 = "\x{00B7}"};
### Print verbose Data
print $_CHAR1, "\t", $_CHAR2, "\t", $+[0], "\n" if defined $options{v};
### Build difference list
$_DIFF = "$_DIFF$_CHAR2";
### Build mask
substr($s1,"$-[0]",1) = "\x{00B7}";
} ### end loop
print "\n" if defined $options{v};
print "$_DIFF, ";
print "Mask: \"$s1\"\n";
} ### end main
if ($#ARGV == 1) {main()};
__DATA__
This is the easiest form you can get
my $s1 = "ACTGGA";
my $s2 = "AGTG-A";
my #s1 = split //,$s1;
my #s2 = split //,$s2;
my $i = 0;
foreach (#s1) {
if ($_ ne $s2[$i]) {
print "$_, $s2[$i] $i\n";
}
$i++;
}

Resources