Unpack multiple variables from sequence - nim-lang

I am expecting the code below to print chr7.
import strutils
var splitLine = "chr7 127471196 127472363 Pos1 0 +".split()
var chrom, startPos, endPos = splitLine[0..2]
echo chrom
Instead it prints #[chr7, 127471196, 127472363].
Is there a way to unpack multiple values from sequences at the same time?
And what would the tersest way to do the above be if the elements weren't contiguous? For example:
var chrom, startPos, strand = splitLine[0..1, 5]
Gives the error:
read_bed.nim(8, 40) Error: type mismatch: got (seq[string], Slice[system.int], int literal(5))
but expected one of:
system.[](a: array[Idx, T], x: Slice[system.int])
system.[](s: string, x: Slice[system.int])
system.[](a: array[Idx, T], x: Slice[[].Idx])
system.[](s: seq[T], x: Slice[system.int])
var chrom, startPos, strand = splitLine[0..1, 5]
^

This can be accomplished using macros.
import macros
macro `..=`*(lhs: untyped, rhs: tuple|seq|array): auto =
# Check that the lhs is a tuple of identifiers.
expectKind(lhs, nnkPar)
for i in 0..len(lhs)-1:
expectKind(lhs[i], nnkIdent)
# Result is a statement list starting with an
# assignment to a tmp variable of rhs.
let t = genSym()
result = newStmtList(quote do:
let `t` = `rhs`)
# assign each component to the corresponding
# variable.
for i in 0..len(lhs)-1:
let v = lhs[i]
# skip assignments to _.
if $v.toStrLit != "_":
result.add(quote do:
`v` = `t`[`i`])
macro headAux(count: int, rhs: seq|array|tuple): auto =
let t = genSym()
result = quote do:
let `t` = `rhs`
()
for i in 0..count.intVal-1:
result[1].add(quote do:
`t`[`i`])
template head*(count: static[int], rhs: untyped): auto =
# We need to redirect this through a template because
# of a bug in the current Nim compiler when using
# static[int] with macros.
headAux(count, rhs)
var x, y: int
(x, y) ..= (1, 2)
echo x, y
(x, _) ..= (3, 4)
echo x, y
(x, y) ..= #[4, 5, 6]
echo x, y
let z = head(2, #[4, 5, 6])
echo z
(x, y) ..= head(2, #[7, 8, 9])
echo x, y
The ..= macro unpacks tuple or sequence assignments. You can accomplish the same with var (x, y) = (1, 2), for example, but ..= works for seqs and arrays, too, and allows you to reuse variables.
The head template/macro extracts the first count elements from a tuple, array, or seqs and returns them as a tuple (which can then be used like any other tuple, e.g. for destructuring with let or var).

For anyone that's looking for a quick solution, here's a nimble package I wrote called unpack.
You can do sequence and object destructuring/unpacking with syntax like this:
someSeqOrTupleOrArray.lunpack(a, b, c)
[a2, b2, c2] <- someSeqOrTupleOrArray
{name, job} <- tim
tom.lunpack(job, otherName = name)
{job, name: yetAnotherName} <- john

Currently pattern matching in Nim only works with tuples. This also makes sense, because pattern matching requires a statically known arity. For instance, what should happen in your example, if the seq does not have a length of three? Note that in your example the length of the sequence can only be determined at runtime, so the compiler does not know if it is actually possible to extract three variables.
Therefore I think the solution which was linked by #def- was going in the right direction. This example uses arrays, which do have a statically known size. In this case the compiler knows the tuple arity, i.e., the extraction is well defined.
If you want an alternative (maybe convenient but unsafe) approach you could do something like this:
import macros
macro extract(args: varargs[untyped]): typed =
## assumes that the first expression is an expression
## which can take a bracket expression. Let's call it
## `arr`. The generated AST will then correspond to:
##
## let <second_arg> = arr[0]
## let <third_arg> = arr[1]
## ...
result = newStmtList()
# the first vararg is the "array"
let arr = args[0]
var i = 0
# all other varargs are now used as "injected" let bindings
for arg in args.children:
if i > 0:
var rhs = newNimNode(nnkBracketExpr)
rhs.add(arr)
rhs.add(newIntLitNode(i-1))
let assign = newLetStmt(arg, rhs) # could be replaced by newVarStmt
result.add(assign)
i += 1
#echo result.treerepr
let s = #["X", "Y", "Z"]
s.extract(a, b, c)
# this essentially produces:
# let a = s[0]
# let b = s[1]
# let c = s[2]
# check if it works:
echo a, b, c
I do not have included a check for the seq length yet, so you would simply get out-of-bounds error if the seq does not have the required length. Another warning: If the first expression is not a literal, the expression would be evaluated/calculated several times.
Note that the _ literal is allowed in let bindings as a placeholder, which means that you could do things like this:
s.extract(a, b, _, _, _, x)
This would address your splitLine[0..1, 5] example, which btw is simply not a valid indexing syntax.

yet another option is package definesugar:
import strutils, definesugar
# need to use splitWhitespace instead of split to prevent empty string elements in sequence
var splitLine = "chr7 127471196 127472363 Pos1 0 +".splitWhitespace()
echo splitLine
block:
(chrom, startPos, endPos) := splitLine[0..2]
echo chrom # chr7
echo startPos # 127471196
echo endPos # 127472363
block:
(chrom, startPos, strand) := splitLine[0..1] & splitLine[5] # splitLine[0..1, 5] not supported
echo chrom
echo startPos
echo strand # +
# alternative syntax
block:
(chrom, startPos, *_, strand) := splitLine
echo chrom
echo startPos
echo strand
see https://forum.nim-lang.org/t/7072 for recent discussion

Related

Can't evaluate at compile time - NIM

Hi I'm starting to play around with NIM
I get a "can't evaluate at compile time" error on this code:
import strutils
type
Matrix[x, y: static[int], T] = object
data: array[x * y, T]
var n,m: int = 0
proc readFile() =
let f = open("matrix.txt")
defer: f.close()
var graph_size = parseInt(f.readline)
var whole_graph: Matrix[graph_size, graph_size, int]
for line in f.lines:
for field in line.splitWhitespace:
var cell = parseInt(field)
whole_graph[n][m] = cell
m = m + 1
n = n + 1
readFile()
Any help appreciated.
Unless you absolutely positively need array in this scenario while not knowing its size at compile-time, you may want to rather swap to the seq type, whose size does not need to be known at compile-time.
Together with std/enumerate you can even save yourself the hassle of tracking the index with n and m:
import std/[strutils, enumerate]
type Matrix[T] = seq[seq[T]]
proc newZeroIntMatrix(x: int, y: int): Matrix[int] =
result = newSeqOfCap[seq[int]](x)
for i in 0..x-1:
result.add(newSeqOfCap[int](y))
for j in 0..y-1:
result[i].add(0)
proc readFile(): Matrix[int] =
let f = open("matrix.txt")
defer: f.close()
let graph_size = parseInt(f.readline)
var whole_graph = newZeroIntMatrix(graph_size, graph_size)
for rowIndex, line in enumerate(f.lines):
for columnIndex, field in enumerate(line.split):
let cell = parseInt(field)
whole_graph[rowIndex][columnIndex] = cell
result = whole_graph
let myMatrix = readFile()
echo myMatrix.repr
Further things I'd like to point out though are:
array[x * y, T] will not give you a 2D array, but a single array of length x*y. If you want a 2D array, you would most likely want to store this as array[x, array[y, T]]. That is assuming that you know x and y at compile-time, so your variable declaration would look roughly like this: var myMatrix: array[4, array[5, int]]
Your Matrix type has the array in its data field, so trying to access the array with that Matrix type needs to be done accordingly (myMatrix.data[n][m]). That is, unless you define proper []and []= procs for the Matrix type that do exactly that under the hood.

Is it possible to destruct sequence in Nim?

Is it possible to get first N elements in Nim? Something like:
let [a, b, ...rest] = "a/b/c".split("/")
P.S.
Use case I'm trying to parse "NYSE:MSFT" string
proc parse_esymbol*(esymbol: string): tuple[string, string] =
let parts = esymbol.split(":")
assert parts.len == 2, fmt"invalid esymbol '{esymbol}'"
(parts[0], parts[1])
echo parse_esymbol("NYSE:MSFT")
You can assign variables from a tuple like this:
let (a,b) = ("a","b")
There isn't a built-in seq to tuple conversion, but you can do it with a little macro like this:
macro first[T](s:openArray[T],l:static[int]):untyped =
result = newNimNode(nnkPar)
for i in 0..<l:
result.add nnkBracketExpr.newTree(s,newLit(i))
let (a,b) = "a/b/c".split('/').first(2)
there are currently at least two libraries implementing a macro like the one in this answer: unpack and definesugar.
import strutils
import unpack
block:
[a, b, *rest] <- "a/b/c/d/e/f".split("/")
echo a,b
echo rest
import definesugar
block:
(a, b, *rest) := "a/b/c/d/e/f".split("/")
echo a,b
echo rest
# output for both
# ab
# #["c", "d", "e", "f"]
recent discussion: https://forum.nim-lang.org/t/7072
For your specific use case though, I would implement something with https://nim-lang.github.io/Nim/strscans.html

Why does Python 3 print statement appear to alter a variable, declared later in the code, but works fine without it?

I am running Python 3.6.2 on Windows 10 and was learning about the zip() function.
I wanted to print part of the object returned by the zip() function.
Here is my code, without the troublesome print statement:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica", "Vicky")
x = zip(a, b)
tup = tuple(x)
print(tup)
print(type(tup))
print(len(tup))
print(tup[1])
Here is my code with the troublesome print statement:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica", "Vicky")
x = zip(a, b)
print(tuple(x)[1])
tup = tuple(x)
print(tup)
print(type(tup))
print(len(tup))
print(tup[1])
The print(tuple(x)[1]) statement appears to change the tuple 'tup' into a zero-length one and causes the print(tup[1]) to fail later in the code!
In this line, you create an iterator:
x = zip(a, b)
Within the print statement, you convert the iterator to a tuple. This tuple has 3 elements. This exhausts the iterator and anytime you call it afterwards, it will return no further elements.
Therefore, upon your creation of tup, your iterator does not return an element. Hence, you have a tuple with length 0. And of course, this will raise an exception when you try to access the element with index 1.
For testing, consider this:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica", "Vicky")
x = zip(a, b)
tup1 = tuple(x)
tup2 = tuple(x)
print(tup1)
print(tup2)
It will give you the following result:
(('John', 'Jenny'), ('Charles', 'Christy'), ('Mike', 'Monica'))
()
This is basically what you do when creating a tuple out of an iterator twice.

multiply a list of numbers with 10

I have a list of numbers t. I want to multiply the numbers in the list with 10. Why does this not work?:
for i in t
i = i*10
Why do I have to do this?:
for i in range(len(t)):
t[i] = t[i]*10
Well, it doesn't work because that's no the correct syntax. You could, however, clean things up a bit with a list comprehension:
t = [x * 10 for x in t]
It doesn't work that way in Python.
The index variable (your i) of the for .. in construct is just a variable. At the start of each pass through the loop, Python assigns the corresponding value from the in sequence to the loop's index variable. It's as if you had a lot of copies of the loop body, and before each copy you put an assignment statement i = t[0], i = t[1], and so on. The index variable does not remember that it was assigned from the in sequence (your t). The index variable is not an equivalent name (an "alias") for the corresponding value of the in sequence. Changing the index variable does not affect the in sequence.
python -c 't = [1, 2, 3]
for i in t:
i = 1
print t'
[1, 2, 3]
But you're not wrong to wonder whether it's an alias! Another language that is frequently compared with Python does work that way (quoting manual page "perlsyn"):
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
So:
perl -e '#t = (1, 2, 3); foreach $i (#t) { $i = 1; } print #t'
111

Are there any programming languages with functions with variable arguments not at the end?

Python, C++, Scheme, and others all let you define functions that take a variable number of arguments at the end of the argument list...
def function(a, b, *args):
#etc...
...that can be called as followed:
function(1, 2)
function(1, 2, 5, 6, 7, 8)
etc... Are there any languages that allow you to do variadic functions with the argument list somewhere else? Something like this:
def function(int a, string... args, int theend) {...}
With all of these valid:
function(1, 2)
function(1, "a", 3)
function(1, "b", "c", 4)
Also, what about optional arguments anywhere in the argument list?
def function(int a, int? b, int c, int... d) {}
function(1, 2) //a=1, c=2, b=undefined/null/something, d=[]
function(1,2,3) //a=1, b=2, c=3,d=[]
function(1,2,3,4,5) //a=1, b=2, c=3, d=[4,5]
The next C++ can do that with this syntax:
void f(int a, std::initializer_list<int> b, int c) {
// b.begin(), b.end(), b.size() allow to access them
}
void g() {
f(1, { 2, 3, 4, 5 }, 2);
}
BASIC has had this for ages.
For instance:
LOCATE [row%] [,[column%] [,[cursor%] [,start% [,stop%]]]]
This command sets the position (row%, column%) of the cursor, as well as specifying the cursor size (start%, stop%) and whether it is actually visible (cursor%). Here, everything in square brackets can be omitted, and if it is, that property is not changed.
A usage example:
LOCATE , 5
to change to column 5, or
LOCATE 1, , 0
to move to the first line and make the cursor invisible.
Another command where this is seen is the PUT command for writing to files. If the middle argument (the file seek position) is omitted then writing occurs just after the previous write.
Importantly, argument omission is only seen in built-in statements, and not user-defined procedures and functions.
In terms of implementation, this is what the Microsoft Basic Compiler (BC) seems to do for a call to LOCATE:
For each argument:
if an argument is omitted, push 0
if an argument is supplied, push 1, and then push the actual value
Push the argument count
Call the library function
Future versions of Ruby (1.9 and up, Ruby 1.9 is scheduled to released at the end of January, 2009) can do this.
It is however not always obvious which value gets bound to which parameter.
This is what Ruby 1.9 accepts:
0 or more mandatory arguments followed by 0 or more optional arguments followed by 0 or more mandatory arguments followed by rest arguments followed by 0 or more mandatory arguments.
Example:
def meth mand1, opt1 = :def1, o2 = :d2, *args, m2, m3
puts %w[mand1 opt1 o2 m2 args m3].inject('') { |s, arg|
s << "#{arg} = #{(eval arg).inspect}, "
}.gsub /, $/, ''
end
meth :arg1, :a2, :a3
# => mand1 = :arg1, opt1 = :def1, o2 = :d2, m2 = :a2, args = [], m3 = :a3
meth :arg1, :a2, :a3, :a4
# => mand1 = :arg1, opt1 = :a2, o2 = :d2, m2 = :a3, args = [], m3 = :a4
meth :arg1, :a2, :a3, :a4, :a5
# => mand1 = :arg1, opt1 = :a2, o2 = :a3, m2 = :a4, args = [], m3 = :a5
meth :arg1, :a2, :a3, :a4, :a5, :a6
# => mand1 = :arg1, opt1 = :a2, o2 = :a3, m2 = :a5, args = [:a4], m3 = :a6
meth :arg1, :a2, :a3, :a4, :a5, :a6, :a7
# => mand1 = :arg1, opt1 = :a2, o2 = :a3, m2 = :a6, args = [:a4, :a5], m3 = :a7
As you can see, mandatory arguments are bound first, from both the left and the right. Then optional arguments get bound and if any arguments are left over, they get bundled up in an array and bound to the rest argument.
Several languages (perl, python, many others) can do named arguments, which are akin to doing optional arguments anywhere in the parameter list... (The named parameters can appear in any order, and any of them can be made optional...) They're not strictly the same, but they're close...
Not sure about varargs, though they can usually be replaced with an array/hash/list object...
Lisp's keyword parameters may be what you are looking for. I think there is a similar arrangement in Ruby. See also Lisp's function parameters overview.
I suppose PHP counts. You can do this to simulate what you are looking for. Personally, I think that would be confusing though.
function foo() {
$args = func_get_args(); // returns an array of args
}
R (the statistical language) has it as well, and it can be in the middle of the list, but there are subtle semantics.
http://cran.r-project.org/doc/manuals/R-intro.html#The-three-dots-argument
> f1 <- function(x,...,y) { return(x+y) }
> f1(1,2)
Error in f1(1, 2) : argument "y" is missing, with no default
> f1(1,y=2)
[1] 3
> f1 <- function(x,...,y) { return(x+y) }
> f1(1,2)
Error in f1(1, 2) : argument "y" is missing, with no default
> f1(1,y=2)
[1] 3
>
It is called Rest Arguments and it can be done at least in C++ and Java. Google "Rest Arguments" and you will find a lot of data on the subject with some examples like functions that will pass numbers and return an average of the numbers input, maybe the minimum or maximum of all numbers passed. As you can see, there are a lot of uses for such features, I used it in code for inputing data in MYSQL so when I want to add a row, I just add the Table name as the first string and the rest are all column names and then their data without having to sit there and manually do it over and over again. Good Luck!

Resources