Clear a sequence in Nim - seq

What is the Nim equivalence of List.Clear in languages like java or c# for sequences? I see listed in system the proc setLen, but im not sure it does what i want. From the description:
f the current length is greater than the new length, s will be truncated. s
Does it mean everytime i set any seq len to 0 it will create a new instance of seq?

setLen resizes the seq without allocating a new one, so usually x.setLen(0) is fine. If you want to allocate a new seq and let the garbage collector clean up the old one, you can do x = #[] instead.

Related

How to input a string value of unknown length from console in Fortran? [duplicate]

I would like to use deferred-length character strings in a "simple" manner to read user input. The reason that I want to do this is that I do not want to have to declare the size of a character string before knowing how large the user input will be. I know that there are "complicated" ways to do this. For example, the iso_varying_string module can be used: https://www.fortran.com/iso_varying_string.f95. Also, there is a solution here: Fortran Character Input at Undefined Length. However, I was hoping for something as simple, or almost as simple, as the following:
program main
character(len = :), allocatable :: my_string
read(*, '(a)') my_string
write(*,'(a)') my_string
print *, allocated(my_string), len(my_string)
end program
When I run this program, the output is:
./a.out
here is the user input
F 32765
Notice that there is no output from write(*,'(a)') my_string. Why?
Also, my_string has not been allocated. Why?
Why isn't this a simple feature of Fortran? Do other languages have this simple feature? Am I lacking some basic understanding about this issue in general?
vincentjs's answer isn't quite right.
Modern (2003+) Fortran does allow automatic allocation and re-allocation of strings on assignment, so a sequence of statements such as this
character(len=:), allocatable :: string
...
string = 'Hello'
write(*,*)
string = 'my friend'
write(*,*)
string = 'Hello '//string
write(*,*)
is correct and will work as expected and write out 3 strings of different lengths. At least one compiler in widespread use, the Intel Fortran compiler, does not engage 2003 semantics by default so may raise an error on trying to compile this. Refer to the documentation for the setting to use Fortran 2003.
However, this feature is not available when reading a string so you have to resort to the tried and tested (aka old-fashioned if you prefer) approach of declaring a buffer of sufficient size for any input and of then assigning the allocatable variable. Like this:
character(len=long) :: buffer
character(len=:), allocatable :: string
...
read(*,*) buffer
string = trim(buffer)
No, I don't know why the language standard forbids automatic allocation on read, just that it does.
Deferred length character is a Fortran 2003 feature. Note that many of the complicated methods linked to are written against earlier language versions.
With Fortran 2003 support, reading a complete record into a character variable is relatively straight forward. A simple example with very minimal error handling below. Such a procedure only needs to be written once, and can be customized to suit a user's particular requirements.
PROGRAM main
USE, INTRINSIC :: ISO_FORTRAN_ENV, ONLY: INPUT_UNIT
IMPLICIT NONE
CHARACTER(:), ALLOCATABLE :: my_string
CALL read_line(input_unit, my_string)
WRITE (*, "(A)") my_string
PRINT *, ALLOCATED(my_string), LEN(my_string)
CONTAINS
SUBROUTINE read_line(unit, line)
! The unit, connected for formatted input, to read the record from.
INTEGER, INTENT(IN) :: unit
! The contents of the record.
CHARACTER(:), INTENT(OUT), ALLOCATABLE :: line
INTEGER :: stat ! IO statement IOSTAT result.
CHARACTER(256) :: buffer ! Buffer to read a piece of the record.
INTEGER :: size ! Number of characters read from the file.
!***
line = ''
DO
READ (unit, "(A)", ADVANCE='NO', IOSTAT=stat, SIZE=size) buffer
IF (stat > 0) STOP 'Error reading file.'
line = line // buffer(:size)
! An end of record condition or end of file condition stops the loop.
IF (stat < 0) RETURN
END DO
END SUBROUTINE read_line
END PROGRAM main
Deferred length arrays are just that: deferred length. You still need to allocate the size of the array using the allocate statement before you can assign values to it. Once you allocate it, you can't change the size of the array unless you deallocate and then reallocate with a new size. That's why you're getting a debug error.
Fortran does not provide a way to dynamically resize character arrays like the std::string class does in C++, for example. In C++, you could initialize std::string var = "temp", then redefine it to var = "temporary" without any extra work, and this would be valid. This is only possible because the resizing is done behind the scenes by the functions in the std::string class (it doubles the size if the buffer limit is exceeded, which is functionally equivalent to reallocateing with a 2x bigger array).
Practically speaking, the easiest way I've found when dealing with strings in Fortran is to allocate a reasonably large character array that will fit most expected inputs. If the size of the input exceeds the buffer, then simply increase the size of your array by reallocateing with a larger size. Removing trailing white space can be done using trim.
You know that there are "complicated" ways of doing what you want. Rather than address those, I'll answer your first two "why?"s.
Unlike intrinsic assignment a read statement does not have the target variable first allocated to the correct size and type parameters for the thing coming in (if it isn't already like that). Indeed, it is a requirement that the items in an input list be allocated. Fortran 2008, 9.6.3, clearly states:
If an input item or an output item is allocatable, it shall be allocated.
This is the case whether the allocatable variable is a character with deferred length, a variable with other deferred length-type parameters, or an array.
There is another way to declare a character with deferred length: giving it the pointer attribute. This doesn't help you, though, as we also see
If an input item is a pointer, it shall be associated with a definable target ...
Why you have no output from your write statement is related to why you see that the character variable isn't allocated: you haven't followed the requirements of Fortran and so you can't expect the behaviour that isn't specified.
I'll speculate as to why this restriction is here. I see two obvious ways to relax the restriction
allow automatic allocation generally;
allow allocation of a deferred length character.
The second case would be easy:
If an input item or an output item is allocatable, it shall be allocated unless it is a scalar character variable with deferred length.
This, though, is clumsy and such special cases seem against the ethos of the standard as a whole. We'd also need a carefully thought out rule about alloction for this special case.
If we go for the general case for allocation, we'd presumably require that the unallocated effective item is the final effective item in the list:
integer, allocatable :: a(:), b(:)
character(7) :: ifile = '1 2 3 4'
read(ifile,*) a, b
and then we have to worry about
type aaargh(len)
integer, len :: len
integer, dimension(len) :: a, b
end type
type(aaargh), allocatable :: a(:)
character(9) :: ifile = '1 2 3 4 5'
read(ifile,*) a
It gets quite messy very quickly. Which seems like a lot of problems to resolve where there are ways, of varying difficulty, of solving the read problem.
Finally, I'll also note that allocation is possible during a data transfer statement. Although a variable must be allocated (as the rules are now) when appearing in input list components of an allocated variable of derived type needn't be if that effective item is processed by defined input.

Heap profiling in Haskell. Am I having space leaks?

I am writing a small snake game in Haskell as sort of a guided tutorial for beginners. The "rendering" just takes a Board and produces a Data.ByteString.Builder which is printed in the terminal. (the html profiles are pushed to the repo, you can inspect them without compiling the programm)
The problem
The problem I have is that the heap profiling looks weird: There are many spikes, and suddenly Builder, PAP and BuildStep take as same memory as the rest of the program. Considering that rendering is happenning 10 times in a second (i.e. every second we produce 10 builders), it seems inconsistent that every once in a while the builder just takes that much memory. I don't know if this is considered an space leak, since there is no thunks in the profile, but the PAP doesn't look right (I don't know...)
Implementation
The board is represented as an inmutable array of builders indexed by coordinaates (tuples) type Board = Array (Int, Int) Builder (essentialy, what should be printed in each coordinate). The function which converts the board into a builder is the expected strict fold which handle new lines using height and width of the board.
toBuilder :: RenderState -> Builder
-- |- The Array (Int, Int) Builder
toBuilder (RenderState b binf#(BoardInfo h w) gOver s) =
-- ^^^ height and width
if gOver
then ppScore s <> fst (boardToString $ emptyGrid binf) -- Not interesting. Case of game over print build an empty grid
else ppScore s <> fst (boardToString b) -- print the current board
where
boardToString = foldl' fprint (mempty, 0) -- concatenate builders and count the number, such that when #width builders have been concatenated, add a new line.
fprint (!s, !i) cell =
if ((i + 1) `mod` w) == 0
then (s <> cell <> B.charUtf8 '\n', i + 1 )
else (s <> cell , i + 1)
Up to the .prof file this function take most of the time and space (92%, which is expected). Moreover, this is the only part of the code that produces a big builder, so the problem should be here.
The buffering mode
The above profile happens when BufferMode is set to LineBuffering (default), but interestingly if I change it to NoBuffering then the profile looks the same but a thunk appears and the builder disappear...
The questions
I have reached a point which I don't know whats going on, hence my questions are a little bit vague:
Is my code with line buffering (first profile) actually leaking? No thunk appears but the PAP eating so much memory looks like a warning
The second profile clearly(?) leaks, is there an standard way to inspect which part of the code is producing the thunk?
Am I completely missing something, and actually the profile looks fine?
In case anyone is interested, I think I've found the problem. It is the terminal speed... If I run an smaller board size or a slower rendering time (the picture is for a 50x70 board with 10 renders a second), then the memory usage is completely normal.
What I think is happening, is that the board is printed into the console using B.hPutBuilder stdout, this action takes shorter than the console to actually print it, so the haskell thread continues and creates another board which should wait to be printed because the console is busy. I guess this leads to some how, two boards living in memory for a short time.
Other guesses are welcome!

Is there inherent "cost of carry" of garbage thunks in Haskell?

I often see high number of cycles spent in GC when running GHC-compiled programs.
These numbers tend to be order of magnitude higher than my JVM experience suggests they should be. In particular, number of bytes "copied" by GC seems to be vastly larger than amounts of data I'm computing.
Is such difference between non- and strict languages fundamental?
tl;dr: Most of the stuff that the JVM does in stack frames, GHC does on the heap. If you wanted to compare GHC heap/GC stats with the JVM equivalent, you'd really need to account for some portion of the bytes/cycles the JVM spends pushing arguments on the stack or copying return values between stack frames.
Long version:
Languages targeting the JVM typically make use of its call stack. Each invoked method has an active stack frame that includes storage for the parameters passed to it, additional local variables, and temporary results, plus room for an "operand stack" used for passing arguments to and receiving results from other methods it calls.
As a simple example, if the Haskell code:
bar :: Int -> Int -> Int
bar a b = a * b
foo :: Int -> Int -> Int -> Int
foo x y z = let u = bar y z in x + u
were compiled to JVM, the byte code would probably look something like:
public static int bar(int, int);
Code:
stack=2, locals=2, args_size=2
0: iload_0 // push a
1: iload_1 // push b
2: imul // multiply and push result
3: ireturn // pop result and return it
public static int foo(int, int, int);
Code:
stack=2, locals=4, args_size=3
0: iload_1 // push y
1: iload_2 // push z
2: invokestatic bar // call bar, pushing result
5: istore_3 // pop and save to "u"
6: iload_0 // push x
7: iload_3 // push u
8: iadd // add and push result
9: ireturn // pop result and return it
Note that calls to built-in primitives like imul and user-defined methods like bar involve copying/pushing the parameter values from local storage to the operand stack (using iload instructions) and then invoking the primitive or method. Return values then need to be saved/popped to local storage (with istore) or returned to the caller with ireturn; occasionally, a return value can be left on the stack to serve as an operand for another method invocation. Also, while it's not explicit in the byte code, the ireturn instruction involves a copy, from the callee's operand stack to the caller's operand stack. Of course, in actual JVM implementations, various optimizations are presumably possible to reduce copying.
When something else eventually calls foo to produce a computation, for example:
some_caller t = foo (1+3) (2+4) t + 1
the (unoptimized) code might look like:
iconst_1
iconst_3
iadd // put 1+3 on the stack
iconst_2
iconst_4
iadd // put 2+4 on the stack
iload_0 // put t on the stack
invokestatic foo
iconst 1
iadd
ireturn
Again, subexpressions are evaluated with a lot of pushing and popping on the operand stack. Eventually, foo is invoked with its arguments pushed on the stack and its result popped off for further processing.
All of this allocation and copying takes place on this stack, so there's no heap allocation involved in this example.
Now, what happens if that same code is compiled with GHC 8.6.4 (without optimization and on an x86_64 architecture for the sake of concreteness)? Well, the pseudocode for the generated assembly is something like:
foo [x, y, z] =
u = new THUNK(sat_u) // thunk, 32 bytes on heap
jump: (+) x u
sat_u [] = // saturated closure for "bar y z"
push UPDATE(sat_u) // update frame, 16 bytes on stack
jump: bar y z
bar [a, b] =
jump: (*) a b
The calls/jumps to the (+) and (*) "primitives" are actually more complicated than I've made them out to be because of the typeclass that's involved. For example, the jump to (+) looks more like:
push CONTINUATION(\f -> f x u) // continuation, 24 bytes on stack
jump: (+) dNumInt // get the right (+) from typeclass instance
If you turn on -O2, GHC optimizes away this more complicated call, but it also optimizes away everything else that's interesting about this example, so for the sake of argument, let's pretend the pseudocode above is accurate.
Again, foo isn't of much use until someone calls it. For the some_caller example above, the portion of code that calls foo will look something like:
some_caller [t] =
...
foocall = new THUNK(sat_foocall) // thunk, 24 bytes on heap
...
sat_foocall [] = // saturated closure for "foo (1+3) (2+4) t"
...
v = new THUNK(sat_v) // thunk "1+3", 16 bytes on heap
w = new THUNK(sat_w) // thunk "2+4", 16 bytes on heap
push UPDATE(sat_foocall) // update frame, 16 bytes on stack
jump: foo sat_v sat_w t
sat_v [] = ...
sat_w [] = ...
Note that nearly all of this allocation and copying takes place on the heap, rather than the stack.
Now, let's compare these two approaches. At first blush, it looks like the culprit really is lazy evaluation. We're creating these thunks all over the place that wouldn't be necessary if evaluation was strict, right? But let's look at one of these thunks more carefully. Consider the thunk for sat_u in the definition of foo. It's 32 bytes / 4 words with the following contents:
// THUNK(sat_u)
word 0: ptr to sat_u info table/code
1: space for return value
// variables we closed over:
2: ptr to "y"
3: ptr to "z"
The creation of this thunk isn't fundamentally different than the JVM code:
0: iload_1 // push y
1: iload_2 // push z
2: invokestatic bar // call bar, pushing result
5: istore_3 // pop and save to "u"
Instead of pushing y and z onto the operand stack, we loaded them into a heap-allocated thunk. Instead of popping the result off the operand stack into our stack frame's local storage and managing stack frames and return addresses, we left space for the result in the thunk and pushed a 16-byte update frame onto the stack before transferring control to bar.
Similarly, in the call to foo in some_caller, instead of evaluating the argument subexpressions by pushing constants on the stack and invoking primitives to push results on the stack, we created thunks on the heap, each of which included a pointer to info table / code for invoking primitives on those arguments and space for the return value; an update frame replaced the stack bookkeeping and result copying implicit in the JVM version.
Ultimately, thunks and update frames are GHC's replacement for stack-based parameter and result passing, local variables, and temporary workspace. A lot of activity that takes place in JVM stack frames takes place in the GHC heap.
Now, most of the stuff in JVM stack frames and on the GHC heap quickly becomes garbage. The main difference is that in the JVM, stack frames are automatically tossed out when a function returns, after the runtime has copied the important stuff out (e.g., return values). In GHC, the heap needs to be garbage collected. As others have noted, the GHC runtime is built around the idea that the vast majority of heap objects will immediately become garbage: a fast bump allocator is used for initial heap object allocation, and instead of copying out the important stuff every time a function returns (as for the JVM), the garbage collector copies it out when the bump heap gets kind of full.
Obviously, the above toy example is ridiculous. In particular, things are going to get much more complicated when we start talking about code that operates on Java objects and Haskell ADTs, rather than Ints. However, it serves to illustrate the point that a direct comparison of heap usage and GC cycles between GHC and JVM doesn't make a whole lot of sense. Certainly, an exact accounting doesn't really seem possible as the JVM and GHC approaches are too fundamentally different, and the proof would be in real-world performance. At the very least, an apples-to-apples comparison of GHC heap usage and GC stats needs to account for some portion of the cycles the JVM spends pushing, popping, and copying values between operand stacks. In particular, at least some fraction of JVM return instructions should count towards GHC's "bytes copied".
As for the contribution of "laziness" to heap usage (and heap "garbage" in particular), it seems hard to isolate. Thunks really play a dual role as a replacement for stack-based operand passing and as a mechanism for deferred evaluation. Certainly a switch from laziness to strictness can reduce garbage -- instead of first creating a thunk and then eventually evaluating it to another closure (e.g., a constructor), you can just create the evaluated closure directly -- but that just means that instead of your simple program allocating a mind-blowing 172 gigabytes on the heap, maybe the strict version "only" allocates a modest 84 gigabytes.
As far as I can see, the specific contribution of lazy evaluation to "bytes copied" should be minimal -- if a closure is important at GC time, it will need to be copied. If it's still an unevaluated thunk, the thunk will be copied. If it's been evaluated, just the final closure will need to be copied. If anything, since thunks for complicated structures are much smaller than their evaluated versions, laziness should typically reduce bytes copied. Instead, the usual big win with strictness is that it allows certain heap objects (or stack objects) to become garbage faster so we don't end up with space leaks.
No, laziness does not inherently lead to a large amount of copying in GC. The programmer's failure to manage laziness properly, however, can certainly do so. For example, if a persistent data structure ends up full of chains of thunks due to lazy modification, then it will end up badly bloated.
Another major issue you may be encountering, as Daniel Wagner mentioned, is the cost of immutability. While it is certainly possible to program with mutable structures in Haskell, it is much more idiomatic to work with immutable ones when possible. Immutable structure designs have various trade-offs. For example, ones designed for high performance when used persistently tend to have low branching factors to increase sharing, which leads to some bloat when they're used ephemerally.

Haskell 'count occurrences' function

I implemented a count function in Haskell and I am wondering will this behave badly on large lists :
count :: Eq a => a -> [a] -> Int
count x = length . filter (==x)
I believe the length function runs in linear time, is this correct?
Edit: Refactor suggested by #Higemaru
Length runs in linear time to the size of the list, yes.
Normally, you would be worried that your code had to take two passes through the list: first one to filter and then one to count the length of the resulting list. However, I believe this does not happen here because filter is not strict on the structure of the list. Instead, the length function forces the elements of the filtered list as it goes along, doing the actual count in one pass.
I think you can make it slightly shorter
count :: Eq a => a -> [a] -> Int
count x = length . filter (x==)
(I would have written a (lowly) comment if I could)
That really depends on the list. For a normal, lazily evaluated list of Ints on my computer, I see this function running in about 2 seconds for 10^9 elements, 0.2 seconds for 10^8, and 0.3 seconds for 10^7, so it appears to run in linear time. You can check this yourself by passing the flags +RTS -s -RTS to your executable when running it from the command line.
I also tried running it with more cores, but it doesn't seem to do anything but increase the memory usage a bit.
An added bonus of the lazy computation is that you only make a single pass over the list. filter and length get turned into a single loop by the compiler (with optimizations turned on), so you save memory and efficiency.

No error message in Haskell

Just out of curiosity, I made a simple script to check speed and memory efficiency of constructing a list in Haskell:
wasteMem :: Int -> [Int]
wasteMem 0 = [199]
wasteMem x = (12432483483467856487256348746328761:wasteMem (x-1))
main = do
putStrLn("hello")
putStrLn(show (wasteMem 10000000000000000000000000000000000))
The strange thing is, when I tried this, it didn't run out of memory or stack space, it only prints [199], the same as running wasteMem 0. It doesn't even print an error message... why? Entering this large number in ghci just prints the number, so I don't think it's a rounding or reading error.
Your program is using a number greater than maxBound :: Int32. This means it will behave differently on different platforms. For GHC x86_64 Int is 64 bits (32 bits otherwise, but the Haskell report only promises 29 bits). This means your absurdly large value (1x10^34) is represented as 4003012203950112768 for me and zero for you 32-bit folks:
GHCI> 10000000000000000000000000000000000 :: Int
4003012203950112768
GHCI> 10000000000000000000000000000000000 :: Data.Int.Int32
0
This could be made platform independent by either using a fixed-size type (ex: from Data.Word or Data.Int) or using Integer.
All that said, this is a poorly conceived test to begin with. Haskell is lazy, so the amount of memory consumed by wastedMem n for any value n is minimal - it's just a thunk. Once you try to show this result it will grab elements off the list one at a time - first generating "[12432483483467856487256348746328761, and leaving the rest of the list as a thunk. The first value can be garbage collected before the second value is even considered (a constant-space program).
Adding to Thomas' answer, if you really want to waste space, you have to perform an operation on the list, which needs the whole list in memory at once. One such operation is sorting:
print . sort . wasteMem $ (2^16)
Also note that it's almost impossible to estimate the run-time memory usage of your list. If you want a more predictable memory benchmark, create an unboxed array instead of a list. This also doesn't require any complicated operation to ensure that everything stays in memory. Indexing a single element in an array already makes sure that the array is in memory at least once.

Resources