Erlang error with binary data - dns

I just started my path on Erlang and I'm facing a problem I can't sort out a solution about:
I wrote a metod to take a domain espressed as a binary string, i.e. <<"www.404pagenotfound.com">> and convert it in the domain format as required for DNS protocol, so: <<3,"www",15,"pagenotfound",3,"com">>.
In the following the code (I rewroted it many times in different ways):
domainbyte(Bin) ->
if byte_size(Bin) > 0 ->
Res = binary:split(Bin, <<".">>),
[Chunk|[RestList]] = Res,
ChunkSize = byte_size(Chunk),
if length(RestList) > 0 ->
Rest = domainbyte(RestList), %% <- Got "bad argument" here!
<<ChunkSize/binary,Chunk,Rest>>;
true ->
<<ChunkSize/binary,Chunk>>
end
end.
Thx in advance for any clues.
PS.
Thx to comments I've found the error in the code abobe:
if length(RestList) > 0 -> %% here RestList is binary data so length throw "bad argument" error.
I've rewroted the method this way, but still with no luck:
**NOTE: I was able to fix the code in the following, problem is that if you have a binary chunk and you want to use it in another binary string, you must specify /binary on it: something not obvious to me.
I.e.: consider this small code snip:
**
TT = <<"com">>,
SS = <<3, TT, 0>> %% <- you get error: bad argument
**
must be fixed this way:
**
TT = <<"com">>,
SS = <<3, TT/binary, 0>>
domainbyte(Bin) ->
if byte_size(Bin) > 0 ->
Res = binary:split(Bin, <<".">>),
if length(Res) > 1 ->
[Chunk|[RestList]] = Res,
ChunkSize = byte_size(Chunk),
Rest = domainbyte(RestList),
<<ChunkSize,Chunk,Rest>>;
true ->
[Chunk] = Res,
ChunkSize = byte_size(Chunk),
<<ChunkSize,Chunk>>
end
end.
MdP

I think the easiest solution is to define the function using a binary comprehension:
domainbyte(Bin) ->
Chunks = binary:split(Bin, <<".">>, [global]), %A list of all the chunks
<< <<byte_size(C),C/binary>> || C <- Chunks >>. %Build output binary
It might be slightly faster to build the output binary as a list of segments in a separate function then put them together in an iolist_to_binary/1. Note that if a '.' occurs outermost in the binary then this code will take that as an empty segment of length 0. If these should be discarded then you nee to add the option trim to binary:split/3. Note also that the size will occupy only one byte.
#Alnitak has the separate function but builds the binary one segment at time so it is not more efficient than the binary comprehension which does the same thing.
N.B. that if you have a binary segment Chunk/binary when constructing a binary this means that Chunk IS a binary, not that it should become one. Binaries are flat structures, think byte arrays, so everything becomes a binary. Or rather the binary.
EDIT: Though I see I missed the 0 which should be at the end. That is left as an exercise to the reader.
EDIT: Being in teaching mode, apart from the constructing binaries, a key to writing good Erlang code is understanding pattern matching. You use a bit but could do it more:
domainbyte(Bin) ->
case binary:split(Bin, <<".">>) of
[Chunk,Rest] -> %There was a '.'
RestBin = domainbyte(Rest),
Size = byte_size(Chunk),
<<Size,Chunk/binary,RestBin/binary>>;
[Chunk] -> %This was the last chunk
Size = byte_size(Chunk),
<<Size,Chunk/binary,0>> %Add terminating 0
end.
This is basically doing the same as your code but we are using pattern matching to select a clause, not only to pull apart a known structure. Pattern matching is the basic method for control, not just in case as here but also in functions and receive. This results in if being used quite sparingly.
Enough from me for now.

I'm a bit rusty on Erlang, but I think your problem is that RestList is an array of chunks from the output of binary:split.
So when you chuck it recursively back into domainbyte it's in the wrong format.
Also - don't forget the terminating NUL byte to represent the root label!
FWIW, here's my working version:
label([]) ->
<< 0 >>;
label([H|T]) ->
D = label(H),
P = label(T),
<< D/binary, P/binary>>;
label(A) ->
L = byte_size(A),
<< <<L>>/binary, A/binary>>.
domainbyte(A) ->
Res = binary:split(A, <<".">>, [global, trim]),
label(Res).
It correctly adds a trailing NUL byte, and trims any extra trailing dots.

The concat of binaries should be written like this:
<< <<ChunkSize>>/binary, Chunk/binary, Rest/binary>>
% and
<< <<ChunkSize>>/binary, Chunk/binary>>

Related

How to handle simple inputs in Kattis with F#?

I am very new to F# and Kattis. I have tried this simple problem "Which is greater" at Kattis. Link is here: https://open.kattis.com/problems/whichisgreater
I have tried with this code:
open System
let a = Console.Read()
let b = Console.Read()
if a > b then Console.WriteLine "1" else Console.WriteLine "0"
But I still get wrong answer. Anybody who can help on how to handle inputs and outputs in Kattis for F#? Maybe some simple examples can be made available?
The following is accepted by Kattis:
open System
let line = Console.ReadLine().Split ' '
let a = int64 line.[0]
let b = int64 line.[1]
Console.WriteLine(if a > b then 1 else 0)
Here, we read the line, split it on a space character into two numbers, compare them and print the required result.
It looks like they're using an old version of the F# compiler, so you have to specify an explicit entry point. Here's their sample F# solution for a different problem:
open System
[<EntryPoint>]
let main argv =
(fun _ -> Console.ReadLine()) |>
Seq.initInfinite |>
Seq.takeWhile ((<>) null) |>
Seq.iter
(fun (s : string) ->
let arr = s.Split([|' '|])
let a = int64 arr.[0]
let b = int64 arr.[1]
/// solve test case and output answer
printfn "%d" (abs (a - b))
)
0
I think that should give you enough info to solve the "which is greater" problem you're looking at. (Note that Console.Read only reads a single character, so it's not what you want for this problem. Instead, you probably want to read in the entire line, then split it into two strings at the blank space, then convert each of those strings into an integer. Coincidentally, the sample code I pasted above does something similar.)

Sequence of legal pairs of parentheses using recursion - Python

I have some problem to solve using recursion in Python.
I'm simply bad in recursion and don't know how to start so please guide me.
We will say that a string contains 'n' legal pairs of parentheses if the string contains only the chars '(',')' and if this sequence of parentheses can be written so in a manner of mathematical formula (That is, every opening of parentheses is closed and parentheses are not closed before they are opened). More precise way to describe it is at the beginning of the string the number of '(' is greater or equal to ')' - and the number of any kind of char in the whole string is equal. Implement a function that recieves a positive integer n and returns a list which contains every legal string of n-combination of the parentheses.
I have tried to start at least, think of a base case, but what's my base case at all?
I tried to think of a base case when I am given the minimal n which is 1 and then I think I have to return a list ['(', ')']. But to do that I have also a difficulty...
def parentheses(n):
if n == 1:
return combine_parent(n)
def combine_parent(n):
parenth_lst = []
for i in range(n):
parenth_lst +=
Please explain me the way to solve problems recursively.
Thank you!
Maybe it's helpful to look in a simple case of the problem:
n = 2
(())
()()
So we start by n=2 and we produce a sequence of ( n times followed by a sequence of ) n times and we return a list of that. Then we recursively do that with n-1. When we reach n=1 it looks like we reached the base case which is that we need to return a string with () n times (not n=1 but n=2).
n = 3
((()))
(())()
()()()
Same pattern for n=3.
The above examples are helpful to understand how the problem can be solved recursively.
def legal_parentheses(n, nn=None):
if nn == 1:
return ["()" * n]
else:
if not nn:
nn = n
# This will produce n ( followed by n ) ( i.e n=2 -> (()) )
string = "".join(["(" * nn, ")" * nn])
if nn < n:
# Then here we want to produce () n-nn times.
string += "()" * (n-nn)
return [string] + legal_parentheses(n, nn-1)
print(legal_parentheses(3))
print(legal_parentheses(4))
print(legal_parentheses(5))
For n = 3:
['((()))', '(())()', '()()()']
For n = 4:
['(((())))', '((()))()', '(())()()', '()()()()']
For n = 5:
['((((()))))', '(((())))()', '((()))()()', '(())()()()', '()()()()()']
This is one way of solving the problem.
The way to think about solving a problem recursively, in my opinion, is to first pick the simplest example of your problem in your case, n=2 and then write down what do you expect as a result. In this case, you are expecting the following output:
"(())", "()()"
Now, you are trying to find a strategy to break down the problem such that you can produce each of the strings. I start by thinking of the base case. I say the trivial case is when the result is ()(), I know that an element of that result is just () n times. If n=2, I should expect ()() and when n=3 I should expect ()()() and there should be only one such element in the sequence (so it should be done only once) hence it becomes the base case. The question is how do we calculate the (()) part of the result. The patterns shows that we just have to put n ( followed by n ) -> (()) for n=2. This looks like a good strategy. Now you need to start thinking for a slightly harder problem and see if our strategy still holds.
So let's think of n=3. What do we expect as a result?
'((()))', '(())()', '()()()'
Ok, we see that the base case should still produce the ()()() part, all good, and should also produce the ((())) part. What about the (())() part? It looks like we need a slightly different approach. In this case, we need to somehow generate n ( followed by n ) and then produce n-1 ( followed by n-1 ) and then n-2 ( followed by n-2 ) and so on, until we reach the base case n=1 where we just going to produce () part n times. But, if we were to call the function each time with n-1 then when we reach the base case we have no clue what the original n value was and hence we cannot produce the () part as we don't know how many we want (if original n was 3 then we need ()()() but if we change n by calling the function with n-1 then by the time we reach the base case, we won't know the original n value). Hence, because of that, in my solution, I introduce a second variable called nn that is the one that is reduced each time, but still, we leave the n unmodified so we know what the original value was.

Why is chaining iterables this complicated? Simplify this code

I want to chain multiple iterables, everything with lazy evaluation (speed is crucial), to do the following:
read many integers from a single huge line of stdin
split() that line
convert the resulting strings to int
compute the diff between successive ints
... and some further things not shown here
The real example is more complex, here's a simplified example:
Here's a sample line of stdin:
2 13 4 16 16 15 22 17 8 8 7 6
(For debugging purposes, instream below might point to sys.stdin, or an opened filehandle)
You can't simply chain generators since map() returns a (lazily-evaluated) list:
import itertools
gen1 = map(int, (map(str.split, instream))) # CAN'T CHAIN DIRECTLY
The least complicated working solution I found is this, can it surely not be simplified?
gen1 = map(int, itertools.chain.from_iterable(itertools.chain(map(str.split, instream))))
Why the hell do I need to chain itertools.chain.from_iterable(itertools.chain just to process the result from map(str.split, instream) - it sort of defeats the purpose?
Is manually defining my generators faster?
An explicit ("manual") generator expression should be preferred over using map and filter. It is more readable to most people, and more flexible.
If I understand your question, this generator expression does what you need:
gen1 = ( int(x) for line in instream for x in line.split() )
You could build your generator by hand:
import string
def gen1(stream):
# presuming that stream is of type io.TextIOBase
s = ""
c = stream.read(1)
while len(c)>0:
if (c not in string.digits):
if len(s) > 0:
i = int(s)
yield i
s = ""
else:
s += c
c = stream.read(1)
if len(s) > 0:
i = int(s)
yield i
import io
g = gen1(io.StringIO("12 45 6 7 88"))
for x in g: # dangerous if stream is unlimited
print(x)
Which is certainly not the most beautiful code, but it does what you want.
Explanations:
If your input is indefinitely long you have to read it in chunks (or character wise).
Whenever you encounter a non-digit (whitespace), you convert the characters you have read until that point into an integer and yield it.
You also have to consider what happens when you reach the EOF.
My implementation is probably not very well performed, due to the fact that I'm reading char-wise. Using chunks one could speed it up significantly.
EDIT as to why your approach will never work:
map(str.split, instream)
does simply not do what you appear to think it does. map applies the given function str.split to each element of the iterator given as the second parameter. In your case that is a stream, i.e. a file object, in the case of sys.stdin specifically a io.TextIOBase object. Which indeed can be iterated over. Line by line, which emphatically is NOT what you want! In effect you iterate over your input line by line and split each line into words. The map generator iterates over (many) lists of words NOT over A list of words. Which is why you have to chain them together to get a single list to iterate on.
Also, the itertools.chain() in itertools.chain.from_iterable(itertools.chain(map(...))) is redundant. itertools.chain chains its arguments (each an inalterable object) together into one iterator. You only give it one argument so there is nothing to chain together, it basically returns the map object unchanged.
itertools.chain.from_iterable() on the other hand takes one argument, which is expected to be an iterator of iterators (e.g. a list of lists) and flattens it into one iterator (list).
EDIT2
import io, itertools
instream = io.StringIO("12 45 \n 66 7 88")
gen1 = itertools.chain.from_iterable(map(str.split, instream))
gen2 = map(int, gen1)
list(gen2)
returns
[12, 45, 66, 7, 88]

Best way to write a large array to file in fortran? Text vs Other

I wanted to know what the best way to write a large fortran array ( 5000 x 5000 real single precision numbers) to a file. I am trying to save the results of a numerical calculation for later use so they do not need to be repeated. From calculation 5000 x 5000 x 4bytes per number number is 100 Mb, is it possible to save this in a form that is only 100Mb? Is there a way to save fortran arrays as a binary file and read it back in for later use?
I've noticed that saving numbers to a text file produces a file much larger than the size of the data type being saved. Is this because the numbers are being saved as characters?
The only way I am familiar with to write to file is
open (unit=41, file='outfile.txt')
do i=1,len
do j=1,len
write(41,*) Array(i,j)
end do
end do
Although I'd imagine there is a better way to do it. If anyone could point me to some resources or examples to approve my ability to write and read larger files efficiently (in terms of memory) that would be great.
Thanks!
Write data files in binary, unless you're going to actually be reading the output - and you're not going to be reading a 2.5 million-element array.
The reasons for using binary are threefold, in decreasing importance:
Accuracy
Performance
Data size
Accuracy concerns may be the most obvious. When you are converting a (binary) floating point number to a string representation of the decimal number, you are inevitably going to truncate at some point. That's ok if you are sure that when you read the text value back into a floating point value, you are certainly going to get the same value; but that is actually a subtle question and requires choosing your format carefully. Using default formatting, various compilers perform this task with varying degrees of quality. This blog post, written from the point of view of a games programmer, does a good job of covering the issues.
Let's consider a little program which, for a variety of formats, writes a single-precision real number out to a string, and then reads it back in again, keeping track of the maximum error it encounters. We'll just go from 0 to 1, in units of machine epsilon. The code follows:
program testaccuracy
character(len=128) :: teststring
integer, parameter :: nformats=4
character(len=20), parameter :: formats(nformats) = &
[ '( E11.4)', '( E13.6)', '( E15.8)', '(E17.10)' ]
real, dimension(nformats) :: errors
real :: output, back
real, parameter :: delta=epsilon(output)
integer :: i
errors = 0
output = 0
do while (output < 1)
do i=1,nformats
write(teststring,FMT=formats(i)) output
read(teststring,*) back
if (abs(back-output) > errors(i)) errors(i) = abs(back-output)
enddo
output = output + delta
end do
print *, 'Maximum errors: '
print *, formats
print *, errors
print *, 'Trying with default format: '
errors = 0
output = 0
do while (output < 1)
write(teststring,*) output
read(teststring,*) back
if (abs(back-output) > errors(1)) errors(1) = abs(back-output)
output = output + delta
end do
print *, 'Error = ', errors(1)
end program testaccuracy
and when we run it, we get:
$ ./accuracy
Maximum errors:
( E11.4) ( E13.6) ( E15.8) (E17.10)
5.00082970E-05 5.06639481E-07 7.45058060E-09 0.0000000
Trying with default format:
Error = 7.45058060E-09
Note that even using a format with 8 digits after the decimal place - which we might think would be plenty, given that single precision reals are only accurate to 6-7 decimal places - we don't get exact copies back, off by approximately 1e-8. And this compiler's default format does not give us accurate round-trip floating point values; some error is introduced! If you're a video-game programmer, that level of accuracy may well be enough. If you're doing time-dependant simulations of turbulent fluids, however, that might absolutely not be ok, particularly if there's some bias to where the error is introduced, or if the error occurs in what is supposed to be a conserved quantity.
Note that if you try running this code, you'll notice that it takes a surprisingly long time to finish. That's because, maybe surprisingly, performance is another real issue with text output of floating point numbers. Consider the following simple program, which just writes out your example of a 5000 × 5000 real array as text and as unformatted binary:
program testarray
implicit none
integer, parameter :: asize=5000
real, dimension(asize,asize) :: array
integer :: i, j
integer :: time, u
forall (i=1:asize, j=1:asize) array(i,j)=i*asize+j
call tick(time)
open(newunit=u,file='test.txt')
do i=1,asize
write(u,*) (array(i,j), j=1,asize)
enddo
close(u)
print *, 'ASCII: time = ', tock(time)
call tick(time)
open(newunit=u,file='test.dat',form='unformatted')
write(u) array
close(u)
print *, 'Binary: time = ', tock(time)
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
end program testarray
Here are the timing outputs, for writing to disk or to ramdisk:
Disk:
ASCII: time = 41.193001
Binary: time = 0.11700000
Ramdisk
ASCII: time = 40.789001
Binary: time = 5.70000000E-02
Note that when writing to disk, the binary output is 352 times as fast as ASCII, and to ramdisk it's closer to 700 times. There are two reasons for this - one is that you can write out data all at once, rather than having to loop; the other is that generating the string decimal representation of a floating point number is a surprisingly subtle operation which requires a significant amount of computing for each value.
Finally, is data size; the text file in the above example comes out (on my system) to about 4 times the size of the binary file.
Now, there are real problems with binary output. In particular, raw Fortran (or, for that matter, C) binary output is very brittle. If you change platforms, or your data size changes, your output may no longer be any good. Adding new variables to the output will break the file format unless you always add new data at the end of the file, and you have no way of knowing ahead of time what variables are in a binary blob you get from your collaborator (who might be you, three months ago). Most of the downsides of binary output are avoided by using libraries like NetCDF, which write self-describing binary files that are much more "future proof" than raw binary. Better still, since it's a standard, many tools read NetCDF files.
There are many NetCDF tutorials on the internet; ours is here. A simple example using NetCDF gives similar times to raw binary:
$ ./array
ASCII: time = 40.676998
Binary: time = 4.30000015E-02
NetCDF: time = 0.16000000
but gives you a nice self-describing file:
$ ncdump -h test.nc
netcdf test {
dimensions:
X = 5000 ;
Y = 5000 ;
variables:
float Array(Y, X) ;
Array:units = "ergs" ;
}
and file sizes about the same as raw binary:
$ du -sh test.*
96M test.dat
96M test.nc
382M test.txt
the code follows:
program testarray
implicit none
integer, parameter :: asize=5000
real, dimension(asize,asize) :: array
integer :: i, j
integer :: time, u
forall (i=1:asize, j=1:asize) array(i,j)=i*asize+j
call tick(time)
open(newunit=u,file='test.txt')
do i=1,asize
write(u,*) (array(i,j), j=1,asize)
enddo
close(u)
print *, 'ASCII: time = ', tock(time)
call tick(time)
open(newunit=u,file='test.dat',form='unformatted')
write(u) array
close(u)
print *, 'Binary: time = ', tock(time)
call tick(time)
call writenetcdffile(array)
print *, 'NetCDF: time = ', tock(time)
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
subroutine writenetcdffile(array)
use netcdf
implicit none
real, intent(IN), dimension(:,:) :: array
integer :: file_id, xdim_id, ydim_id
integer :: array_id
integer, dimension(2) :: arrdims
character(len=*), parameter :: arrunit = 'ergs'
integer :: i, j
integer :: ierr
i = size(array,1)
j = size(array,2)
! create the file
ierr = nf90_create(path='test.nc', cmode=NF90_CLOBBER, ncid=file_id)
! define the dimensions
ierr = nf90_def_dim(file_id, 'X', i, xdim_id)
ierr = nf90_def_dim(file_id, 'Y', j, ydim_id)
! now that the dimensions are defined, we can define variables on them,...
arrdims = (/ xdim_id, ydim_id /)
ierr = nf90_def_var(file_id, 'Array', NF90_REAL, arrdims, array_id)
! ...and assign units to them as an attribute
ierr = nf90_put_att(file_id, array_id, "units", arrunit)
! done defining
ierr = nf90_enddef(file_id)
! Write out the values
ierr = nf90_put_var(file_id, array_id, array)
! close; done
ierr = nf90_close(file_id)
return
end subroutine writenetcdffile
end program testarray
OPEN the file for reading and writing as "unformatted", and READ and WRITE the data without supplying a format, as demonstrated in the program below.
program xunformatted
integer, parameter :: n = 5000, inu = 20, outu = 21
real :: x(n,n)
integer :: i
character (len=*), parameter :: out_file = "temp_num"
call random_seed()
call random_number(x)
open (unit=outu,form="unformatted",file=out_file,action="write")
do i=1,n
write (outu) x(i,:) ! write one row at a time
end do
print*,"sum(x) =",sum(x)
close (outu)
open (unit=inu,form="unformatted",file=out_file,action="read")
x = 0.0
do i=1,n
read (inu) x(i,:) ! read one row at a time
end do
print*,"sum(x) =",sum(x)
end program xunformatted

OCaml : need help for List.map function

I need to create a function that basically works like this :
insert_char("string" 'x') outputs "sxtxrxixnxg".
So here is my reasoning :
Create a list with every single character in the string :
let inserer_car(s, c) =
let l = ref [] in
for i = 0 to string.length(s) - 1 do
l := s.[i] :: !l
done;
Then, I want to use List.map to turn it into a list like ['s', 'x', 't', 'x' etc.].
However, I don't really know how to create my function to use with map. Any help would be appreciated!
I'm a beginner in programming and especially in ocaml! so feel free to assume I'm absolutely ignorant.
If you were using Core, you could write it like this:
open Core.Std
let insert_char s c =
String.to_list s
|> (fun l -> List.intersperse l c)
|> String.of_char_list
Or, equivalently:
let insert_char s c =
let chars = String.to_list s in
let interspersed_chars = List.intersperse chars c in
String.of_char_list interspersed_chars
This is just straightforward use of existing librariies. If you want the implementation of List.intersperse, you can find it here. It's quite simple.
A map function creates a copy of a structure with different contents. For lists, this means that List.map f list has the same length as list. So, this won't work for you. Your problem requires the full power of a fold.
(You could also solve the problem imperatively, but in my opinion the reason to study OCaml is to learn about functional programming.)
Let's say you're going to use List.fold_left. Then the call looks like this:
let result = List.fold_left myfun [] !l
Your function myfun has the type char list -> char -> char list. In essence, its first parameter is the result you've built so far and its second parameter is the next character of the input list !l. The result should be what you get when you add the new character to the list you have so far.
At the end you'll need to convert a list of characters back to a string.

Resources