I thought about two different ways, but both seem pretty ugly.
Transform the string s into an array a by splitting it, then use sample(a, length(s), replace=false) and join the array again into a string
Get a RandomPermutation r of length length(s) and join the single s[i] for i in r.
What's the right way? Unfortunately there is no method matching sample(::String, ::Int64; replace=false).
Perhaps defining a shuffle method for String constitutes type piracy, but, anyway, here's a suggested implemetation:
Base.shuffle(s::String) = isascii(s) ? s[randperm(end)] : join(shuffle!(collect(s)))
If you wanted to squeeze out performance from shuffle then you can consider:
function shufflefast(s::String)
ss = sizeof(s)
l = length(s)
ss == l && return String(shuffle!(copy(Vector{UInt8}(s))))
v = Vector{Int}(l)
i = start(s)
for j in 1:l
v[j] = i
i = nextind(s, i)
end
p = pointer(s)
u = Vector{UInt8}(ss)
k = 1
for i in randperm(l)
for j in v[i]:(i == l ? ss : v[i+1]-1)
u[k] = unsafe_load(p, j)
k += 1
end
end
String(u)
end
For large strings it is over 4x faster for ASCII and 3x faster for UTF-8.
Unfortunately it is messy - so I would rather treat it as an exercise. However, it uses only exported functions so it is not a hack.
Inspired by the optimization tricks in Bogumil Kaminski's answer, the following is a version with almost the same performance, but a bit clearer (in my opinion) and using a second utility function which may be of value in itself:
function strranges(s) # returns the ranges of bytes spanned by chars
u = Vector{UnitRange{Int64}}()
sizehint!(u,sizeof(s))
i = 1
while i<=sizeof(s)
ii = nextind(s,i)
push!(u,i:ii-1)
i = ii
end
return u
end
function shufflefast(s)
ss = convert(Vector{UInt8},s)
uu = Vector{UInt8}(length(ss))
i = 1
#inbounds for r in shuffle!(strranges(s))
for j in r
uu[i] = ss[j]
i += 1
end
end
return String(uu)
end
Example timing:
julia> using BenchmarkTools
julia> s = "ďaľšý"
julia> #btime shuffle($s) # shuffle from DNF's answer
831.200 ns (9 allocations: 416 bytes)
"ýľďša"
julia> #btime shufflefast($s) # shuffle from this answer
252.224 ns (5 allocations: 432 bytes)
"ľýďaš"
julia> #btime kaminskishufflefast($s) # shuffle from Kaminski's answer
197.345 ns (4 allocations: 384 bytes)
"ýašďľ"
EDIT: a little better performance - see code comments
This is from Bogumil Kaminski's answer where I am trying to avoid calculating length (*) if it is not necessary:
function shufflefast2(s::String)
ss = sizeof(s)
local l
for l in 1:ss
#if ((codeunit(s,l) & 0xc0) == 0x80)
if codeunit(s,l)>= 0x80 # edit (see comments bellow why)
break
end
end
ss == l && return String(shuffle!(copy(Vector{UInt8}(s))))
v = Vector{Int}(ss)
i = 1
l = 0
while i<ss
l += 1
v[l] = i
i = nextind(s, i)
end
v[l+1] = ss+1 # edit - we could do this because ss>l
p = pointer(s)
u = Vector{UInt8}(ss)
k = 1
for i in randperm(l)
# for j in v[i]:(i == l ? ss : v[i+1]-1)
for j in v[i]:v[i+1]-1 # edit we could do this because v[l+1] is defined (see above)
u[k] = unsafe_load(p, j)
k += 1
end
end
String(u)
end
Example timing for ascii string:
julia> srand(1234);#btime for i in 1:100 danshufflefast("test") end
19.783 μs (500 allocations: 34.38 KiB)
julia> srand(1234);#btime for i in 1:100 bkshufflefast("test") end
10.408 μs (300 allocations: 18.75 KiB)
julia> srand(1234);#btime for i in 1:100 shufflefast2("test") end
10.280 μs (300 allocations: 18.75 KiB)
Difference is too small, sometimes bkshufflefast is faster. Performance has to be equal. Whole length has to be count and there is same allocation.
Example timing for unicode string:
julia> srand(1234);#btime for i in 1:100 danshufflefast(s) end
24.964 μs (500 allocations: 42.19 KiB)
julia> srand(1234);#btime for i in 1:100 bkshufflefast(s) end
20.882 μs (400 allocations: 37.50 KiB)
julia> srand(1234);#btime for i in 1:100 shufflefast2(s) end
19.038 μs (400 allocations: 40.63 KiB)
shufflefast2 is a little but clearly faster here. A little more allocation than Bogumil's function and a little less allocation than in Dan's solution.
(*) - I a little hope that String implementation in Julia will be faster in future and length could be much quicker than it is now.
Related
I am working with big matrices (size of 30k rows and ~100 columns). I am doing some matrix multiplication and the process would take around 20 seconds. This is my code:
#time begin
result = -1
data = -1
for i=1:size
first_matrix = #view data[i * split,:]
for j=1:size
second_matrix = #view Qg[j * split,:]
matrix_multiplication = first_matrix * second_matrix'
current_sum = sum(matrix_multiplication)
global result
if current_sum > result
result = current_sum
data = matrix_multiplication[1,1]
end
end
end
end
Trying to optimize this a little more, I tried to use multi-threading (julia --thread 4) to get better performance.
#time begin
global result = -1
global data = -1
lock = ReentrantLock()
for i=1:size
first_matrix = #view data[i * split,:]
Threads.#threads for j=1:size
second_matrix = #view Qg[j * split,:]
matrix_multiplication = first_matrix * second_matrix'
current_sum = sum(matrix_multiplication)
global result
if current_sum > result
lock(lock)
result = current_sum
data = matrix_multiplication[1,1]
unlock(lock)
end
end
end
end
By adding multi-threading I thought I would get an increase in performance, but the performance got worse (~40 seconds). I removed the lock to see if that was the issue, but still got the same performance. I am running this on a Dual-Core Intel Core i5 (MacBook pro). Does anyone know why my multi-threading code doesn't work?
So I'm trying to wrap my head around Julia's parallelization options. I'm modelling stochastic processes as Markov chains. Since the chains are independent replicates, the outer loops are independent - making the problem embarrassingly parallel.
I tried to implement both a #distributed and a #threads solution, both of which seem to run fine, but aren't any faster than the sequential.
Here's a simplified version of my code (sequential):
function dummy(steps = 10000, width = 100, chains = 4)
out_N = zeros(steps, width, chains)
initial = zeros(width)
for c = 1:chains
# print("c=$c\n")
N = zeros(steps, width)
state = copy(initial)
N[1,:] = state
for i = 1:steps
state = state + rand(width)
N[i,:] = state
end
out_N[:,:,c] = N
end
return out_N
end
What would be the correct way of parallelizing this problem to increase performance?
Here is the correct way to do it (at the time of writing this answer the other answer does not work - see my comment).
I will use slightly less complex example than in the question (however very similar).
1. Not parallelized version (baseline scenario)
using Random
const m = MersenneTwister(0);
function dothestuff!(out_N, N, ic, m)
out_N[:, ic] .= rand(m, N)
end
function dummy_base(m=m, N=100_000,c=256)
out_N = Array{Float64}(undef,N,c)
for ic in 1:c
dothestuff!(out_N, N, ic, m)
end
out_N
end
Testing:
julia> using BenchmarkTools; #btime dummy_base();
106.512 ms (514 allocations: 390.64 MiB)
2. Parallelize with threads
#remember to run before starting Julia:
# set JULIA_NUM_THREADS=4
# OR (Linux)
# export JULIA_NUM_THREADS=4
using Random
const mt = MersenneTwister.(1:Threads.nthreads());
# required for older Julia versions, look still good in later versions :-)
function dothestuff!(out_N, N, ic, m)
out_N[:, ic] .= rand(m, N)
end
function dummy_threads(mt=mt, N=100_000,c=256)
out_N = Array{Float64}(undef,N,c)
Threads.#threads for ic in 1:c
dothestuff!(out_N, N, ic, mt[Threads.threadid()])
end
out_N
end
Let us test the performance:
julia> using BenchmarkTools; #btime dummy_threads();
46.775 ms (535 allocations: 390.65 MiB)
3. Parallelize with processes (on a single machine)
using Distributed
addprocs(4)
using Random, SharedArrays
#everywhere using Random, SharedArrays, Distributed
#everywhere Random.seed!(myid())
#everywhere function dothestuff!(out_N, N, ic)
out_N[:, ic] .= rand(N)
end
function dummy_distr(N=100_000,c=256)
out_N = SharedArray{Float64}(N,c)
#sync #distributed for ic in 1:c
dothestuff!(out_N, N, ic)
end
out_N
end
Performance (note that inter-process communication takes some time and hence for small computations threads will be usually better):
julia> using BenchmarkTools; #btime dummy_distr();
62.584 ms (1073 allocations: 45.48 KiB)
You can use #distributed macro, to run processes in parallel
#everywhere using Distributed, SharedArrays
addprocs(4)
#everywhere function inner_loop!(out_N, chain_number,steps,width)
N = zeros(steps, width)
state = zeros(width)
for i = 1:steps
state .+= rand(width)
N[i,:] .= state
end
out_N[:,:,chain_number] .= N
nothing
end
function dummy(steps = 10000, width = 100, chains = 4)
out_N = SharedArray{Float64}((steps, width, chains); pids = collect(1:4))
#sync for c = 1:chains
# print("c=$c\n")
#spawnat :any inner_loop!(out_N, c, steps,width)
end
sdata(out_N)
end
I have a file with strings of a known length, but no separator.
% What should be the result
vals = arrayfun(#(x) ['Foobar ', num2str(x)], 1:100000, 'UniformOutput', false);
% what the file looks like when read in
strs = cell2mat(vals);
strlens = cellfun(#length, vals);
The most straightforward approach is quite slow:
out = cell(1, length(strlens));
for i=1:length(strlens)
out{i} = fread(f, strlens(i), '*char');
end % 5.7s
Reading everything in and splitting it up afterwards is a lot faster:
strs = fread(f, sum(strlens), '*char');
out = cell(1, length(strlens));
slices = [0, cumsum(strlens)];
for i=1:length(strlens)
out{i} = strs(slices(i)+1:slices(i+1));
end % 1.6s
With a mex function I can get down to 0.6s, so there's still a lot of room for improvement. Can I get comparable performance with pure Matlab (R2016a)?
Edit: the seemingly perfect mat2cell function doesn't help:
out = mat2cell(strs, 1, strlens); % 2.49s
Your last approach – reading everything at once and splitting it up afterwards – looks pretty optimal to me, and is how I do stuff like this.
For me, it's running in about 80 ms seconds when the file is on a local SSD in both R2016b and R2019a, on Mac.
function out = scratch_split_strings(strlens)
%
% Example:
% in_strs = arrayfun(#(x) ['Foobar ', num2str(x)], 1:100000, 'UniformOutput', false);
% strlens = cellfun(#length, in_strs);
% big_str = cat(2, in_strs{:});
% fid = fopen('text.txt'); fprintf(fid, '%s', big_str); fclose(fid);
% scratch_split_strings(strlens);
t0 = tic;
fid = fopen('text.txt');
txt = fread(fid, sum(strlens), '*char');
fclose(fid);
fprintf('Read time: %0.3f s\n', toc(t0));
str = txt;
t0 = tic;
out = cell(1, length(strlens));
slices = [0, cumsum(strlens)];
for i = 1:length(strlens)
out{i} = str(slices(i)+1:slices(i+1))';
end
fprintf('Munge time: %0.3f s\n', toc(t0));
end
>> scratch_split_strings(strlens);
Read time: 0.002 s
Munge time: 0.075 s
Have you stuck it in the profiler to see what's taking up your time here?
As far as I know, there is no faster way to split up a single primitive array into variable-length subarrays with native M-code. You're doing it right.
Is there a convenience function for truncating strings to a certain length?
It would equivalent to something like this
test_str = "test"
if length(test_str) > 8
out_str = test_str[1:8]
else
out_str = test_str
end
In the naive ASCII world:
truncate_ascii(s,n) = s[1:min(sizeof(s),n)]
would do. If it's preferable to share memory with original string and avoid copying SubString can be used:
truncate_ascii(s,n) = SubString(s,1,min(sizeof(s),n))
But in a Unicode world (and it is a Unicode world) this is better:
truncate_utf8(s,n) = SubString(s,1, (eo=endof(s) ; neo=0 ;
for i=1:n
if neo<eo neo=nextind(s,neo) ; else break ; end ;
end ; neo) )
Finally, #IsmaelVenegasCastelló reminded us of grapheme complexity (arrrgh), and then this is what's needed:
function truncate_grapheme(s,n)
eo = endof(s) ; tt = 0 ; neo=0
for i=1:n
if (neo<eo)
tt = nextind(s,neo)
while neo>0 && tt<eo && !Base.UTF8proc.isgraphemebreak(s[neo],s[tt])
(neo,tt) = (tt,nextind(s,tt))
end
neo = tt
else
break
end
end
return SubString(s,1,neo)
end
These last two implementations try to avoid calculating the length (which can be slow) or allocating/copying, or even just looping n times when the length is shorter.
This answer draws on contributions of #MichaelOhlrogge, #FengyangWang, #Oxinabox and #IsmaelVenegasCastelló
I would do strtruncate(str, n) = join(take(str, n)).
Example:
julia> strtruncate("αβγδ", 3)
"αβγ"
julia> strtruncate("αβγδ", 5)
"αβγδ"
Note that your code is not fully valid for Unicode strings.
If the string is ASCII, this is pretty efficient:
String(resize!(str.data, n))
Or in-place:
resize!(str.data, n)
For unicode, #Fengyang Wangs's method is very fast, but converting to a Char array can be slightly faster if you only truncate the very end of the string:
trunc1(str::String, n) = String(collect(take(str, n)))
trunc2(str::String, n) = String(Vector{Char}(str)[1:n])
trunc3(str::String, n) = String(resize!(Vector{Char}(str), n))
trunc4(str::String, n::Int)::String = join(collect(graphemes(str))[1:n])
function trunc5(str::String, n)
if isascii(str)
return String(resize!(str.data, n))
else
trunc1(str, n)
end
end
Timing:
julia> time_trunc(100, 100000, 25)
0.112851 seconds (700.00 k allocations: 42.725 MB, 7.75% gc time)
0.165806 seconds (700.00 k allocations: 91.553 MB, 11.84% gc time)
0.160116 seconds (600.00 k allocations: 73.242 MB, 11.58% gc time)
1.167706 seconds (31.60 M allocations: 1.049 GB, 11.12% gc time)
0.017833 seconds (100.00 k allocations: 1.526 MB)
true
julia> time_trunc(100, 100000, 98)
0.367191 seconds (700.00 k allocations: 83.923 MB, 5.23% gc time)
0.318507 seconds (700.00 k allocations: 132.751 MB, 9.08% gc time)
0.301685 seconds (600.00 k allocations: 80.872 MB, 6.19% gc time)
1.561337 seconds (31.80 M allocations: 1.122 GB, 9.86% gc time)
0.061827 seconds (100.00 k allocations: 1.526 MB)
true
Edit: Whoops.. I just realized that I'm actually destroying the original string in trunc5. This should be correct, but with less superior performance:
function trunc5(str::String, n)
if isascii(str)
return String(str.data[1:n])
else
trunc1(str, n)
end
end
New timings:
julia> time_trunc(100, 100000, 25)
0.123629 seconds (700.00 k allocations: 42.725 MB, 7.70% gc time)
0.162332 seconds (700.00 k allocations: 91.553 MB, 11.41% gc time)
0.152473 seconds (600.00 k allocations: 73.242 MB, 9.19% gc time)
1.152640 seconds (31.60 M allocations: 1.049 GB, 11.54% gc time)
0.066662 seconds (200.00 k allocations: 12.207 MB)
true
julia> time_trunc(100, 100000, 98)
0.369576 seconds (700.00 k allocations: 83.923 MB, 5.10% gc time)
0.312237 seconds (700.00 k allocations: 132.751 MB, 9.42% gc time)
0.297736 seconds (600.00 k allocations: 80.872 MB, 5.95% gc time)
1.545329 seconds (31.80 M allocations: 1.122 GB, 10.02% gc time)
0.080399 seconds (200.00 k allocations: 19.836 MB, 5.07% gc time)
true
Aaand new edit: Aargh, forgot the timing function. I'm inputting an ascii string:
function time_trunc(m, n, m_)
str = randstring(m)
#time for _ in 1:n trunc1(str, m_) end
#time for _ in 1:n trunc2(str, m_) end
#time for _ in 1:n trunc3(str, m_) end
#time for _ in 1:n trunc4(str, m_) end
#time for _ in 1:n trunc5(str, m_) end
trunc1(str, m_) == trunc2(str, m_) == trunc3(str, m_) == trunc4(str, m_) == trunc5(str, m_)
end
Final edit (I hope):
Trying out #Dan Getz's truncate_grapheme and using unicode strings:
function time_trunc(m, n, m_)
# str = randstring(m)
str = join(["αβγπϕ1t_Ω₃!" for i in 1:100])
#time for _ in 1:n trunc1(str, m_) end
#time for _ in 1:n trunc2(str, m_) end
#time for _ in 1:n trunc3(str, m_) end
# #time for _ in 1:n trunc4(str, m_) end # too slow
#time for _ in 1:n trunc5(str, m_) end
#time for _ in 1:n truncate_grapheme(str, m_) end
trunc1(str, m_) == trunc2(str, m_) == trunc3(str, m_) == trunc5(str, m_) == truncate_grapheme(str, m_)
end
Timing:
julia> time_trunc(100, 100000, 98)
0.690399 seconds (800.00 k allocations: 103.760 MB, 3.69% gc time)
1.828437 seconds (800.00 k allocations: 534.058 MB, 3.66% gc time)
1.795005 seconds (700.00 k allocations: 482.178 MB, 3.19% gc time)
0.667831 seconds (800.00 k allocations: 103.760 MB, 3.17% gc time)
0.347953 seconds (100.00 k allocations: 3.052 MB)
true
julia> time_trunc(100, 100000, 25)
0.282922 seconds (800.00 k allocations: 48.828 MB, 4.01% gc time)
1.576374 seconds (800.00 k allocations: 479.126 MB, 3.98% gc time)
1.643700 seconds (700.00 k allocations: 460.815 MB, 3.70% gc time)
0.276586 seconds (800.00 k allocations: 48.828 MB, 4.59% gc time)
0.091773 seconds (100.00 k allocations: 3.052 MB)
true
So the last one seems clearly the best (and this post is now way too long.)
You could use the graphemes function:
C:\Users\Ismael
λ julia5
_
_ _ _(_)_ | By greedy hackers for greedy hackers.
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _' | |
| | |_| | | | (_| | | Version 0.5.0-rc3+0 (2016-08-22 23:43 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-w64-mingw32
help?> graphemes
search: graphemes
graphemes(s) -> iterator over substrings of s
Returns an iterator over substrings of s that correspond to the extended
graphemes in the string, as defined by Unicode UAX #29.
(Roughly, these are what users would perceive as single characters, even
though they may contain more than one codepoint; for example a letter
combined with an accent mark is a single grapheme.)
Example:
julia> s = "αβγπϕ1t_Ω₃!"; n = 8;
julia> length(s)
11
julia> graphemes(s)
length-11 GraphemeIterator{String} for "αβγπϕ1t_Ω₃!"
julia> collect(ans)[1:n]
8-element Array{SubString{String},1}:
"α"
"β"
"γ"
"π"
"ϕ"
"1"
"t"
"_"
julia> join(ans)
"αβγπϕ1t_"
Check out the truncate function:
julia> methods(truncate)
# 2 methods for generic function "truncate":
truncate(s::IOStream, n::Integer) at iostream.jl:43
truncate(io::Base.AbstractIOBuffer, n::Integer) at iobuffer.jl:140
help?> truncate
search: truncate
truncate(file,n)
Resize the file or buffer given by the first argument to exactly n bytes,
filling previously unallocated space with '\0' if the file or buffer is
grown.
So the solution could look like this:
julia> #doc """
truncate(s::String, n::Int)::String
truncate a `String`; `s` up to `n` graphemes.
# Example
```julia
julia> truncate("αβγπϕ1t_Ω₃!", 8)
"αβγπϕ1t_"
julia> truncate("test", 8)
"test"
```
""" ->
function Base.truncate(s::String, n::Int)::String
if length(s) > n
join(collect(graphemes(s))[1:n])
else
s
end
end
Base.truncate
Test it:
julia> methods(truncate)
# 3 methods for generic function "truncate":
truncate(s::String, n::Int64)
truncate(s::IOStream, n::Integer) at iostream.jl:43
truncate(io::Base.AbstractIOBuffer, n::Integer) at iobuffer.jl:140
help?> truncate
truncate(file,n)
Resize the file or buffer given by the first argument to exactly n bytes,
filling previously unallocated space with '\0' if the file or buffer is
grown.
truncate(s::String, n::Int)::String
truncate a String; s up to n graphemes.
Example
≡≡≡≡≡≡≡≡≡
julia> truncate("αβγπϕ1t_Ω₃!", 8)
"αβγπϕ1t_"
julia> truncate("test", 8)
"test"
julia> truncate("αβγπϕ1t_Ω₃!", n)
"αβγπϕ1t_"
julia> truncate("test", n)
"test"
Profile it:
julia> Pkg.add("BenchmarkTools")
INFO: Nothing to be done
INFO: METADATA is out-of-date — you may not have the latest version of BenchmarkTools
INFO: Use `Pkg.update()` to get the latest versions of your packages
julia> using BenchmarkTools
julia> #benchmark truncate("αβγπϕ1t_Ω₃!", 8)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 1.72 kb
allocs estimate: 48
minimum time: 1.96 μs (0.00% GC)
median time: 2.10 μs (0.00% GC)
mean time: 2.45 μs (7.80% GC)
maximum time: 353.75 μs (98.40% GC)
julia> Sys.cpu_info()[]
Intel(R) Core(TM) i7-4710HQ CPU # 2.50GHz:
speed user nice sys idle irq ticks
2494 MHz 937640 0 762890 11104468 144671 ticks
You could use:
"test"[1:min(end,8)]
Also
SubString("test", 1, 8)
Here's one that can handle any UTF-8 string:
function trim_str(str, max_length)
edge = nextind(str, 0, max_length)
if edge >= ncodeunits(str)
str
else
str[1:edge]
end
end
I'm try to run an ocean temperature model for 25 years using the explicit method (parabolic differential equation).
If I run for a year a = 3600 or five years a = 18000 it works fine.
However, when I run it for 25 years a = 90000 it crashes.
a is the amount of time steps used. And a year is considered to be 360 days. The time step is 4320 seconds, delta_t = 4320..
Here is my code:
program task
!declare the variables
implicit none
! initial conditions
real,parameter :: initial_temp = 4.
! vertical resolution (delta_z) [m], vertical diffusion coefficient (av) [m^2/s], time step delta_t [s]
real,parameter :: delta_z = 2., av = 2.0E-04, delta_t = 4320.
! gamma
real,parameter :: y = (av * delta_t) / (delta_z**2)
! horizontal resolution (time) total points
integer,parameter :: a = 18000
!declaring vertical resolution
integer,parameter :: k = 101
! declaring pi
real, parameter :: pi = 4.0*atan(1.0)
! t = time [s], temp_a = temperature at upper boundary [°C]
real,dimension(0:a) :: t
real,dimension(0:a) :: temp_a
real,dimension(0:a,0:k) :: temp
integer :: i
integer :: n
integer :: j
t(0) = 0
do i = 1,a
t(i) = t(i-1) + delta_t
end do
! temperature of upper boundary
temp_a = 12. + 6. * sin((2. * t * pi) / 31104000.)
temp(:,0) = temp_a(:)
temp(0,1:k) = 4.
! Vertical resolution
do j = 1,a
do n = 1,k
temp(j,n) = temp(j-1,n) + (y * (temp(j-1,n+1) - (2. * temp(j-1,n)) + temp(j-1,n-1)))
end do
temp(:,101) = temp(:,100)
end do
print *, temp(:,:)
end program task
The variable a is on line 11 (integer,parameter :: a = 18000)
As said, a = 18000 works, a = 90000 doesn't.
At 90000 get I get:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
RUN FAILED (exit value 1, total time: 15s)
I'm using a fortran on windows 8.1, NetBeans and Cygwin (which has gfortran built in).
I'm not sure if this problem is caused through bad compiler or anything else.
Does anybody have any ideas to this? It would help me a lot!
Regards
Take a look at the following lines from your code:
integer,parameter :: k = 101
real,dimension(0:a,0:k) :: temp
integer :: n
do n = 1,k
temp(j,n) = temp(j-1,n) + (y * (temp(j-1,n+1) - (2. * temp(j-1,n)) + temp(j-1,n-1)))
end do
Your array temp has bounds of 0:101, you loop n from 1 to 101 where in iteration n=101 you access temp(j-1,102), which is out of bounds.
This means you are writing to whatever memory lies beyond temp and while this makes your program always incorrect, it is only causing a crash sometimes which depends on various other things. Increasing a triggers this because column major ordering of your array means k changes contiguously and is strided by a, and as a increases your out of bounds access of the second dimension is further in memory beyond temp changing what is getting overwritten by your invalid access.
After your loop you set temp(:,101) = temp(:,100) meaning there is no need to calculate temp(:,101) in the above loop, so you can change its loop bounds from
do n = 1,k
to
do n = 1, k-1
which will fix the out of bounds access on temp.