In Julia, how to create a macro consisting of several optional macros? - multithreading

In Julia, I am trying out different parallelization libraries, to make my program more performant, and to check if memory consumption is the same as with no parallelization. The unfortunate effect of this is a lot of duplication.
Is there a way to organize my code so that I write the algorithm only once and then some macro with a parameter decides how the code is parallelized? My question is similar to this one. For example, my MWE
using ThreadsX, Folds, FLoops, Polyester
create_data = (n,s) -> [rand(1:n,r) for j=1:n for r∈[rand(1:s)]]
function F!(method ::Int, L ::Vector{Vector{Int}}) ::Nothing
n = length(L)
if method==0 for j=1:n sort!(L[j]) end end
if method==1 Threads.#threads for j=1:n sort!(L[j]) end end
if method==2 ThreadsX.foreach(1:n) do j sort!(L[j]) end end
if method==3 Folds.foreach(1:n) do j sort!(L[j]) end end
if method==4 FLoops.#floop for j=1:n sort!(L[j]) end end
if method==5 Polyester.#batch for j=1:n sort!(L[j]) end end
return nothing end
for mtd=0:5
L = create_data(10^6,10^3);
#time F!(mtd,L) end
returns
17.967120 seconds
4.537954 seconds (38 allocations: 3.219 KiB)
4.418978 seconds (353 allocations: 27.875 KiB)
5.583201 seconds (54 allocations: 3.875 KiB)
5.542852 seconds (53 allocations: 3.844 KiB)
4.263488 seconds (3 allocations: 80 bytes)
so there are different performances already for a very simple problem.
In my actual case, instead of sort!(L[j]) I have lots of intensive code with several Arrays, Vector{Vector}s, Dicts, ..., where different threads read from occasionally the same place, but write to different places, allocate space in memory, mutate the input, etc. Is there a way to create a new macro #Parallel so that my code would be just
function F!(method ::Int, L ::Vector{Vector{Int}}) ::Nothing
n = length(L)
#Parallel(method) for j=1:n sort!(L[j]) end
return nothing end
Note that I have never created a macro, I only used them thus far, so some explanation would be welcome.

A macro-based solution is possible, but seems unnecessary to me. I'd rather organize code like this:
function testfun!(x::Vector{Int})
# here goes the repetitive part
return sort!(x)
end
# entry point for dispatch, and set up data
function F(kernel!, M, N, strategy::Symbol)
L = create_data(M, N)
return F!(kernel!, L, Val{strategy}())
end
# and a function to run the loop for every strategy
function F!(kernel!, L, ::Val{:baseline})
n = length(L)
for j=1:n kernel!(L[j]) end
end
function F!(kernel!, L, ::Val{:Threads})
n = length(L)
Threads.#threads for j=1:n kernel!(L[j]) end
end
function F!(kernel!, L, ::Val{:ThreadsX})
n = length(L)
ThreadsX.foreach(1:n) do j kernel!(L[j]) end
end
# ...
# + rest of the loop functions for Floops, Folds, etc.
function dotests()
for strategy = (:baseline, :Threads, :ThreadsX, ...)
#benchmark F(testfun!, 10^6, 10^3, strategy)
end
end
This is showing a dispatch-based approach. You could equally well use dictionaries or conditions. The important point is separating the "runner" F! from the "kernel" function.

Related

Reducing memory allocations when building Vector{UInt8} from parts

I am looking to build a Vector{UInt8} from different parts like so:
using BenchmarkTools
using Random
const a = Vector{UInt8}("Number 1: ")
const b = Vector{UInt8}(", Number 2: ")
const c = Vector{UInt8}(", Number 3: ")
const d = Vector{UInt8}(", Number 4: ")
function main(num2, num4)::Vector{UInt8}
return vcat(
a,
Vector{UInt8}(string(rand(1:100))),
b,
Vector{UInt8}(string(num2)),
c,
Vector{UInt8}(string(rand(1:100))),
d,
Vector{UInt8}(string(num4)),
)
end
#btime main(70.45, 12) # 486.224 ns (13 allocations: 1.22 KiB)
#Example output: "Number 1: 50, Number 2: 70.45, Number 3: 10, Number 4: 12"
It seems wrong to convert to string then Vector{UInt8}. I dont mind the 1 allocation that occurs when joining the Vectors.
Converting an integer to a vector of digits in UInt8 format can be done very efficiently. Converting a float is a bit more tricky.
All in all, I think your code is already quite efficient. Here's a suggestion for speeding up the integer code. The floating point code, I haven't been able to improve:
function tobytes(x::Integer)
N = ndigits(x)
out = Vector{UInt8}(undef, N)
for i in N:-1:1
(x, r) = divrem(x, 0x0a)
out[i] = UInt8(r) + 0x30
end
return out
end
tobytes(x) = Vector{UInt8}(string(x))
# notice that we generate random UInt8 values instead of rand(1:100), as this is faster. They still have to be converted according to the character interpretation, though.
function main2(num2, num4)
[a; tobytes(rand(0x01:0x64)); b; tobytes(num2); c; tobytes(rand(0x01:0x64)); d; tobytes(num4)]
end
tobytes for intergers are now close to optimal, the runtime is dominated by the time to pre-allocate the Vector{UInt8}.

Out of memory error copying cells from one sheet to another

There are a lot of questions regarding this issue. I read many of them and tried a few things but they don't fix my case.
I am trying to compare lines from two different (very long) sheets. If specific indices match then specific cells (always the same columns with the current line) need to be copied from one sheet into the other.
It looks like this just bigger (enlarged example):
Dim ArrayOne() as string
Dim ArrayTwo() as string
Redim ArrayOne (1 to AmountOfRowsSheet1)
Redim ArrayTwo (1 to AmountOfRowsSheet2)
For i = 1 to AmountOfRowsSheet1
ArrayOne(i) = Sheet1.Cells(i, ThisColumn)
next i
For i = 1 to AmountOfRowsSheet2
ArrayTwo(i) = Sheet2.Cells(i, ThatColumn)
next i
for i = 1 to 4600
for j = 1 to 69000
if ArrayOne(i) Like "*" & ArrayTwo(j) then
Sheet1.Cells(i, 5).value = Sheet3.Cells(i,10).value
'the line above is repeated about 20 times just with different columns
'so it gets potentially executed 4600*69000*20 times (6348000000)
end if
next j
next i
For-loop and everything is working, it also copies correctly but after an amount of lines I run out of memory. In the TaskManager I can see my used RAM tick up every few seconds. At one point Excel displays an error that it can't handle the next copying because of a lack of resources.
I tried:
Application.CutCopyMode = False '( at restart of loop)
Creating an empty data object and putting it into the clipboard.
and a few user32.dll fixes I found.
I turned your example into how you would work with arrays
Option Explicit
Sub Example()
Dim ArrayOne() As Variant
Dim ArrayTwo() As Variant
ArrayOne = Sheet1.Columns(1).Value 'read column 1 into array
ArrayTwo = Sheet2.Columns(2).Value 'read column 2 into array
Dim start
start = Timer
Dim i As Long
For i = 1 To 4600
Dim j As Long
For j = 1 To 69000
If ArrayOne(i, 1) Like "*" & ArrayTwo(j, 1) Then
Sheet.Cells(i, 5).Value = Sheet.Cells(i, 10).Value + 1
End If
Next j
Debug.Print i, start, Timer, "Runtime=" & Timer-start
Stop 'we want to test time of one iteration = 23 seconds
Next i
End Sub
This example run 23 seconds (on my computer) for one iteration of the j loop. So this will run in total 23*4600 seconds which is about 30 hours.
So either you strip down the data that needs to be processed or you use something else than Excel VBA to get it faster. Or you change your entire approach.
Note that VBA is limited to single threading. So no matter how many cores your CPU has VBA will only use one. That makes it actually a pretty bad tool for processing big data.
Actually what you need to get rid of is the read/write actions to the cells
Sheet.Cells(i, 5).Value = Sheet.Cells(i, 10).Value
Whenever you access a cell value it slows down a lot. Without that line the loop runs in 2 instead of 23 seconds (still a total runtime of 2.5 hours). So there is potential to get this faster, but probably not much faster than 2.5 hours.
If you cannot get rid of multiple read/write actions then even turning off calculation Application.Calculation = xlCalculationManual before going into the loop brings an immense boost. Just don't forget to turn it on Application.Calculation = xlCalculationAutomatic in the end. Note that turning off calculation only works if you have no formulas that need to be calculated while your loop runs (otherwise you get faulty results).
I recommend to try to improve your real code like above and check the runtime for one full run of the inner j loop as I did with the stop command. This way you can easily calculate the entire runtime by multiplication with 4600.
Not an answer to the question but instead of:
For i = 1 to AmountOfRowsSheet1
ArrayOne(i) = Sheet1.Cells(i, ThisColumn)
next i
try:
ArrayOne= Range(Cells(1, ThisColumn), Cells(AmountOfRowsSheet1, ThisColumn))
ArrayOne will be a 2D array, with data starting in (1,1) and incrementing (n,1)...
Quicker to get data and similar can be used for putting an array back into a worksheet - also miles quicker than a for loop.
Edit: Again, not direct answer to the question, but this:
import random, string
# ---------------------------------------------------------
#This part is just generating random data to compare against each other (and in case of lists 5 & 6, the data on the sheet in Sheets1(i,5) & Sheets3(i,10)
N1 = 6
list2 = []
list5 = [] #This would correspond to existing vals in Sheets(1.cells(i,5)
list6 = [] #and this to Sheets3.cells(i,10)
for i1 in range(0, 4600):
list2.append(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N1)))
list5.append(5)
list6.append(10)
N2 = 12
list3 = []
for i1 in range(0, 69000):
list3.append(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N2)))
list2[0] = "$$$$$$" #Just setting two values so we can check the method works
list3[10] = "$$$$££££££"
# ---------------------------------------------------------
#This part is actually doing what your trying to do in VBA
list4 = []
ij1 = 0
for j1 in list2:
found = False
for j2 in list3:
if j1 in j2:
found = True
break
if found:
list4.append(list6[j1])
else:
list4.append(list5[j1])
ij1 += 1
The bit your interested in runs in around 25 seconds. Absolutely no fancy code-work needed. Go look at downloading anaconda. You'd probably be quicker reading your two excel files into python, do your ops, then writing back out again than trying to do it purely in VBA.

VBA: How to a edit a code to eliminate the out of stack space error?

This code used to work, but recently I am getting an error (out of stack space).
I think the code is failing because I am calling a function too many times without exiting/ending.
If that is the case, how many times can you call a function and is there something I can do to fix this?
I am not the original author of this code.
I included the sub where the error occurs.
Sub CalculatePct(e As Variant)
Dim G As Integer
Dim pct As Double
Dim Owned100Pct As Boolean
If entities(e) < 0 Then
pct = 0
Owned100Pct = True ' Keeps track if the entity exists in the table other than as a parent
For G = 1 To UBound(MainArray, 1)
If MainArray(G, colEntity) = e Then
Owned100Pct = False
If entities(MainArray(G, colParent)) = -1 Then
'If we don't know the parent's ownership percentage, go and calculate it
CalculatePct MainArray(G, colParent)
End If
pct = pct + CDbl(MainArray(G, colPct)) / 100 * entities(MainArray(G, colParent))
End If
Next
If Owned100Pct Then
'Assume 100% owned if we don't know the parentage
'("Outside" entities won't go through here as they are already set to 0%)
entities(e) = 1
Else
'Store the entity's percentage
entities(e) = pct
End If
End If
End Sub
A #TimWilliams noted in the comments - you have an endless recursion loop.
Highlighting the problem area:
Sub CalculatePct(e As Variant)
[...]
If entities(MainArray(G, colParent)) = -1 Then
CalculatePct MainArray(G, colParent)
End If
[...]
End Sub
e is the parameter, and entities(e) is checked. In the code, MainParent(G, colParent) is used in place of e, so the next call to the routine gives e = MainParent(G, colParent).
Up to the point in the code, you do not change the value of G, colParent. entities or MainArray. So if entities(MainArray(G, colParent)) = -1 it will be forever calling itself.
Without knowing anything else about the code (including if recursion is necessary) I cannot suggest any definitive solutions. However, some things to consider:
Rewriting to be a loop instead of a recursive call
Making the recursive call to a subset of MainArray
Doing any amendments to G or colParent prior to the recursive
call
You've offered no indication of what line the error occurs on nor what MainArray represents but I'm guessing that MainArray has grown to a size greater than what can be accessed with a signed short integer.
Change the declaration of your iteration variable to a Signed Long Integer. This raises the functional limit of the variable from 32,767 iterations to 2,147,483,647.
Dim G As Long

contains with separate result for each of multiple patterns

Matlab's documentation for the function TF = contains(str,pattern) states:
If pattern is an array containing multiple patterns, then contains returns 1 if it finds any element of pattern in str.
I want a result for each pattern individually however.
That is:
I have string A='a very long string' and two patterns B='very' and C='long'. I want to check if B is contained in A and if C is contained in A. I could do it like this:
result = false(2,1);
result(1) = contains(A,B);
result(2) = contains(A,C);
but for many patterns this takes quite a while. What is the fast way to do this?
I don't know or have access to that function; it must be "new", so I don't know its particular idiosyncrasies.
How I would do that is:
result = ~cellfun('isempty', regexp(A, {B C}));
EIDT
Judging from the documentation, you can do the exact same thing with contains:
result = contains(A, {B C});
except that seems to return contains(A,B) || contains(A,C) rather than the array [contains(A,B) contains(A,C)]. So I don't know, I can't test it here. But if all else fails, you can use the regex solution above.
The new text processing functions in 16b are the fastest with string. If you convert A to a string you may see much better performance.
function profFunc
n = 1E6;
A = 'a very long string';
B = 'very';
C = 'long';
tic;
for i = 1:n
result(1) = contains(A,B);
result(2) = contains(A,C);
end
toc;
tic;
for i = 1:n
x = regexp(A, {B,C});
end
toc;
A = string(A);
tic;
for i = 1:n
result(1) = contains(A,B);
result(2) = contains(A,C);
end
toc;
end
>> profFunc
Elapsed time is 7.035145 seconds.
Elapsed time is 9.494433 seconds.
Elapsed time is 0.930393 seconds.
Questions: Where do B and C come from? Do you have a lot of hard coded variables? Can you loop? Looping would probably be the fastest. Otherwise something like
cellfun(#(x)contains(A,x),{B C})
is an option.

Reading a file of lists of integers in Fortran

I would like to read a data file with a Fortran program, where each line is a list of integers.
Each line has a variable number of integers, separated by a given character (space, comma...).
Sample input:
1,7,3,2
2,8
12,44,13,11
I have a solution to split lines, which I find rather convoluted:
module split
implicit none
contains
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
integer :: i, j, k, n, m, p, r
character(*) :: str
character :: sep, c
character(:), allocatable :: tmp
!First pass: find number of items (m), and maximum length of an item (r)
n = len_trim(str)
m = 1
j = 0
r = 0
do i = 1, n
if(str(i:i) == sep) then
m = m + 1
r = max(r, j)
j = 0
else
j = j + 1
end if
end do
r = max(r, j)
allocate(a(m))
allocate(character(r) :: tmp)
!Second pass: copy each item into temporary string (tmp),
!read an integer from tmp, and write this integer in the output array (a)
tmp(1:r) = " "
j = 0
k = 0
do i = 1, n
c = str(i:i)
if(c == sep) then
k = k + 1
read(tmp, *) p
a(k) = p
tmp(1:r) = " "
j = 0
else
j = j + 1
tmp(j:j) = c
end if
end do
k = k + 1
read(tmp, *) p
a(k) = p
deallocate(tmp)
end function
end module
My question:
Is there a simpler way to do this in Fortran? I mean, reading a list of values where the number of values to read is unknown. The above code looks awkward, and file I/O does not look easy in Fortran.
Also, the main program has to read lines with unknown and unbounded length. I am able to read lines if I assume they are all the same length (see below), but I don't know how to read unbounded lines. I suppose it would need the stream features of Fortran 2003, but I don't know how to write this.
Here is the current program:
program read_data
use split
implicit none
integer :: q
integer, allocatable :: a(:)
character(80) :: line
open(unit=10, file="input.txt", action="read", status="old", form="formatted")
do
read(10, "(A80)", iostat=q) line
if(q /= 0) exit
if(line(1:1) /= "#") then
a = string_to_integers(line, ",")
print *, ubound(a), a
end if
end do
close(10)
end program
A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.
I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.
If you always use the comma you can remove the replacing by comma and thus shorten it even more.
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
character(*) :: str
character :: sep
integer :: i, n_sep
n_sep = 0
do i = 1, len_trim(str)
if (str(i:i)==sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
allocate(a(n_sep+1))
read(str,*) a
end function
Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.
The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not
do i = 2, len_trim(str)
if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.
read(unit=10,fmt=*) a
will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).
status = 0
do while ( status == 0)
line_counter = line_counter + 1
read(unit=10,, iostat=status, fmt=*)
end do
allocate(a(counter*21))
If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.
Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by
do j = 1, line_counter ! You determined this on your first read
read(unit=11,fmt=*) a(j,:) ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
! You have a separate vector numberOfCommas(j) from before
end do
And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.

Resources