Not possible to increase number of threads in Julia - multithreading

I changed the number of threads to 4 following the documentation. I know that the maximum number of threads is limited by Sys.CPU_THREADS which is 8 in my case. But I can't change it to 8. Why is this?
What I am doing:
set JULIA_NUM_THREADS=8 in cmd
There shows no error, but in Julia I still have Threads.nthreads() = 4

If you use Atom you can set the number of threads in the settings of the julia-client package (see screenshot). The default is set to number of cores which would probably be the four you are experiencing.

Related

Which 3GPP spec identifies the maximim number of PDP Contexts a UE can activate?

So I know that the network limit on the max number of contexts a UE can activate is 11. However, I can't find in the 3GPP specs where this is stated explicitly. I've been searching for hours now and can't find anything. Can anyone point me in the right direction?
Okay, I think I've figured it out. It turns out I needed a little bit more time to continue digging. The limit is specified indirectly through the number of available NSAPIs that can be assigned. The number of NSAPIs is limited by the definition of the NSAPI IE specified in TS 24.008 (I'm using Rel4 for this, the situation may have changes in later releases) section 10.5.6.2. The definition allocates 4 bits to the NSAPI which encodes 11 NSAPIs and 5 reserved values. Since each PDP context needs to be allocated a specific NSAPI then there can only be a max of 11 PDP contexts because there is only 11 NSAPIs available.

Python3 multiprocessing: Memory Allocation Error

I know that this question has been asked a lot of times, but the answers are not applicable.
This is answer one of a parallelized loop using multiprocessing on StackoverFlow:
import multiprocessing as mp
def processInput(i):
return i * i
if __name__ == '__main__':
inputs = range(1000000)
pool = mp.Pool(processes=4)
results = pool.map(processInput, inputs)
print(results)
This code works fine. But if I increase the range to 1000000000, my 16GB of Ram are getting filled completely and I get [Errno 12] Cannot allocate memory. It seems as if the map function starts as many processes as possible. How do I limit the number of parallel processes?
The pool.map function starts 4 processes as you instructed it (in the line processes=4 you instruct the pool on how many processes it can use to perform your logic).
There is however a different issue underlying this implementation.
The pool.map function will return a list of objects, in this case its numbers.
Numbers do not act like int-s in ANSI-C they have overhead and will not overflow (e.g. turn to -2^31 whenever reaching 2^31+1 on 32-bit).
Also python lists are not array and do incur an overhead.
To be more specific, on python 3.6, running the following code will reveal some overhead:
>>>import sys
>>>t = [1,2,3,4]
>>>sys.getsizeof(t)
96
>>>t = [x for x in range(1000)]
>>>sys.getsizeof(t)
9024
So this means 24 bytes per number on small lists and ~9 bytes on large lists.
So for a list the size of 10^9 we get about 8.5GB
EDIT: 1. As tfb mentioned, this is not even the size of the underlying Number objects, just pointers and list overhead, meaning there is much more memory overhead I did not account for in the original answer.
Default python installation on windows is 32-bit (you can get 64-bit installation but you need to check the section of all available downloads in the python website), So I assumed you are using the 32-bit installation.
range(1000000000) creates a list of 10^9 ints. This is around 8GB (8 bytes per int on a 64-bit system). You are then trying to process this to create another list of 10^9 ints. A really really smart implementation might be able to do this on a 16GB machine, but its basically a lost cause.
In Python 2 you could try using xrange which might or might not help. I am not sure what the Python 3 equivalent is.

SharePoint Search Component has multiple process and use lots memory?

I'm using sharepoint2013 + windows2012. I noticed that the SP search component has 5 processes in taskmgr. each uses about 400-500 MB memory. Is this normal? I also tried
Set-SPEnterpriseSearchService -PerformanceLevel Reduced
But it did not change anything. Should I restart the server?
I never nooticed this on other SP server I worked before. Just curious, is it because of SP 2013, some default settings?
thanks
user3211586 ‘s link worked for me. Basically this article says:
Quick and Dirty
Kill the noderunner.exe (Microsoft Sharepoint Search Component) process via TaskManager
This will obviously break everything related to Search on the site
Production
Change the Search Service Performance Level with powerhsell
Get-SPEnterpriseSearchService | Set-SPEnterpriseSearchService –PerformanceLevel “PartlyReduced”
Performance Level Explained:
Reduced: Total number of threads = number of processors, Max Threads/host = number of processors
PartlyReduced: Total number of threads = 4 times the number of processors , Max Threads/host = 16 times the number of processors
Maximum: Total number of threads = 4 times the number of processors , Max Threads/host = 16 times the number of processors (threads are created at HIGH priority)
For the setting to take effect do an IISReset or restart the Search Service in Central Admin
I had the same issue as the OP and running Set-SPEnterpriseSearchService –PerformanceLevel “PartlyReduced” followed by IISRESET /noforce resolved the issue for me.
Please check below given article:
http://social.technet.microsoft.com/wiki/contents/articles/20413.sharepoint-2013-performance-issues-with-search-service-application.aspx
When I tried this method, and when I changed the config setting from 0 to any value between 1 and 500, it did reduce the memory usage but the search stopped working. After I reverted back the config settings to 0, the memory usage increased but search started working again.

FIO Flexible IO tester for repetitive data access patterns

I am currently working on a project and I need to test my prototype with repetitive data access patterns. I came across fio which is a flexible I/O tester for Linux (1).
Fio has many options and I want it to produce a workload which accesses the same blocks of a file, the same number of times over and over again. I also need those accesses to not be equal among these blocks. For instance, if fio creates a file named "test.txt"
and this file is divided on 10 blocks, I need the workload to read a specific number of these blocks, with different number of IOs each, over and over again. Let's say that it chooses to access block 3, 7 and 9. Then I want to access these in a specific order and a specific number of times each, over and over again. If this workload can be described by N passes, then I want to be something like this:
1st pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
2nd pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
...
N-pass: read block 3 10 times, read block 7 5 times, read block 9 2 times.
Question 1: Can the above workload be produced with Fio? If yes, How?
Question 2: Is there a mailing list, forum, website, community for Fio users?
Thank you,
Nick
http://www.spinics.net/lists/fio/index.html This is the website you can follow mailing list.
http://www.bluestop.org/fio/HOWTO.txt link will also help you.
This is actually quite a tricky thing to do. The closest you'll get with parameters is using one of the non-uniform distributions (see random_distribution in the HOWTO) but you'll be saying re-read blocks A, B, C more than blocks X, Y, Z and you won't be able to control the exact counts.
An alternative is to write an iolog that can be replayed that has the exact sequence you're looking for (see Trace file format v2 in the HOWTO).

Joblib parallel increases time by n jobs

While trying to get multiprocessing to work (and understand it) in python 3.3 I quickly reverted to joblib to make my life easier. But I experience something very strange (in my point of view). When running this code (just to test if it works):
Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(200000))
It takes about 9 seconds but by increasing n_jobs it actually takes longer... for n_jobs=2 it takes 25 seconds and n_jobs=4 it takes 27 seconds.
Correct me if I'm wrong... but shouldn't it instead be much faster if n_jobs increases? I have an Intel I7 3770K so I guess it's not the problem of my CPU.
Perhaps giving my original problem can increase the possibility of an answer or solution.
I have a list of 30k+ strings, data, and I need to do something with each string (independent of the other strings), it takes about 14 seconds. This is only the test case to see if my code works. In real applications it will probably be 100k+ entries so multiprocessing is needed since this is only a small part of the entire calculation.
This is what needs to be done in this part of the calculation:
data_syno = []
for entry in data:
w = wordnet.synsets(entry)
if len(w)>0: data_syno.append(w[0].lemma_names[0])
else: data_syno.append(entry)
The n_jobs parameter is counter intuitive as the max number of cores to be used is at -1. at 1 it uses only one core. At -2 it uses max-1 cores, at -3 it uses max-2 cores, etc. Thats how I read it:
from the docs:
n_jobs: int :
The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

Resources