MPI4PY strange OS error - python-3.x

I have a complex MPI4PY script, that gives a seemingly impossible error.
The important part of the script:
for rnd in range(50):
if rnd > 0:
WEIGHT_FILE = '{}/weights_{}.wts'.format(WORK_DIR, rnd - 1)
WORK_DIR = '{}'.format(rnd)
if PROCESS_NUM == 0:
if not os.path.isdir(WORK_DIR):
os.mkdir(WORK_DIR)
....
So after the second iteration i get OS Error, cannot create directory, directory exists. How is this possible? If the directory exists, if should not create it. PROCESS_NUM is the MPI rank, so only one process should try to create it. Is there some kind of race condition, or locking error? Any idea?

You need to create the full path name before checking:
if not os.path.isdir(os.path.join(full_path, WORK_DIR)):
Let's use:
os.makedirs(WORK_DIR, exist_ok=True)

I seem to found the answer, and i was deep in the architecture, not related to python.
I was using SLURM distribution manager with mpich, and on one of the nodes there was an installation of open-mpi as well alongside mpich causing some trouble. The numbering of the cores on that node was 0/1 for all allocations, causing a race condition in the script cause multiple cores got the same PROCESS_NUM.

Related

pandarallel package on windows infinite loop bug

so this is not really a question but rather a bug report for the pandarallel package:
this is the end of my code:
...
print('Calculate costs NEG...')
for i, group in tqdm(df_mol_neg.groupby('DELIVERY_DATE')):
srl_slice = df_srl.loc[df_srl['DATE'] == i]
srl_slice['srl_soll'] = srl_slice['srl_soll'].copy() * -1
df_aep_neg.loc[df_aep_neg['DATE'] == i, 'SRL_cost'] = srl_slice['srl_soll'].parallel_apply(lambda x: get_cost_of_nearest_mol(group, x)).sum()
what happens here is that instead of doing the parallel_apply function, it loops back to the start of my code and repeats it all again. the exact same code works fine on my remote linux mashine so I have 2 possible error sources:
since pandarallel itself already has some difficulties with the windows os it might just be a windows problem
the other thing is that I currently use the early access version of pycharm (223.7401.13) and use the debugger which might also be a problem source
other than this bug I can highly recommend the pandarallel package (at least for linux users). it's super easy to use and if you got some cores it can really shave off some time, in my case it shaved off a cool 90% of time.
(also if there is a better way to report bugs, please let me know)

Tune Hyperparameter in sklearn with ray

I wonder but could not found any information why this appears all the time if I try to tune hyperparameter from sklearn with TuneSearchCV:
Note that the important part is the Log sync warning and as a result that the logging in combination with tensorflow and search_optimization such as optuna does not work:
Backend is sklearn
Concatenating h5 datasets of the following files:
('output/example_train_1.h5', 'output/example_train_2.h5')
based on the following keys:
('x', 'y')
Concatenation successful, resulting shapes for the given dsets:
Key: x, shape: (20000, 25)
Key: y, shape: (20000,)
Log sync requires rsync to be installed.
Process finished with exit code 0
The tuning processes seem to be working, as long as I do not use search-optimization such as optional.
I use it within a docker container. I got through the ray-documentation, but I could find the source where I think the error drops. However, I could not find any settings or additional options on how to prevent it.
Furthermore, it seems that rsync is just necessary if I use a cluster. But actually, I don't do that right now.
The warning (Log sync requires rsync to be installed.) does not stop the script from executing. If rsync is not installed, it will just not synchronize logs between nodes, which seems to be unnecessary in your case anyway. You shouldn't run into any problem there.
It's hard to say what the problem here is, as we're missing crucial information: Which version of Ray are you running, which version of tune-sklearn, and how does your training script look like?
If you're running into problems and you suspect it is a bug, please consider opening an issue in the tune-sklearn repository, and make sure to include the above information and preferably a minimal reproducible script so the maintainers can look into this.

Python keeps running for 10+mins (after last statement in program) when there is huge (33GB) data structure in memory (nothing in swap)

I have need to parse a huge gz file (about ~10GB compressed, ~100GB uncompressed). The code creates data structure ('data_struct') in memory. I am running on a machine with Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz with 16 CPUs and plenty RAM (ie 200+ GB), running CentOS-6.9. I have implemented these things using a Class in Python3.6.3 (CPython) as shown below :
class my_class():
def __init__(self):
cmd = f'gunzip huge-file.gz'
self.process = subprocess(cmd, stdout=subprocess.PIPE, shell=True)
self.data_struct = dict()
def populate_struct(self):
for line in process.stdout:
<populate the self.data_struct dictionary>
def __del__():
self.process.wait()
#del self.data_struct # presence/absence of this statement decreases/increases runtime respectively
#================End of my_class===================
def main():
my_object = my_class()
my_object.populate_struct()
print(f'~~~~ Finished populate_struct() ~~~~') # last statement in my program.
## Python keeps running at 100% past the previous statement for 10+mins
if __name__ == '__main__':
main()
#================End of Main=======================
The resident memory consumption of my data_struct in memory (RAM only, no swap) is about ~33GB. I did $ top to find the PID of Python process and traced the Python process using $ strace -p <PID> -o <out_file> (to see what Python is doing). While it is executing populate_struct(), I can see in the out_file of strace that Python is using calls like mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0684160000 to create data_struct. While Python was running past the last print() statement, I found that Python was issuing only munmap() operations as shown below :
munmap(0x2b3c75375000, 41947136) = 0
munmap(0x2b3c73374000, 33558528) = 0
munmap(0x2b4015d2a000, 262144) = 0
munmap(0x2b4015cea000, 262144) = 0
munmap(0x2b4015caa000, 262144) = 0
munmap(0x2b4015c6a000, 262144) = 0
munmap(0x2b4015c2a000, 262144) = 0
munmap(0x2b4015bea000, 262144) = 0
munmap(0x2b4015baa000, 262144) = 0
...
...
The Python keeps running anywhere between 10+ mins to 12mins after the last print() statement. An observation is that if I have del self.data_struct statement in __del__() method, then it takes 2mins only. I have done these experiments multiple times and the runtime decrease/increase by the presence/absence of del self.data_struct in __del__().
My questions :
I understanding is that Python is doing cleanup work by using munmap(), but unlike Python, other languages like Perl immediately release memory and exit the program. Am I doing it right by implementing as shown above ? Is there a way to tell Python to avoid this munmap() ?
Why does it take 10+mins to cleanup if there is no del self.data_struct statement in __del__(), and takes only 2mins to cleanup if there is del self.data_struct statement in __del__() ?
Is there a way to speedup the cleanup work ie munmap()?
Is there a way to exit program immediately without the cleanup work ?
Other thoughts/suggestions about tackling this problem are appreciated.
Please try a more recent version of Python (at least 3.8)? This shows several signs of being a mild(!) form of a worst-case quadratic-time algorithm in CPython's object deallocator, which was rewritten here (and note that the issue linked to here in turn contains a link to an older StackOverflow post with more details):
https://bugs.python.org/issue37029
Some glosses
If my guess is right, the amount of memory isn't particularly important - it's instead the sheer number of distinct Python objects being managed by CPython's "small object allocator" (obmalloc.c), combined with "bad luck" in the order in which their memory is released.
When that code was first written, RAM wasn't big enough to hold millions of Python objects, so nobody noticed that one particular part of deallocation logic could take time quadratic in the number of allocated "arenas" (details aren't really helpful, but "arenas" are the granularity at which system mmap() and munmap() calls are made - 256 KiB chunks).
It's not those mapping calls that are consuming mounds of time, and any decent implementation of any language using OS memory mapping facilities will eventually call munmap() enough times to release the OS resources consumed by its mmap() calls.
So that's a red herring. munmap() is being called many times simply because you allocated many objects, which required many mmap() calls.
There isn't any crisp or easy way to explain exactly when the problem shows up. See "bad luck" above ;-) The relevant code was rewritten for CPython 3.8 to be worst-case linear time instead, which gave a factor of ~250 speedup for the specific program that triggered the issue report (see the link already given).
As a comment noted, you can exit your program immediately at any time by invoking os._exit(), but the leading underscore is meant to scare you off: "immediately" means "immediately". No cleanups of any kind are performed. For example, the __del__ method in your class? Skipped. __del__ is run as a side effect of deallocation, but if you actually "immediately release memory and exit the program" then no destructors of any kind are run, nor any handlers registered with the atexit module, etc etc. It's as drastic as a program dying, e.g., with a segfault.

Python too many subprocesses?

I'm trying to start a lot of python procees on a single machine.
Here is a code snippet:
fout = open(path, 'w')
p = subprocess.Popen((python_path,module_name),stdout=fout,bufsize=-1)
After about 100 processes I'm getting the error below:
Running on win 10 64 bit, Python 3.5. Any Idea what that might be? Already tried to split the start (so start from two scripts) as well as sleep command. After a certain number of processes, the error shows up. Any Idea what that might be? Thanks a lot for any hint!
PS:
Some background. Each process opens database connections as well as does some requests using the requests package. Then some calculations are done using numpy, scipy etc.
PPS: Just discover this error message:
dll load failed the paging file is too small for this operation to complete python (when calling scipy)
Issues solved through reinstalling numpy and scipy + installing mkl.
Strange about this error was that it only appeared after a certain number of processes. Would love to hear if anybody knows why this happened!

"Out of Memory Error (Java)" when using R and XLConnect package

I tried to load a ~30MB excel spreadsheet into R using the XLConnect package.
This is what I wrote:
wb <- loadWorkbook("largespreadsheet.xlsx")
And after about 15 seconds, I got the following error:
Error: OutOfMemoryError (Java): GC overhead limit exceeded.
Is this a limitation of the XLConnect package or is there a way to tweak my memory settings to allow for larger files?
I appreciate any solutions/tips/advice.
Follow the advice from their website:
options(java.parameters = "-Xmx1024m")
library(XLConnect)
If you still have problems with importing XLSX files you can use this opiton. Anwser with "Xmx1024m" didn't work and i changed to "-Xmx4g".
options(java.parameters = "-Xmx4g" )
library(XLConnect)
This link was useful.
Use read.xlsx() in the openxlsx package. It has no dependency on rJava thus only has the memory limitations of R itself. I have not explored in much depth for writing and formatting XLSX but it has some promising looking vignettes. For reading large spreadsheets, it works well.
Hat tip to #Brad-Horn. I've just turned his comment as an answer because I also found this to be the best solution!
In case someone encounters this error when reading not one huge but many files, I managed to solve this error by freeing Java Virtual Machine memory with xlcFreeMemory(), thus:
files <- list.files(path, pattern = "*.xlsx")
for (i in seq_along(files)) {
wb <- loadWorkbook(...)
...
rm(wb)
xlcFreeMemory() # <= free Java Virtual Machine memory !
}
This appears to be the case, when u keep using the same R-session over and over again without restarting R-Studio. Restarting R-Studio can help to allocate a fresh memory-heap to the program. It worked for me right away.
Whenever you are using a library that relies on rJava (such as RWeka in my case), you are bound to hit the default heap space (512 MB) some day. Now, when you are using Java, we all know the JVM argument to use (-Xmx2048m if you want 2 gigabytes of RAM). Here it's just a matter of how to specify it in the R environnement.
options(java.parameters = "-Xmx2048m")
library(rJava)
As suggested in this here, make sure to run the option function in the first line in your code. In my case, it worked only when I restarted the R session and run it in the first line.
options(java.parameters = "-Xmx4g" )
library(XLConnect)

Resources