Can't load python NMSLIB index? - python-3.x

I'm trying to load an index of about 8.6GB using nmslib by the following commands:
search_index = nmslib.init(method='hnsw', space='cosinesimil')
search_index.loadIndex('./data/search/search_index.nmslib')
But, the following error happens:
Check failed: data_level0_memory_
Traceback (most recent call last):
search_index.loadIndex("./data/search/search_index.nmslib")
RuntimeError: Check failed: it's either a bug or inconsistent data!
My computer has only 8GB as primary memory. So, did this error happen because the index is over 8GB and could not be loaded into memory?
Thank you for any help.

Related

lx-symbols: Python Exception <class 'gdb.MemoryError'> Cannot access memory at address in module

I am trying to debug a module in the kernel.
when I use lx-symbols, it shows the error message
(gdb)lx-symbols
loading vmlinux
Traceback (most recent call last):
File "/home/hlleng/linux/imx6ull/100ask_imx6ull-qemu/linux-4.9.88/sc
ripts/gdb/linux/symbols.py",line 163,in invoke
self.load_all_symbols()
File "/home/hlleng/linux/imx6ull/100ask_imx6ull-qemu/linux-4.9.88/sc
ripts/gdb/linux/symbols.py",line 150,in load_all_symbols
[self.load module symbols(module)for module in module list]
File "/home/hlleng/linux/imx6ull/100ask_imx6ull-qemu/linux-4.9.88/sc
ripts/gdb/linux/symbols.py",line 110,in load_module_symbols
module_name module['name'].string()
gdb.MemoryError:Cannot access memory at address 0x7f00040c
Error occurred in Python:Cannot access memory at address 0x7f00040c
(gdb)c
Continuing.
[root#qemu_imx6ul:/mnt]# cat /proc/modules
hello_drv_cdev 2631 0 - Live 0x7f000000 (O)
It seems it cannot access the module address.
I have already added nokaslr:
[root#qemu_imx6ul:/mnt]# cat /proc/cmdline
nokaslr console=ttymxc0,115200 rootfstype=ext4 root=/dev/mmcblk1 rw rootwait init=/sbin/init loglevel=8
Also, breakpoints to a module functions, such 'open', are not worked: GDB doesn't stop on them.
The kernel version is 4.9.88.
Environment is Linux qemu_imx6ul.
I already tried
add-auto-load-safe-path vmlinux-gdb.py
and it supports approx lx now.

zarr.consolidate_metadata yields error: 'memoryview' object has no attribute 'decode'

I have an existing LMDB zarr archive (~6GB) saved at path. Now I want to consolidate the metadata to improve read performance.
Here is my script:
store = zarr.LMDBStore(path)
root = zarr.open(store)
zarr.consolidate_metadata(store)
store.close()
I get the following error:
Traceback (most recent call last):
File "zarr_consolidate.py", line 12, in <module>
zarr.consolidate_metadata(store)
File "/local/home/marcel/.virtualenvs/noisegan/local/lib/python3.5/site-packages/zarr/convenience.py", line 1128, in consolidate_metadata
return open_consolidated(store, metadata_key=metadata_key)
File "/local/home/marcel/.virtualenvs/noisegan/local/lib/python3.5/site-packages/zarr/convenience.py", line 1182, in open_consolidated
meta_store = ConsolidatedMetadataStore(store, metadata_key=metadata_key)
File "/local/home/marcel/.virtualenvs/noisegan/local/lib/python3.5/site-packages/zarr/storage.py", line 2455, in __init__
d = store[metadata_key].decode() # pragma: no cover
AttributeError: 'memoryview' object has no attribute 'decode'
I am using zarr 2.3.2 and python 3.5.2. I have another machine running python 3.6.2 where this works. Could it have to do with the python version?
Thanks for the report. Should be fixed with gh-452. Please test it out (if you are able).
If you are able to share a bit more information on why read performance suffers in your case, that would be interesting to learn about. :)

'Cannot allocate memory' when submit Spark job

I got an error when try to submit a Spark Job to yarn.
But I can't understand which JVM throwed this error.
How can I avoid this error?
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006bff80000, 3579314176, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 3579314176 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/meituan/warehouse/ETL/sqlweaver/hs_err_pid6272.log
[DBUG]: AppID NOT FOUND
Traceback (most recent call last):
File "/opt/meituan/warehouse/ETL/sqlweaver/../sqlweaver/step.py", line 119, in act
self.actionclass.run(step = self)
File "/opt/meituan/warehouse/ETL/sqlweaver/../sqlweaver/action/base.py", line 60, in run
cls.__run__(step)
File "/opt/meituan/warehouse/ETL/sqlweaver/../sqlweaver/action/sparkjobsubmit.py", line 43, in __run__
handler.submit_job(step.template, step.args['submit_user'])
File "/opt/meituan/warehouse/ETL/sqlweaver/../sqlweaver/handler/SparkBaseHandler.py", line 115, in submit_job
raise RuntimeError(msg_f)
RuntimeError: <StringIO.StringIO instance at 0x5a929e0>
Traceback (most recent call last):
File "/opt/meituan/warehouse/ETL/sqlweaver/../sqlweaver/engine/koala.py", line 34, in process
step.act()
File "/opt/meituan/warehouse/ETL/sqlweaver/../sqlweaver/step.py", line 128, in act
raise e
RuntimeError: <StringIO.StringIO instance at 0x5a929e0>
Fail in Action Step(SparkJobSubmit)
"

Gensim multicore LDA overflow error

I'm having an issue running multicored LDA in gensim (generating 2000 topics, 1 pass using 15 workers). I get the error below, I initially thought it might not have to do with saving the model, but looking at the error (the code still keeps running, at least the process hasn't quit).
Anyone know what I can do to prevent this error from occurring?
python3 run.py --method MultiLDA --ldaparams 2000 1 --workers 15 --path $DATA/gender_spectrum/
Traceback (most recent call last):
File "/usr/lib64/python3.5/multiprocessing/queues.py", line 241, in _feed
obj = ForkingPickler.dumps(obj)
File "/usr/lib64/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB```

apache spark "Py4JError: Answer from Java side is empty"

I get this error every time...
I use sparkling water...
My conf-file:
***"spark.driver.memory 65g
spark.python.worker.memory 65g
spark.master local[*]"***
The amount of data is about 5 Gb.
There is no another information about this error...
Does anybody know why it happens? Thank you!
***"ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
raise Py4JError("Answer from Java side is empty")
Py4JError: Answer from Java side is empty
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused"***
Have you tried setting spark.executor.memory and spark.driver.memory in your Spark configuration file?
See https://stackoverflow.com/a/22742982/5453184 for more info.
Usually, you'll see this error when the Java process get silently killed by the OOM Killer.
The OOM Killer (Out of Memory Killer) is a Linux process that kicks in when the system becomes critically low on memory. It selects a process based on its "badness" score and kills it to reclaim memory.
Read more on OOM Killer here.
Increasing spark.executor.memory and/or spark.driver.memory values will only make things worse in this case, i.e. you may want to do the opposite!
Other options would be to:
increase the number of partitions if you're working with very big data sources;
increase the number of worker nodes;
add more physical memory to worker/driver nodes;
Or, if you're running your driver/workers using docker:
increase docker memory limit;
set --oom-kill-disable on your containers, but make sure you understand possible consequences!
Read more on --oom-kill-disable and other docker memory settings here.
Another point to note if you are on wsl2 using pyspark. Ensure that your wsl2 config file has an increased memory.
# Settings apply across all Linux distros running on WSL 2
[wsl2]
# Limits VM memory to use no more than 4 GB, this can be set as whole numbers using GB or MB
memory=12GB # This was originally set to 3gb which caused me to fail since spark.executor.memory and spark.driver.memory was only able to MAX of 3gb regardless of how high i set it.
# Sets the VM to use eight virtual processors
processors=8
for reference. your .wslconfig config file should be located in C:\Users\USERNAME

Resources