Arangoimp exception - arangodb

I have a csv file with the following structure:
h1_h2,hashtag1,hashtag2,coccurrence
39108234088393,9230981401776738405,11889764071793228909,2
48887306406636,2844752706633868157,14936885980370043276,2
...
There are 1028112 lines in file.
I tried to import it into collection via arangoimp
arangoimp --file E:\current_crawler\Data\edges\edges_for_graph_ep_83.csv --collection edges_temp4 --create-collection true
--create-collection-type edge --type csv --translate "hashtag1=_from" --translate "hashtag2=_to" --from-collection-prefix hashtags
--to-collection-prefix hashtags --translate "h1_h2=_key" --server.database "newDB" --server.username username --server.password password
and got the error:
Connected to ArangoDB 'http+tcp://127.0.0.1:8529', version 3.3.4, database: 'newDB', username: 'svm'
----------------------------------------
database: newDB
collection: edges_temp4
from collection prefix: hashtags
to collection prefix: hashtags
create: yes
source filename: E:\current_crawler\Data\edges\edges_for_graph_ep_83.csv
file type: csv
quote: "
separator:
threads: 2
connect timeout: 5
request timeout: 1200
----------------------------------------
Starting CSV import...
2018-08-22T16:49:21Z [3012] INFO processed 1998848 bytes (3%) of input file
2018-08-22T16:49:21Z [3012] INFO processed 3964928 bytes (6%) of input file
2018-08-22T16:49:21Z [3012] INFO processed 5963776 bytes (9%) of input file
2018-08-22T16:49:22Z [3012] INFO processed 7929856 bytes (12%) of input file
2018-08-22T16:49:22Z [3012] INFO processed 9928704 bytes (15%) of input file
2018-08-22T16:49:22Z [3012] INFO processed 11894784 bytes (18%) of input file
2018-08-22T16:49:22Z [3012] INFO processed 13893632 bytes (21%) of input file
2018-08-22T17:09:23Z [3012] ERROR Caught exception Expecting item during import
What does thar error means? The file is OK, there is no empty lines and duplicated _keys in it. Moreover, when I rebooted the system and tried again, there was no such error, it imported successfully.
I'd appreciate all the help I can get.
Envoronment:
Storage Engine: RocksDB
Deployment Mode: Single Server
Configuration: Intel Xeon X5650 x2, 32GB RAM
Operating System: Windows 10

Related

extract values of specific area from command output

Need some advises on how I can extract values from output execution of command. Snippet output below. The output generates so much info, but I just need to extract the value of machine, state and address as per snippet below.
I would like to have the output that have list of machine, state and address
machine state address
0 started 1.9.10.34
0/kvm/0 started 1.9.10.21
xxxxx xxxxxxx xxxxxxx
This is the code I used.
for line in stdout:
line = line.strip()
if not line:
continue
#if line.startswith("0"):
machine_id, state, instance_id = line.split()[0:3]
print(f"Machine ID: {machine_id}, State: {state}, Address: {address}")
f.write(f"Machine ID: {machine_id}, State: {state}, Address:
{address}\n")
Please advise how can I only extract the info related to machine, state and address only. Thank you.

Is there a string size limit when feeding .fromstring() method as input?

I'm working on multiple well-formed xml files, whose sizes range from 100 MB to 4 GB. My goal is to read them as strings and then import them as ElementTree objects using .fromstring() method (from xml.etree.ElementTree module).
However, as the process goes through and the string size increases, two exceptions occured related to memory restriction :
xml.etree.ElementTree.ParseError: out of memory: line 1, column 0
OverflowError: size does not fit in an int
It looks like .fromstring() method enforces a string size limit to the input, around 1GB... ?
To debug this, I wrote a short script using a for loop:
xmlFiles_list = [path1, path2, ...]
for fp in xmlFiles_list:
xml_fo = open(fp, mode='r', encoding="utf-8")
xml_asStr = xml_fo.read()
xml_fo.close()
print(len(xml_asStr.encode("utf-8")) / 10**9) # display string size in GB
try:
etree = cElementTree.fromstring(xml_asStr)
print(".fromstring() success!\n")
except Exception as e:
print(f"Error :{type(e)} {str(e)}\n")
continue
The ouput is as following :
0.895206753
.fromstring() success!
1.220224531
Error :<class 'xml.etree.ElementTree.ParseError'> out of memory: line 1, column 0
1.328233473
Erreur :<class 'xml.etree.ElementTree.ParseError'> out of memory: line 1, column 0
2.567867904
Error :<class 'OverflowError'> size does not fit in an int
4.080672538
Error :<class 'OverflowError'> size does not fit in an int
I found multiple workarounds to avoid this issue : .parse() method or lxml module for bette performance. I just hope someone could shed some light on this :
Is there a specific string size limit in xml.etree.ET module and .fromstring() method ?
Why do I end up with two different exceptions as the string size increases ? Are they related to the same memory-allocation restriction ?
Python version/system: 3.9 (64 bits)
RAM : 32go
Hope my topic is clear enough, I'm new on stackoverflow

PXSSH Connection fails sometimes randomly after upgrading to Python3

I am trying to create a ssh session using pexpect.pxssh as follows:
from pexpect import pxssh
connection = pxssh.pxssh()
connection.login('localhost', username, password, port=port, check_local_ip=False)
"""
Fails with the following error
pexpect.pxssh.ExceptionPxssh: Could not establish connection to host
"""
Also I create two sessions one after the other, the first session connects without a problem but the second session fails to connect with the same code. Also, sometimes the code works properly and is able to connect both times. I have also added retries just to be sure that it's not a random event.
Another thing to note is that this code runs without a problem with Python 2 but with Python 3 this happens. I couldn't find any difference in the connection mechanism b/w Python2 and Python3. Any help will be appreciated!
EDIT: After adding logging as per comment:
2021-06-25 10:49:37 INFO - Attempting to connect to device on port 10022.
Connecting to USB device...
Jun 25 10:49:37 tcprelay[203] : Created thread to connect [::1]:10022->[::1]:58316<12> to unix:0<15>
user#localhost's password: xxxxx
Jun 25 10:49:37 tcprelay[203] : Exiting thread to connect [::1]:10022->[::1]:58316 to unix:0
Connecting to USB device...
Jun 25 10:49:38 tcprelay[203] : Created thread to connect [::1]:10022->[::1]:58317<12> to unix:0<15>
user#localhost's password: xxxxx
Jun 25 10:49:39 tcprelay[203] : Exiting thread to connect [::1]:10022->[::1]:58316 to unix:0
The code retries 2 times and then fails.
Note: I am adding a port offset of 10000 using tcprelay
EDIT:
Sorry I was not logging the error properly.
2021-06-25 15:45:34 - ERROR - Failed to connect. Retrying...
2021-06-25 15:45:34 - ERROR - End Of File (EOF). Empty string style platform.
<pexpect.pxssh.pxssh object at 0x127feb0a0>
command: /usr/bin/ssh
args: ['/usr/bin/ssh', '-q', '-oNoHostAuthenticationForLocalhost=yes', '-p', 'xxxxx', '-l', 'xxxxx', 'localhost']
buffer (last 100 chars): b''
before (last 100 chars): b' \r\n'
after: <class 'pexpect.exceptions.EOF'>
match: None
match_index: None
exitstatus: None
flag_eof: True
pid: 30020
child_fd: 26
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <_io.BufferedWriter name='<stdout>'>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
0: re.compile(b'(?i)are you sure you want to continue connecting')
1: re.compile(b'[#$]')
2: re.compile(b'(?i)(?:password:)|(?:passphrase for key)')
3: re.compile(b'(?i)permission denied')
4: re.compile(b'(?i)terminal type')
5: TIMEOUT
Traceback (most recent call last):
File "/src/helpers/utilities.py", line 590, in try_connect_ssh
connection.make_connection(ipaddress=ipaddress, user=user,
File "/src/transport/myssh.py", line 26, in make_connection
self.ssh_process.login(ipaddress, user, password, port=port, sync_multiplier=5, check_local_ip=False)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/pxssh.py", line 418, in login
i = self.expect(session_regex_array)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 343, in expect
return self.expect_list(compiled_pattern_list,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 372, in expect_list
return exp.expect_loop(timeout)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/expect.py", line 179, in expect_loop
return self.eof(e)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pexpect/expect.py", line 122, in eof
raise exc
pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform.
<pexpect.pxssh.pxssh object at 0x127feb0a0>
EDIT:
Using macOS Mojave
pExpect 3.8.0
Device asks for password but after password is sent the connection returns EOF

TensorFlow lite and keras model integration

I have been trying to put my keras model .tflite file into the google's Tflitecamera demo. But i am getting an allocation error(Cannot convert between a TensorFlowLite buffer with 12288 bytes and a ByteBuffer with 1072812 bytes. at ).
I assume it is because of wrong bytebuffer allocation.
ByteBuffer.allocate(
DIM_BATCH_SIZE
* getImageSizeX()
* getImageSizeY()
* DIM_PIXEL_SIZE
* getNumBytesPerChannel());
Could anyone shed some light on this. I am a newbie to TensorFlow.
Following is the log
08-10 11:56:28.905 28066-28066/android.example.com.tflitecamerademo E/MultiWindowProxy: getServiceInstance failed!
08-10 11:56:35.675 28066-28092/android.example.com.tflitecamerademo E/AndroidRuntime: FATAL EXCEPTION: CameraBackground
Process: android.example.com.tflitecamerademo, PID: 28066
java.lang.IllegalArgumentException: Cannot convert between a TensorFlowLite buffer with 12288 bytes and a ByteBuffer with 1072812 bytes.
at org.tensorflow.lite.Tensor.throwExceptionIfTypeIsIncompatible(Tensor.java:175)
at org.tensorflow.lite.Tensor.setTo(Tensor.java:65)
at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:126)
at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:168)
at org.tensorflow.lite.Interpreter.run(Interpreter.java:145)
at com.example.android.tflitecamerademo.ImageClassifierFloatInception.runInference(ImageClassifierFloatInception.java:103)
at com.example.android.tflitecamerademo.ImageClassifier.classifyFrame(ImageClassifier.java:136)
at com.example.android.tflitecamerademo.Camera2BasicFragment.classifyFrame(Camera2BasicFragment.java:702)
at com.example.android.tflitecamerademo.Camera2BasicFragment.-wrap0(Camera2BasicFragment.java)
at com.example.android.tflitecamerademo.Camera2BasicFragment$4.run(Camera2BasicFragment.java:597)
at android.os.Handler.handleCallback(Handler.java:822)
at android.os.Handler.dispatchMessage(Handler.java:104)
at android.os.Looper.loop(Looper.java:207)
at android.os.HandlerThread.run(HandlerThread.java:61)
You can find the reason of this error in the following link :
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/Tensor.java
As you see in line 170 to 181, if Capacity of the output buffer is not equal to the number of Bytes then this error will happen.

Python 3: Requests response.iter_content: ChunkedEncodingError

I am using requests stream for preforming a 'GET' download of a remote very large CSVs, then chunking response using response.iter_content(). This has been working for multiple data providers.
However, for one remote data provider, when using response.iter_content(), occasionally I am getting a ChunkedEncodingError, specifically:
ChunkedEncodingError: (
ProtocolError(
'Connection broken: IncompleteRead(921 bytes read, 103 more expected)',
IncompleteRead(921 bytes read, 103 more expected)),
)
Here is the Python 3 code, and I would like to know of an alternative to resolving this chunking exception problem:
tmp_csv_chunk_sum = 0
with open(
file=tmp_csv_file_path,
mode='wb',
encoding=encoding_write
) as csv_file_wb:
try:
for chunk in response.iter_content(chunk_size=8192):
if not chunk:
break
tmp_csv_chunk_sum += 8192
csv_file_wb.write(chunk)
csv_file_wb.flush()
os.fsync(csv_file_wb.fileno())
except Exception as ex:
self.logger.error(
"Request CSV Download: Exception",
extra={
'error_details': str(ex),
'chunk_total_sum': tmp_csv_chunk_sum
}
)
raise
I truly appreciate assistance, Thank you

Resources