base64 encoding of Image Links - MySQL Fatal errors - base64

Recently I've been working on migrating a very large client database from a shared host to a VPS (the size of the main databases alone are around 5GB) and while I've managed to migrate the majority of the tables successfully via the shell (and increasing the packet cap to 2GB) there is one table in particular which has been giving me problems.
The issue is that I have about 20k rows in the middle of a table which will not upload either in phpMyAdmin or via the shell because they contain base64 encoded images which keep causing mySQL to throw me "max packet reached errors" even though the cap is at 2GB.
I tried reaching out to my server admin and his support team and they are also stumped about how to go about processing this data so any assistance is greatly appreciated.
Also, the table in question is only 160mb or so uncompressed and the segment alone in question (split from the others) is only ~60mb so I don't think the packet cap is the issue.
update I just ran the query to show packet sizes and MySQL showed: 1048576 under the max_allowed_packet value
update2 When I show warnings in MySQL I keep getting errors about rows not having data in all columns however when I replaced the empty strings with Null the errors persist so I am still confused
IonCube seems to be enabled on the original client server so I'm not sure if that is also causing any errors.

Related

Limiting Kismet log files to a size or duration

Looking for a solid way to limit the size of Kismet's database files (*.kismet) through the conf files located in /etc/kismet/. The version of Kismet I'm currently using is 2021-08-R1.
The end state would be to limit the file size (10MB for example) or after X minutes of logging the database is written to and closed. Then, a new database is created, connected, and starts getting written to. This process would continue until Kismet is killed. This way, rather than having one large database, there will be multiple smaller ones.
In the kismet_logging.conf file there are some timeout options, but that's for expunging old entries in the logs. I want to preserve everything that's being captured, but break the logs into segments as the capture process is being performed.
I'd appreciate anyone's input on how to do this either through configuration settings (some that perhaps don't exist natively in the conf files by default?) or through plugins, or anything else. Thanks in advance!
Two interesting ways:
One could let the old entries be taken out, but reach in with SQL and extract what you wanted as a time-bound query.
A second way would be to automate the restarting of kismet... which is a little less elegant.. but seems to work.
https://magazine.odroid.com/article/home-assistant-tracking-people-with-wi-fi-using-kismet/
If you read that article carefully... there are lots of bits if interesting information here.

ArangoDB - arangoimp on csv files is very slow on large datasets

I am new to arango. I'm trying to import some of my data from Neo4j into arango.
I am trying to add millions of nodes and edges to store playlist data for various people. I have the csv files from neo4j. I ran a script to change the format of the csv files of node to have a _key attribute. And the edges to have a _to and _from attribute.
When I tried this on a very small dataset, things worked perfectly and I could see the graph on the UI and perform queries. Bingo!
Now, I am trying to add millions of rows of data ( each arangoimp batch imports a csv with about 100,000 rows ). Each batch has 5 collections ( a different csv file for each)
After about 7-8 batches of such data, the system all of a sudden gets very slow, unresponsive and throws the following errors:
ERROR error message: failed with error: corrupted collection
This just randomly comes up for any batch, though the format of the data is exactly the same as the previous batches
ERROR Could not connect to endpoint 'tcp://127.0.0.1:8529', database: '_system', username: 'root'
FATAL got error from server: HTTP 401 (Unauthorized)'
Otherwise it just keeps processing for hours with barely any progress
I'm guessing all of this has to do with the large number of imports. Some post said that maybe I have too many file descriptors, but I'm not sure how to handle it.
Another thing I notice, is that the biggest collection of all the 5 collections, is the one that mostly gets the errors ( although the other ones also do). Do the file descriptors remain specific to a certain collection, even on different import statements?
Could someone please help point me in the right direction? I'm not sure on how to begin debugging the problem
Thank you in advance
The problem here is, that the server must not be overrun in terms of available disk I/O. The situation may benefit from more available RAM.
The system also has to maintain indices while importing, which increases complexity with the number of documents in the collections.
With ArangoDB 3.4 we have improved Arangoimp to maximize throughput, without maxing out which should resolve this situation and remove the necessity to split the import data into chunks.
However, as its already is, the CSV format should be prepared, JSONL is also supported.

Streamsets throws exception (MANUAL_FLUSH buffer) while using Kudu client

I'm a newbie in Streamsets and Kudu technologies and I'm trying several solutions to reach my goal:
I've got a folder containing some Avro files and these files need to be processed and afterward sent to a Kudu schema.
https://i.stack.imgur.com/l5Yf9.jpg
When using an Avro file containing a couple hundreds of records all goes right, but when the number of records increases to 16k this error is shown:
Caused by:
org.apache.kudu.client. NonRecoverableException:
MANUAL_FLUSH is enabled but the buffer is too big.
I've searched in all available configurations both on Streamsets and Kudu and the only solution that I was able to apply consists in editing the Java source code, deleting a single row that switched from the default flush mode to the manual one; this works but it's not the optimal solution because it requires to edit and compile this file each time I want to use it on a new machine.
Anyone knows how to avoid this happens?
Thanks in advance!

Gridgain: Write timed out (socket was concurrently closed)

While trying to upload data to Gridgain using GridDataLoader, I'm getting
'Write timed out (socket was concurrently closed).'
I'm trying to load 10 million lines of data using a .csv file on a cluster having 13 nodes (16 core cpu).
The structure of my GridDataLoader is GridDataLoader where Key is a composite key. While using a primitive data type as the key there was no issue. But when I changed it to a composite key this error is coming.
I guess this is because it takes up too large space on heap when it tries to parse your csv and create entries. As a result, if you don't configure your heap-size large enough, you are likely suffering from GC pauses since when GC kicks in, everything has to pause, and that's why you got this time out error. I think it may help if you can break that large csv into smaller files and load them one by one.

SAS and Oracle CLOB unwanted conversion

We have some SAS programs and they used to run fine and our old Oracle database.
We've migrated to a new Oracle DB on Amazon RDS (I don't know if it's relevant) and a new SAS Server instance (from 9.1 to 9.3, don't know if relevant).
When running our programs, we are constantly facing the issue when string types, when uploaded to Oracle with a proc sql or data step, are randomly (?) converted to CLOB or LOB.
Our strings do not exceed the maximum length authorized by Oracle (for the varchar type), they are pretty short actual, but still, the data gets uploaded to CLOB. That affects our whole process for reading the data.
We have found this workaround but I'm not a fan:
data oracledb.new_data;
length REGION $ 50;
format REGION $char50.;
set old_data;
run;
The fact is, many, many string columns get randomly converted to CLOB
Do you know how to solve that issue? Does it come from the Oracle side (I doubt it) or from the SAS side (but what has changed?)
Thanks for your help,
I hope I have provided you with enough information
Ok found the error.
While transforming the data, if you don't specify the output type, the SAS datasets can automatically inherit long, very long formats.
Here is the fix:
http://support.sas.com/kb/24/804.html
This will reduce the length to the minimum without losing data. When uploading to Oracle, the formats will be adapted.
NB: I just reduced the size of my dataset by 80%. I recommend this macro to everyone.
Follow up question (please answer in comments):
This changes only the format, not the informat. Is that an issue, can it affect performance?
Thanks!

Resources