When I dump my database , 75% was already Okay.But It returns failed :document is corrupted - document

When I dump my database, 75% was already Okay. But It returns failed: the document is corrupted. What should I do

Related

MS Azure Data Factory ADF Copy Activity from BLOB to Azure Postgres Gen5 8 cores fails with connection closed by host error

I am using ADF copy acivity to copy files on azure blob to azure postgres.. im doing recursive copy i.e. there are multiple files withing the folder.. thats fine.. size of 5 files which i have to copy is total around 6 gb. activity fails after 30-60 min of run. used write batch size from 100- 500 but still fails.
used 4 or 8 orauto DIUS, similarly tried used 1,2,4,8 or auto parallel connections to postgres.normally it seems it uses 1 per source file. azure postgres server has 8 cores and temp buffer size is 8192 kb. max allowed is 16000 something kb. even tried using that but 2 errors which i have been constantly getting. ms support team suggested to use retry option. still awaiting response from there pg team if i get something but below r the errors.
Answer: {
'errorCode': '2200',
'message': ''Type=Npgsql.NpgsqlException,Message=Exception while reading from stream,Source=Npgsql,''Type=System.IO.IOException,Message=Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.,Source=System,''Type=System.Net.Sockets.SocketException,Message=An existing connection was forcibly closed by the remote host,Source=System,'',
'failureType': 'UserError',
'target': 'csv to pg staging data migration',
'details': []
}
or
Operation on target csv to pg staging data migration failed: 'Type=Npgsql.NpgsqlException,Message=Exception while flushing stream,Source=Npgsql,''Type=System.IO.IOException,Message=Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.,Source=System,''Type=System.Net.Sockets.SocketException,Message=An existing connection was forcibly closed by the remote host,Source=System
I was also facing this issue recently and contacted our microsoft rep who got back to me with the following update on 2020-01-16:
“This is another issue we found in the driver, we just finished our
deployment yesterday to fix this issue by upgrading driver version.
Now customer can have up to 32767 columns data in one batch size(which
is the limitation in PostgreSQL, we can’t exceed that).
Please let customer make sure that (Write batch size* column size)<
32767 as I mentioned, otherwise they will face the limitation. “
"Column size" refers to the count of columns in the table. The "area" (row write batch size * column count) cannot be greater than 32,767.
I was able to change my ADF write batch size on copy activity to a dynamic formula to ensure optimum batch sizes per table with the following:
#div(32766,length(pipeline().parameters.config)
pipeline().parameters.config refers to an array containing information about columns for the table. the length of the array = number of columns for table.
hope this helps! I was able to populate the database (albeit slowly) via ADF... would much prefer a COPY based method for better performance.

Couchdb views crashing for large documents

Couchdb keeps crashing whenever I try to build the index of the views of a design document emitting values for large documents. The total size of the database is 40 MB and I guess the documents are about 5 MB each. We're talking about large JSON without any attachment.
What concerns me is that I have 2.5 GB of free ram before trying to access the views but as soon as I try to access them, the CPU usage raises to 99% and all the free RAM gets eaten by erl.exe before the indexing fails with exit code 1.
Here is the log:
[info] 2016-11-22T22:07:52.263000Z couchdb#localhost <0.212.0> -------- couch_proc_manager <0.15603.334> died normal
[error] 2016-11-22T22:07:52.264000Z couchdb#localhost <0.15409.334> b9855eea74 rexi_server throw:{os_process_error,{exit_status,1}} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
Views skipping these documents can be accessed without issue. Which general guidelines could you provide me to help with this kind of situation? I am using couchdb 2.0 on windows.
Many thanks
Update : I tried to limit the number of view server instances to 1 and vary the max RAM allowed for couchjs, but it keeps crashing. Also I noticed that even though CouchDb is supposed to pass only one document at a time to the view server, erl.exe keeps eating all the available RAM (3GB used for three 5mb docs to update...). Initially I thought this could be because of the multiple couchjs instances but apparently this isn't the case.
Update : Made some progress, now it looks like the indexing is progressing well for just less than 10 minutes then erl.exe crashes. I have posted the dump here (just to clarify "well" means, 99% CPU usage and computer screen completely frozen).

MemSQL code generation has failed: Failed to codegen

Ihave workstation of 250 gb Ram and 4 tb SSD. The memsql has a table that contains 1 billion records each of which 44 columns with 500 gb data. When I run the following query on that table
SELECT count(*) ct,name,age FROM research.all_data group by name having count(*) >100 order by ct desc
I got the following error
MemSQL code generation has failed
I made a restart to the server and after that I got another error
Not enough memory available to complete the current request. The request was not processed
I gave the server maximum mermory 220 GB and max_table_memory 190 GB.
why that error could happen?
why memsql consuming 140 gb from memory however I am using column store?
For "MemSQL code generation has failed", check the tracelog (http://docs.memsql.com/docs/trace-log) on the MemSQL node where the error was hit for more details - this can mean a lot of different things.
MemSQL needs memory to process query results, hold some metadata, etc. even though columnstore data lives on disk. Check memsql status info to see what is using memory - https://knowledgebase.memsql.com/hc/en-us/articles/208759276-What-is-using-memory-on-my-leaves-.

Recurring SQLite Error - Unable to open database file

I have a SQLite database that I am cleaning and reshaping for analysis. It is approximately 120GB in size.
We bought a much better machine, more cores, 20x the RAM, etc. It also has a set of four 1TB drives (I have the root on one physical drive, and I have /home on a RAID0 array of the other three physical drives).
I was working on a machine with much fewer resources to start and my script ran perfectly (albeit slowly). I never had these errors and corruptions. But when I ran the exact same script on the new machine I starting getting this error. I am unable to open the database. I have backups, but this is happening so frequently that I haven't made any progress on the analysis since moving to the new machine.
Error in sqliteNewConnection(drv, ...) :
RS-DBI driver: (could not connect to dbname:
unable to open database file
)
Calls: insertLinks ... .valueClassTest -> is -> is -> sqliteNewConnection -> .Call
In addition: Warning message:
In mclapply(idx, function(idx1) { :
all scheduled cores encountered errors in user code
Execution halted
Every insert I perform (1,000 - 10,000 records at a time) is encapsulated in a transcation. Here is the code that causes this specific error above.
insertLinks <- function(b) {
con <- dbConnect(SQLite(), dbpro)
sql <- 'insert into link (emailid, md5, field, email, domain, company) values (?, ?, ?, ?, ?, ?)'
dbBeginTransaction(con)
dbGetPreparedQuery(con, sql, bind.data = b)
dbCommit(con)
dbDisconnect(con)
}
Error when I run the .log stdout and try to open the database from inside SQLite prompt
(14) cannot open file at line 29016 of [27392118af]
(14) os_unix.c:29016: (2) open((unreachable)/databaseFile.db)

Uploading a huge file from ec2 to s3 fails

I'm trying to upload 160 Gb file from ec2 to s3 using
s3cmd put --continue-put FILE s3://bucket/FILE
but every time uploading interrupts with the message:
FILE -> s3://bucket/FILE [part 10001 of 10538, 15MB] 8192 of 15728640 0% in 1s 6.01 kB/s failed
ERROR: Upload of 'FILE' part 10001 failed. Aborting multipart upload.
ERROR: Upload of 'FILE' failed too many times. Skipping that file.
The target bucket does exist.
What is the issue's reason?
Are there any other ways to upload the file?
Thanks.
You can have up to 10000 upload parts per object, so it fails on part 10001. Using larger parts may solve the issue.
"huge"---is it 10s or 100s of GBs? s3 limits the object size to 5GB and uploading may fail if it exceeds the size limitation.

Resources