"server has gone away" during ODBC SQL INSERT on Azure MariaDB server - azure

I have code that’s been running on several thousand customer PCs for about 6 years that uses ODBC to SQL INSERT a blob onto MariaDB databases. All well and good.
Since moving MariaDB servers to Azure (managed "Azure database for MariaDB server") a few months ago I sometimes see the SQL INSERT generate an ODBC error message “Server has gone away” or “lost connection” depending upon the MySQL ODBC connector version ( 5.3 and 8.0 respectively).
Customers see this more rarely but I see it here often because I tend to upload larger blobs and, possibly, because I’m the other side of the Atlantic and hitting some network level timeout. But I need to get to the bottom of it as it does occasionally happen to customers - about once a month and with blobs as small as 1mb.
as per other posts hereabouts i have increased the following parameters but it still fails :
explicit_defaults_for_timestamp TRUE
connect_timeout 240
net_write_timeout 2400
wait_timeout 240
max_allowed_packet 1024M
interactive_timeout 1800
net_read_timeout 3000
delayed_insert_timeout 3000
MariaDB v10.3.23
One reason I think this is an Azure/network issue rather than ODBC/MariaDB is because, since the switch to Azure, I am experiencing a similar issue copying large files from the clipboard to any of our VMs; copying a 35mb file fails 4 out of every 5 attempts at varying points in the copy. The RDP connection errors and reconnects. I have other ways of uploading the files in question so that’s not the problem… but for the time being I’m assuming it’s the same underlying Azure/network issue as the ODBC error.
I’ve checked with my ISP and they can see no network issues my end.

Related

Azure - never ending Full Backup Uploading in Database Migration Service

I have been migrating some databases from a SQL Server to an SQL Managed Instance. 13 of 14 DBs have been successfully restored. There is only one remaining, the biggest one with almost 600 Gb. It has been more than a week continuously uploading the initial full backup and it is still running.
It is a big database but I thought it has been a long time and it should have been finished by now. For this reason I have been trying some cmd/az commands but I don´t get anything more than a running status.
The strange thing is that I can´t see the DB (in recovery mode) in the SQL Management Studio and the file has not been created yet in the container of the Storage Account. All the other databases appear in SSMS and in the storage account.
I had around 75 Gb more than the total size of the databases in the Storage Account, so I guess that was not the issue. In any case, I added 500 Gb more but still no results.
Is it possible to stop the task and restart it to see if this helps? Obviously I would not like to upload all databases again if possible.
Could you please help ?
Thank you!
As explained in the comments before, the best options for the migration of old SQL Servers in my case were:
Check regularly the cpu and network performance of the server.
When you configure your SQL MI, use at least the double storage size of the full DB backups size.
Finally, if you have big DBs, (my case more than 400Gbs), create different activities* to separate the small ones from the big ones. This would help as well if any errors happen into the big DBs. You won´t need to upload all of them again.
*NOTE. I had some issues when I had more than 2 activities: Some of them stayed in "Queued" Status and after a day still did not run. This happened even when the other activities were already completed. So, to fix this, I had to delete all the activities and create the remaining one again.
Have a good day.
I would recommend to open a case with Support to make sure there is no patching or failover happening on the SQL MI during Migration.
I have seen this happen before where the restore is going through for a VLDB and then patching on SQL MI causes it to restart restoring again.
Hopefully this will help

Azure App Service "Local Written Bytes"

I have an app service running that has 8 instances running in the service plan.
The app is written in asp dotnet core, it's an older version than is currently available.
Occasionally I have an issue where the servers start returning a high number of 5xx errors after a period of sustained load.
It appears that only one instance is having an issue - which is causing the failed request rate to climb.
I've noticed that there is a corresponding increase in the "locally written bytes" on the instance that is having problems - I am not writing any data locally so I am confused as to what this metric is actually measuring. In addition the number of open connections goes high and then stays high - rebooting the problematic instance doesn't seem to achieve anything.
The only thing I suspect is that we are copying data from a user's request straight into Azure Blob Store using the UploadFromStreamAsync from the HttpRequest.Body - with the data coming from a mobile phone app.
Microsoft support suggested we swapped to using local cache as an option to reduce issues with storage, however this has not resolved the issue.
Can anyone tell me what is the "locally written bytes" actually measuring? There is little documentation on this metric that I can find in google.

SQL Azure Premium tier is unavailable for more than a minute at a time and we're around 10-20% utilization, if that

We run a web service that gets 6k+ requests per minute during peak hours and about 3k requests per minute during off hours. Lots of data feeds compiled from 3rd party web services and custom generated images. Our service and code is mature, we've been running this for years. A lot of work by good developers has gone into our service's code base.
We're migrating to Azure, and we're seeing some serious problems. For one, we are seeing our Premium P1 SQL Azure database routinely become unavailable for 1-2 full entire minutes. I'm sorry, but this seems absurd. How are we supposed to run a web service with requests waiting 2 minutes for access to our database? This is occurring several times a day. It occurs less after switching from Standard level to Premium level, but we're nowhere near our DB's DTU capacity and we're getting throttled hard far too often.
Our SQL Azure DB is Premium P1 and our load according to the new Azure portal is usually under 20% with a couple spikes each hour reaching 50-75%. Of course, we can't even trust Azure's portal metrics. The old portal gives us no data for our SQL, and the new portal is very obviously wrong at times (our DB was not down for 1/2 an hour, like the graph suggests, but it was down for more than 2 full minutes):
Azure reports the size of our DB at a little over 12GB (in our own SQL Server installation, the DB is under 1GB - that's another of many questions, why is it reported as 12GB on Azure?). We've done plenty of tuning over the years and have good indices.
Our service runs on two D4 cloud service instances. Our DB libraries are all implementing retry logic, waiting 2, 4, 8, 16, 32, and then 48 seconds before failing completely. Controllers are all async, most of our various external service calls are async. DB access is still largely synchronous but our heaviest queries are async. We heavily utilize in-memory and Redis caching. The most frequent use of our DB is 1-3 records inserted for each request (those tables are queried only once every 10 minutes to check error levels).
Aside from batching up those request logging inserts, there's really not much more give in our application's db access code. We're nowhere near our DTU allocation on this database, and the server our DB is on has like 2000 DTU's available to be allocated still. If we have to live with 1+ minute periods of unavailability every day, we're going to abandon Azure.
Is this the best we get?
Querying stats in the database seems to show we are nowhere near our resource limits. Also, on premium tier we should be guaranteed our DTU level second-by-second. But, again, we go more than an entire solid minute without being able to get a database connection. What is going on?
I can also say that after we experience one of these longer delays, our stats seem to reset. The above image was a couple minutes before a 1 min+ delay and this is a couple minutes after:
We have been in contact with Azure's technical staff and they confirm this is a bug in their platform that is causing our database to go through failover multiple times a day. They stated they will be deploying fixes starting this week and continuing over the next month.
Frankly, we're having trouble understanding how anyone can reliably run a web service on Azure. Our pool of Websites randomly goes down for a few minutes a few times a month, taking our public sites down. If our cloud service returns too many 500 responses something in front of it is cutting off all traffic and returning 502's (totally undocumented behavior as far as we can tell). SQL Azure has very limited performance and obviously isn't ready for prime time.

Performance of Web Database against new standard service tier

I have sql azure database. Currently I'm using the "Web" SQL database since my DB was small ie about 300mb and the maximum size is 5GB. I came to know that the Web service tiers will be retired in September 2015 i have restored my my Live DB as a "Standard" s0 which has a maximum size of 2 GB. But what i noticed is the performance with the new standard type database is poor when compared to the retired web edition. Say for instance it used to take like 40 seconds to delete 60 thousand records in the Web edition and it is now taking two minutes to 3 minutes with the new standard type. Have any one experienced this kind of thing or its just me ?
Please give me your suggestions
I had a similar issue; I migrated sql 2008 to Azure web; got a performance hit; then switched from web to S0; got another hit. I think im now at s1
I figured it was probably missing indexes; but with the ability to Trace + tune gone with azure, I had to do things a bit more manually.
First, look at this, http://msdn.microsoft.com/en-us/library/azure/ff394114.aspx you want to be able to get to the part where you can get the long running queries.
Then, with each long running query; you will want to execute the execution plan. To view a query’s execution plan, we need to explicitly include it before executing the query. Right-Click the query window and select Include Actual Execution Plan.
If this does not help you then you need to do more work; what you will want to do is export the database (it comes out as a bacpac file) to sql 2012 (Right click on the Connection > Databases node and select "Import Data-tier application...") on a local server somwhere (I used an Azure VM); then hookup an application/website to this, enable query analyzer., and tune it the old way., this will reveal all the non-clustered indexes that magically disappeared... once you add those to your sql azure db, you will get performance back.
Sure you could just increase your standard tier., but this can get expensive., its better to tune and find out where things went wrong...

sybase tempdb log segment filling

I have a Sybase ASE server that hangs every week or so, indicating tempdb log segment is full.
I have tried everything. trunc log on chkpt is enabled and it works correctly resetting used_pages about every 60 seconds or so.
The problem is, not all the pages freed are returned to free_pages. So, over time, free_pages eventually ends up at 0, while used_pages is minimal. The values I'm referring to come from the query sp_spaceused syslogs on tempdb. It's like a memory leak!
Currently when I run this command I get:
total_pages: 64000
free_pages: 29719
used_pages: 251
reserved_pages: 0
Every time I run the command, used_pages increases which is also odd.
This database is running on 64-bit Windows Server 2003. I have another similarly configured ASE server that does not have these issues. The contents of this other database are similar. This database is running on 32-bit Windows Server 2003. There's no need to move tempdb to a different device or expand its size any further because this other server operates perfectly and it is configured the same as the one that has odd behavior.
It depends on application that running on this ASE.
Try to monitor application with ASE monitoring tables.
Look at very advanced presentation http://download.sybase.com/presentation/TW2005/ASE115.pdf.

Resources