SqlConnection with Parallel programming - c#-4.0

This is my existing code which save some data to several tables
using (SqlConnection conn = new SqlConnection("myConnString"))
{
DoWork1(conn);
DoWork2(conc);
DoWork3(conn);
}
In order to speed my code up so i try to get .net TPL support and i rehanged my code as below
using (SqlConnection conn = new SqlConnection("myConnString"))
{
ParallelOptions pw = new ParallelOptions();
pw.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.Invoke(pw,()=> DoWork1(conn),()=> DoWork2(conc),()=> DoWork3(conn));
}
But this throws me an Internal connection fatal error exception from the ExecuteNonQuery() method in my data access layer.Is my parallel approach is wrong?

Well, there are ways it could potentially be made to work using MARS - but I would suggest a different approach. (I don't know whether MARS supports using the same connection across multiple threads, even though it allows multiple concurrent operations.)
Instead of trying to reuse one connection in all the parallel tasks, make each task open (and close) a connection for itself, and let connection pooling handle the efficiency side of that. That's general best practice in .NET whether you're using parallelism or not: open the connection, do some work, close the connection.

I don't have the reputation to comment but it seems I can answer.
That is a bit strange
Anyway my comment was that temporary tables are per connection, so opening a new connection means you cannot see the temporary tables created by the other task.
Global temporary tables might be the answer but you either have to
a) use a single global temp table and partition the data using some key
b) use uniquely named tables which means dynamic sql
All a bit of a mess really

Related

Caching relational data using redis

I'm building a small social network (users have posts and posts have comments - very basic), using clustered nodejs server and redis as a distributed cache.
My approach to cache users posts is to have a sorted set that contains all the user's posts ids ordered by rate(which should be updated every time someone add a like or comment), and actual objects sorted as hash objects.
So the get user's posts flow should look like this:
1. using zrange to get a range of ids from the sorted set.
2. using multi/exec and hgetall to fetch all the objects at once.
I have a couple of questions:
1. in regards of performance issues, will my approach scale when the cache size getting bigger, or maybe I should use lua or something?
1. in case if I want to continue with current approach, where I should save the sorted set in case of redis crash, if I use the redis persistence this will affect the overall performance, I thought about using a dedicated redis server for the sets (I searched If it is possible to backup only part of the redis data but didn't found anything about it.
My approach => getTopObjects({userID}, 0, 20) :
self.zrange = function(setID, start, stop, multi)
{
return execute(this, "zrange", [setID, start, stop], multi);
};
self.getObject = function(key, multi)
{
return execute(this, "hgetall", key, multi);
};
self.getObjects = function(keys)
{
let multi = thisArg.client.multi();
let promiseArray = [];
for (var i = 0, len = keys.length; i < len; i++)
{
promiseArray.push(this.getObject(keys[i], multi));
}
return execute(this, "exec", [], multi).then(function(results)
{
//TODO: do something with the result.
return Promise.all(promiseArray);
});
};
self.getTopObjects = function(setID, start, stop)
{
//TODO: validate the range
let thisArg = this;
return this.zrevrange(setID, start, stop).then(function(keys)
{
return thisArg.getObjects(keys);
});
};
It's an interesting intellectual exercise, but in my opinion this is classic premature optimization.
1) It's probably way too early to have even introduced redis, let alone be thinking about whether redis is fast enough. Your social network is almost certainly just fine up to about 1,000 users running off raw SQL queries against Mysql / Postgres / Random RDS. If it starts to slow down, get data on slow running queries and fix them with query optimizations and appropriate indexes. That'll get you past 10,000 users.
2) Now you can start introducing redis. In general, I'd encourage you to think about your redis as purely caching and not permanent storage; it shouldn't matter if it gets blown away, it just means your site is slower for the next few seconds because your users are getting their page loads from SQL queries instead of redis hits (each query re-populating that user's sorted list of posts in redis, of course).
Your strategy and example code for using redis seem fine to me, but until you have actual data on how users use your site (which may be drastically different than your current expectations), it's simply impossible to know what types of SQL indexes you will need, what keys and lists are ideal for caching in redis, etc.
I faced similar issues, I needed a way to query the data more efficiently. Can't say for sure but I heard Redis being single threaded blocks the main thread when running lua scripts, i'm sure that's not good for a social networking site. I heard about Tarantool and it looks promising, currently trying to wrap my head around it.
If you are concerned about your cache size growing bigger, I think most social networks keep two weeks worth of data in the users cache, anything older than two weeks gets deleted and you simply implement a scrolling feature that works with pagination, once the user scrolls down, fetch the next two weeks worth of data and add it back to memory only for that specific user (don't forget to specify the new ttl for the newly added data). This helps keep your cache size lean.
What happens when redis or any in memory data tool you are using crashes, you simply reload data back into the memory. They all have features where you save data to files as backup. I'm thinking of implementing another database layer don't know lets say Cassandra or Mongodb that holds the timelines of each user since inception. Sure this creates another overhead cause you have to keep three data layers (e.g mysql, redis and mongodb) in sync!
If this looks like a lot of work, feel free to use a 3rd party service to host your in memory data, at least you can sleep easy, but it's gonna cost you.
That said, this is highly opinionated. Got tired of people telling me to wait until my site explodes with users or the so called premature optimization reply you got :)

Azure function slow executing a stored procedure

I'm using an Azure function like a scheduled job, using the cron timer. At a specific time each morning it calls a stored procedure.
The function is now taking 4 mins to run a stored procedure that takes a few seconds to run in SSMS. This time is increasing despite efforts to successfully improve the speed of the stored procedure.
The function is not doing anything intensive.
using (SqlConnection conn = new SqlConnection(str))
{
conn.Open();
using (var cmd = new SqlCommand("Stored Proc Here", conn) { CommandType = CommandType.StoredProcedure, CommandTimeout = 600})
{
cmd.Parameters.Add("#Param1", SqlDbType.DateTime2).Value = DateTime.Today.AddDays(-30);
cmd.Parameters.Add("#Param2", SqlDbType.DateTime2).Value = DateTime.Today;
var result = cmd.ExecuteNonQuery();
}
}
I've checked and the database is not under load with another process when the stored procedure is running.
Is there anything I can do to speed up the Azure function? Or any approaches to finding out why it's so slow?
UPDATE.
I don't believe Azure functions is at fault, the issue seems to be with SQL Server.
I eventually ran the production SP and had a look at the execution plan. I noticed that the statistic were way out, for example a join expected the number of returned rows to be 20, but actual figure was closer to 800k.
The solution for my issue was to update the statistic on a specific table each week.
Regarding why that stats were out so much, well the client does a batch update each night and inserts several hundred thousand rows. I can only assume this affected the stats and it's cumulative, so it seems to get worse with time.
Please be careful adding with recompile hints. Often compilation is far more expensive than execution for a given simple query, meaning that you may not get decent perf for all apps with this approach.
There are different possible reasons for your experience. One common reason for this kind of scenario is that you got different query plans in the app vs ssms paths. This can happen for various reasons (I will summarize below). You can determine if you are getting different plans by using the query store (which records summary data about queries, plans, and runtime stats). Please review a summary of it here:
https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-2017
You need a recent ssms to get the ui, though you can use direct queries from any tds client.
Now for a summary of some possible reasons:
One possible reason for plan differences is set options. These are different environment variables for a query such as enabling ansi nulls on or off. Each different setting could change the plan choice and thus perf. Unfortunately the defaults for different language drivers differ (historical artifacts from when each was built - hard to change now without breaking apps). You can review the query store to see if there are different “context settings” (each unique combination of set options is a unique context settings in query store). Each different set implies different possible plans and thus potential perf changes.
The second major reason for plan changes like you explain in your post is parameter sniffing. Depending on the scope of compilation (example: inside a sproc vs as hoc query text) sql will sometimes look at the current parameter value during compilation to infer the frequency of the common value in future executions. Instead of ignoring the value and just using a default frequency, using a specific value can generate a plan that is optimal for a single value (or set of values) but potentially slower for values outside that set. You can see this in the query plan choice in the query store as well btw.
There are other possible reasons for performance differences beyond what I mentioned. Sometimes there are perf differences when running in mars mode vs not in the client. There may be differences in how you call the client drivers that impact perf beyond this.
I hope this gives you a few tools to debug possible reasons for the difference. Good luck!
For a project I worked on we ran into the same thing. Its not a function issue but a sql server issue. For us we were updating sprocs during development and it turns out that per execution plan, sql server will cache certain routes/indexes (layman explanation) and that gets out of sync for the new sproc.
We resolved it by specifying WITH (RECOMPILE) at the end of the sproc and the API call and SSMS had the same timings.
Once the system is settled, that statement can and should be removed.
Search on slow sproc fast ssms etc to find others who have run into this situation.

Spark Cassandra Connector proper usage

I'm looking to use spark for some ETL, which will mostly consist of "update" statements (a column is a set, that'll be appended to, so a simple insert is likely not going to work). As such, it seems like issuing CQL queries to import the data is the best option. Using the Spark Cassandra Connector, I see I can do this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/1_connecting.md#connecting-manually-to-cassandra
Now I don't want to open a session and close it for every row in the source (am I right in not wanting this? Usually, I have one session for the entire process, and keep using that in "normal" apps). However, it says that the connector is serializable, but the session is obviously not. So, wrapping the whole import inside a single "withSessionDo" seems like it'll cause problems. I was thinking of using something like this:
class CassandraStorage(conf:SparkConf) {
val session = CassandraConnector(conf).openSession()
def store (t:Thingy) : Unit = {
//session.execute cql goes here
}
}
Is this a good approach? Do I need to worry about closing the session? Where / how best would I do that? Any pointers are appreciated.
You actually do want to use withSessionDo because it won't actually open and close a session on every access. Under the hood, withSessionDo accesses a JVM level session. This means you will only have one session object PER cluster configuration PER node.
This means code like
val connector = CassandraConnector(sc.getConf)
sc.parallelize(1 to 10000000L).map(connector.withSessionDo( Session => stuff)
Will only ever make 1 cluster and session object on each executor JVM regardless of how many cores each machine has.
For efficiency i would still recommend using mapPartitions to minimize cache checks.
sc.parallelize(1 to 10000000L)
.mapPartitions(it => connector.withSessionDo( session =>
it.map( row => do stuff here )))
In addition the session object also uses a prepare cache, which lets you cache a prepared statement in your serialized code, and it will only ever be prepared once per jvm(all other calls will return the cache reference.)

JDBC programms running long time performance issue

My program has an issue with Oracle query performance, I believe the SQL have good performance, because it returns quickly in SQLPlus.
But when my program has been running for a long time, like 1 week, the SQL query (using JDBC) becomes slower (In my logs, the query time is much longer than when I originally started the program). When I restart my program, the query performance comes back to normal.
I think it is could be something wrong with the way I use the preparedStatement, because the SQL I'm using does not use placeholders "?" at all. Just a complex select query.
The query process is done by a util class. Here is the pertinent code building the query:
public List<String[]> query(String sql, String[] args) {
Connection conn = null;
conn = openConnection();
conn.setAutocommit(true);
....
PreparedStatement preStatm = null;
ResultSet rs = null;
....//set preparedstatment arg code
rs = preStatm.executeQuery();
....
finally{
//close rs
//close prestatm
//close connection
}
}
In my case, the args is always null, so it just passes a query sql to this query method. Is that possible this way could slow down the DB query after program long time running? Or I should use statement instead, or just pass args with "?" in the SQL? How can I find out the root cause for my issue? Thanks.
Maybe problem in jdbc cache... oracle spec
Try to turn it off.
or try to reinit the driver some times (one time per day)
You first need to look into data that will help you see where you are spending most your time, guessing is not an option when performance tunning.
So I would recommend get solid data that pin points the layer presenting the issue (JAVA or DB).
For this I would suggest to look at AWR and ASH reports when the problem is most noticeable. Also collect data on the JVM (you can use JConsole and/or JVisualVM).
When first diagnosing bad performance I always do the "USE" method, Utilization, Saturation and Error.
So first, look for Errors in logs.
Then look for any resource becoming Saturated (CPUs, Memory etc...)
Finally Look at the Utilization of each resource, having a client server layout will make this easier, if this is not the case you will need to drill down to process level to know whether its Java or the DB.
Once you have collected this data you can direct your tunning efforts accordingly. Going this approach will only make you waste time and sometimes even mask problems or induce new ones.
You can come back later with this data and we can take a look!

Transactional operation with SaveChanges and ExecuteStoreCommand

I have a problem that I would like to share. The context is a bit messy, so I will try to do my best in the explanation.
I need to create a transactional operation over a number of entities. I'm working with EF CodeFirst but with a legacy database that I can't change. In order to create a more consistent model than the database provides I'm projecting the database information into a more refined entities I created on my own.
As I need to use different contexts, my initial idea was to use TransactionScope which gave me good results in the past. Why do I need different contexts? Due to diverse problems with db, I can't make the updates only in one operation (UnitOfWork). I need to retrieve different IDs which only appears after SaveChanges().
using (var scope = new TransactionScope())
{
Operation1();
Operation2();
Operation3(uses ExecuteStoreCommand)
SaveChanges();
Operation4();
SaveChanges();
}
I know that, in order to use TransactionScope, I need to share the same connection among all the operations (And I'm doing it, passing the context to the objects). However, when I execute one of the operations (which uses ExecuteStoreCommand) or I try to do some update after the first SaveChanges I always receive the MSDTC error (the support for distributed transactions is disabled) or even more rare, as unloaded domains.
I don't know if someone can help me, at least to know which is the best direction for this scenario.
Have a look at this answer:
Entity Framework - Using Transactions or SaveChanges(false) and AcceptAllChanges()?
The answer does exactly what you require having a transaction, over multiple data contexts.
This post on Transactions and Connections in Entity Framework 4.0 I found really helpful too.
For people who may need a simpler solution, here's what I use when I need to mix ExecuteStoreCommand and SaveChanges in a transaction.
using (var dataContext = new ContextEntities())
{
dataContext.Connection.Open();
var trx = dataContext.Connection.BeginTransaction();
var sql = "DELETE TestTable WHERE SomeCondition";
dataContext.ExecuteStoreCommand(sql);
var list = CreateMyListOfObjects(); // this could throw an exception
foreach (var obj in list)
dataContext.TestTable.AddObject(obj);
dataContext.SaveChanges(); // this could throw an exception
trx.Commit();
}

Resources