I have a console/desktop application that crawls a lot (think million calls) of data from various webservices. At any given time I have about 10 threads performing these call and aggregating the data into a MySql database. All seeds are also stored in a database.
What would be the best way to report it's progress? By progress I mean:
How many calls already executed
How many failed
What's the average call duration
How much is left
I thought about logging all of them somehow and tailing the log to get the data. Another idea was to offer some kind of output to a always open TCP endpoint where some form of UI could read the data and display some aggregation. Both ways look too rough and too complicated.
Any other ideas?
The "best way" depends on your requirements. If you use a logging framework like NLog, you can plug in a variety of logging targets like files, databases, the console or TCP endpoints.
You can also use a viewer like Harvester as a logging target.
When logging multi-threaded applications I sometimes have an additional thread that writes a summary of progress to the logger once every so often (e.g. every 15 seconds).
since it is a Console Application, just use Writeline, just have the application spit the important stuff out to the Console.
I did something Similar in an application that I created to export PDF's from a SQL Server Database back into PDF Format
you can do it many different ways. if you are counting records and their size you can run a tally of sorts and have it show the total every so many records..
I also wrote out to a Text File, so that I could keep track of all the PDFs and what case numbers they went to and things like that. that information is in the answer that I gave to the above linked question.
you could also write things out to a Text File every so often with the statistics.
the logger that Eric J. mentions is probably going to be a little bit easier to implement, and would be a nice tool for your toolbox.
these options are just as valid depending on your specific needs.
Related
My company is currently using excel for reporting where we have to collect data from various business units on a monthly basis. Each unit will send an excel file with 50 columns and 10-1000 row items each. After receiving each file, we will use vba to consolidate all these files. This consolidated master file is then split to various sections and sent to various personnels where any changes will have to be updated in the master file.
Is there any way that this process can be improved and automated using a different system?
Is there any way that this process can be improved and automated using a different system?
Well, you already have the "low cost, low tech" solution.
The proper solution (until something better is invented) is a proper web application, which collects the data from the various users, processes it and then generates the necessary reports.
This endeavor is not something to be treated lightly, even if it sounds like a small task. Your company needs to understand what they want, and then contact some supplier companies to get an estimation of the costs.
The costs cover at least:
the development of the application;
the server on which the application will run after it is finished; can be a virtual server;
the costs of training the employees to use the application properly;
the costs of actually making the employees to use the application properly (the most resilient being usually the managers themselves);
Of course, I assume that you already have some network infrastructure, backups of all important data is done according to the best practices (by IT)...
I have different external APIs doing basically the same things but in a different way : add product informations (ext_api).
I would like to make an adapter API that would call, behind the scene, the different external APIs (adapter_api).
My problem is the following : the external APIs are optimised when calling them with a batch of products attributes. However, my API would be optimised on a product by product basis.
I would like to somehow make a buffer of product attributes that would grow when I call my adapter_api. When the number of product attributes reach a certain limit, the ext_api would be called and the buffer would be reset and ready to receive more product attributes.
I'm wondering how to achieve that. I was thinking of making a REST api in python that would store the buffer of product attributes. I would like this REST api to be able to scale on a Kubernetes cluster : it would need low latency, and several instance of this API would write in the buffer of products until one of them reach the limit and make the call to the external API.
Here is what I have in mind :
Are there any best practices concerning the buffer on this use case ? To add some extra informations : my main purpose here is to hide from internal business APIs (not drawn) the complexity of calling many different external APIs each of which have their own rules and credentials.
Thank you very much for your help.
You didn't tell us your performance evaluation criteria.
You did tell us this:
don't know how to store the buffer : I would like to avoid databases or files.
which makes little sense,
since there's a simple answer to this question:
Is there any best practices on this use case ?
Yes. The best practice is to append requests to buffer.txt
and send the batch when that file exceeds some threshold.
A convenient way to implement the threshold would be
to send when getsize() reports a large enough value.
If requests are of quite different size and the batch
size really matters to you, then append a single byte
to a 2nd file, and use size of that to indicate how
many entries are enqueued.
requirements
The heart of your question seems to revolve around
what was left unsaid:
What is the cost function for sending too many "small" batches to ext_api?
What is the cost function for the consumer of the adapter_api, what does it care about? Low latency return, perhaps?
If ext_api permanently fails (say, a day of downtime), do we have some responsibility for quickly notifying the consumer that its updates are going into a black hole?
And why would using the filesystem be inappropriate?
It seems a perfect match for your needs.
Consider using a global in-memory object,
such as list or queue for the batch you're accumulating.
You might want to protect accesses with a lock.
Maybe your client doesn't really want a
one-product-at-a-time API.
Maybe you'd prefer to have your client
accumulate items,
sending only when its batch size is big enough.
I have inherited a website built on Expression Engine which is having a lot of trouble under load. Looking in the server console for the database I am seeing a lot of database writes (300-800/second)
Trying to track down why we are getting so much write activity compared to read activity and seeing things like
UPDATE `exp_snippets` SET `snippet_contents` = 'some content in here' WHERE `snippet_name` = 'member_login_form'
Why would EE be writing these to the database when no administrative changes are happening and how can I turn this behavior off?
Any other bottlenecks which could be avoided? The site is using an EE ad module so I cannot easily run it through Varnish since the ads need to change on each page load - looking to try and integrate DFP instead so they can be loaded asynchronously
There are a lot of front end operations that trigger INSERT and UPDATE operations. (Having to do with tracking users, hits, sessions, also generating hashes for forms etc.)
The snippets one tho seems very strange indeed I wouldn't think that snippets would call an UPDATE under normal circumstances. Perhaps the previous developer did something where the member_login_form (which has dynamic hash in it) is written to a snippet each time it is called? Not sure why you would do it, but there's a guess.
For general speed optimization see:
Optimizing ExpressionEngine
There are a number of configs in the "Extreme Traffic" section that will reduce the number of writes (tho not the snippet one which doesn't seem to be normal behavior).
I am looking for the ways how to optimize bulk download of CRM2011 data. Here are the two main scenarios:
a) Full synchronization: Download of all data - first all accounts, then all contacts etc etc.
b) Incremental synchronization: Download of all entities modified since given date
We use multithread downloader with 3 threads. Each thread performs FetchXml for one entity type that is downloaded page by page. Parsed objects are stored in the downloader cache and the downloader goes on for the next page. There is another thread that pulls the downloaded data from the cache and processes them. This organization increases the download speed more than 2x.
The problems I see:
a) FetchXml protocol is very inefficient. For example it contains lots of unneeded data. Example: FormattedValues take 10-15% bandwidth (my data show ~15% in the source Xml stream or ~10% in the zipped stream), although all we do with it is a) Xml parsing, b) throwing away. (Note that the parsing is not negligible either - iOs/Android Mono parsers are surprisingly slow.)
b) In case of incremental synchronization most of the FetchXml requests return zero items. In this case it would be highly desirable to combine several FetchXml requests into one. (AFAIK it is impossible.) Or maybe use another trick such as to ask for the counts of modified objects I did not investigate what is possible yet.
Does anybody have any advice how to optimize FetchXml traffic?
Your fastest method would be to use SQL server directly for something like this (unless you are using online).
To make the incremental faster, your best bet is to use the aggregate functionality FetchXML provides which is both extremely quick and less verbose.
Why parse on the iOS/Android Mono? If you are instead sharing this to a large number of devices, you'd be better off having a central caching server that could send back this data in a json (zipped) format to the devices (or possibly bson). Then the caching server would request an update of the changes, process those and then send back incremental changes in whatever format to the clients. Would be considerably faster on the clients and far less bandwidth.
I'm not sure of a way to further optimize FetchXML. I would question why you're not using the OData Endpoints and REST, especially if you're primarily concerned with the about of data being sent over the wire.
I have talked to some brilliant CRM MVPs, and I know they have used REST to migrate data to CRM. I'm not sure if they did it because it was faster, but I assumed that was why.
I do know that you are going to minimize the amount of data that is being sent to the client, since XML is extremely bloated.
Have a look at the ExecuteMultipleRequest to allow you to perform multiple requests/queries at once... http://msdn.microsoft.com/en-us/library/microsoft.xrm.sdk.messages.executemultiplerequest.aspx
My iPhone app uses core data and things are fine for most part. But here is a problem:
after a certain amount of data, it stalls at first time execution (where core data entities must be loaded).
Some experimenting showed that things are OK up to a certain amount of data loaded in Core Data at start.
If I go over a critical amount the installation starts failing. The bigger the amount of data for start, the higher the probability that it fails.
By making separate tests I made sure the data themselves are not faulty.
I also can say this problem does not appear in the simulator.
It also does not happen when I connect the debugger to the device.
It looks like too much data loaded in core data in a short amount of time creates some kind of overload.
Is that true? Any idea on a possible solution?
At this point I made up a partial solution using a UIActionSheet object to kill some time (asking the user to push a button). But this is not very satisfactory, though for the time being it works.
Any comment or advice for a better way would be appreciated.
It is not quite clear what do you mean by "it fails".
However if you are using SQLite, by loading into CoreData, if you mean "create and save" entities at start up to populate CoreData, then remember to not call [managedObjectContext save...] only at the end especially with large amount of data, but create and save a reasonable set of NSManagedObject.
Otherwise, if you mean you have large amount of data that are retrieved as NSManagedObject, probably loaded into a UITableView consider using some kind of NSOperation for asynchronous loading.
If those two cases doesn't apply to you just tell us the error you are getting, or what you mean by "fails" os "stalls".