How to keep an object unique

How to keep an object unique - c#-4.0

I have a static DataTable (with 80k records) in Common.DLL and that Common.DLL is referred by 10 windows services. So, instead of having 10 copies of that DataTable in the memory I need to have it as 1 copy and all the services pointing to that data source. Is this approach possible?

Given that the services will at least be using different AppDomains, and quite possibly different processes, sharing the same data between all of them would be tricky.
I would personally suggest that you just don't worry about it - unless each record is actually pretty large, 80K records is still going to be fairly small.
You could potentially have an 11th service which is the only one to have the data, and then talk to that service from the other ones. But that's introducing a lot of complexity for very little benefit.
One way of potentially saving memory would be to use a List<T> for a custom type, instead of a DataTable - that may well be more efficient, and would almost certainly be more pleasant to use within the code. It doesn't help if you really need DataTable for whatever you're doing with it, but personally I try to avoid that...

You could create a WCF service that is hosted locally and reads the 80k records into memory.
You would then define an API on the WCF service that contains methods appropriate to whatever calls your 10 windows services need to make.
Doing this would add a level of complexity to your solution that may well not be needed though.

Use the Singleton pattern. Here's a detailed tutorial with C#.

Related

full database table update

I currently have a REST endpoint with basic CRUD operations for a sqlite database.
But my application updates whole tables at a time (with a "save" button)
My current idea/solution is to query the data first, compare the data, and update only the "rows" that changed.
The solution is a bit complex because there are several different types of changes that can be done:
Add row
Remove row
Row content changed (similar to content moving up or down)
Is there a simpler solution?
The most simplest solution is a bit dirty. (Delete table, create table and add each row back)

The solution is a bit complex because there are several different types of changes that can be done:
Add row
Remove row
Row content changed (similar to content moving up or down)
Is there a simpler solution?
The simple answer is
Yes, you are correct.
That is exactly how you do it.
There is literally no easy way to do this.
Be aware that, for example, Firebase entirely exists to do this.
Firebase is worth billions, is far from perfect, and was created by the smartest minds around. It literally exists to do exactly what you ask.
Again there is literally no easy solution to what you ask!
Some general reading:
One of the handful of decent articles on this:
https://www.objc.io/issues/10-syncing-data/data-synchronization/
Secondly you will want to familiarize yourself with Firebase, since, a normal part of computing now is either using baas sync solutions (eg Firebase, usually some noSql solution), or indeed doing it by hand.
http://firebase.google.com/docs/ios/setup/
(I don't especially recommend Firebase, but you have to know how to use it in as much as you have to know how to do regex and you have to know how to write sql calls.)
Finally you can't make realistic iOS apps without Core Data,
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CoreData/index.html
By no means does core data solve the problem you describe, but, realistically you will use it while you solve the problem conceptually.
You may enjoy realm,
https://realm.io
which again is - precisely - a solution to the problem you describe. (Which is basically the basic problem in typical modern app development.) As with FBase, even if you don't like it or decide not to go with it on a particular project, one needs to be familiar with it.

Support for multiple byte ranges on Azure blob read/write

We need random read (and later write) access to thousands of discrete ranges (each in the order of a few KBs) within very large binary blobs (in the order of 100s of GB). The current APIs force us to submit a single request for each such range. One negative aspect is billing, of course, but the main problem is the client-side and network loads for handling all these requests!
Are there any known ways of avoiding the massive overhead for access patterns like this?
Assume that reformatting the data is not viable, since the access patterns vary. Replicating the data in a multitude of versions optimized for each access pattern variation is also highly undesirable, for several reasons (optimization lead time, storage costs, data management, plus not all access patterns can be predicted - the known ones might not even be used).
Extending the "Range" REST API header to support multiple ranges would be ideal solution, but obviously that's not ours to control.

Unfortunately, there are no other nice ways to do that. The current api(I think you're using get blob api) only supports a single range not multi-ranges and detail is here.
As of now, there is no good workaround for this issue. I see the user voice you submitted, it's a good feedback and already upvoted for it. Hope the MS team can implement it in the future release.

Are there reasons why FTSearch would not be a suitable alternative to DBColumn in a Type-Ahead on an XPage, when trying to improve performance?

I have a general requirement in my current project to make an existing XPage application faster. One thing we looked at was how to speed up some slower type-ahead fields, and one solution to this which seems to be fast, is implementing it using FTSearch rather than the DBColumn we originally had. I want to get advice on whether this would be an OK approach, or if there are any suggestions to do what we need in a different way.
Background:
While there are a number of factors affecting the speed (like network latency, server OS, available server memory etc.), as we are using 8.5.3, we have optimized the application in general as far as we can, making use of the IBM Toolkit to find problem areas, and also using the features IBM added to help with this in 8.5.3 (e.g. Partial Execution, using the optimized JS and CSS option, etc.). Unfortunately we are stuck with the server running on a 32bit Windows OS with 3.5Gb Ram for another few months.
One of the slowest elements to respond are in certain type-aheads which reference a large number of documents. The worst one averages around 5 or 6 seconds before the suggested list appears for a type-ahead enabled field.
It uses SSJS to call a java class to perform a dbcolumn call (using Ferry Kranenburg's XPages Snippet) to get a unique list from a view, then back in SSJS it loops though the array to check if each entry contains the search key value, and if found it adds a highlight (bold) html tag around the search text in the word, then returns the formatted list back to the browser.
I added a print statement to output the elapsed time it takes to run the code, and on average today on our dev server it is around 3250 ms.
I tried a few things to see how we could make this process faster:
Added a Java class to do all processing (so not using SSJS). This only saved an average of 100ms.
Using a view-scoped Managed Bean, I loaded the unique Lookup list into memory when the page is loaded. This produces a really fast type-ahead response (16ms), but I suspect this is a very bad way to do this with a large data set - and could really impact the general server if multiple users were accessing the application. I tried to find information on what would be considered a large object, but couldn't find any guidance or recommendation on how much is too much to store in memory (I searched JSF and XPage sites). Does anyone have any suggestions on this?
Still in a Java class - instead of performing a dblookup to get the 'list' of all values to search through, I have the code run a FT Search to get the doc collection, then loop each doc to extract the field value I want and add those to a 'SortedSet' (which automatically doesn't allow duplicates), then loop the sorted set to insert the bold tags around the search term, and return that to the browser. This takes on average 100ms - which is great and barely noticeable. Are there an drawbacks to this approach - or reasons I should not do it this way?
Thanks for any feedback or advice on this.
Pam.
Update Aug, 14. 2013: I tried another approach (inspired by the IBM/Tony McGuckin Insights application on OpenNtf) as the Company Search type-ahead in that is using managed beans and is fast across a lot of data.
4 . Although the Insights application deals with data split across multiple databases, the principle for the type-ahead is similar. I couldn't use a view with getAllEntriesByKey though as I needed to search for a string within the text too, not just at the start of the entry. I tried creating a ViewEntryCollection based on a view FTSearch, but as we have a lot of duplicate names in the column, this didn't give the unique list I wanted. I then tried using a NotesViewNavigator on a categorized view, and looping through that. This produced the unique list I needed, but it turned out to be slower than any of the other methods above. (I did implement these ViewNavigator performance tips).

From my standpoint, performance may be affected by any of many layers every Domino application (not only XPages) consists of.
From top - browser (DOM, JS, CSS, HTML...), network (latencies, DNS, SSO...) to application layer (effective algorithms, caches), database/API (amount of data, indexes, reader names...) and OS/hardware (disks, memory...)
According to things you tested:
That is interresting, but could be expected: SSJS is cached and may use lower level API to get data (NAPI).
For your environment (32bit/3.5G RAM - I expect your statement about 3.5M is typo) I DO NOT recommend to cache big lists, especially if you apply it as a pattern to many fields/forms/applications. Cache in WeakHashMap could be more stable, though.
Use of FT search is perfectly fine, unless you need data that update frequently. FT index need some time and resources to update.
My suggestion is: go for FT, if it solves your problem. Definitely, troubleshoot FT performance in some heavy performance test on your server first.

(I cannot comment because of my low reputation)
I have recently been tackling with a similar problem. Here are some additional points to consider:
Are there many duplicate keywords in the view? Consider making a categorized view for #DbColumn.
FTSearching a view is often slower than a database, I believe. See Andre Guirard's article. Consider using db.FTSearch() and refining your FT query to include view's selection #Formula, if possible.
The FT index can be updated programmatically with db.updateFTIndex(). If keywords are added rarely, but need to be instantly available, you can perform index update in keyword document's QuerySave event (or similar). We used this approach when the keywords were stored in different (much smaller) database and the update was very fast.
The memory consumption can be checked this way:
Install XPages Toolbox from OpenNTF.
Open your application.
Create a JVM memory dump (Session dumps - Generate Heap Dump).
Install Eclipse Memory Analyzer Tool
Install IBM Diagnostic Tool Framework into Memory Analyzer.
Load your memory dump into MAT. You will see every Java object and their sizes.
In the end, I believe that there is no single general answer to your question. You need to test different approaches to find the fastest solution in your environment.

One problem with FT search is this error:
The full text index for this database is in use
Based on my experience this will occur for a while (maybe a few seconds) when the indexer task starts to index the database. If your users are not very demanding they can just try again and it will probably work.
But in many cases you want to minimize the errors the users get and will have to handle this error nicely. I've built my own FTSearch method which waits a bit and tries again until the error is not received. This will show as slowness to the user instead of error.

Is Monotouch.Dialog a suitable replacement for all UITableviews?

This question is mainly targeted towards Miguel as the creator of MT.Dialog but I would like to hear opinions of others as well.
I'm currently refactoring a project that has many table views. I'm wondering if I should replace All of them with MT.Dialog.
My pros are:
easy to use
simple code
hope that Xamarin will offer it cross platform one day
Cons:
my cells are complete custom made. Does it make sense in that case?
performance? Is that an issue?
breaking the MVC paradigms (source no longer separated from view and controller)
Is it in general better to just use MT.Dialog or inherit from it for specific use cases? What are your experiences?

To address some of your questions.
The major difference between MonoTouch.Dialog and UITableView is that with the former you "load" all the data that you want to render upfront, and then forget about it. You let MonoTouch.Dialog take care of rendering it, pushing views and taking care of sections/elements. With UITableView you need so provide callback methods for returning the number of sections, the titles for the sections and the data itself.
UITableView has the advantage that to render say a million rows with the same size and the same cells, you dont really have to load all the data upfront, you can just wait to be called back. That being said, this breaks quickly if you use cells with different heights, as UITableView will have to query for the sizes of all of your rows.
So in short:
(1) yes, even if you use custom cells, you will benefit from shorter code and a simpler programming model. Whether you use the other features of it or not, is up to you.
(2) For performance, the issue boils down to how many rows you will have. Like I mentioned before, if you are browsing a potentially large data set, you would have to load all of those cells in memory up front, or like TweetStation, add features to load "on-demand".
The reality is that it will consume more memory, because you need to load your data in MonoTouch.Dialog. Your best optimization technique is to keep your Elements very lightweight. Tweetstation for example uses a "TweetElement" that merely holds the ID to the tweet, and loads the actual contents on demand, to keep the size of the TweetElement in memory very small.
With UITableView, you do not pay for that price. But if you are not using a database of some sort, the data will still be in memory.
If your application calls for the data to be in memory, then you might as well move the data to be elements and use that as your model.
(3) This is a little bit of a straw man. Your data "source" is never really independent of UIKit. I know that people like to talk about these models as being reusable, but in practice, you wont ever be able to reuse a UITableViewSource as a source for anything but a UITableView. It's main use is to support scalable controls that do not require data to be loaded in memory up-front, it is not really about separating the Model from the View.
So what you really have is an adaptor class that bridges the world of the UITableView with your actual data model (a database, an XML list, an in-memory array, a Redis connection).
With UITableView, your adaptor code lives in the constructor and the UITableViewSource. With MonoTouch.Dialog your adatpro code lives in the code that populates the initial RootElement to DialogViewController.
So there are reasons to use UITableView over MonoTouch.Dialog, but it is none of those three Cons.

I use MonoTouch.Dialog (and it's brother QuickDialog for objc) pretty much every time I use a tableview. It does help a lot to simplify the code, and gives you a better abstraction of a table.
There's one exception, though, which is when the table will have thousands and thousands of rows, and the data is in a database. MT.D/QD requires you to load all the data upfront, so you can create the sections, and that's simply too slow if you don't already have the objects in memory.
Regarding "breaking MVC", I kind of agree with you. I never really use the reflection bindings in MT.D because of that fact. Usually I end up creating the root from scratch in code, or use something like JSON (in my fork https://github.com/escoz/MonoMobile.Forms), so that my domain objects don't need to know about MT.D.

Strategies for search across disparate data sources

I am building a tool that searches people based on a number of attributes. The values for these attributes are scattered across several systems.
As an example, dateOfBirth is stored in a SQL Server database as part of system ABC. That person's sales region assignment is stored in some horrible legacy database. Other attributes are stored in a system only accessible over an XML web service.
To make matters worse, the the legacy database and the web service can be really slow.
What strategies and tips should I consider for implementing a search across all these systems?
Note: Although I posted an answer, I'm not confident its a great answer. I don't intend to accept my own answer unless no one else gives better insight.

You could consider using an indexing mechanism to retrieve and locally index the data across all the systems, and then perform your searches against the index. Searches would be an awful lot faster and more reliable.
Of course, this just shifts the problem from one part of your system to another - now your indexing mechanism has to handle failures and heterogeneous systems, but that may be an easier problem to solve.
Another factor is how often the data changes. If you have to query data in real-time that goes stale very quickly, then indexing may not be practical.

If you can get away with a restrictive search, start by returning a list based on the search criteria corresponding to the fastest data source. Then join up those records with the other systems and remove records which don't match the search criteria.
If you have to implement OR logic, this approach is not going to work.

While not an actual answer, this might at least get you partway to a workable solution. We had a similar situation at a previous employer - lots of data sources, different ways of accessing those data sources, different access permissions, military/government/civilian sources, etc. We used Mule, which is built around the Enterprise Service Bus concept, to connect these data sources to our application. My details are a bit sketchy, as I wasn't the actual implementor, just an integrator, but what we did was define a channel in Mule. Then you write a simple integration piece to go between the channel and the data source, and the application and the channel. The integration piece does the work of making the actual query, and formatting the results, so we had a generic SQL integration piece for accessing a database, and for things like web services, we had some base classes that implemented common functionality, so the actual customization of the integration piecess was a lot less work than it sounds like. The application could then query the channel, which would handle accessing the various data sources, transforming them into a normalized bit of XML, and return the results to the application.
This had a lot of advantages for our situation. We could include new data sources for existing queries by simply connecting them to the channel - the application didn't have to know or care what data sources where there, as it only looked at the data from the channel. Since data can be pushed or pulled from the channel, we could have a data source update the application when, for example, it was updated.
It took a while to get it configured and working, but once we got it going, we were pretty successful with it. In our demo setup, we ended up with 4 or 5 applications acting as both producers and consumers of data, and connecting to maybe 10 data sources.

Have you thought of moving the data into a separate structure?
For example, Lucene stores data to be searched in a schema-less inverted indexed. You could have a separate program that retrieves data from all your different sources and puts them in a Lucene index. Your search could work against this index and the search results could contain a unique identifier and the system it came from.
http://lucene.apache.org/java/docs/
(There are implementations in other languages as well)

Have you taken a look at YQL? It may not be the perfect solution but I might give you starting point to work from.

Well, for starters I'd parallelize the queries to the different systems. That way we can minimize the query time.
You might also want to think about caching and aggregating the search attributes for subsequent queries in order to speed things up.
You have the option of creating an aggregation service or middleware that aggregates all the different systems so that you can provide a single interface for querying. If you do that, this is where I'd do the previously mentioned cache and parallize optimizations.
However, with all of that it you will need weighing up the development time/deployment time /long term benefits of the effort against migrating the old legacy database to a faster more modern one. You haven't said how tied into other systems those databases are so it may not be a very viable option in the short term.
EDIT: in response to data going out of date. You can consider caching if your data if you don't need the data to always match the database in real time. Also, if some data doesn't change very often (e.g. dates of birth) then you should cache them. If you employ caching then you could make your system configurable as to what tables/columns to include or exclude from the cache and you could give each table/column a personalizable cache timeout with an overall default.

Use Pentaho/Kettle to copy all of the data fields that you can search on and display into a local MySQL database
http://www.pentaho.com/products/data_integration/
Create a batch script to run nightly and update your local copy. Maybe even every hour. Then, write your query against your local MySQL database and display the results.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string