XPages: Unite views from 'X' databases in one page - xpages

I am facing the following challenge in an XPage: There are three databases with exactly the same views in it. The goal is to unite these three views from the three databases in one XPage and one view component!
AFAIK, one can usually provide just one view per view component. Currently, I have a Java back end where the documents are fetched. They are then processed to HTML markup and made more beautiful / functional by using jQuery data tables.
I see (at least) three disadvantages:
It is quite some code and if you want to display another view from the databases you quickly run into boiler plate code...
It is not too fast as it takes up to 30 sec. to fetch and display all records.
I can hardly image that my way is best practice.
Has anyone ever faced this challenge? I would like to reduce Java code, make it faster and use some standard component if possible.

Tim has good questions in his comment. With your current approach make sure you use ViewNavigator cache which is the fastest way to retrieve view entries:
Notes/Domino Release 8.52 or greater
View.setAutoUpdate must be False
ViewNavigator cache must be enabled
ViewNavigator.getNext() (or getPrev) must be used
http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Fast_Retrieval_of_View_Data_Using_the_ViewNavigator_Cache

Related

Best way to build a custom table linked to a NotesView with sorting and paging

I have a view in my Xpage application that contains a lot of elements. From this view I need to build a table with custom rows (I can't just display the view, I need to build the rows to display myself because I need to compute data from other database, things that you can't do directly in a view).
In order to do so I know that I can use Dataview, Datatable or repeat control (other ideas maybe?). For sure I can't bring all the data on the client, it's way too much.
I am looking for a solution that will allow me to do paging (easy to do with the pager component) but more important sorting on header click. To be clear, I need sorting for all the entries of the view and not only for the current displayed page on the client.
What can be the more efficient way to do so ? I really have a lot of data to compute so I need the fastest way to do it.
(I can create several views with different sorting criteria if needed).
Any repeating control can have pagers applied to it. Also View Panels can include data not in the current view - just set the columnName property to blank and compute the value property. bear in mind you will not be able to sort on those column though - they're not columns, they're values computed at display time.
Any computed data is only computed for the entries currently shown. So if you have 5000 entries in the view but are only displaying 30 at a time, the computed data will only be computed for the current 30.
If your users need to be able to sort on all columns and you have a lot of data, basically they have to accept that they're requirements mean all that data needs computing when they enter the view...and whenever it's updated, by themselves or any other users. That's never going to be quick, and the requirements are the issue there, not the architecture. RDBMS may be better as a back-end, if that's a requirement, as long as the data doesn't go across multiple tables. Otherwise graph database structure may be a better alternative.
The bigger question is why the users need to sort on any column. Do the users really want to sort on the fifth column and then scroll down to entries beginning with a "H"? Do they want to sort on the fourth column and scroll down to entries for May 2014? On a Notes Client, that's a traditional approach, because it's easier than filtering. But usually users know what they're looking for - they don't want entries beginning "H", they want entries where the department is HR. If that's the case, sorting on all columns and paging is not the most efficient method either from a database design or a usability point of view.
To keep the processing faster and lightweight, I use JSON with JQuery DataTables.
Depending on the Data-size and usage, JSON could be generated on the fly or scheduled basis and saved in Lotus Notes Documents or ApplicationScope variables.
$.each(data, function(i, item) {
dataTable.row.add( [data[i].something1,data[i].something2,data[i].something3])
});
You can compute a viewColumn but if you have a lot going on I wouldn't go that route.
This is where Java in XPages SHINE!
Build a Java object to represent your row. So in java use backend logic to get all the data you need. Let's say you have a report of Sales Orders for a a company. And sales orders is pulling data from different places. Your company object would have a method like:
List<salesOrder> getOrders() {}
so in the repeat you call company.getOrders() and it returns all the rows that you worked out in java and populated. So your "rowData" collection name in the repeat can access all the data you want. Just build it into a table.
But now the sorting... We've been using jQuery DataTables to do just this.. It's all client side... your repeat comes down and then the DataTables kicks in and can make everything sortable... no need to rely on views.. works great...
Now it's all client side but supports paging and works pretty decent. If you're just pumping out LOTS of records - 6,000+ then you might want to look at outputting the data as json and taking advatange of some server cacheing... We're starting to use it with some really big output.. LOTS of rows and it's working well so far. Hopefully I'll have some examples on NotesIn9.com in the near future.

Replicating pre-calculated views in CouchDB/Couchbase

When i first query a CouchDB/Couchbase view it needs to be calculated. This can take a good while if there are large number of docs and that for each single view..
Is there any way of replicate an already calculated view from one Couch to another?
Not directly through CouchDB replication, no, there's all sorts of practical complexities in how that would have to be implemented that make it impractical I'm afraid.
For starters it means that CouchDBs have to carefully manage replication of view calculation of changes simultaneously somehow exactly in sync with the actual data (so you don't ever get newer view calculations than data), and that then gets further complicated by the fact that views only get updated when requested, so view data on either end could be out of date (and if users are querying with stale=ok, it might even be required to stay out of date).
I believe you can do it by directly copying the view index files (in /var/lib/couchdb/.DBNAME_design/SOMEHASH.view by default I think), if you just need a once-off view sync. I'd recommend against doing that frequently as a general solution though, since it's not officially supported AFAIK and is likely to be pretty fragile.
This isn't directly the answer to your question, although as PimTerry pointed out, replicating the view index is not supported, especially between different implementation.
What you can do instead is follow the procedure described here:
http://wiki.apache.org/couchdb/How_to_deploy_view_changes_in_a_live_environment
This way you can have your couchdb calculate the new index "in background" without blocking the usage of your application.
Hope this helps.

Are there reasons why FTSearch would not be a suitable alternative to DBColumn in a Type-Ahead on an XPage, when trying to improve performance?

I have a general requirement in my current project to make an existing XPage application faster. One thing we looked at was how to speed up some slower type-ahead fields, and one solution to this which seems to be fast, is implementing it using FTSearch rather than the DBColumn we originally had. I want to get advice on whether this would be an OK approach, or if there are any suggestions to do what we need in a different way.
Background:
While there are a number of factors affecting the speed (like network latency, server OS, available server memory etc.), as we are using 8.5.3, we have optimized the application in general as far as we can, making use of the IBM Toolkit to find problem areas, and also using the features IBM added to help with this in 8.5.3 (e.g. Partial Execution, using the optimized JS and CSS option, etc.). Unfortunately we are stuck with the server running on a 32bit Windows OS with 3.5Gb Ram for another few months.
One of the slowest elements to respond are in certain type-aheads which reference a large number of documents. The worst one averages around 5 or 6 seconds before the suggested list appears for a type-ahead enabled field.
It uses SSJS to call a java class to perform a dbcolumn call (using Ferry Kranenburg's XPages Snippet) to get a unique list from a view, then back in SSJS it loops though the array to check if each entry contains the search key value, and if found it adds a highlight (bold) html tag around the search text in the word, then returns the formatted list back to the browser.
I added a print statement to output the elapsed time it takes to run the code, and on average today on our dev server it is around 3250 ms.
I tried a few things to see how we could make this process faster:
Added a Java class to do all processing (so not using SSJS). This only saved an average of 100ms.
Using a view-scoped Managed Bean, I loaded the unique Lookup list into memory when the page is loaded. This produces a really fast type-ahead response (16ms), but I suspect this is a very bad way to do this with a large data set - and could really impact the general server if multiple users were accessing the application. I tried to find information on what would be considered a large object, but couldn't find any guidance or recommendation on how much is too much to store in memory (I searched JSF and XPage sites). Does anyone have any suggestions on this?
Still in a Java class - instead of performing a dblookup to get the 'list' of all values to search through, I have the code run a FT Search to get the doc collection, then loop each doc to extract the field value I want and add those to a 'SortedSet' (which automatically doesn't allow duplicates), then loop the sorted set to insert the bold tags around the search term, and return that to the browser. This takes on average 100ms - which is great and barely noticeable. Are there an drawbacks to this approach - or reasons I should not do it this way?
Thanks for any feedback or advice on this.
Pam.
Update Aug, 14. 2013: I tried another approach (inspired by the IBM/Tony McGuckin Insights application on OpenNtf) as the Company Search type-ahead in that is using managed beans and is fast across a lot of data.
4 . Although the Insights application deals with data split across multiple databases, the principle for the type-ahead is similar. I couldn't use a view with getAllEntriesByKey though as I needed to search for a string within the text too, not just at the start of the entry. I tried creating a ViewEntryCollection based on a view FTSearch, but as we have a lot of duplicate names in the column, this didn't give the unique list I wanted. I then tried using a NotesViewNavigator on a categorized view, and looping through that. This produced the unique list I needed, but it turned out to be slower than any of the other methods above. (I did implement these ViewNavigator performance tips).
From my standpoint, performance may be affected by any of many layers every Domino application (not only XPages) consists of.
From top - browser (DOM, JS, CSS, HTML...), network (latencies, DNS, SSO...) to application layer (effective algorithms, caches), database/API (amount of data, indexes, reader names...) and OS/hardware (disks, memory...)
According to things you tested:
That is interresting, but could be expected: SSJS is cached and may use lower level API to get data (NAPI).
For your environment (32bit/3.5G RAM - I expect your statement about 3.5M is typo) I DO NOT recommend to cache big lists, especially if you apply it as a pattern to many fields/forms/applications. Cache in WeakHashMap could be more stable, though.
Use of FT search is perfectly fine, unless you need data that update frequently. FT index need some time and resources to update.
My suggestion is: go for FT, if it solves your problem. Definitely, troubleshoot FT performance in some heavy performance test on your server first.
(I cannot comment because of my low reputation)
I have recently been tackling with a similar problem. Here are some additional points to consider:
Are there many duplicate keywords in the view? Consider making a categorized view for #DbColumn.
FTSearching a view is often slower than a database, I believe. See Andre Guirard's article. Consider using db.FTSearch() and refining your FT query to include view's selection #Formula, if possible.
The FT index can be updated programmatically with db.updateFTIndex(). If keywords are added rarely, but need to be instantly available, you can perform index update in keyword document's QuerySave event (or similar). We used this approach when the keywords were stored in different (much smaller) database and the update was very fast.
The memory consumption can be checked this way:
Install XPages Toolbox from OpenNTF.
Open your application.
Create a JVM memory dump (Session dumps - Generate Heap Dump).
Install Eclipse Memory Analyzer Tool
Install IBM Diagnostic Tool Framework into Memory Analyzer.
Load your memory dump into MAT. You will see every Java object and their sizes.
In the end, I believe that there is no single general answer to your question. You need to test different approaches to find the fastest solution in your environment.
One problem with FT search is this error:
The full text index for this database is in use
Based on my experience this will occur for a while (maybe a few seconds) when the indexer task starts to index the database. If your users are not very demanding they can just try again and it will probably work.
But in many cases you want to minimize the errors the users get and will have to handle this error nicely. I've built my own FTSearch method which waits a bit and tries again until the error is not received. This will show as slowness to the user instead of error.

Does a B Tree work well for auto suggest/auto complete web forms?

Auto suggest/complete fields are used all over the web. Google has appeared to master it given that as soon as one types in a search query, suggestions are returned almost instantaneously.
I'm assuming the framework for achieving this involves a fast, in-memory data store on the web tier. We're building a Grails app based around retail products, so a user may search for Can which should suggest things like Canon, Cancun, etc, and wondering if a Java B-tree cached in memory would suffice for quick auto completes returned as JSON over AJAX. Outside of the jQuery AutoComplete field, do any frameworks and/or libraries exist to facilitate the development of this solution?
Autocomplete is a text matching, information retrieval problem. Implementing your own B-tree and writing your own logic to match words to other words is something you could do. But then you would have to implement Porter Stemming, a Vector Space Model, and a String-edit distance calculation.
...or you could use Lucene and its derivatives, which do a lot of this stuff already. If you really care about the data structures used to store this stuff, you could dive into its source. But I highly doubt writing your own and doing it all yourself would be more maintainable and efficient in the long run.
One of the more popular Grails ecosystem plugins for this is Searchable, which was mentioned in Ledbrook & Smith's Grails in Action. It uses Lucene under the covers, and make sit pretty easy to add full-text search to your domain classes. (For example, check out chapter 8 in GinA or the searchable docs).
The Grails Richui plugin has an autocomplete that I've used in the past. We had it hooked up to hit the database every keystroke (which I would not suggest but our data changed often enough that real-time data was required). If your list of things is pretty static though then it could probably work well for you.
http://grails.org/plugin/richui#AutoComplete

SharePoint InfoPath best practices for persisting large forms

I am currently architecting a large SharePoint deployment.
This deployment has the potential to grow to petabytes in size over the course of several years.
One of the current issues we are discussing is the option of storing our data in SharePoint using InfoPath Forms. Some of these forms contain hundres of fields and require a lot of mapping to content types for persistence and search. Our search requirement is primarily a singular identifier and NOT the contents of the forms, although I am told I should preempt the "want" to search in the future.
We require our information to be utilised for secondary purposes (such as reporting etc). The information MUST be accessible instantly after persisting to the system.
My core questions therefore are:
What are the benefit/risks of this approach compared to storing
our data in a singular relational store using web-service
persistence?
If we decided on this approach what would be the
impact of changing the forms, content-types over time?
What happens when our farm grows beyond a single web-application / site collection how accessible will the information be?
Will I know where it is and how portable will the information be overtime?
1.)
Benefit:
Form templates can be created & deployed (relatively) easy
You can easily configure Field Validation
Probably no code involved
Risks:
Hitting SharePoint 2010 Limits (not so uncommon as you might
think)
Needs careful form design/planning (correct XML structure)
Information only accessible via SharePoint Object model or
WebService's (very slow)
2.) Well this is a tough one. Changing the form template and re-deploying is easy and only takes a few minutes. However changing the structure (underlying XML) of the template can get you in trouble very easily, because older (filled out) forms will be invalid - there is an option to "upgrade" older forms out-of-the-box, but in my experience it never worked as it supposed to.
Content Types behave very similar, say you want to delete a column from a content type because it's no longer needed - you'll have to remove all references to it, which means removing all items so you can delete the column.
3.) Well portability is definitly an issue with InfoPath, because it heavily relies on the corresponding URL structure. You absolutely can add more site collections, but this means you have to deploy your form template to each site collection. Information (filled out forms) can't easily be shared between site collection's because each form contains the SourceURL (where it came from) and the Namespace of the template (which changes constantly once you deploy).
Considering your requirements, i would strongly recommend a relational store instead of InfoPath - simply because it is not designed to be a data storage.
I would use a SQL database to store the data and a custom UI (WebPart or Application Page) to perform CRUD operations. This means that the information is not actually stored in SharePoint - just displayed (which also means that it can't be searched with the builtin SharePoint Search). There is also the possibilty to use the Business Connectivity Services (which basically does all of the above without you needing to create a custom UI - however very slow with large amount of data).
If you do need the information just in SharePoint, why not just make all this happen with Lists only?
This is going to be a long one and may not have an answer just because there's no silver bullet for what you're looking for. It's mostly insight and ultimately the choice is up to you.
the option of storing our data in SharePoint using InfoPath Forms
This statement throws me a little. SharePoint data is stored in SharePoint (well, SQL technically) but InfoPath is just a UI layer for accessing any part of that data.
Some of these forms contain 100s of fields and require alot of mapping to content types for persistence and search
From this I assume there are multiple forms which would mean different types of data being accessed (and probably different purposes). Hundreds of fields is no problem and it really boils down to managing the form and view design.
From the form side you should check out cxpartners form design crib sheet. This gives you a nice standard to follow to manage all those fields. Another thing would be to look at breaking the form up in tabs or views itself (in InfoPath) based on what the user needs to fill out. Basically it breaks down to not creating a form with 100s of fields on one massively scrolling screen the user will just freak out over.
Same with the views on the form or document library you're storing the form data in. InfoPath forms are just xml stored in a library (so regardless of how many fields you have, the footprint is pretty minimal). You don't want to map and surface every field in the form nor do you want to have a view with 100 columns on it. You should look at breaking down the views as they're fit for purpose, with only a few hundred items in each view with a few columns. It's a balancing act too as you don't want to create 100s of views either so you need to find out what's right. A good B.A. or Information Architect will help with this with the SharePoint/InfoPath guru and business user helping out.
We require our information to be utilised for secondary purposes (such as reporting etc). The information MUST be accessible instantly
This is another requirement that's going to be a little difficult to meet exactly. If the library has thousands of items (or 10s of thousands) and a view has dozens of fields then expect the view to come to crawl (especially if the user is insistent on "seeing everything" and wants the limits of each view to be set to 1000 items, like anyone could process that much information at once). Instant access is difficult if you're keeping everything online for a long time (like for reporting). There's the operational side where users are filling out forms, finding forms, editing them, etc. and for that you only want a few hundred items to be live at any given moment (up to a few thousand but you need to be careful on the views). If you have a list with 100,000 items in it and users are using this for daily activities and trying to run reports for trending or long term operations against it, you're going to lose the performance battle. Look at doing reporting offline, potentially shipping the data that's reportable to a second source like SQL and using SSRS against it. Performance Point is an option but adds a layer of complexity to the architecture. The question will ultimately fall to what reporting looks like and how important is it in relation to daily operations.
To try to answer your questions directly:
The benefits to using SharePoint over a database are that the data can be easily viewed and sliced and diced up. Creating a view is child's play and can quickly show you useful information like # of sales in a month or customer feedback grouped by call centre person. SharePoint makes it easy to view this information and even setup dashboards, hook in KPIs, etc. without having to get some developer to craft custom web pages. As far as risks go, you need to be careful with letting things grow organically and out of control. Don't let the users design views of data, they generally want something but not sure and will ask for all columns to be available which they just export to Excel to slice and dice. Make sure there's a good design around the views and lists and they're fit for purpose and meet what needs the user is trying to get out of the data. Ask the question of what they're looking for and why, that will help shape what to expose.
Any change needs to be thought out and planned and tested. It's no different in SharePoint if you add a column to a list as you would by adding a column to a SQL database. Form updates should be considered and while you won't get it 100% right the first time, you should try to get as much as possible without going overboard and putting in crazy things like 100 "blank" fields that are players to be named later. Strike a balance by understanding the needs of the users and company and where things are going. Hopefully someone will have a vision of what this thing might be when it grows up and that'll go a long way to understanding the impact of change.
Data is just xml and as long as you're not doing stupid stuff in the form like hard coding absolute paths to services (use data connection libraries) the impact of growth will be minimal. Growing beyond a web application into multiple ones is a pretty big change and not something to be taken lightly. Even splitting site collections out is big and there needs to be a really good reason for this. Site collections can handle thousands of sites and millions of documents without issue. Web applications are really there for dividing up areas of interest or separation of purpose (like team sites on one web app and a publishing portal on another) and not really meant for splitting data due to growth concerns.
Like I said, there's no silver bullet here and what you're asking for is an architecture for a solution that nobody here has all the requirements for. Hope this helps.

Resources