I have been working in guidewire application version 6.0.How would you paginate an extremely large dataset in the app-server?
Example : Consider entity payment. Currently the PCF is bring back all the payments preset in the claim to the screen and the no of result to display in the UI reduced to 3 by specifying pagesize=3. Now I would like to implement the same concept through pagination in the database, via a chunking query in order increase the system stability.
List views: have a build-in row iterator which should even allow you to specify the number of rows displayed on each page.
When you configure your row iterator, there's a parameter called "pageSize"
when you set 0: paging is disabled
when you set a number different than 0 - that will be the number of elements in a single page
if you mean pagination on UI #SebastianJ answer is correct, if you are tlaking about query level you need something like this:
var partitionSize = 1000
var rows = Query.make(InvoiceItem).select()
var rowPartitions = com.google.common.collect.Iterables.partition(rows,
partitionSize).iterator() //partitions invoice item ids
while(rowPartitions.hasNext()) {
var invoiceItems = rowPartitions.next().toTypedArray() //
...
}
Related
I'm evaluating the feasibility to replace parts of our SQL database with Firestore and so far it's been a pleasure!
I'm wondering what's the best way to provide bookmarkable links to a given page in a list? I've successfully implemented paging, and found out that I need to maintain
lastVisible: firebase.firestore.DocumentSnapshot[] = []
to be able to traverse back and forth the paginated list like this:
if (this._paginator._pageIndex - 1 in this.lastVisible) {
return ref.orderBy('date', 'desc').startAfter(this.lastVisible[this._paginator._pageIndex - 1]).limit(this._paginator.pageSize)
} else {
return ref.orderBy('date', 'desc').limit(this._paginator.pageSize)
}
If I understand correctly, I can feed either the whole doc or the value used by the index to the startAfter and startAt methods. Either way, to provide a a deeplink for the list, opened at page 245, I need to pass 245 values in the URL to be able to pull this one off?
Or then I need to requery all the items from 0 to xxx page and record all the last items?
Any thoughts how to best tackle this one?
Is there any way to just use the numeric indexes, that can be calculated from the page and page size?
I have a question about the good way to use pagination with Alfresco.
I know the documentation (https://wiki.alfresco.com/wiki/4.0_JavaScript_API#Search_API)
and I use with success the query part.
I mean by that that I use the parameters maxItems and skipCount and they work the way I want.
This is an example of a query that I am doing :
var paging =
{
maxItems: 100,
skipCount: 0
};
var def =
{
query: "cm:name:test*"
page: paging
};
var results = search.query(def);
The problem is that, if I get the number of results I want (100 for example), I don't know how to get the maxResults of my query (I mean the total amount of result that Alfresco can give me with this query).
And I need this to :
know if there are more results
know how many pages of results are lasting
I'm using a workaround for the first need : I'm doing a query for (maxItems+1), and showing only maxItems. If I have maxItems+1, I know that there are more results. But this doesn't give me the total amount of result.
Do you have any idea ?
With the javascript search object you can't know if there are more items. This javascript object is backed by the class org.alfresco.repo.jscript.Search.java. As you can see the query method only returns the query results without any extra information. Compare it with org.alfresco.repo.links.LinkServiceImpl which gives you results wrapped in PagingResults.
So, as javacript search object doesn't provide hasMoreItems info, you need to perform some workaround, for instance first query without limits to know the total, and then apply pagination as desired.
You can find how many objects have been found by your query simply calling
results.length
paying attention to the fact that usually queries have a configured maximum result set of 1000 entries to save resources.
You can change this value by editing the <alfresco>/tomcat/webapps/alfresco/WEB_INF/classes/alfresco/repository.properties file.
So, but is an alternative to your solution, you can launch a query with no constraints and obtain the real value or the max results configured.
Then you can use this value to devise how many pages are available basing you calculation on the number of results for page.
Then dinamically pass the number of the current page to the builder of your query def and the results variable will contain the corresponding chunk of data.
In this SO post you can find more information about pagination.
I have a domino database with the following view:
Project_no Realization_date Author
1/2005 2015-01-02 Alex/Acme
3/2015 2015-02-20 John/Acme
33/2015 2016-06-20 Henry/Acme
44/2015 2015-02-13 John/Acme
...
Now I want to get all projects from this view that starts i.e with "3" (partial match), sort them by Realization_date descending and display first 1000 of them on Xpage.
View is large - some selection can give me 500.000 documents.
The FT search view option is not acceptable because it returns 5.000 docs only.
Creation of ArrayList or ListMap resulted with java out of memory exception (java Domino objects are recycled). Exceeding the memory may help of course but we have 30k users so it may be insufficient.
Do you have any ideas how can I achive this?
I think the key is goiong to be what the users want to do with the output, as Frantisek says.
If it's for an export, I'd export the data without sorting, then sort in the spreadsheet.
If it's for display, I would hope there's some paging involved, otherwise it will take a very long time to push the HTML from the server to the browser, so I'd recommend doing an FT Search on Project_no and Realization_date between certain ranges and "chunking" your requests. You'll need a manual pager to load the next set of results, but if you're expecting that many, you won't get a pager that calculates the total number of pages anyway.
Also, if it's an XAgent or displaying everything in one go, set viewState="nostate" on the relevant XPage. Otherwise, every request will get serialized to disk. So the results of your search get serialized to disk / memory, which is probably what's causing the Java memory issues you're seeing.
Remember FT_MAX_SEARCH_RESULTS notes.ini variable can be amended on the server to increase the (default) maximum from 5000.
500,000 is a very high set of results and is probably not going to make it very user-friendly for any subsequent actions on them. I'd probably also recommend restricting the search, e.g. forcing a separate entry of the "2015" portion or preventing entry of just one number, so it has to be e.g. "30" instead of just "3". That may also mean amending your view so the Project_no format displays as #Right("0000" + #Left(Project_no,"/"), 4), so users don't get 3, 30, 31, 32....300, 301, 302...., but can search for "003" and find just 30, 31, 32..., 39. It really depends on what the users are wanting to do and may require a bit of thinking outside the box, to quickly give them access to the targeted set of documents they want to action.
I would optimize data structure for your view. For example make a ArrayList<view entry>, that will represent the minimum information from your view. It mimics the index. The "view entry" is NOT Notes object (Document, ViewEntry), but a simplified POJO that will hold just enough information to sort it (via comparator) and show or lookup real data - for example Subject column to be shown and UNID to make a link to open that document.
This structure should fit into few hundred bytes per document. Troublesome part is to populate that structure - even with ViewNavigator it may take minutes to build such list.
Proper recycling should be ok but...
You could also "revert" to classic Domino URLS for ex ?yourviewname?ReadViewEntries&startkey=3&outputformat=JSON and render that JSON via Javascript UI component of some kind
If the filtering is based on partial match for the first sorted column, there's a pure Domino based solution. It requires that the Domino server is 8.5.3 or newer (view.resortView was introduced in 8.5.3), and that the realization_date column has click to sort.
Create a filtered collection with getAllEntriesByKey( key, false ) <-- partial match
Call view.resortView( "name_of_realization_date_column" )
Create a collection of all entries, now sorted by realization_date
Intersect the sorted collection with the filtered collection. This gives you the entries you want sorted by realization_date. E.g. sortedCollection.intersect( filteredCollection )
Pseudocode:
..
View view = currentDb.getView( "projectsView" );
view.setAutoUpdate( false );
ViewEntryCollection filteredCollection = view.getAllEntriesByKey( userFilter, False );
// Use index where view is sorted by realization_date
view.resortView( "realization_date" );
// All entries sorted by realization_date
ViewEntryCollection resortedCollection = view.getAllEntries();
resortedCollection.intersect( filteredCollection );
// resortedCollection now contains only the entries in filteredCollection, sorted by realization_date
..
I'm not certain if this would be faster than creating a custom data structure, but I would think it's worth to test :)
Background (ie what the heck is a relative complement?)
Relative Complement
What I'm trying to do
Let's say I've got a custom Vehicle entity that has a VehicleType option set that is either "Car", or "Truck". There is a 1 to many relationship between Contact and Vehicle (ie. ContactId is on the vehicle entity). How do I write an XRM query (Linq To CRM, QueryExpression, fetch Xml, whatever) that returns the contacts with only cars?
Option 1:
I’d prefer a modification of the proposal that AdamV makes above. I can’t think of a way that you’d get this particular query answered using Linq to CRM, Query Expressions, FetchXML alone. Daryl doesn’t offer what the client is, but I would suppose if Linq and Query Expressions were acceptable offerings, .NET is on the table. Creating aggregate fields containing the count of the related entity on the parent entity (contact in this case) offers more than the Boolean option. If the query requirements ever changed to a threshold (more than X cars, less than Y trucks, between X and Y total vehicles) the Boolean options fails to deliver. The client in this question isn’t known, but I can’t think of many (any?) cases where pulling all the records to the client on a set of 500K+ rows is more efficient than a the SQL query that CRM would make on your behalf against several integer fields with range clauses.
Upside:
Maintains client purity in Query approach
Simple client query
Probably as performant as possible
Downside:
Setups for Aggregate fields
Workflow or plugin to manage the increment and decrement of the aggregate fields
SQL Script for initial load of the aggregates.
Risk that aggregate fields get out of sync (workflow or plugin fails)
Option 2:
If purity within the client isn’t essential, and .NET is on the table – skip the aggregate fields and the setup and just run SQL against the Views. If you don’t want to work with the ADO.NET, a thin ORM like Dapper, Massive, or PetaPOCO can still give you an object model. As Andreas offers in his comment on the OP’s first answer, it seems like something fairly trivial to do in SQL.
Sketching something from top of mind:
SELECT c.*
FROM Contact
WHERE C.Contactid in (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type = ‘Car’ and count(contactid) > 1
)
AND NOT IN (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type <> ‘Car’ and count(contactid) > 1
)
Upside:
Much less work
CRM Entities get left alone
Downside:
Depending on the client and/or the application mixing DataAccess methods is a bit kludgy.
Likely less performant than Option 1
Option 3:
Mix and Match: Take the aggregate fields from Option 1. But update them using a scheduled SQL job (or something similar) with a query similar to the initial load job you’d need to write in Option 1
Upside:
Takes most of the work and risk out of Option 1
Keeps all of the performance of Option 1
Downside:
Some will see this as an unsupported feature.
In order to order to perform a true Relative Complement Query you need to be able to perform a subquery.
Your query would basically say give me all the contacts with cars, and then, within those results, remove any contacts that have a vehicle that isn't a car. This is what the SQL in #JasonKoopmans answer does. Unfortunetly, CRM does not support SubQueries.
Therefore, the only way to achieve this is to either perform the sub query on the client side, as I resorted to doing, or storing the results of what would be the subquery in a manner that can be accessed through the main query (ie storing counts on the contact entity).
You could theoretically do this "on the fly" by making a SubQueryResult entity that stores a ContactId, and SubQueryId. You'd first pull back the contacts that have at least 1 car, and create a SubQueryResult record for each record, with it's contactId, and a single SubQueryId that is generated client side to tie them all together.
Then you'd do another query that says give me all the contacts that are in this SubQueryResult with this SubQueryId, that do not have any vehicles that aren't cars.
I could only assume that this wouldn't be any more efficient than performing the two separate queries and performing the filter client side. Although with the new ExecuteMultipleRequests in the new CRM release, it may be close.
I have resorted to pulling back all of my records in CRM, and performing the check on the client side since CRM 2011 doesn't support this via Query Expressions.
You could write two Fetch XML statements, one to return all contacts and the count of their vehicles, and another to return all contacts and the count of their cars, then compare the list on the client side. But once again, you're having to return every contact and filter it client side.
It's not tested but how about this query expression? I'm linking in the Vehicle entity as an inner join, requiring that it's a Car. I'm assuming that the field VehicleType is a String because I'm a bit lazy and don't want to test it (I'm typing this hardcore style, no compilation - pure brain work).
Optionally, you might want to add a Criteria section as well to control which of the Contact instances that actually get retrieved. Do tell how it went!
Sorry for the verbosity. I know you like it short. My brains work better when circumlocutory.
new QueryExpression
{
EntityName = "contact",
ColumnSet = new ColumnSet("fullname"),
LinkEntities =
{
new LinkEntity
{
JoinOperator = JoinOperator.Inner,
LinkFromEntityName = "contact",
LinkFromAttributeName = "contactid",
LinkToEntityName = "vehicle",
LinkToAttributeName = "contactid",
Columns = new ColumnSet("vehicletype"),
EntityAlias = "Vroom",
//LinkCriteria = { Conditions =
//{
// new ConditionExpression(
// "vehicletype", ConditionOperator.Equal, "car")
//} }
LinkCriteria = { Conditions =
{
new ConditionExpression(
"vehicletype", ConditionOperator.NotEqual, "truck")
} }
}
}
};
EDIT:
I've talk to my MVP Gustaf Westerlund and he's suggested the following work-around. Let me stress that it's not an answer to your original question. It's just a way to solve it. And it's cumbersome. :)
So, the hint is to add a flag in the Contact or Person entity. Then, every time you create a new instance of Vehicle, you need to fire a message and using a plugin, update the information on the first about the creation of the latter.
This has several drawbacks.
It requires us to do stuff.
It's not the straight-forward do-this-and-that type of approach.
Maintenance is higher for every new type of Vehicle one adds.
Buggibility is elevated since there are many cases to regard (what happens to the flagification when a Vehicle instance is reasigned, deleted etc.).
So, my answer to your question is changed to: "can't be done". This remains effective until (gladly) proven wrong by presented alternative solution. Duck!
Personally, I'd fetch (almost) everything and unleash the hounds of LINQ onto it. But I'd do that without smiling nor proud. :)
I'm developing service which consumes CRM 2011 data via dynamic entities (as in, Microsoft.Xrm.Sdk.Entity, the late-binding method). I'm deliberately not using Xrm.cs method (early binding) in an attempt to keep my solution generic.
Also, I want to avoid connecting to a CRM database directly (e.g. EDMX) as this would stop my solution being usable for a hosted CRM (e.g. with no direct DB access).
I have the following (simplified) requirement, I'm really struggling with the selection criteria:
A random 7% of records needs to be selected (and updated).
In SQL, the selection criteria would be relatively easy - I know how to select a random percentage of records. Something like:
SELECT TOP 7 PERCENT * FROM
(
SELECT TOP 1000 NEWID() AS Foo, [someColumns]
FROM [someTable]
)
AS Bar ORDER BY Bar.Foo ASC
This works perfectly. I gather the LINQ equivalent is something like:
from e in someEntities
orderby Guid.NewGuid()
select e;
There's a problem though, I don't know of a way to use LINQ with CRM 2011 dynamic entities - instead they insist on using either some restrictive QueryExpression classes/syntax, or fetchXML, as seen on this page (MSDN).
I've identified the following options for fulfilling this requirement:
Using dynamic entities, return the whole record set into a List, then simply choose a random selection by index. This however involves returning up to 10,000 records over an internet data service, which may be slow/insecure/etc.
Use a fetchXML statement. Unfortunately I don't know fetchXML, so I don't know if it's possible to do things like COUNT, TOP, PERCENT or NEWID().
Use Xrm.cs and LINQ, or use a Stored Procedure, or a SQL view. All of these options mean tying the solution down to either direct database connectivity and/or early binding, which is not desirable.
Say no to the customer.
Any advise would be greatly appreciated! Can fetchXML perform this query? Is there a better way to do this?
FetchXML does not support this, so you are down to either 1 or 3. And you are right, 3 would only work in the On Premise version, as you can't connect directly to SQL with the CRM Online product. However, that's the one I would go with unless you are absolutely sure the customer will be moving to CRM Online. If you must go with 1, you can at least limit the returned columns to only be the GUID of the record to decrease the payload size. Then when you select your random records, just go get their additional columns if needed (of course this could end up being slower due to "chattiness" depending on how many random records you are dealing with).
Dynamics CRM 2011, at this point, can't give you the degree of querying power that SQL and other LINQ providers can give, so I really believe you'll want to say no to the customer and move to the on-premise version if he/she wants that kind of flexibility.
With that said, a variant of method #1 is to, rather than fetch all rows at once and then choose your random set, fetch a random set from the entity one row at a time until you have the number of rows you want. The downside of this method is that instead of one call to the DB, there are many, which slows down the overall retrieve speed. A POC is below.
As for #2, I believe it's possible to handle all of your request, with some degree of success, using fetchXml. In fact, the only way to get aggregated data is by using fetchXml, and it also supports paging.
As for #3, native SQL is your best bet to get everything you want out of your data at this point, but that notwithstanding, while the LINQ provider is limited, it's a lot easier to transition SQL statements to LINQ than to fetchXML, and it does support late-binding/dynamic entities.
//create a list of random numbers
List<int> randomNumbers = new List<int>();
//declare a percentage of records you'd like to retrieve
double pctg = 0.07;
//use FetchXML to count the # of rows in the table
string fetchXml = #"<fetch aggregate='true'>
<entity name='salesorder'>
<attribute name='salesorderid' aggregate='count' alias='countIds' distinct='false' />
</entity>
</fetch>";
EntityCollection result = _service.RetrieveMultiple(new FetchExpression(fetchXml));
int rowCount = int.Parse(result.Entities[0].FormattedValues["countIds"].Replace(",", ""));
//initalize the random number list for paging
for (int i = 0; i < Math.Ceiling(pctg * rowCount); i++)
{
randomNumbers.Add((new Random(unchecked((int)(DateTime.Now.Ticks >> i)))).Next(rowCount - 1));
}
randomNumbers.Sort();
//page through the rows one at a time until you have the number of rows you want
using (OrganizationServiceContext osc = new OrganizationServiceContext(_service))
{
foreach (int r in randomNumbers)
{
foreach (var er in (from c in osc.CreateQuery("salesorder")
//not especially useful to use the orderby option as you can only order by entity attributes
//orderby c.GetAttributeValue<string>("name")
select new
{
name = c.GetAttributeValue<string>("name")
}).Skip(r).Take(1))
{
Console.WriteLine(er.name);
}
}
}