I am observing a quite strange issue, with the sharepoint rest API (SharePoint Online as well as onPrem).
We have a slightly bigger library (~100.000 documents) with some multi-level folder structure. I have two additional fields on the folder, let's call them CountryA and CountryB, both fields are indexed.
What I observe is the following:
/Items?$filter=(substringof('Ukraine',CountryA))&$select=ID (150 records)
/Items?$filter=(substringof('Spain',CountryB))&$select=ID (250 records)
/Items?$filter=(substringof('Ukraine',CountryA)
and (substringof('Spain',CountryB))&$select=ID (100 records)
and now the very strange thing:
/Items?$filter=(substringof('Ukraine',CountryA)
or (substringof('Spain',CountryB))&$select=ID
throws an Microsoft.SharePoint.SPQueryThrottledException
To be honest this doesn't make any sense to me; almost looks like a bug..
As I don't have much time, I "sorted" the issue by doing both single field queries rest queries in parallel and joining the result on the client side..
Anyhow any feedback is really appreciated; as I actually increase the load towards SharePoint by now throwing two rest calls in parallel on the server I just can't believe this is the "right" way to go?!
If the CountryA column and CountryB column are indexed, you can use the REST URL below.
/Items?$top=500&$filter=(substringof('Ukraine',CountryA) or substringof('Spain',CountryB))&$select=ID
Related
TLDR of the question:
Is it possible to use Graph to query a SharePoint list, which contains lookups that would need to be fetched from a different SharePoint list?
The "old" SharePoint API can do that in one request.
Follow up question as a result of my attempts to work around that limitation:
Why does Graph not allow me to ask for multiple list entries by ID?
This makes literally no sense to me.
Background for the question:
I've been given the task to move a small SharePoint app from the normal SharePoint API over to the Graph API, so the features could be expanded into incorporating Exchange too. I've never worked with either prior to this, so I didn't really have any idea what I was getting into.
And while I did succeed in finding equivalent queries to Graph for everything that was needed so far, do I also start to doubt that Graph is seriously intended to be used for SharePoint access.
Lists are the best example. The SharePoint API offers resolving LoopupId values when requesting multiple items.
Graph doesn't even offer that when requesting an item directly, let alone multiple.
To make things worse, after I wrote my own lookup routine that picks the columns that are lookups, and having to manually tell it where to find the values for that, I discovered that Graph won't even let me request multiple items by ID...
At first I tried to chain id eq '<id>' requests, because even $batch requests are limited to 20 individual requests, limiting the amount of items I could lookup to that at most.
But filtering 'id' is apparently unintended.
https://graph.microsoft.com/v1.0/sites/{site}/lists/{list}/items?$filter=id+eq+'67'
results in "General exception while processing", which I've never even seen as a response until this.
I then tried the in keyword:
https://graph.microsoft.com/v1.0/sites/{site}/lists/{list}/items?$filter=id+in+('67')
which results in "Invalid request".
After that I thought I could be smart an add a calculated column which copies the item's id and index on that, but guess what: can't set an index on that column in the first AND it also refuses filtering on that on top.
Not even offering the header fix for indexing on unindexed columns, nope. Outright complains that the field isn't usable.
With all this, I feel like I will have to settle for a hybrid approach, unless I'm seriously missing something here.
I thought that having to write my own LookupId resolver was bad, but being unable to even optimize the requests to return all matching items from a list in one request at least, and instead having to request every single item EACH, because filtering by id is forbidden and the ONLY access by id is singular, just gives me the feeling that Graph was never meant to be used for SharePoint lists at all.
Is there an actual question here?
Microsoft has been recommending Graph for working with SharePoint Online for a while now. Though it is true there are shortcomings at present, Microsoft is constantly investing in improving Graph whereas older methods like CSOM are deprecated and are no longer updated.
If there is any sufficient reason why you have to filter by id instead of using GET .../items/{id} as many times as needed, you may try to filter by fields/ID, but don't use $ before filter parameter for that case - it won't work, so try
GET https://graph.microsoft.com/v1.0/sites/{site}/lists/{list}/items?filter=fields/ID eq 67
with setting header Prefer=HonorNonIndexedQueriesWarningMayFailRandomly
Also, getting multiple items from large lists may cause issues, so for that case I'd add &top=5000
I have Express running in a Node.js server, which serves as a backen for my React frontend application.
The frontend application fetches data from the backend (which is stored in Mongo) through a REST call, and display this data in a table.
The amount of data is growing by the day, so I though I should look into reducing the abount of data transferred to the frontend application, so avoid unnecessary strain on the backend.
I'm not sure if this is the right way to approach this, but I've been thinking I would look into having the backen fetch a limited amount of entries, so that only these data will be displayed in the frontend table.
The problem arises with searching - when the user wants to search the data in the table, I'll need to be able to search through all entries, not just the data loaded into the table.
I guess one option would be to have the search function actually query the REST API, instead of searching the table itself.
If I'm on the right track, I guess I could implement REST API pagination, somewhere along the example found in https://refactoringfactory.wordpress.com/2012/09/08/pagination-in-node-js-and-express/. Other suggestions on how to implement pagination are welcome.
I'd very much like some input on the approach I described, and suggestions for smarter ways implement this.
EDIT: I changed the title somewhat to include "Infinite scroll pagination". This is what I'm looking to implement. At the moment I have a click on pages pagination setup, but would like to replace this for the infinite scroll pagination.
I've been thinking I would look into having the backen fetch a limited amount of entries, so that only these data will be displayed in the frontend table.
This is common practice in my experience. The term for it is "pagination." Have a look at this SO question regarding best practices for pagination in REST API's: API pagination best practices.
The problem arises with searching - when the user wants to search the data in the table, I'll need to be able to search through all entries, not just the data loaded into the table.
I guess one option would be to have the search function actually query the REST API, instead of searching the table itself.
Again, you got it. Doing small filters/searches on the client is fine for a limited number of entries, but if you need to only retrieve items matching search criteria in the first place, then adding that functionality to your REST API is the right choice.
Right, you should do
pagination: you might implement it by exposing 2 arguments in the rest endpoint for the listing
?p=<number>: page number, defaults to 1
?l=<number>: number of items per page / page length, defaults to a number maybe from 10 to 100
search: implement it by exposing 1 argument in the rest endpoint for the listing
/?q=<string>: you can define to be what you want, maybe a string that matches with one or multiple fields of the data
If you want to minimize the network traffic, you might also add one more parameter to explicitly select the fields you want to be returned, like this
/?f=<string>: string could be something like id,name,age, and so the api should return only those three fields per record.
All this parameters should be accepted by a list endpoint in your RESTful API
Example:
http://example.com/api/cars/?p=2&l=15&q=toyota&f=id,brand,model,color
I have a lot of documents I want to store in a document library in SharePoint 2010. We're talking about 50k+ documents. I've worked with document libraries many times, but not of this size and I find myself getting confused about some definitions when it comes to how these should be stored and the number of elements allowed.
By looking here: http://technet.microsoft.com/en-us/library/cc262787%28v=office.14%29.aspx#ListLibrary it says that a document library can hold up to 30 million documents. Nice! 50k is not close to 30 millions. However, can I just dump all of the documents into a library without grouping them in views or sub folders? Cause a view only can have 5k elements and then I have to create views and put the documents in many views in order not to exceed this limit.
Now, the documents, and the library, will most likely never be browsed by going to the library. Each document will be linked from another place, and this will also not be that often. Therefore I am kind of hoping I can just dump all the documents in one big library. I have read that if the number of elements in a list exceeds 5k SharePoint will not query the query to return everything, but instead exchange this query with some default query. In my case this is fine, but are there other concerns about dumping this many files into one library in SharePoint 2010? And is there anything else I may not have thought about?
Also quick question at the end, I am planning on scripting the upload by using PowerShell, but I have heard from others that uploading documents this way to SharePoint could takea lot of time because it does it one document at the time. Is it possible to "bulk upload" documents through PowerShell or another approach?
The key here is to understand that SharePoint can STORE up to 30 million documents, but can only display 5,000 at a time. The easiest way to maintain that would be to dump the documents into separate folders with no more than 5,000 documents in each folder. Its easy to do that, but I'm not a big fan of folders since they impose a single organizational structure on a set of documents. Applying metadata and then filtering views is more efficient in the long run, but much harder to do when dumping documents into a library. I would suggest looking at some of the third party migration software that can do this kind of bulk upload and still maintain appropriate metadata. One I've used (there are others) is Metalogix Content Matrix.
take for instance an ecommerce store with catalog and price data in different web services. Now, we know that solr does not allow partial updates to a document field(JIRA bug), so how do you index these two services ?
I had three possibilities, but I'm not sure which one is correct:
Partial update - not possible
Solr join - have price and catalog in separate index and join them in solr. You cant join them in your client side code, without screwing up pagination and facet counts. I dont know if this is possible in pre-solr 4.0
have some sort of intermediate indexing service, which composes an entire document based on the results from both these services and sends this for indexing. however there are two problems with this approach:
3.1 You can still compose documents partially, and then when the document is complete, you can set a flag indicating that this is a complete document. However, to do this each time a document has to be indexed, it has to first check whether the document exists in the index, edit it and push it back. So, big performance hit.
3.2 Your intermediate service checks whether a particular id is available from all services - if not silently drops it and hopes that when it appears in the other service, the first service will already be populated. This is OK, but it means that an item is not available in search until all fields are available (not desirable always - if u dont have price, you can simply set it to out-of-stock and still have it available)
Of all these methods, only #3.2 looks viable to me - does anyone know how you do this kind of thing with DIH? Because now, you have two different entry points (2 different web services) into indexing and each has to check the other
The usual way to solve this is close to your 3.2: write code that creates the document you want to index from the different available services. The usual flow would be to fetch all the items from the catalog, then fetch the prices when indexing. Wether you want to have items in the search from the catalog that doesn't have prices available depends on your business rules for the service. If you want to speed up the process (fetch product, fetch price, repeat), expand the API to fetch 1000 products and then prices for all the products at the same time.
There is no reason why you should drop an item from the index if it doesn't have price, unless you don't want items without prices in your index. It's up to you and your particular need what kind of information you need to have available before indexing the document.
As far as I remember 4.0 will probably support partial updates as it moves to the new abstraction layer for the index files, although I'm not sure it'll make your situation that much more flexible.
Approach 3.2 is the most common, though I think about it slightly differently. First, think about what you want in your search results, then create one Solr document for each potential result, with as much information as you can get. If it is OK to have a missing price, then add the document that way.
You may also want to match the documents in Solr, but get the latest data for display from the web services. That gives fresh results and avoids skew between the batch updates to Solr and the live data.
Don't hold your breath for fine-grained updates to be added to Solr and Lucene. It gets a lot of its speed from not having record-level locking and update.
I have only been working with sharepoint for three months but right from the start I was told that the SharePoint content db was off limits as MS could change the schema at any time. The recommended route is to use the object model, and in most case I kind of understands that.
Now I need to join some lists in order to present the content grouped by some specific fields. Rather then iterating through each and every list I would prefer to link our own db which resides on the same DB server, to the WSS content DB and just create a view on the tables. This view should be on our DB in order to make such that we don't change ANYTHING on the WSS content DB.
Am I on the route to eternal damnation or not?
Yes, you are. Microsoft is very clear that any modifications to the SharePoint tables renders you unsupportable.
Direct modification of the SharePoint database or its data is not recommended because it puts the environment in an unsupported state.
Now, creating a link on your own DB which queries the SharePoint DB is shaky ground. Personally I'd do one of two things:
If this is a mission-critical application, run it past MSFT support.
If it is anything else, just make sure that your view is not locking the DB during querying.
A better strategy might be to iterate the lists and sync it to your own table so you can do whatever kind of data-mining you'd like - if you don't mind whatever lag time your sync routine would need.
SharePoint pretty much relies on total "ownership" of the underlying database.
Small things like another process reading from the SharePoint database could potentially slow down SharePoint's operations in unexpected ways.
As SharePoint does not usually update in a "real time" manner, it should be good enough to create a process that queries the sharepoint lists and adds the data to a table in your own application.
Schedule the crawl for a low activity period and voila a solution that is not going to risk unexpected slow downs to SharePoint.
Start your search on querying SharePoint at SPQuery.
Check out SLAM, SharePoint List Assocation Manager. It allows you to easily push your SharePoint data to SQL, including complex joins (one to one, one to many, many to many). And it keeps the data synchonized in real time.
http://slam.codeplex.com
Well, if the joins you need to do are pretty simple, defining a linked data source in SharePoint Designer may work for you