SharePoint 2007: List theory Question

I'm writing a solution around MOSS 2007. And storing fairly large quantities of data in a list.
My first question is: Can lists handle large quantities of data - around 200 000 items. Now I've already read up about it, and it seems like the limitations of lists are on the number of items the views can display (2000). So question is: Is this a recommendation or a real limitation? No documentation actually confirms this.
second question if its a physical limitation in how many items the view can display, Does this mean that its impossible to check for duplicates in a sharepoint list that contains vast quantities of data?
In the sense that to perform a wsList.getListItems you have to pass a view (if the list contains 100 000 records, and the view can only contain 2000 records) how is it possible to check for duplicates?

Huge list performance
You may want to read "Scaling to Extremely Large Lists and Performant Access Methods" and "Best Practices for LARGE SharePoint Lists and Documents Libraries".
Another thing this article does not mention that adding list items with SPList.Items.Add, because on large list it's a huge performance penality. What you do is build efficient query that returns no items and then add item to that collection (somwhere i was reading that webservices perform good on adding item, however i can't find that article no more).
You can also see some tests (or other tests) on how huge lists perform.
As for duplicates
You may want to create Scheduled job (SPJobDefinition) that runs somwhere at night and checks for duplicates.
Better idea than looping all SPListItem's and then Query list for each item to check for duplicates would probably be to get a DataTable (SPListItemCollection.GetDataTable()) for all items and use some technique to determine duplicates.
As for views
Filter items, order to see relevant ones and define your RowLimit. That's the key for views - you just need most relevant items, don't you?

You can have very large lists, but the performance is going to SUCK.
We had lists with 50,000+ items in a project and we found the best way we could query and process the contents was using SPSiteDataQuery and CrossListQueryCache and formatting the queries in the obscure, annoying SharePoint CAML dialect.

If possible breaking up the items into containers like folders would help with performance. If one of the list item fields is some type of classification lookup, then that could be replaced by putting items in folders of that classification type.


Does Powerapps return the delegatable filtered results, prior to performing the non-delegatable filtering on the app?

I am setting up a large (2000+ records) "task tracking register" using a SharePoint List, and intend to use Powerapps as the UI.
As you would imagine there numerous drop drown fields in the list which I would like to use as a filter within the Powerapp, but being that these are "Complex" fields, they are non-delegatable.
I'm lead to believe that I can avoid this by creating additional Columns in the SharePoint list that use a Flow that populates them with plain text based on the Drop-down selected.
This is a bit of pain, so I'd like to limit the quantity of these helper columns as much as possible.
Can anyone advise if a Powerapps Gallery will initially filter the results being returned using the delegateable functions first, and then perform the non-delegatable search functions on those items, or whether the inclusion of a non-delgatable search criteria means that the whole query is performed in a non-delegatable manner?
Filter 3000 records down to 800 using delegatable search, then perform the additional filtering of those 800 on the app for the non-delegatable search criteria.
I understand that it may be possible to do this via loading the initial filtered results into a collection within the app and potentially filtering that list, but have read some conflicting information as to the efficacy of this method, so not such if this is the route I should take.
Delegation can be a challenge. Here are some methods for handling it:
Users rarely need more than a few dozen records at any time in a mobile app. Try to use delegable queries to create a Collection locally. From there, its lightning fast.
If you MUST pull in all 3k+ of your records, here's my favorite hack. Collect chunks of your data source then combine into a single collection.
If you want the function to scale (and the user's wait time) you can determine the first and last ID to dynamically build a function.
Good luck!

SharePoint - document libraries, lists, views and number of elements

I have a lot of documents I want to store in a document library in SharePoint 2010. We're talking about 50k+ documents. I've worked with document libraries many times, but not of this size and I find myself getting confused about some definitions when it comes to how these should be stored and the number of elements allowed.
By looking here: it says that a document library can hold up to 30 million documents. Nice! 50k is not close to 30 millions. However, can I just dump all of the documents into a library without grouping them in views or sub folders? Cause a view only can have 5k elements and then I have to create views and put the documents in many views in order not to exceed this limit.
Now, the documents, and the library, will most likely never be browsed by going to the library. Each document will be linked from another place, and this will also not be that often. Therefore I am kind of hoping I can just dump all the documents in one big library. I have read that if the number of elements in a list exceeds 5k SharePoint will not query the query to return everything, but instead exchange this query with some default query. In my case this is fine, but are there other concerns about dumping this many files into one library in SharePoint 2010? And is there anything else I may not have thought about?
Also quick question at the end, I am planning on scripting the upload by using PowerShell, but I have heard from others that uploading documents this way to SharePoint could takea lot of time because it does it one document at the time. Is it possible to "bulk upload" documents through PowerShell or another approach?
The key here is to understand that SharePoint can STORE up to 30 million documents, but can only display 5,000 at a time. The easiest way to maintain that would be to dump the documents into separate folders with no more than 5,000 documents in each folder. Its easy to do that, but I'm not a big fan of folders since they impose a single organizational structure on a set of documents. Applying metadata and then filtering views is more efficient in the long run, but much harder to do when dumping documents into a library. I would suggest looking at some of the third party migration software that can do this kind of bulk upload and still maintain appropriate metadata. One I've used (there are others) is Metalogix Content Matrix.

showing search results more efficiently?

I want to implement the auto-complete feature provided by various e-commerce stores. Functionality is pretty simple, when you type some characters, it start showing relevant suggestions.
I implemented it using solr (django-haystack), using the autocomplete method provided in haystack.query.SearchQuerySet . Basically, i get a list of results sorted by the score. Showing top n results as suggestions.
Solr document contains $product_name, $category_name and other fields. So the results which i generated looks like list of " in ".
Problem arise when i change the category name. If i change the category name, i have to update all the product belong to that particular category to reflect these changes in the auto-complete (update all documents in solr for products of this category).
Another way to do this is by putting just the id of the categories with product in the solr document. In that case, I have do look-up for category name each time, and this is not efficient.
Is there any other efficient way to do this?
Since you are changing the underlying data, the same has to be propogated to SOLR.
There are different approaches to do this:
Update the database, and reindex - Pros: Simple enough, Cons: Indexing time can be large.
Update database and Solr in tandem - Pros: Quick updates, almost instantaneous, Cons: Can lead to data inconsistency (if one update fails)
Update database, and schedule a delta-import in Solr. This is like a middle ground between the two above.
I would recommend the 3rd approach, but this would require some upfront schema design. Read more about delta import here, in context of DataImportHandler.

Suggest me best solutions to edit /update one billion records in SharePoint list

I am having question, we have to handle huge volume of data, like one billion records that should load into a SharePoint list, after loading into the SharePoint list users can edit and update records.
Suggest me best solutions to edit /update one billion records in SharePoint list
Ramesh Reddy
1,000,000,000 records? Good luck with that! You may want to revisit the decision to use SharePoint lists as the underlying data store and look at a database instead with some very well constructed indexes.
In case our a masochist though :-
Working with Large Lists in Office SharePoint Server 2007
If its 2010, BCS could be used, and its available in all versions of SharePoint, completely agree with all the others though, you need to rethink your approach, sounds like you need to leverage a proper DB. If you do need to surface the data in SharePoint BCS and External Content Types in 2010 are a god-send.
A SharePoint list is the wrong storage choice here. A quick quote from the Working with Large Lists in Office SharePoint Server 2007 (bottom of the first paragraph under "Test results and findings"):
The maximum number of items supported
in a list with recursive folders is 5
million items.
Perhaps this list is more logically several lists dumped into one?
If you need to expose data through SharePoint, use a regular DB and maybe read about SharePoint Business Data Catalog?

Best Practice for Accessing SharePoint List Item through Object Model

I would like to know the best Practice that you guys follow when it comes to access the SharePoint List Items / Doc Lib using Object Model. To start let me share few things I have found.
Limit the number of Items Per container to 2K items.
Use ProcessWebData method of SPWeb to do Update/Insert of Large items
To completely answer your question would require a full blog post. There are several of these out there on the IntraWebs already.
Here are a few of the major points:
Avoid iterating though the entire list unless you need to see every item
If you do iterate through the list, use a foreach loop instead of a for loop
In all other cases use an SPQuery or an SPSiteData query
Access columns by the internal name or the field ID
You should also take a look at Common Coding Issues When Using the SharePoint Object Model as it has some examples on how to avoid serious performance problems.
