SharePoint List Scalability - sharepoint

I am particularly interested in Document Libraries, but in terms of general SharePoint lists, can anyone answer the following...?
What is the maximum number of items that a SharePoint list can contain?
What is the maximum number of lists that a single SharePoint server can host?
When the number of items in the list approaches the maximum, does filtering slow down, and if so, what can be done to improve it?

In SharePoint v.2:
Max # list items : 2000 (per folder level)
Max lists per site : 2000 is a "reasonable" number
Effect when we reach the limit : Exponential degradation of performance.
More info: http://technet.microsoft.com/en-us/library/cc287743.aspx
In SharePoint v.3:
Max # list items : 2000 (per view, you can have million items as long as you don't display in a single view more than 2000 items)
Max lists per site : 2000 is a "reasonable" number
Effect when we reach the limit : Exponential degradation of performance when we enumerate more than 2000 items using the OM. An alternative is to use Search API or CAML queries.
More info: http://technet.microsoft.com/en-us/library/cc287790.aspx

Link To Resource

The whitepaper I found most useful was linked from the resource posted by user mbowles above.
The direct link is...
http://go.microsoft.com/fwlink/?LinkId=95450&clcid=0x409

http://blah.winsmarts.com/2008-4-SharePoint_limits.aspx
for #3: you can index specific columns in a list, but you should still keep the sizes down.

The exact answers have already been given, however I do feel I should add this warning:
This might be one of those situations where if you have to ask, you can't afford it.
So if you find yourself approaching the limits posted earlier, think long and hard about what you are trying to do, and make sure you're not doing it wrong.

The "2000 limit" expressed above doesn't hold.
Please take a look at this document for more exact answers to your questions.
I personally experimented what the document describes.

SharePoint 2013:
What is the maximum number of items that a SharePoint list can contain? 30.000.000.
What is the maximum number of lists that a single SharePoint server can host? I could not find the answer to this question.
When the number of items in the list approaches the maximum, does filtering slow down, and if so, what can be done to improve it? The maximum of 30.000.000 not a hard boundary, but the limit has been found by testing the product. So a performance penalty is to be expected if this limit is exceeded.
Answers found in Software boundaries and limits for SharePoint 2013.

Beware, the Performance of SiteDataQuery degrades heavily he more subsites you have. A hundred subsites can take 20 seconds to query.

There is documented guidance for Microsoft® Office SharePoint® Server 2007 regarding the maximum size of lists and list containers. For typical customer scenarios in which the standard Office SharePoint Server 2007 browser-based user interface is used, the recommendation is that a single list should not have more than 2,000 items per list container. A container in this case means the root of the list, as well as any folders in the list — a folder is a container because other list items are stored within it. A folder can contain items from the list as well as other folders, and each subfolder can contain more of each, and so on. For example, that means that you could have a list with 1,990 items in the root of the site, 10 folders that each contain 2,000 items, and so on. The maximum number of items supported in a list with recursive folders is 5 million items.

Related

Sharepoint List View Threshold and Item Limit setting

I have read a lot about the list view threshold. I have indexed the appropriate meta data columns to help. I have placed mandatory web part filters on web pages. I think I am going to be able to control the view pretty well.
Should a user try to get an "All Items View" will the "Item Limit" in the view settings keep the view from exceeding the threshold? I could not find a straight or understandable answer.
Actually I only know the answer for SharePoint 2010. From your question is not clear, which version you are using.
But, the item limit in the view will not protect you from exceeding the threshold. This is a very, very complicated topic with the List View Threshold.
You should read this
https://support.office.com/en-us/article/manage-large-lists-and-libraries-in-sharepoint-b8588dae-9387-48c2-9248-c24122f07c59

Leave Management System in SharePoint 2013

I am developing Leave Management System in SharePoint 2013. Employees can apply for leaves and Manager can either approve or reject it.
I have accomplished this by creating a new list - "Leaves" and starting a workflow when a new item gets added. Workflow sends an email to Manager and creates a task item for him to be able to approve or reject it.
However, I would like to know if this approach is preferable in real time scenario. Suppose for organization of 500 employees, can a single list hold so many records for all employees. What are possible ways here to utilize the features in SharePoint and also create a scalable application.
Also, I am also planning to develop a new Add-in in SharePoint 2013 since for applying new leave, we need to display additional information such as available leaves and do some custom validations which are not provided by default SharePoint list. I will be adding the new item to the SharePoint list from the custom developed page so that the workflow still is intact and I am still utilizing out-of-box SharePoint features. Is this the way to go for enterprise level application or there are any other alternatives. Please suggest.
SharePoint Lists are capable of holding that much data. I don't see a problem if you use a single list to hold leave request of 500 employees.
Assume a worst case scenario that all of the 500 employee apply 25 leaves individually in a year, then the item count would be (500*25= 12500) which is not bad.
You will need to take care of the List Threshold error, because data is greater than 5000. For this you can create views which always bring out results less than 5000.
Now lets say you have plan for 5 years, so each year you will add 12500 items which at the end of 5 years will be 12500*5 = 62500 items
Here you can think of 2 options
You can create a list for each year, i.e. Leaves2016, Leaves2017 etc.
In a single list create folders of year, and inside them add all leave datas.
Note: The only major thing you need to take care of List view threshold problem. Which can be tackled with intelligently designing
views
For your second question.
I agree that the OOB SharePoint List form will not cater your requirements. So creating a custom page an add in or something else is a way to go. As far as your data is getting inserted into a list and eventually activating a workflow there is no harm in it.

Sharepoint multiple list update

I'm creating three approved software lists for my company with SharePoint. One is the general list for all associates the next is the restricted list which will contain software like wireshark that only certain people should have access to and the last is the master list which will be a combination of the other two lists.
What would be ideal is being able to add the software to the master list and have it update the other two lists automagically. The unique key will of course be the software title. The field that will determine which list the row will be added to is the the [group] field. (This is where the uncertainty comes in) There will be 4 values that can go into this [group] field they are: restricted, general, engineering, media.
I would like to have the rows with "restricted" go to the restricted list, obviously, and everything else go to the general list.
I'm very new to SharePoint (~1 week) and I'm trying to simplify this process as much as possible. I'm continuing to read and watch the videos to lean more however, I understand this is a complex application. I thought I'd pose this question to people with more experience than myself to find if it's even possible. If not I'll be able to change my train of thought sooner.
Thank you for your time
This is probably a question for https://sharepoint.stackexchange.com/
But -- what I would do in your situation is only use 1 list and make multiple views.
Each view can be filtered by a different criteria (like your group column in this case) then instead of having 3 distinct lists, you can display or have a link for each view (they all get their own URI in SharePoint) seperately.
This way you only ever have to update 1 list, and you avoid the overhead/complexity of trying to copy into other lists with event recievers or workflows or something else.
If someone reading this needs instructions on views:
You can create/switch views from the 'List' or 'Library' tab when you're viewing the list. Then when you add the list to a web part page, you can select which view to use in the web part properties window.

How do search engines conduct 'AND' operation?

Consider the following search results:
Google for 'David' - 591 millions hits in 0.28 sec
Google for 'John' - 785 millions hits in 0.18 sec
OK. Pages are indexed, it only needs to look up the count and the first few items in the index table, so speed is understandable.
Now consider the following search with AND operation:
Google for 'David John' ('David' AND 'John') - 173 millions hits in 0.25 sec
This makes me ticked ;) How on earth can search engines get the result of AND operations on gigantic datasets so fast? I see the following two ways to conduct the task and both are terrible:
You conduct the search of 'David'. Take the gigantic temp table and conduct a search of 'John' on it. HOWEVER, the temp table is not indexed by 'John', so brute force search is needed. That just won't compute within 0.25 sec no matter what HW you have.
Indexing by all possible word
combinations like 'David John'. Then
we face a combinatorial explosion on the number of keys and
not even Google has the storage
capacity to handle that.
And you can AND together as many search phrases as you want and you still get answers under a 0.5 sec! How?
What Markus wrote about Google processing the query on many machines in parallel is correct.
In addition, there are information retrieval algorithms that make this job a little bit easier. The classic way to do it is to build an inverted index which consists of postings lists - a list for each term of all the documents that contain that term, in order.
When a query with two terms is searched, conceptually, you would take the postings lists for each of the two terms ('david' and 'john'), and walk along them, looking for documents that are in both lists. If both lists are ordered the same way, this can be done in O(N). Granted, N is still huge, which is why this will be done on hundreds of machines in parallel.
Also, there may be additional tricks. For example, if the highest-ranked documents were placed higher on the lists, then maybe the algorithm could decide that it found the 10 best results without walking the entire lists. It would then guess at the remaining number of results (based on the size of the two lists).
I think you're approaching the problem from the wrong angle.
Google doesn't have a tables/indices on a single machine. Instead they partition their dataset heavily across their servers. Reports indicate that as many as 1000 physical machines are involved in every single query!
With that amount of computing power it's "simply" (used highly ironically) a matter of ensuring that every machine completes their work in fractions of a second.
Reading about Google technology and infrastructure is very inspiring and highly educational. I'd recommend reading up on BigTable, MapReduce and the Google File System.
Google have an archive of their publications available with lots of juicy information about their techologies. This thread on metafilter also provides some insight to the enourmous amount of hardware needed to run a search engine.
I don't know how google does it, but I can tell you how I did it when a client needed something similar:
It starts with an inverted index, as described by Avi. That's just a table listing, for every word in every document, the document id, the word, and a score for the word's relevance in that document. (Another approach is to index each appearance of the word individually along with its position, but that wasn't required in this case.)
From there, it's even simpler than Avi's description - there's no need to do a separate search for each term. Standard database summary operations can easily do that in a single pass:
SELECT document_id, sum(score) total_score, count(score) matches FROM rev_index
WHERE word IN ('david', 'john') GROUP BY document_id HAVING matches = 2
ORDER BY total_score DESC
This will return the IDs of all documents which have scores for both 'David' and 'John' (i.e., both words appear), ordered by some approximation of relevance and will take about the same time to execute regardless of how many or how few terms you're looking for, since IN performance is not affected much by the size of the target set and it's using a simple count to determine whether all terms were matched or not.
Note that this simplistic method just adds the 'David' score and the 'John' score together to determine overall relevance; it doesn't take the order/proximity/etc. of the names into account. Once again, I'm sure that google does factor that into their scores, but my client didn't need it.
I did something similar to this years ago on a 16 bit machine. The dataset had an upper limit of around 110,000 records (it was a cemetery, so finite limit on burials) so I setup a series of bitmaps each containing 128K bits.
The search for "david" resulting in me setting the relevant bit in one of the bitmaps to signify that the record had the word "david" in it. Did the same for 'john' in a second bitmap.
Then all you need to do is a binary 'and' of the two bitmaps, and the resulting bitmap tells you which record numbers had both 'david' and 'john' in them. Quick scan of the resulting bitmap gives you back the list of records that match both terms.
This technique wouldn't work for google though, so consider this my $0.02 worth.

SharePoint 2007 Banner Hit Counter

In the SharePoint publishing site I will have some banners that are Web Parts and can have any HTML content inside them. I have requirement to count clicks on that banners. Banners will have some links to external sites.
I am not sure where to store counters for individual banners. Custom List is the first thing that came to my mind but I am not sure how will it behave in concurrent access. Can I lock list (list item) and do the counter increment ? What will happen for other list access if it is in lock state ? Will it fail or just wait ?
Are there any alternatives to storing counters somewhere else ?
There are lots of places, here are the two most popular:
Property Bag (most likely on the Web) which is a number you increment
Inside a list
Of these, I have successfully done it with a list on our blogging solution, you can see it here: http://community.zevenseas.com/blogs, where I'm tracking views for each post. I took this approach because I like to see more than a number, eg. referrer, ip, etc.
Things to keep in mind:
You need to keep a close eye on the number of items you are storing. SharePoint doesn't like lots of items in a list. To manage them put them in folders, a folder for each banner, and then subfolders for each month.
I would keep a list with each of the banners (just their name or more) in it, then create a second list to store the views. In the list where you store the views have a lookup back to the list storing the banners. On the original banner list you can then create a new column which "Counts" the number of Views related to each banner item.
Again, be very careful about the number of items you are expecting, but this works pretty nicely for us.
Don't forget a small database will allow you to store page hits against whatever you want. You can then call a stored proc and that database "just takes care of it". You don't have to worry about access and concurrency (because you used a transaction riiiight!).
A SharePoint list is easy because they are there out of the box, but consider that they have a lot of overhead for adding values and even reading from. They are also editable by a site administrator, which may be find, depending on the number of administrators you have. A list is easier to provision than a new database, so in the end you do need to consider the two options carefully.
Just because SharePoint has a hammer does not mean everything is a nail :)

Resources