groovyldap, search only returning 5000 results - groovy

I am doing an LDAP search with groovyldap, the search returns the group that I am looking for but only returns 5000 members of the group:
def getGroupMembers() {
def ldap = LDAP.newInstance(connectionInformation.hostname, connectionInformation.user, connectionInformation.password)
def result = connection.search("CN=mygroup", "OU=foo,DC=bar,DC=blech", SearchScope.SUB)
def members = result["member;range=0-4999"]
members = members[0]
}
Yes, there is actually a field returned with the key "member;range=0-4999", and the "members" array has 5000 elements in it. I couldn't find any setting in the LDAP code to enable returning all members, but it seems logical to think that I should be able to fetch all of the results.

Microsoft Active Directory implements LDAP policies are implemented using objects of the queryPolicy class.
Appears the one which you are encountering is the MaxValRange which the number of values that are returned in the retrieval of multi-valued attributes of an entry.
In Microsoft Active Directory 2008 (and I assume later, this is hardcoded and although it can be modified, it is not effective).
If an attribute has more than the number of values that are specified by the MaxValRange value, from LDAP, you may use LDAP_SERVER_RANGE_OPTION_OID "Control" to retrieve values that exceed the MaxValRange value.

I have two possible answers for you. (Sorry, it's been a while since I've played with LDAP / ActiveDirectory).
You may be hitting AD page size limits. Learn more about page size limits, and the easiest way here is to implement paging in your ldap queries.
How exactly to do that in groovyldap is (unfortunately an exercise for the reader, but I've done it with nodeldap. I think it emits a page event(?) it's been a while since I've done this.
You may have a range query. I don't know much about these, but there's some thoughts on that in another StackOverflow Answer.

Related

How do you construct an Azure Search query to return a wildcard search based solely on a specific field?

If I may have missed this in some other area of SO please redirect me but I don't think this is a duped question.
I am using Azure Search with an index field called Title which is searchable and filterable using a Standard Lucerne Analyzer.
When using the built-in explorer, if I want to return all Job Titles that are explicitly named Full Stack Developer I can achieve it this way:
$filter=Title eq 'Full Stack Developer'&$count=true
But if I want to retrieve all the Job Titles using a wildcard to return all records having Full Stack in the name this way:
$filter=Title eq 'Full Stack*'&$count=true
The first 20 or so records returned are spot on, but after that point I get a mix of records that have absolutely nothing in common with the Title I specified in the query. My initial assumption was that perhaps Azure was including my specified Title performing an inclusive keyword search on the text as well.
Though I found a few instances where that hypothesis seemed to prove out, so many more of the records returned invalidated that altogether.
Maybe I don't understand fully the mechanics under the hood of Azure Search and so though my query appears to be valid; my expectation of the result is way off.
So how should my query look to perform a wildcard resulting to guarantee the words specified in the search to be included in the Titles returned, if this should be possible? And what would be the correct syntax to condition the return to accommodate for OR operators to be inclusive?
Azure Cognitive Search allows you to perform wildcard searches limited to specific fields only. To do so, you will need to specify the name of the fields in which you want to perform the search in searchFields parameter.
Your search URL in this case would be something like:
https://accountname.search.windows.net/indexes/indexname/docs?api-version=2020-06-30&searchFields=Title&search=Full Stack*
From the link here:

CouchDB and Couchbase Document Keys

In reference material for CouchDB and Couchbase it's common guidance to store the type of a document as a parameter within the actual document.
I've got a database, where I have different documents that record certain behaviour by URL. So naturally, I use the URL as the id of the document.
The problem I find is that by using just the key as the document id, I now get clashes between documents of different types. So I have started using the type as the first part of the key like this:
{ doc._id: "rss_entry|http://www.spiegel.de/1234", [...] }
{ doc._id: "page_text|http://www.spiegel.de/1234", [...] }
Now I start to wonder why I've never seen this approach to model type in any of the documentation.
Prefixes are commonly used. In addition to support for scenarios such as yours, prefixing allows one to perform logical range queries against views. There is use of this technique in the modeling examples, but perhaps the concept is not described in as much detail as you are expecting. In the section http://docs.couchbase.com/couchbase-devguide-2.5/#modeling-documents, the documents are keyed as beer_NNNN and brewery_NNNN. Also, the section http://docs.couchbase.com/couchbase-devguide-2.5/#using-reference-documents-for-lookups goes a bit deeper into this technique. There is a counter document named user::count and then each user is keyed as user::NNNN. Additionally, there are documents in the example that are keyed as fb::NNNN for a Facebook ID, email::XXX#YYYY.com for a user's email address, etc.

Solr Merging Results of 2 Cores Into Only Those Results That Have A Matching Field

I am trying to figure out if how I can accomplish the following and none of the answers I have found so far seem to fit:
I have a fairly static and large set of resources I need to have indexed and searchable. Solr seems to be a perfect fit for that. In addition I need to have the ability for my users to add resources from the main data set to a 'Favourites' folder (which can include a few more tags added by them). The Favourites needs to be searchable in the same manner as the main data set, across all the same fields plus the additional ones.
My first thought was to have two separate schemas
- the first for the main data set and its metadata
- the second for the Favourites folder with all of the metadata from the main set copied over and then adding the additional fields.
Then I thought that would probably waste quite a bit of space (the number of users is much larger than the number of main resources).
So then I thought I could have the main data set with its metadata (Core0), same as above with the resourceId as the unique identifier. Then there would be second one (Core1) for the Favourites folder with the unique id of the resourceId, userId, grade, folder all concantenated. The resourceId would be a separate field also. In addition, I would create another schema/core (Core3) with all the fields from the other two and have a request handler defined on it that searches across the other 2 cores and returns the results through this core.
This third core would have searches run against it where the results would expect to only be returned for a single user. For example, a user searches their Favourites folder for all the items with Foo. The result is only those items the user has added to their Favourites with Foo somewhere in their main data set metadata. I guess the result handler from Core3 would break the search up into a search for all documents with Foo in Core0, a search across Core1 for userId and folder and then match up the resourceIds from both of them and eliminate those not in both. Or run a search on Core1 with the userId and folder and then having gotten that result set back, extract all the resourceIds and append an AND onto the search query to Core0 like: AND (resourceId = 1232232312 OR resourceId = 838388383 OR resourceId = 8637626491).
Could this be made to work? Or is there some simpler mechanism is Solr to resolve the merging of 2 searches across 2 cores and only return the results that match on (not necessarily a unique) field in both?
Thanks.
Problem looks like a data base join of 2 tables with resource id as the foreign key.
Ignore the post if what i understood is wrong.
First i will probably do it with a single core, with a field userid (indexed, but not stored), reindex a document every time a new user favorites it by appending his user id (delimited by something that analyzer ignores).
So searching gets easier (userId:"kaka's id" will fetch all my favorites)
I think it takes some work to do this and also if number of users who can like a document increases, userid field gets really long.
So in that case,i will move on to my next idea which is similar to yours,have a second core with (userid,resource id).Write a wrapper which first searches this core for all the favorites, then searches another core for all the resources in a where condition, but again..if a user favorites more resources, the query might exceed GET method's size limit..
If both doesn't seem to work, its time to think something more scalable, which leaves us the same space wasting option.
Am i missing something??

Query and/or Search for SharePoint Document ID

We have the sharepoint 2010 environment with Document ID's enabled.
Given (part of) a Doc ID, we want to programmatically retrieve the document(s) matching that ID. The problem seems to be that this column is rather special, in that it might need special handling.
Using an SPSiteDataQuery, fetching the _dlc_DocId field as part of the viewfields works fine. However, including it as part of the where query never results in any documents being fetched.
Using the Search API has gotten us nowhere at all.
Has anyone pulled this off, or any suggestions on how to tackle this problem?
[Update] Turns out we were fooled by subtle errors in the XML and bad debugging misinterpretations. This stuff just works fine.
I don't normally contribute to these sorts of things because cleverer people than I always get there before me, but as this is an old one with no proper answer I think I'll add my thoughts for those who find this page.
I was struggling with this but after a little digging around and learning a bit of Caml I got this working.
I am using the SharePoint Client Object Model against SharePoint 2010 and Office365 beta.
Start off your query by looking at the all list items query:
Microsoft.SharePoint.Client.CamlQuery.CreateAllItemsQuery().ViewXml
"<View Scope=\"RecursiveAll\">\r\n <Query>\r\n </Query>\r\n</View>"
Stick a where child inside the query
Then add in
<Eq><FieldRef Name="_dlc_DocId" /><Value Type="Text">MDXC2KE55ASN-3-80</Value></Eq>
replacing MDXC2KE55ASN-3-80 with the doc ID you are looking for inside the where.
Also don't forget you might want to make use of these too:
<ViewFields><FieldRef Name="_dlc_DocId" /></ViewFields>
<RowLimit>1</RowLimit>
Then use List.GetItems() method to bring back the ListItemCollection.
Just in case nobody comes with a slick solutions from the depths of the Sharepoint infrastructure:
What would Google Do?
Slice is, Dice it and dump it in a reverse index.
Solr and Lucene offer supreme tools for this. The idea is to cut the DocId's in small pieces and add the location of the document to the bucket for that piece.
Say We have "A real nice document" with Id ABCD123. You would add it to the buckets
ABCD, BCD1, CD12, D123
When searching for a partial ID (+ other data like dates, types, ...) you (well the search engine) creates the union of the buckets + applies additonal constraints.
To make this happen you need to write a spider for the sharepoint server and a routine which makes a record of data elements to be indexed.
Put a nice REST interface in frnt of it (actually SOLR already has that), integrate it in the main sharepoint server, and nobody needs to know there is something else running behind it.
These products can also incrementally update the indexes, so they can be kept up to date.
you could use the following to get the Document ID.
SPFile file = MethodToUploadFileToServer(web, filepath);
SPListItem item = file.Item;
string DocID = item.Properties["_dlc_DocId"].ToString();

Sharepoint: SQL to find all documents created/edited by a user

I am looking for a query that will work on Sharepoint 2003 to show me all the documents created/touched by a given userID.
I have found tables with the documents (Docs) and tables for users (UserInfo, UserData)
but the relationship between seems a bit odd - there are 99,000 records in our userdata table, and 12,000 records in userinfo - we have 400 users!
I suppose I was expecting a simple 1 to many relationship with a user table having 400 records and joining that to the documents table, but I see thats not the case.
Any help would be appreciated.
Edit:
Thanks Bjorn,
I have translated that query back to the Sharepoint 2003 structure:
select
d.* from
userinfo u join userdata d
on u.tp_siteid = d.tp_siteid
and
u.tp_id = d.tp_author
where
u.tp_login = 'userid'
and
d.tp_iscurrent = 1
This gets me a list of siteid/listid/tp_id's I'll have to see if I can trace those back to a filename / path.
All: any additional help is still appreciated!
I've never looked at the database in SharePoint 2003, but in 2007 UserInfo is connected to Sites, which means that every user has a row in UserInfo for each site collection (or the equivalent 2003 concept). So to identify what a user does you need both the site id and the user's id within that site. In 2007, I would begin with something like this:
select d.* from userinfo u
join alluserdata d on u.tp_siteid = d.tp_siteid
and u.tp_id = d.tp_author
where u.tp_login = '[username]'
and d.tp_iscurrentversion = 1
Update: As others write here, it is not recommended to go directly into the SharePoint database, but I would say use your head and be careful. Updates are an all-caps no-no, but selects depends on the context.
DO NOT QUERY THE SHAREPOINT DATABASE DIRECTLY!
I wonder if I made that clear enough? :)
You really need to look at the object model available in C#, you will need to get an SPSite instance for a SiteCollection, and then iterate over the SPList instances that belong to the SPSite and the SPWeb objects.
Once you have the SPList object, you will need to call GetListItems using a query that filters for the user you want.
That is the supported way of doing what you want.
You should never go to the database directly as SharePoint isn't designed for that at all and there is no guarantee (actually, there's a specific warning) that the structure of the database will be the same between versions and upgrades, and additionally when content is spread over several content databases in a farm there is no guarantee that a query that runs on one content database will do what you expect on another content database.
When you look at the object model for iteration, also note that you will need to dispose() the SPSite and SPWeb objects that you create.
Oh, and yes you may have 400 users, but I would bet that you have 30 sites. The information is repeated in the database per site... 30 x 400 = 12,000 entries in the database.
If you are going to use that query in Sharepoint you should know that creating views on the content database or quering directly against the database seems to be a big No-No. A workaround could be some custom code that iterates through the object model and writes the results to your own database. This could either be timer based or based on an eventtrigger.
You really shouldn't be doing SELECTs with Locks either i.e. adding WITH (NOLOCK) to your queries. Some parts of the system are very timeout sensitive and if you start introducing locks that the system wasn't expecting you can see the system freak out.
But really, you should be doing this via the object model. Mess around with something like IronPython and experimentations with the OM are almost downright pleasant.

Resources