Query and/or Search for SharePoint Document ID - sharepoint

We have the sharepoint 2010 environment with Document ID's enabled.
Given (part of) a Doc ID, we want to programmatically retrieve the document(s) matching that ID. The problem seems to be that this column is rather special, in that it might need special handling.
Using an SPSiteDataQuery, fetching the _dlc_DocId field as part of the viewfields works fine. However, including it as part of the where query never results in any documents being fetched.
Using the Search API has gotten us nowhere at all.
Has anyone pulled this off, or any suggestions on how to tackle this problem?
[Update] Turns out we were fooled by subtle errors in the XML and bad debugging misinterpretations. This stuff just works fine.

I don't normally contribute to these sorts of things because cleverer people than I always get there before me, but as this is an old one with no proper answer I think I'll add my thoughts for those who find this page.
I was struggling with this but after a little digging around and learning a bit of Caml I got this working.
I am using the SharePoint Client Object Model against SharePoint 2010 and Office365 beta.
Start off your query by looking at the all list items query:
Microsoft.SharePoint.Client.CamlQuery.CreateAllItemsQuery().ViewXml
"<View Scope=\"RecursiveAll\">\r\n <Query>\r\n </Query>\r\n</View>"
Stick a where child inside the query
Then add in
<Eq><FieldRef Name="_dlc_DocId" /><Value Type="Text">MDXC2KE55ASN-3-80</Value></Eq>
replacing MDXC2KE55ASN-3-80 with the doc ID you are looking for inside the where.
Also don't forget you might want to make use of these too:
<ViewFields><FieldRef Name="_dlc_DocId" /></ViewFields>
<RowLimit>1</RowLimit>
Then use List.GetItems() method to bring back the ListItemCollection.

Just in case nobody comes with a slick solutions from the depths of the Sharepoint infrastructure:
What would Google Do?
Slice is, Dice it and dump it in a reverse index.
Solr and Lucene offer supreme tools for this. The idea is to cut the DocId's in small pieces and add the location of the document to the bucket for that piece.
Say We have "A real nice document" with Id ABCD123. You would add it to the buckets
ABCD, BCD1, CD12, D123
When searching for a partial ID (+ other data like dates, types, ...) you (well the search engine) creates the union of the buckets + applies additonal constraints.
To make this happen you need to write a spider for the sharepoint server and a routine which makes a record of data elements to be indexed.
Put a nice REST interface in frnt of it (actually SOLR already has that), integrate it in the main sharepoint server, and nobody needs to know there is something else running behind it.
These products can also incrementally update the indexes, so they can be kept up to date.

you could use the following to get the Document ID.
SPFile file = MethodToUploadFileToServer(web, filepath);
SPListItem item = file.Item;
string DocID = item.Properties["_dlc_DocId"].ToString();

Related

Azure Cognitive Search - return full json as SearchDocument?

I'm using Azure.Search.Documents in C# to index JSON documents in Azure blob storage. About half of the fields of each json doc are meant to be searchable or fielded. The JSON also includes some fields that I don't want evaluated by my search.
My goal is to return the entire JSON document in my search results.
It seems like my choices are to (a) add SearchField records to my SearchIndex for every aspect of the document (in which the SearchDocument results are ready for me to use) or (b) leverage metadata_storage_path / metadata_storage_name and do a separate fetch for the document itself.
Option (b) feels less efficient, considering that the SearchDocument returned is already so close to the full JSON; it seems a shame to have to make a separate fetch for each document. But for option (a) to work, I'd need to tell the SearchIndex about the extra fields without them triggering false positive search results.
For (a) is there a way to add SearchFields (or the equivalent) and have them not trigger false positives? (IsSearchable seems to affect how, but not whether, they are evaluated). Also, if (b) is the better approach, is there a way to do this using "new SearchField" as opposed to declared via attributes? Thanks.
Thank you Vince. Adding your comment as answer to help other community users.
Set IsSearchable to FALSE

Honoring previous searches in the next search result in solr

I am using solr for searching. I wants to improve my search result quality based on previously searched terms. Suppose, I have two products in my index with names 'Jewelry Crystal'(say it belongs to Group 1) & 'Compound Crystal'(say it belongs to Group 2). Now, if we query for 'Crystal', then both the products will come.
Let say, if I had previously searched for 'Jewelry Ornament', then I searches for 'Crystal', then I expects that only one result ('Jewelry Crystal') should come. There is no point of showing 'Compound Crystal' product to any person looking for jewelry type product.
Is there any way in SOLR to honour this kind of behavior or is there any other method to achieve this.
First of all, there's nothing built-in in Solr to achive this. What you need for this is some kind of user session, which is not supported by Solr, or a client side storage like a cookie or something for the preceding query.
But to achive the upvote you can use a runtime Boost Query.
Assuming you're using the edismax QueryParser, you can add the following to your Solr query:
q=Crystal&boost=bq:(Jewelry Ornament)
See http://wiki.apache.org/solr/ExtendedDisMax#bq_.28Boost_Query.29

Misconeptions about search indexing? (Haystack/Whoosh)

I'm using haystack with whoosh for development purposes.
I want search results based on django models to be filtered by the user that created them.
Please see my other post Filter haystack result with SearchQuerySet for details.
Basically I had to add User to my search index. But I noticed, when I manually change the user_id of a record, search is broken. After thinking about it this even makes sense. But, this means I have to rebuild the index after each field update in each model? Surely that doesn't scale at all?
I thought the engine would find the object by id, then look it up in the database, and return a current instance for further processing like filtering. It seems like everything is cached in the index so must be synchronized in realtime for search results to show up? Am I missing something here?
This documentation helped shed some light:
http://docs.haystacksearch.org/dev/searchindex_api.html

subsonic . navigate among records ,view,query

i'm some newbie in this matter of .net
i'm trying understand this new paradigm
i began with linq for SQl
but i found this library, kind of framework of T4
more specifically: subsonic T4
i think it could be very usefull
but the support docs outside are very scarce
my first intention is use them in the very simple form: a catalog
lets say... Users
so...
how can i use the model generated with subsonic
( using the iactiverecord)
to implement the record-navigational part.,...???!!!
i mean
i want a simple form
to create, delete or modify records
and that is fairy easy
but
what about to move among records ?
i found how to get the first, the last record..
but how can i advance or go back among them???
how can i order the records..?
it seems everytime imust query the table..
its so?
but how can imove among the records i already got?
all of the exmples found are very simple
dont touch the matter and/ or are repetitive everywhere
so.. please
if anybody can help me
or give more references...
i'd thank you a lot
gmo camilo
SubSonic can return a List of your objects if you call ExecuteTypedList or ToList on a query e.g.
List<Product> products = Products.All().ToList();
Once you've got a List then you can move around it in memory. Have a look at the following references to learn more about collections in .net:
System.Collections.Generic Namespace
IEnumerable<(Of <(T>)>) Interface
List<(Of <(T>)>) Class

Sharepoint: SQL to find all documents created/edited by a user

I am looking for a query that will work on Sharepoint 2003 to show me all the documents created/touched by a given userID.
I have found tables with the documents (Docs) and tables for users (UserInfo, UserData)
but the relationship between seems a bit odd - there are 99,000 records in our userdata table, and 12,000 records in userinfo - we have 400 users!
I suppose I was expecting a simple 1 to many relationship with a user table having 400 records and joining that to the documents table, but I see thats not the case.
Any help would be appreciated.
Edit:
Thanks Bjorn,
I have translated that query back to the Sharepoint 2003 structure:
select
d.* from
userinfo u join userdata d
on u.tp_siteid = d.tp_siteid
and
u.tp_id = d.tp_author
where
u.tp_login = 'userid'
and
d.tp_iscurrent = 1
This gets me a list of siteid/listid/tp_id's I'll have to see if I can trace those back to a filename / path.
All: any additional help is still appreciated!
I've never looked at the database in SharePoint 2003, but in 2007 UserInfo is connected to Sites, which means that every user has a row in UserInfo for each site collection (or the equivalent 2003 concept). So to identify what a user does you need both the site id and the user's id within that site. In 2007, I would begin with something like this:
select d.* from userinfo u
join alluserdata d on u.tp_siteid = d.tp_siteid
and u.tp_id = d.tp_author
where u.tp_login = '[username]'
and d.tp_iscurrentversion = 1
Update: As others write here, it is not recommended to go directly into the SharePoint database, but I would say use your head and be careful. Updates are an all-caps no-no, but selects depends on the context.
DO NOT QUERY THE SHAREPOINT DATABASE DIRECTLY!
I wonder if I made that clear enough? :)
You really need to look at the object model available in C#, you will need to get an SPSite instance for a SiteCollection, and then iterate over the SPList instances that belong to the SPSite and the SPWeb objects.
Once you have the SPList object, you will need to call GetListItems using a query that filters for the user you want.
That is the supported way of doing what you want.
You should never go to the database directly as SharePoint isn't designed for that at all and there is no guarantee (actually, there's a specific warning) that the structure of the database will be the same between versions and upgrades, and additionally when content is spread over several content databases in a farm there is no guarantee that a query that runs on one content database will do what you expect on another content database.
When you look at the object model for iteration, also note that you will need to dispose() the SPSite and SPWeb objects that you create.
Oh, and yes you may have 400 users, but I would bet that you have 30 sites. The information is repeated in the database per site... 30 x 400 = 12,000 entries in the database.
If you are going to use that query in Sharepoint you should know that creating views on the content database or quering directly against the database seems to be a big No-No. A workaround could be some custom code that iterates through the object model and writes the results to your own database. This could either be timer based or based on an eventtrigger.
You really shouldn't be doing SELECTs with Locks either i.e. adding WITH (NOLOCK) to your queries. Some parts of the system are very timeout sensitive and if you start introducing locks that the system wasn't expecting you can see the system freak out.
But really, you should be doing this via the object model. Mess around with something like IronPython and experimentations with the OM are almost downright pleasant.

Resources