Sharepoint: SQL to find all documents created/edited by a user - sharepoint

I am looking for a query that will work on Sharepoint 2003 to show me all the documents created/touched by a given userID.
I have found tables with the documents (Docs) and tables for users (UserInfo, UserData)
but the relationship between seems a bit odd - there are 99,000 records in our userdata table, and 12,000 records in userinfo - we have 400 users!
I suppose I was expecting a simple 1 to many relationship with a user table having 400 records and joining that to the documents table, but I see thats not the case.
Any help would be appreciated.
Edit:
Thanks Bjorn,
I have translated that query back to the Sharepoint 2003 structure:
select
d.* from
userinfo u join userdata d
on u.tp_siteid = d.tp_siteid
and
u.tp_id = d.tp_author
where
u.tp_login = 'userid'
and
d.tp_iscurrent = 1
This gets me a list of siteid/listid/tp_id's I'll have to see if I can trace those back to a filename / path.
All: any additional help is still appreciated!

I've never looked at the database in SharePoint 2003, but in 2007 UserInfo is connected to Sites, which means that every user has a row in UserInfo for each site collection (or the equivalent 2003 concept). So to identify what a user does you need both the site id and the user's id within that site. In 2007, I would begin with something like this:
select d.* from userinfo u
join alluserdata d on u.tp_siteid = d.tp_siteid
and u.tp_id = d.tp_author
where u.tp_login = '[username]'
and d.tp_iscurrentversion = 1
Update: As others write here, it is not recommended to go directly into the SharePoint database, but I would say use your head and be careful. Updates are an all-caps no-no, but selects depends on the context.

DO NOT QUERY THE SHAREPOINT DATABASE DIRECTLY!
I wonder if I made that clear enough? :)
You really need to look at the object model available in C#, you will need to get an SPSite instance for a SiteCollection, and then iterate over the SPList instances that belong to the SPSite and the SPWeb objects.
Once you have the SPList object, you will need to call GetListItems using a query that filters for the user you want.
That is the supported way of doing what you want.
You should never go to the database directly as SharePoint isn't designed for that at all and there is no guarantee (actually, there's a specific warning) that the structure of the database will be the same between versions and upgrades, and additionally when content is spread over several content databases in a farm there is no guarantee that a query that runs on one content database will do what you expect on another content database.
When you look at the object model for iteration, also note that you will need to dispose() the SPSite and SPWeb objects that you create.
Oh, and yes you may have 400 users, but I would bet that you have 30 sites. The information is repeated in the database per site... 30 x 400 = 12,000 entries in the database.

If you are going to use that query in Sharepoint you should know that creating views on the content database or quering directly against the database seems to be a big No-No. A workaround could be some custom code that iterates through the object model and writes the results to your own database. This could either be timer based or based on an eventtrigger.

You really shouldn't be doing SELECTs with Locks either i.e. adding WITH (NOLOCK) to your queries. Some parts of the system are very timeout sensitive and if you start introducing locks that the system wasn't expecting you can see the system freak out.
But really, you should be doing this via the object model. Mess around with something like IronPython and experimentations with the OM are almost downright pleasant.

Related

CouchDB replication strategy with dynamic groups of users

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y
Possible groups: A+C, A+B
The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.
The goal:
To replicate changes and deletions accordingly.
What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present
Is there a recipe for this situation? Am I missing an obvious solution?
EDIT
Partial solution for case 1:
localDB.sync(remoteDB, {
live: true,
retry: true,
filter: 'app/by_user',
query_params: { "agente": agent }
})
.on('paused', function(info){
console.log("paused");
localDB.allDocs().then(function(docs){
console.log("allDocs");
docs.rows.forEach(function(row){
console.log(row);
remoteDB.get(row.id)
.then(function(doc){
if(doc.Agents.indexOf(agent) < 0){
localDB.remove(doc);
}
});
});
});
})
.on('change', function(result){
console.log("change!");
result.change.docs.forEach(function(change) {
if(!change.deleted){
$rootScope.$apply(function(){
$rootScope.$broadcast('upsert', change);
});
}
});
});
Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"
(3) Seems like the simplest solution to me, i.e. the "database per role" solution.
I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.
Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.
Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.
Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!
We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh
Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.
1) still sounds as the simplest approach to me..
I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.
I mean.. a delete is like an update which sets the _deleted attribute to true.
So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:
function(doc,req){
// optional acls for deleting doc.. doc is owned by req.userCtx.name
// doc.users are users already granted to work with this doc
return [{
"_id" : doc._id,
"_rev": doc._rev,
"_deleted":true,
"users": doc.users
},"Ok doc deleted"];
}
Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.
The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

How to preform a relative complement query in CRM?

Background (ie what the heck is a relative complement?)
Relative Complement
What I'm trying to do
Let's say I've got a custom Vehicle entity that has a VehicleType option set that is either "Car", or "Truck". There is a 1 to many relationship between Contact and Vehicle (ie. ContactId is on the vehicle entity). How do I write an XRM query (Linq To CRM, QueryExpression, fetch Xml, whatever) that returns the contacts with only cars?
Option 1:
I’d prefer a modification of the proposal that AdamV makes above. I can’t think of a way that you’d get this particular query answered using Linq to CRM, Query Expressions, FetchXML alone. Daryl doesn’t offer what the client is, but I would suppose if Linq and Query Expressions were acceptable offerings, .NET is on the table. Creating aggregate fields containing the count of the related entity on the parent entity (contact in this case) offers more than the Boolean option. If the query requirements ever changed to a threshold (more than X cars, less than Y trucks, between X and Y total vehicles) the Boolean options fails to deliver. The client in this question isn’t known, but I can’t think of many (any?) cases where pulling all the records to the client on a set of 500K+ rows is more efficient than a the SQL query that CRM would make on your behalf against several integer fields with range clauses.
Upside:
Maintains client purity in Query approach
Simple client query
Probably as performant as possible
Downside:
Setups for Aggregate fields
Workflow or plugin to manage the increment and decrement of the aggregate fields
SQL Script for initial load of the aggregates.
Risk that aggregate fields get out of sync (workflow or plugin fails)
Option 2:
If purity within the client isn’t essential, and .NET is on the table – skip the aggregate fields and the setup and just run SQL against the Views. If you don’t want to work with the ADO.NET, a thin ORM like Dapper, Massive, or PetaPOCO can still give you an object model. As Andreas offers in his comment on the OP’s first answer, it seems like something fairly trivial to do in SQL.
Sketching something from top of mind:
SELECT c.*
FROM Contact
WHERE C.Contactid in (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type = ‘Car’ and count(contactid) > 1
)
AND NOT IN (
Select contactid
FROM Vehicle v
group by v.contactid , v.type
having v.type <> ‘Car’ and count(contactid) > 1
)
Upside:
Much less work
CRM Entities get left alone
Downside:
Depending on the client and/or the application mixing DataAccess methods is a bit kludgy.
Likely less performant than Option 1
Option 3:
Mix and Match: Take the aggregate fields from Option 1. But update them using a scheduled SQL job (or something similar) with a query similar to the initial load job you’d need to write in Option 1
Upside:
Takes most of the work and risk out of Option 1
Keeps all of the performance of Option 1
Downside:
Some will see this as an unsupported feature.
In order to order to perform a true Relative Complement Query you need to be able to perform a subquery.
Your query would basically say give me all the contacts with cars, and then, within those results, remove any contacts that have a vehicle that isn't a car. This is what the SQL in #JasonKoopmans answer does. Unfortunetly, CRM does not support SubQueries.
Therefore, the only way to achieve this is to either perform the sub query on the client side, as I resorted to doing, or storing the results of what would be the subquery in a manner that can be accessed through the main query (ie storing counts on the contact entity).
You could theoretically do this "on the fly" by making a SubQueryResult entity that stores a ContactId, and SubQueryId. You'd first pull back the contacts that have at least 1 car, and create a SubQueryResult record for each record, with it's contactId, and a single SubQueryId that is generated client side to tie them all together.
Then you'd do another query that says give me all the contacts that are in this SubQueryResult with this SubQueryId, that do not have any vehicles that aren't cars.
I could only assume that this wouldn't be any more efficient than performing the two separate queries and performing the filter client side. Although with the new ExecuteMultipleRequests in the new CRM release, it may be close.
I have resorted to pulling back all of my records in CRM, and performing the check on the client side since CRM 2011 doesn't support this via Query Expressions.
You could write two Fetch XML statements, one to return all contacts and the count of their vehicles, and another to return all contacts and the count of their cars, then compare the list on the client side. But once again, you're having to return every contact and filter it client side.
It's not tested but how about this query expression? I'm linking in the Vehicle entity as an inner join, requiring that it's a Car. I'm assuming that the field VehicleType is a String because I'm a bit lazy and don't want to test it (I'm typing this hardcore style, no compilation - pure brain work).
Optionally, you might want to add a Criteria section as well to control which of the Contact instances that actually get retrieved. Do tell how it went!
Sorry for the verbosity. I know you like it short. My brains work better when circumlocutory.
new QueryExpression
{
EntityName = "contact",
ColumnSet = new ColumnSet("fullname"),
LinkEntities =
{
new LinkEntity
{
JoinOperator = JoinOperator.Inner,
LinkFromEntityName = "contact",
LinkFromAttributeName = "contactid",
LinkToEntityName = "vehicle",
LinkToAttributeName = "contactid",
Columns = new ColumnSet("vehicletype"),
EntityAlias = "Vroom",
//LinkCriteria = { Conditions =
//{
// new ConditionExpression(
// "vehicletype", ConditionOperator.Equal, "car")
//} }
LinkCriteria = { Conditions =
{
new ConditionExpression(
"vehicletype", ConditionOperator.NotEqual, "truck")
} }
}
}
};
EDIT:
I've talk to my MVP Gustaf Westerlund and he's suggested the following work-around. Let me stress that it's not an answer to your original question. It's just a way to solve it. And it's cumbersome. :)
So, the hint is to add a flag in the Contact or Person entity. Then, every time you create a new instance of Vehicle, you need to fire a message and using a plugin, update the information on the first about the creation of the latter.
This has several drawbacks.
It requires us to do stuff.
It's not the straight-forward do-this-and-that type of approach.
Maintenance is higher for every new type of Vehicle one adds.
Buggibility is elevated since there are many cases to regard (what happens to the flagification when a Vehicle instance is reasigned, deleted etc.).
So, my answer to your question is changed to: "can't be done". This remains effective until (gladly) proven wrong by presented alternative solution. Duck!
Personally, I'd fetch (almost) everything and unleash the hounds of LINQ onto it. But I'd do that without smiling nor proud. :)

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Query and/or Search for SharePoint Document ID

We have the sharepoint 2010 environment with Document ID's enabled.
Given (part of) a Doc ID, we want to programmatically retrieve the document(s) matching that ID. The problem seems to be that this column is rather special, in that it might need special handling.
Using an SPSiteDataQuery, fetching the _dlc_DocId field as part of the viewfields works fine. However, including it as part of the where query never results in any documents being fetched.
Using the Search API has gotten us nowhere at all.
Has anyone pulled this off, or any suggestions on how to tackle this problem?
[Update] Turns out we were fooled by subtle errors in the XML and bad debugging misinterpretations. This stuff just works fine.
I don't normally contribute to these sorts of things because cleverer people than I always get there before me, but as this is an old one with no proper answer I think I'll add my thoughts for those who find this page.
I was struggling with this but after a little digging around and learning a bit of Caml I got this working.
I am using the SharePoint Client Object Model against SharePoint 2010 and Office365 beta.
Start off your query by looking at the all list items query:
Microsoft.SharePoint.Client.CamlQuery.CreateAllItemsQuery().ViewXml
"<View Scope=\"RecursiveAll\">\r\n <Query>\r\n </Query>\r\n</View>"
Stick a where child inside the query
Then add in
<Eq><FieldRef Name="_dlc_DocId" /><Value Type="Text">MDXC2KE55ASN-3-80</Value></Eq>
replacing MDXC2KE55ASN-3-80 with the doc ID you are looking for inside the where.
Also don't forget you might want to make use of these too:
<ViewFields><FieldRef Name="_dlc_DocId" /></ViewFields>
<RowLimit>1</RowLimit>
Then use List.GetItems() method to bring back the ListItemCollection.
Just in case nobody comes with a slick solutions from the depths of the Sharepoint infrastructure:
What would Google Do?
Slice is, Dice it and dump it in a reverse index.
Solr and Lucene offer supreme tools for this. The idea is to cut the DocId's in small pieces and add the location of the document to the bucket for that piece.
Say We have "A real nice document" with Id ABCD123. You would add it to the buckets
ABCD, BCD1, CD12, D123
When searching for a partial ID (+ other data like dates, types, ...) you (well the search engine) creates the union of the buckets + applies additonal constraints.
To make this happen you need to write a spider for the sharepoint server and a routine which makes a record of data elements to be indexed.
Put a nice REST interface in frnt of it (actually SOLR already has that), integrate it in the main sharepoint server, and nobody needs to know there is something else running behind it.
These products can also incrementally update the indexes, so they can be kept up to date.
you could use the following to get the Document ID.
SPFile file = MethodToUploadFileToServer(web, filepath);
SPListItem item = file.Item;
string DocID = item.Properties["_dlc_DocId"].ToString();

Link data in custom SQL db with document library

Environment:
I have a windows network shared desktop application written in C# that leans against an MSSQL database. Windows sharepoint services 3.0 is installed (default installation, single processor, default sql express content database and so on) on the same Windows Server 2003 machine.
Scenario:
The application generates MS Word documents during processing (creating work orders) that need to be saved on sharepoint, and the result of the process must be linked to the corresponding document.
So, for each insert in dbo.WorkOrders (one work order), there is one MS Word document. I would need to save the document ID from the sharepoint library to my database so that later on, possible manual corrections can be made to the document related. When a work order is deleted, the sharepoint document would also have to be deleted.
Also, there is a dbo.Jobs table which is parent to dbo.WorkOrders and can have several work orders.
I was thinking about making a custom list on sharepoint, that would have two ID fields - one is the documents ID and the other AutoID of the document. I don't think this would be a good way performance-wise and it requires too much upkeep, therefore it's more error prone.
Another path I was contemplating is metadata. I could have an Identity field in dbo.WorkOrders that would be unique and auto incremented, and I could save that value as a file name (1.docx, 2.docx 3.docx ... n.docx where n would be the value in dbo.WorkOrder's identity field). In the metadata field of the Word document, I could save the job ID from dbo.Jobs.
I could also just increment the identity field in the WorkOrder (it would be a bigint), but then the file names would get ugly and maybe I'd overflow the ID range (since there could be a lot of documents).
There are other options also that I have considered and dismissed, since none of them satisfied the requirements (linked data sources, subfolder structures etc.). I'm not sure how to proceed. I'm new to sharepoint and it's still a bit of a mystery to me, as I don't understand all the inner workings of the system.
What do you suggest?
Edit:
I think I'll be using guid as file names and save those guids in my database after sending documents to sharepoint. What do you think of that?
All the documents in SharePoint under the same Content Database (SQL Database) are stored in the same table, that said, you have an unique ID for files no matter where they are in the sharepoint structure.
When retrieving files by their UniqueID The API only gives you the option to get them if you also know their SPWeb, so you could easily store, for each record you have in your external database (or your custom list, the SPFile GUID and the SPWeb GUID) retrieving them with:using(SPWeb subweb = (SPContext.Current.Site.OpenWeb(new Guid("{000...}")))
{
SPFile file = subweb.GetFile(new Guid("{111...}"));
// file logic
}
ps.: As Colin pointed out, url retrieval is possible but messy. I also changed the SPSite to the context since you are always under the same Site Collection in my example.
Like F.Aquino said, all items in sharepoint have a UniqueId field already (i.e. SPListItem.UniqueId and SPFile.UniqueId), which is a guid. Save that to your database, along with your web.'s guid. Then you can use the code provided by F.Aquino to get the file, or even the byte[] of the stream.
P.S. for F.Aquino, your code leaves the SPSite in memory, use this instead:
P.P.S this is just clarification, mark F.Aquino as the answer.
using(SPSite site = new SPSite("http://url"))
{
using(SPWeb subweb = site.OpenWeb(new Guid("{000...}"))
{
SPFile file = subweb.GetFile(new Guid("{111...}"));
// file logic
}
}

Resources