Retrieve objects by container in LDAP - ldap-query

Does anyone know if it is possible in an LDAP query to filter objects according to the container they are in?
For example, I would like to return all users in the OU called staff, maybe something like this:
(& (objectCategory=user) (containerOU=Staff))
Obviously I just made up the containerOU bit, I'm just trying to illustrate what I mean.
At the moment, the only way I could do this is the bring back the entire Staff OU and iterate through it. I'm a SQL man, I'm used to being able to specify exactly what I want.
Thanks
David

You could search by the objects distinguishedName.
(distinguishedName=*OU=Staff, DC = blabla, DC = com)

Related

finding organization and industry/sector from string in dbpedia

I am generating a short list of 10 to 20 strings which I want to lookup on dbpedia to see if they have an organization tag and if so return the industry/sector tag. I have been looking at the SPARQLwrapper queries on their website but am having trouble constructing one that returns organization and sector/industry for my string. Is there a way to do this?
If I use the code below I get a list of industry types I think rather than the industry of the company.
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
SELECT ?industry WHERE
{ <http://dbpedia.org/resource/IBM> a ?industry}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
Instead of looking at queries which are meant to help you understand the querying tool, you should start by looking at the data which is being queried. For instance, just click http://dbpedia.org/resource/IBM, and look at the properties (the left hand column) to see its rdf:type values (of which there are MANY)!
Note that IBM is not described as a ?industry. IBM is described as a <http://dbpedia.org/resource/Public_company> (among other things). On the other hand, IBM is also described as having three values for <http://dbpedia.org/ontology/industry> --
<http://dbpedia.org/resource/Cloud_computing>
<http://dbpedia.org/resource/Information_technology>
<http://dbpedia.org/resource/Cognitive_computing>
I don't know whether these are what you're actually looking for or not, but hopefully what I've done above will start you down the right path to whatever you do want to get out of DBpedia.

groovyldap, search only returning 5000 results

I am doing an LDAP search with groovyldap, the search returns the group that I am looking for but only returns 5000 members of the group:
def getGroupMembers() {
def ldap = LDAP.newInstance(connectionInformation.hostname, connectionInformation.user, connectionInformation.password)
def result = connection.search("CN=mygroup", "OU=foo,DC=bar,DC=blech", SearchScope.SUB)
def members = result["member;range=0-4999"]
members = members[0]
}
Yes, there is actually a field returned with the key "member;range=0-4999", and the "members" array has 5000 elements in it. I couldn't find any setting in the LDAP code to enable returning all members, but it seems logical to think that I should be able to fetch all of the results.
Microsoft Active Directory implements LDAP policies are implemented using objects of the queryPolicy class.
Appears the one which you are encountering is the MaxValRange which the number of values that are returned in the retrieval of multi-valued attributes of an entry.
In Microsoft Active Directory 2008 (and I assume later, this is hardcoded and although it can be modified, it is not effective).
If an attribute has more than the number of values that are specified by the MaxValRange value, from LDAP, you may use LDAP_SERVER_RANGE_OPTION_OID "Control" to retrieve values that exceed the MaxValRange value.
I have two possible answers for you. (Sorry, it's been a while since I've played with LDAP / ActiveDirectory).
You may be hitting AD page size limits. Learn more about page size limits, and the easiest way here is to implement paging in your ldap queries.
How exactly to do that in groovyldap is (unfortunately an exercise for the reader, but I've done it with nodeldap. I think it emits a page event(?) it's been a while since I've done this.
You may have a range query. I don't know much about these, but there's some thoughts on that in another StackOverflow Answer.

global identifiers? - iCloud + Core Data + Ensembles - duplicates when deleting objects

I am trying to implement iCloud sync in my Core Data app. I am not that pro in programming and this is really an advanced topic I learned... I found that Core Data sync Framework "Ensembles" by Drew McCormack. It seems to make iCloud Sync much easier.
I integrated it in my App and syncing does work quite well as long as I add new objects to my Core Data model. But when I delete an object, it creates duplicates. And then duplicates from duplicates. I ended up having the same Entry (object) like 3-4 times...
Why is that? What am I doing wrong? I did some research and my guess is that global identifiers could solve this?
What are global identifiers? My guess is that they help to avoid duplicates!? But how do I set this? I really have no idea, did a lot of research but couldn´t find an answer to that.
Thanks for help!
Update:
Thanks for help! I read the readme and the book, but since i am beginner not everything is clear to me.
I think I understand the use of global identifiers in Ensembles now, but I don´t know if I´m doing it correctly.
If I understand it right, I have to assign an identifier to each object. I can do this by storing it in an attribute. This identifier can be anything as long as it is unique and a NSString?
In my app the user can store different things, let´s say name, text, title, date and so on. The app is based on the Master-Detail-View template in Xcode and uses Core Data. My Core Data model has only a single entity with some attributes, most are strings and a NSDate. No relationships or anything. If the user hits "+" a new object is created and I store the things the user enters in the attributes.
What I did to add global identifiers is to add a new attribute that stores it.
So when a new object is created i do
/// I did find that to use as identifier !?
NSString *taskUniqueStringKey = newManagedObject.objectID.URIRepresentation.absoluteString;
/// and store it in the attribute.
[newManagedObject setValue:taskUniqueStringKey forKey:#"coreDataObjectID"];
Then i use this:
- (NSArray *)persistentStoreEnsemble:(CDEPersistentStoreEnsemble *)ensemble globalIdentifiersForManagedObjects:(NSArray *)objects
{
return [objects valueForKeyPath:#"coreDataObjectID"];;
}
This seems to work for me. But am I doing it right? Is this the right place to assign a global identifier? I have no awakeFromInsert !?
If this is working, I got the next problem. My app is already live and older entries that the user saved before the update will be missing the global identifier. What can I do about that? I thought what I already got and what is unique and the only thing I can think of is an attribute that saves [NSDate date] when the object is created.
I was trying to use this but I failed because Ensembles will only accept NSString and not NSDate!? Can I use this date attribute, is this unique enough and working as gloabl identifier? And if yes, could you please give me code example in how to convert this from date to string?
Syncing with Ensembles works quite good. No duplicates anymore, you can just switch off iCloud and the entries stay and switch it on again and it syncs like it should without loosing locally stored objects or so. Ensembles is really cool! I am seeing some minor strange behaviors like sometimes sync takes long, sometimes it´s really quick and if I edit things in a short time period on two different devices it gets a bit messed up like an object that I just deleted reappears. But I guess that´s normal? If I take some time between using the app on the different devices everything works fine.
Do I understand it right, there is only that one method to call for sync:
- (void)syncWithCompletion:(void(^)(void))completion
{
if (self.ensemble.isMerging) return;
if (!self.ensemble.isLeeched) {
[self.ensemble leechPersistentStoreWithCompletion:^(NSError *error) {
if (error) NSLog(#"Error in leech: %#", error);
if (completion) completion();
}];
}
else {
[self.ensemble mergeWithCompletion:^(NSError *error) {
if (completion) completion();
}];
}
and you just call it if needed? There is nothing else like doing merge without leeching before, or a method like "this is the actual status - save it like it is now" ?
There are different points in the app where you want to sync. On app start and when terminating will be a good point. In my app there are two points where I should sync I guess: when adding an object and save it to Core Data and when I save changes to the object. I could also provide a button like "sync now". Is this a good approach and do I always just call
[self syncWithCompletion:NULL];
Another question that came up. Can I exclude objects from sync with Ensembles? My app loads tutorial entries as objects once on first app start. I don´t want to sync them if that´s possible somehow?
Thanks a lot for your help! If I could help you with anything like localizing in german or so let me know ! ;)
Yes, this is almost certainly due to not setting up global identifiers for your objects, or at least not doing it properly.
When you leech your ensemble, the local persistent store is imported into the sync data. Without global identifiers, Ensembles will assign random ids to your objects, so it can track them across devices.
Duplicates arise when you leech a second device that has the same data. Ensembles has no way to know that the data represents the same logical objects as on the other device, so it again assigns random ids. Effectively, it treats the objects on each device as being completely independent, so that all end up in your data set after syncing.
The solution is global identifiers. By implementing a CDEPersistentStoreEnsemble delegate method, you can provide Ensembles with global ids, which it can use to identify which objects on different devices belong together.
What should you use for global ids? Often, just a UUID, though for singleton like objects you will just want to pick an id.
You can initialize them in awakeFromInsert. You can store the global ids in attributes on your entities. (Note that if you are migrating an existing app, you will want to check with a fetch if the global ids have been generated BEFORE you try to leech the store for syncing.)
More details are in the README on GitHub and in the book at leanpub.
Update
To answer your update questions:
Yes, an identifier just has to be a string, and immutable. It should not change once assigned.
The NSManagedObjectID is not a very good global identifier, in that it will be different on different devices. We really want something that is global across devices.
If you are starting from scratch, using NSUUID is a good approach. Just create a unique id, and store it in the object.
If you have an existing app, and it has been syncing via another mechanism, you need to come up with a way to provide the same global identifiers on each device. One way to do that is mash up the object properties in some way. Usually that will give you a pretty-close-to-unique value, and it will be good enough for the transition.
As an example, you do a quick fetch, and discover that your objects don't yet have global ids. You go through the objects, and set the global ids to a string comprised of creationDate + text. (You could even shorten this by taking a hash, but it probably isn't that important.) After this initial 'migration' to global identifiers, you would just use UUIDs for any newly created objects.
Note that you don't have to use awakeFromInsert. That is simply a convenient place to put it. As long as you assign the global identifier before saving the object you should be fine.
The easiest way to get a string from an NSDate is to call the description method, but another way would be to get a double using timeIntervalSince1970, and turning that into a string. (Be careful with dates as unique identifiers on their own: often objects created together will have the same creation date.)
You are correct about how you should do a sync: you can simply call syncWithCompletion:.
To answer the question about excluding objects: You can't exclude individual objects, mainly because it could become tricky when those objects have relationships to synced objects. You can handle these objects in one of two ways:
Put them in a separate persistent store, and add that store to the same persistent store coordinator.
Sync the objects, but give them global ids manually, so that the objects are treated the same on each device. Eg. You could just give global ids as 'Sample1', 'Sample2', etc.
To integrate Drew's answer, I guess the two steps are the following.
1 Implement CDEPersistentStoreEnsemble delegate method (see README)
- (NSArray *)persistentStoreEnsemble:(CDEPersistentStoreEnsemble *)ensemble
globalIdentifiersForManagedObjects:(NSArray *)objects {
return [objects valueForKeyPath:#"yourUniqueIdentifier"];
}
2 Generate the unique identifier for a NSManagedObject subclass
- (void)awakeFromInsert {
[super awakeFromInsert];
if (!self.yourUniqueIdentifier) {
self.yourUniqueIdentifier = [[NSUUID UUID] UUIDString];
}
}
In awakeFromInsert you can initialize special default property values, like for example an identifier.
The check is necessary, for example, when you have parent-child contexts. Otherwise you are overwriting the identifier previously set. See Why is awakeFromInsert called twice?.

Query and/or Search for SharePoint Document ID

We have the sharepoint 2010 environment with Document ID's enabled.
Given (part of) a Doc ID, we want to programmatically retrieve the document(s) matching that ID. The problem seems to be that this column is rather special, in that it might need special handling.
Using an SPSiteDataQuery, fetching the _dlc_DocId field as part of the viewfields works fine. However, including it as part of the where query never results in any documents being fetched.
Using the Search API has gotten us nowhere at all.
Has anyone pulled this off, or any suggestions on how to tackle this problem?
[Update] Turns out we were fooled by subtle errors in the XML and bad debugging misinterpretations. This stuff just works fine.
I don't normally contribute to these sorts of things because cleverer people than I always get there before me, but as this is an old one with no proper answer I think I'll add my thoughts for those who find this page.
I was struggling with this but after a little digging around and learning a bit of Caml I got this working.
I am using the SharePoint Client Object Model against SharePoint 2010 and Office365 beta.
Start off your query by looking at the all list items query:
Microsoft.SharePoint.Client.CamlQuery.CreateAllItemsQuery().ViewXml
"<View Scope=\"RecursiveAll\">\r\n <Query>\r\n </Query>\r\n</View>"
Stick a where child inside the query
Then add in
<Eq><FieldRef Name="_dlc_DocId" /><Value Type="Text">MDXC2KE55ASN-3-80</Value></Eq>
replacing MDXC2KE55ASN-3-80 with the doc ID you are looking for inside the where.
Also don't forget you might want to make use of these too:
<ViewFields><FieldRef Name="_dlc_DocId" /></ViewFields>
<RowLimit>1</RowLimit>
Then use List.GetItems() method to bring back the ListItemCollection.
Just in case nobody comes with a slick solutions from the depths of the Sharepoint infrastructure:
What would Google Do?
Slice is, Dice it and dump it in a reverse index.
Solr and Lucene offer supreme tools for this. The idea is to cut the DocId's in small pieces and add the location of the document to the bucket for that piece.
Say We have "A real nice document" with Id ABCD123. You would add it to the buckets
ABCD, BCD1, CD12, D123
When searching for a partial ID (+ other data like dates, types, ...) you (well the search engine) creates the union of the buckets + applies additonal constraints.
To make this happen you need to write a spider for the sharepoint server and a routine which makes a record of data elements to be indexed.
Put a nice REST interface in frnt of it (actually SOLR already has that), integrate it in the main sharepoint server, and nobody needs to know there is something else running behind it.
These products can also incrementally update the indexes, so they can be kept up to date.
you could use the following to get the Document ID.
SPFile file = MethodToUploadFileToServer(web, filepath);
SPListItem item = file.Item;
string DocID = item.Properties["_dlc_DocId"].ToString();

Sharepoint: SQL to find all documents created/edited by a user

I am looking for a query that will work on Sharepoint 2003 to show me all the documents created/touched by a given userID.
I have found tables with the documents (Docs) and tables for users (UserInfo, UserData)
but the relationship between seems a bit odd - there are 99,000 records in our userdata table, and 12,000 records in userinfo - we have 400 users!
I suppose I was expecting a simple 1 to many relationship with a user table having 400 records and joining that to the documents table, but I see thats not the case.
Any help would be appreciated.
Edit:
Thanks Bjorn,
I have translated that query back to the Sharepoint 2003 structure:
select
d.* from
userinfo u join userdata d
on u.tp_siteid = d.tp_siteid
and
u.tp_id = d.tp_author
where
u.tp_login = 'userid'
and
d.tp_iscurrent = 1
This gets me a list of siteid/listid/tp_id's I'll have to see if I can trace those back to a filename / path.
All: any additional help is still appreciated!
I've never looked at the database in SharePoint 2003, but in 2007 UserInfo is connected to Sites, which means that every user has a row in UserInfo for each site collection (or the equivalent 2003 concept). So to identify what a user does you need both the site id and the user's id within that site. In 2007, I would begin with something like this:
select d.* from userinfo u
join alluserdata d on u.tp_siteid = d.tp_siteid
and u.tp_id = d.tp_author
where u.tp_login = '[username]'
and d.tp_iscurrentversion = 1
Update: As others write here, it is not recommended to go directly into the SharePoint database, but I would say use your head and be careful. Updates are an all-caps no-no, but selects depends on the context.
DO NOT QUERY THE SHAREPOINT DATABASE DIRECTLY!
I wonder if I made that clear enough? :)
You really need to look at the object model available in C#, you will need to get an SPSite instance for a SiteCollection, and then iterate over the SPList instances that belong to the SPSite and the SPWeb objects.
Once you have the SPList object, you will need to call GetListItems using a query that filters for the user you want.
That is the supported way of doing what you want.
You should never go to the database directly as SharePoint isn't designed for that at all and there is no guarantee (actually, there's a specific warning) that the structure of the database will be the same between versions and upgrades, and additionally when content is spread over several content databases in a farm there is no guarantee that a query that runs on one content database will do what you expect on another content database.
When you look at the object model for iteration, also note that you will need to dispose() the SPSite and SPWeb objects that you create.
Oh, and yes you may have 400 users, but I would bet that you have 30 sites. The information is repeated in the database per site... 30 x 400 = 12,000 entries in the database.
If you are going to use that query in Sharepoint you should know that creating views on the content database or quering directly against the database seems to be a big No-No. A workaround could be some custom code that iterates through the object model and writes the results to your own database. This could either be timer based or based on an eventtrigger.
You really shouldn't be doing SELECTs with Locks either i.e. adding WITH (NOLOCK) to your queries. Some parts of the system are very timeout sensitive and if you start introducing locks that the system wasn't expecting you can see the system freak out.
But really, you should be doing this via the object model. Mess around with something like IronPython and experimentations with the OM are almost downright pleasant.

Resources