How to get case insensitive cmis:folder search query - cmis

stmt = session.createQueryStatement("SELECT * FROM cmis:folder WHERE IN_TREE(?) and cmis:name=?");
stmt.setString(1,'sites/test/documentLibrary');
stmt.setString(2,'Test');
I got result with exact folder name(Test) case senstive, but If i gave foldername test or TEST result not found.
Could you please help me on case insenstive folder search.

The CMIS QL does not support case insensitive queries because many repositories can't provide it.
Depending on the repository and the repository setup, a LIKE query might be case insensitive.
Btw. IN_TREE takes a folder ID, not a folder path.

Related

Graph API DriveItem: How can I only get query results from the root directory (PREVENT recursive searching)?

https://graph.microsoft.com/v1.0/sites/MyDomain.sharepoint.com,00000000-1111-2222-3333-444444444444/drive/search(q='Matrix')
The above correctly returns all drive files with the word "Matrix" in them within the Shared%20Documents directory for the site's provided site ID (00000000-1111-2222-3333-444444444444).
However, it's recursive: it returns files with the word "Matrix" in them within subfolders too. I only want to query files in the root directory.
How do I search for file names, only within the root directory? I tried changing /drive to /drive/root like below, but it did not make a difference:
https://graph.microsoft.com/v1.0/sites/MyDomain.sharepoint.com,00000000-1111-2222-3333-444444444444/drive/root/search(q='Matrix')
ChatGPT recommended adding the filter $filter=parentReference/path eq '/drive/root':
https://graph.microsoft.com/v1.0/sites/MySite.sharepoint.com,00000000-1111-2222-3333-444444444444/drive/search(q='Matrix')?$filter=parentReference/path eq '/drive/root'
...but I got the error "Only createdDateTime,remoteItem.shared.sharedBy.group.id,remoteItem.shared.sharedBy.user.id is supported for filtering" which ChatGPT didn't know how to get past
I solved this by obtaining the folder id of the root folder and using the /drive/items/{folderId}/children$filter URI instead of /drive/search. I obtained the folder ID of the root folder by copying the id within the parentReference of an item that lies within my root directory from the output of my first command.
Then I queried the files in the root directory with the following format:
https://graph.microsoft.com/v1.0/sites/MyDomain.sharepoint.com,{siteId}/drive/items/{folderId}/children?$filter=startswith(name,'MyWord')
So in my case, the URI ended up looking like below:
https://graph.microsoft.com/v1.0/sites/MyDomain.sharepoint.com,00000000-1111-2222-3333-444444444444/drive/items/01NCSFADN6Y2GOVW7725BZO354PWSELRRZ/children?$filter=startswith(name,'Install')
Unfortunately, I couldn't use the contains function (which functions similarly to /search) and had to use startswith because contains isn't supported on $filter for text fields.
Finally, you can optionally tack on the end whichever field(s) you're interested in retrieving with the select parameter:
&select=name,#microsoft.graph.downloadUrl

How to search for regular expression match on s3 folder, and parse the files

Below is the s3 folder :
s3://bucket-name/20210802-123429/DM/US/2021/08/02/12/test.json
20210802-123429 is archive job which puts the files .
what i could achieve:
cred_obj = cred_conn.list_objects_v2(Bucket=cfg.Bucket_Details['extractjson'], Prefix="DM"+'/'+"US"+'/'+self.yr+'/'+self.mth+'/'+self.day+'/'+self.hr+'/')
Problem statement :
But, in above line, im not sure how to match the criteria for 20210802 and parse the "test.json"
list_objects_v2 does not support RegEx match. The only way to search is using the prefix. Therefore, you must know the prefix or part of the prefix in order to search.
timestr_arc = todays_dt.strftime("%Y%m%d")
cred_obj = cred_conn.list_objects_v2(Bucket=cfg.Bucket_Details['extractjson'], Prefix="DM"+'/'+"US"+'/'+str(self.timestr_arc))
This will check for the specific condition

ADF Azure Data-Factory loop over folder syntax - wilcard?

i'm tryimg to loop over a diffrent countries folder that got fixed sub folder named survey (i.e Spain/survey , USA/survey ).
where and how I Need to define a wildcard / parameter for the countries so I could loop over all the files that in the survey folder ?
what is the right wildcard syntax ? ( the equivalent of - like 'survey%' in SQL) ?
I tried several ways to define it with no success and I would be happy to get some help on this - Thanks !
In case if the list of paths are static, you can create a parameter or add it in a SQL database and get that result from a lookup activity.
Pass the output to a for each activity and within foreach activity use a copy activity.
You can parameterize the input dataset to get the file paths thereby you need not think of any wildcard characters but use the actual paths itself.
Hope this is helpful.

Spark 2.3 - How to read subdirectories with out asterisks?

String folder = "/Users/test/data/*/*";
sparkContext.textFile(folder, 1).toJavaRDD()
Is asterisks mandatory to read a folder. Yes, otherwise it is not reading files the subdirectories.
What if I get a folder which is having more subdirectories than the number of asterisks mentioned ? How to handle this scenario ?
For example:
1) /Users/test/data/*/*
This would work ONLY if I get data as /Users/test/data/folder1/file.txt
2)How to make this expression as generic ? It should still work if I get a folder as: /Users/test/data/folder1/folder2/folder3/folder4
My input folder structure is not same all the time.
Is there anything exists in Spark to handle this kind of scenario ?
On hadoop you could try sparkContext.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
But THB I don't think this will work in your case.
I would write a small function that returns the nested file structure as a list of paths and pass them to spark
Like:
val filePaths = List("rec1/subrec1.1/", "rec1/subrec1.2/", "rec1/subrec1.1/subsubrec1.1.1/", "rec2/subrec2.1/")
val files = spark.read.text(filePaths: _*)

exclude a certain path from all user searches

Unfortunately we have a special folder named "_archive" in our repository everywhere.
This folder has its purpose. But: When searching for content/documents we want to exclude it and every content beneath "_archive".
So, what i want is to exclude the path and its member from all user searches. Syntax is easy with fts:
your_query AND -PATH:"//cm:_archive//*"
to test:
https://www.docdroid.net/RmKj9gB/search-test.pdf.html
take the pdf, put it into your repo twice:
/some_random_path/search-test.pdf
/some_random_path/_archive/search-test.pdf
In node-browser everything works as expected:
TEXT:"HODOR" AND -PATH:"//cm:_archive//*"
= 1 result
TEXT:"HODOR"
= 2 results
So, my idea was to edit search.get.config.xml and add the exclusion to the list of properties:
<search>
<default-operator>AND</default-operator>
<default-query-template>%(cm:name cm:title cm:description ia:whatEvent
ia:descriptionEvent lnk:title lnk:description TEXT TAG) AND -PATH:"//cm:_archive//*"
</default-query-template>
</search>
But it does not work as intended! As soon as i am using 'text:' or 'name:' in the search field, the exclusion seems to be ignored.
What other option do i have? Basically just want to add the exclusion to the base query after the default query template is used.
Version is Alfresco Community 5.0.d
thanks!
I guess you're mistaken what query templates are meant for. Take a look at the Wiki.
So what you're basically doing is programmatically saying I've got a keyword and I want to match the keywords to the following metadata fields.
Default it will match cm:name cm:title cm:description etc. This can be changed to a custom field or in other cases to ALL.
So putting an extra AND or here of whatever won't work, cause this isn't the actual query which will be built. I can go on more about the query templates, but that won't do you any good.
In your case you'll need to modify the search.get webscript of Alfresco and the method called function getSearchResults(params) in search.lib.js (which get's imported).
Somewhere in at the end of the method it will do the following:
ftsQuery = '(' + ftsQuery + ') AND -TYPE:"cm:thumbnail" AND -TYPE:"cm:failedThumbnail" AND -TYPE:"cm:rating" AND -TYPE:"st:site"' + ' AND -ASPECT:"st:siteContainer" AND -ASPECT:"sys:hidden" AND -cm:creator:system AND -QNAME:comment\\-*';
Just add your path to query to it and that will do.

Resources