Querying for new and changed documents in SharePoint

Querying for new and changed documents in SharePoint - sharepoint

We need to query SharePoint for new and changed documents. The 'Write' property gives the modified timestamp from inside the document metadata (i.e. inside the document properties), not the time the document was added to SharePoint.
I've tried this query:
and(IsDocument:true, write:range(2018-02-16, max) )
but the write time is based on the date embedded inside the document (not the date the document was added to SharePoint).
Does anyone have any guidance they can share?
Update with more test results:
Digging into the properties, I think that the DiscoveredTime property might be what I'm looking for. This documentation (https://technet.microsoft.com/en-us/library/jj219630.aspx) says that the DiscoveredTime property isn't searchable, but I am able to search using this:
and(IsDocument:true, DiscoveredTime:range(2018-02-16, max) )
I can't find anywhere that explains what DiscoveredTime actually is, but it seems to correlate with when the document was added to SP.

Related

Azure Cognitive Search - MergeOrUpdate to a only retrievable field

What happens when "updating" a document, but only changing a retrievable field. Will the document be reindexed?
I'm trying to mesure the "cost" of updating a retrievable field to a lot of documents to decide if I should put this field on the index or getting from SQL Server after consulting the index.
Thanks in advance

MergeOrUpdate behavior is the same as Merge, if the document (document key specifically) already exists. It doesn't matter which are the properties of the field (retrievable, filterable, etc.). In other words, the whole document is not re-indexed, but the specified field values in the call are merged. For more information on the REST API behavior: Add, Update or Delete Documents (Azure Cognitive Search REST API).

How to Add Indexed Property to a Site So I can build a result source to search all sites for content with this property

I want to build a sharepoint online search result source that includes only some site collections and subsites. I have over 5000 Site collections, so I can't use named URLs/site ids in the result source - not maintainable. Instead I hope to add an indexed property to a site's property bag after site is provisioned then map a managed property to it, and use it in a resultsource to filter search results to match only content found in sites that have that property value. The Phil Harding's article describes the approach: https://platinumdogs.me/2015/02/06/set-a-propertybag-property-as-indexed-queryable-via-search-using-csom-powershell/ and Mike Morawski adds some code for the indexed property encoding - http://www.migee.com/2015/09/14/allowing-property-bag-values-to-be-searched-via-sharepoint-search/ I used bits of both to implement this approach.
Approach:
Add Key = 'SiteType', Value = 'MySiteType' to Web All Properties
Add Indexed Property 'SiteType' with 'MySiteType' Value to web IndexedProperties (vti_indexedpropertykeys). Key encoded to base64
Add Managed Property 'propSiteType' mapped to 'SiteType' Crawled property in Search Schema manually.
I've done 1 and 2 via Powershell+CSOM, and verify site property added and is crawled. Managed Property is there, but It is not available in ResultSource builder dialog, and any searches such as {searchterms} propSiteType:MySiteType or (contentclass:STS_Web OR contentclass:STS_Site) propSiteType:MySiteType do not return results.
Ideas or alternative approaches? Thanks in advance

My only thought is an indexing delay. I have seen O365 take days to index new content, and that's even when manually requesting a crawl. If that was the issue, it's probably resolved by now. Are you seeing relevant search results?
https://www.sharepointnutsandbolts.com/2013/10/waiting-for-search-crawl-in-office-365.html

Sitecore 7, Content Search, "Content" property of the document is not showing actual content

I have managed to get search results using LINQ way, I can access different fields of the searched document, including Title, Url, Path, etc.
I can also access Content property of the document but that is not showing actual content of the document. It is showing Title of the document separated by -, for example if the searched document Title is Video news items, Content property contains Video-news-items|Video news items.
How can I get actual content of the searched document?
Code I am using to search document is explained in another post.

Based on your other question, I assume you are using the base SearchResultItem class. You can inherit from this class and add Properties that map to specific fields in your items. you can then just use the properties as normal. This article explains the process:
Sitecore 7 POCO's explained
If you haven't yet done any research into Sitecore 7 search yet, I would suggest that you do some. The concepts may not be familiar.

I think this is showing meta title for this page

How to index documents with elastic.js client?

So far I haven't found any samples of HOW the elastic.js client api (https://github.com/fullscale/elastic.js) can be used for indexing documents. There are some clues here & there but nothing concrete yet.
http://docs.fullscale.co/elasticjs/ejs.Document.html
Document ( index, type, id ): Object used to create, replace, update, and delete documents
Document > doIndex(fnCallBack): Stores a document in the given index and type. If no id is set, one is created during indexing.
Document > source (doc): Sets the source document.
Can anyone provide a sample snippet of code to show how an document object can be instantiated and used to index data?
Thanks!
Update # 1 (Sun Apr 21st, 2013 on 12:58pm CDT)
https://gist.github.com/pulkitsinghal/5430444

Your gist is correct.
You create ejs.Document objects specifying the index, type, and optionally the id of the document you want indexed. If you don't specify an id, elasticsearch will generate one for you.
You set the source to the json object you want indexed then call the doIndex method specifying a callback if needed. The node example does not index docs, but the angular and jquery examples show a basic example and can easily be used with the node client.
https://github.com/fullscale/elastic.js/blob/master/examples/angular/js/controllers.js#L30
Also have a peek at the tests:
https://github.com/fullscale/elastic.js/blob/master/tests/index_test.js#L265

elastic.js nowadays only implements the Query DSL, so it can't be used for this scenario anymore. See this commit.

Link data in custom SQL db with document library

Environment:
I have a windows network shared desktop application written in C# that leans against an MSSQL database. Windows sharepoint services 3.0 is installed (default installation, single processor, default sql express content database and so on) on the same Windows Server 2003 machine.
Scenario:
The application generates MS Word documents during processing (creating work orders) that need to be saved on sharepoint, and the result of the process must be linked to the corresponding document.
So, for each insert in dbo.WorkOrders (one work order), there is one MS Word document. I would need to save the document ID from the sharepoint library to my database so that later on, possible manual corrections can be made to the document related. When a work order is deleted, the sharepoint document would also have to be deleted.
Also, there is a dbo.Jobs table which is parent to dbo.WorkOrders and can have several work orders.
I was thinking about making a custom list on sharepoint, that would have two ID fields - one is the documents ID and the other AutoID of the document. I don't think this would be a good way performance-wise and it requires too much upkeep, therefore it's more error prone.
Another path I was contemplating is metadata. I could have an Identity field in dbo.WorkOrders that would be unique and auto incremented, and I could save that value as a file name (1.docx, 2.docx 3.docx ... n.docx where n would be the value in dbo.WorkOrder's identity field). In the metadata field of the Word document, I could save the job ID from dbo.Jobs.
I could also just increment the identity field in the WorkOrder (it would be a bigint), but then the file names would get ugly and maybe I'd overflow the ID range (since there could be a lot of documents).
There are other options also that I have considered and dismissed, since none of them satisfied the requirements (linked data sources, subfolder structures etc.). I'm not sure how to proceed. I'm new to sharepoint and it's still a bit of a mystery to me, as I don't understand all the inner workings of the system.
What do you suggest?
Edit:
I think I'll be using guid as file names and save those guids in my database after sending documents to sharepoint. What do you think of that?

All the documents in SharePoint under the same Content Database (SQL Database) are stored in the same table, that said, you have an unique ID for files no matter where they are in the sharepoint structure.
When retrieving files by their UniqueID The API only gives you the option to get them if you also know their SPWeb, so you could easily store, for each record you have in your external database (or your custom list, the SPFile GUID and the SPWeb GUID) retrieving them with:using(SPWeb subweb = (SPContext.Current.Site.OpenWeb(new Guid("{000...}")))
{
SPFile file = subweb.GetFile(new Guid("{111...}"));
// file logic
}
ps.: As Colin pointed out, url retrieval is possible but messy. I also changed the SPSite to the context since you are always under the same Site Collection in my example.

Like F.Aquino said, all items in sharepoint have a UniqueId field already (i.e. SPListItem.UniqueId and SPFile.UniqueId), which is a guid. Save that to your database, along with your web.'s guid. Then you can use the code provided by F.Aquino to get the file, or even the byte[] of the stream.
P.S. for F.Aquino, your code leaves the SPSite in memory, use this instead:
P.P.S this is just clarification, mark F.Aquino as the answer.
using(SPSite site = new SPSite("http://url"))
{
using(SPWeb subweb = site.OpenWeb(new Guid("{000...}"))
{
SPFile file = subweb.GetFile(new Guid("{111...}"));
// file logic
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string