Slash in cmis:name and cmis:contentStreamFilename - cmis

The CMIS 1.1 specification says:
2.1.5.3 Paths
A folder hierarchy MAY be represented in a canonical notation such as path.
[...]
A pathSegment token MUST not include a ’/’ character.
It is repository specific how a repository chooses the value for pathSegment. Repositories might choose to use cmis:name or content stream filename for pathSegment token.
But, I was able to create the document below in a Documentum 7.1.0000.0146 server:
As you can see, both cmis:name and cmis:contentStreamFilename contain a slash. Actually, it seems that cmis:contentStreamFilename becomes whatever cmis:name is (plus the extension).
Is this a bug in Documentum's CMIS implementation?
While this server replies correctly to "normal" getObjectByPath requests with cmis:name path elements, is there any way to use getObjectByPath for the object in the screenshot above? Or is getObjectByPath unusable with such a server?

I do believe it is a bug in Documentum's CMIS implementation (meaning, it's a bug to allow the slash to be part of cmis:name and cmis:contentStreamFileName). I found this bug report from the Apache Chemistry Jira project, where it seems a similar bug was fixed. Some comments that shed some light on the issue:
Given the following path to an object:
/a/b/c/d.pdf
The path segment tokens are "a", "b", "c", and "d.pdf".
The getObjectByPath method does assume that you will pass it a path comprised of path segment tokens separated by forward slashes. This is how CMIS defines a "path" per the spec.
...and...
Repository might use cmis:name for pathSegment token, however, if repository doesn't use cmis:name as pathSegment token, this case will fail obviously.
There're some possible scenarios that cmis:name is not used as pathSegment token:
1) Content stream file name is used rather than cmis:name as it is described in the spec.
2) Repository supports to create document with same cmis:name in a folder, which means it is inevitable to use other value rather than cmis:name as its pathSegment since "The pathSegment token for each item MUST uniquely identify the item in the folder" according to the spec.
I don't know Documentum specifically, so whether this bug would manifest when fetching the object is a coin toss. Have you considered using CMIS Workbench to run a simple CMIS query to find it using one of those properties? If it worked, I'd be pretty confident (though not 100%) it would work with getObjectByPath.

I do not think there are much restrictions on object names in a Documentum repository.
If you want to apply a business rule to ensure no object names are created with the '/' character then have a look at the BOF framework.

Related

What should I use as an endpoint to serve downloadable documents using REST API?

Right now I have an endpoint that servers a file to the user (json, csv, excel or pdf).
My question here is, which type of route should I use to serve it, path variables or query parameters (considering best practices and for developer comprehension):
baseURl/api/v1/resource/xlsx
or
baseURl/api/v1/resource?format=xlsx
Thank you in advance.
So long as you are consistent with the production rules of the http URI scheme, any spelling conventions you choose are fine.
Choosing spellings that match the capabilities of URI templates will make it easier to construct/deconstruct resource identifiers in a "common URI space", which is often convenient both for clients and servers.
Using path segments vs query is purely trade offs. Using application/x-www-form-urlencoded key value pairs in the query part mean that you can implement your URI template as an HTML form. Using path segments means that you can use dot segments to describe other identifiers in the common URI space.
If you don't care about either of those, it just comes down to which spellings you like best in an access log, or in your documents, or in a browser history, or when you paste them into an email message, or ....
It is best practice to use headers for HTTP to show what format the client can understand. You should use a get route and include the Accept header for the format.
Header key: Accept
Header Value: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept
HTML Input="file" Accept Attribute File Type (CSV)

the remote server returned an error 414 request uri too long

I am using SaveBinaryDirect method to upload file to SharePoint library. I am getting error like below
the remote server returned an error 414 request uri too long
Can anybody help me please
I wouldn't call this a SharePoint problem necessarily, more like a problem that happens a lot in SharePoint... Essentially, you have around a 2,000 character limit for the URL. In most scenarios this is fine, however in SharePoint it occasionally becomes an issue.
Users tend to create a lot of nested libraries and the name of each library becomes part of the URL - separated by '/'. Then the file name is added at the end of the URL. And to make matters worse, if there are any spaces or un-URL friendly characters, they are encoded and become three characters each - space becomes %20. This all adds up.
In my experience the solution is a combination of user education and proper architecture. Instead of creating nested libraries, store the documents in a single library and differentiate the items by assigning meta-data attributes, then create views to display items of a particular type.
This error can also be cause by having "invalid" characters in the filename or path. See this answer for what makes a character invalid in a URI:
Which characters make a URL invalid?

GWT SafeHTML, XSS & Best Practices

The good people of OWASP emphasize that you MUST use the escape syntax for the part of the HTML document you’re putting untrusted data into (body, attribute, JavaScript, CSS, or URL). See OWASP - XSS. Their API (developed by the ESAPI team) subsequently caters for this having encoders for each context:
ESAPI.encoder().encodeForHTML("input");
ESAPI.encoder().encodeForHTMLAttribute("input");
ESAPI.encoder().encodeForJavaScript("input");
ESAPI.encoder().encodeForCSS("input");
ESAPI.encoder().encodeForURL("input");
Subsequently this allows the developer to cater for DOM-based XSS .
So my question is how does GWT's safehtml package cater for this or does it merely focus on HTML encoding?
SafeHtmlTemplates will do it (client-side only though, as it relies on a GWT generator). It'll parse the HTML fragment using a "tag soup" parser, that will infer the context and either log a warning or throw if the argument cannot be used in this context (for instance, it prevents all use of placeholders in script context). This is still in flux though (SafeUri is still in review and SafeStyles is still severely limited) but it'll be there in due time (should be in GWT 2.4 I think).
Otherwise:
SafeHtmlUtils's will escape all of <, >, &, ' and " so the result is safe for "HTML" and "HTML attribute" contexts
SafeHtmlBuilder's various append methods will just call SafeHtmlUtils under the hood
UriUtils provides tools to scrub unsafe URIs (you'll still need a SafeHtmlUtils pass or equivalent afterwards if you're building an HTML string –vs. using the value directly for an image's source or anchor's href–).
SafeStyles doesn't provide anything specific in itself, but SafeHtmlTemplates will only allow it at the beginning of a CSS context, and will log a warning if you try to put anything else in a CSS context. SafeStylesBuilder is expected to be extended with type-safe methods, to help build well-formed CSS.
I've been working on a SafeUri interface, similar to SafeStyles but in a URL context. In due time, SafeHtmlTemplates will only allow a SafeUri or a String as the full value of a URL attribute, passing the String through UriUtils to make sure it's safe.
In brief, I think the answer to your question is: yes, GWT's safehtml package cater for this; but you'll probably have to always use the latest version of GWT (at least for the coming year) to be safe.

Good approach to XSD schema versioning?

The company I am working for at the moment codify the schema or contract version into the root node. For example,
<PurchaseOrder_v1_2 xmlns="http://someNamespace">
...
</PurchaseOrder>
I am looking for people's opinions on this design approach, as I am not convinced it is sound. For example, it requires that all services using this schema as a messaging contract are able to publish multiple versions to satisfy client requirements for different versions.
I would probably disagree with #hacktick's suggestion that versioning the namespace is conventional. I've never seen the W3C recommend that the namespace changes with version - certainly W3C namespaces don't do this - both versions of XSLT have the namespace http://www.w3.org/1999/XSL/Transform, for example.
Both encoding the version into the root and the namespace are changing the name of the element. In the case of the root, you are effectively stating that is an entirely different element with no defined relationship to the PurchaseOrder element in the previous version. In the case of the namespace change you are stating the same thing about *all the elements in the language.
Version attributes are more normal. May I suggest you read this thread on the XML-dev mailing list for some very well-informed discussion?
normally you versionize the url for the schema.
so you would have a schema called "schema" and you would then make reference to this like:
"http://www.example.com/2011/01/schema" where 2011 and 01 are versions in the form of year and month.
Example:
<PurchaseOrder xmlns="http://www.example.com/2011/01/schema">
</PurchaseOrder>
another approach is to use specify the version in the root element.
if your root-element for example is called "PurchaseOrder" you would add an required version attribute (""). your version attribute would contain a simple number that increments with each version of your xsd. you must save a history of all your public xsds. this could lead to easier xsd urls but the extraction and the validation of these xml-files is a little bit harder.
Example:
<PurchaseOrder version="1" xmlns="http://www.example.com/schema">
</PurchaseOrder>
If you versionize the root element name ("PurchaseOrder_v1_2") you would have conversion problems in your xml-files if you go for another version.
Personally i would go for solution 1 (versionized namespaces). this is also recommended from the w3c. can't find a link for this statement though.

specification/implementation behaviour for empty href?

I once read a page a few years ago about the various browsers' differing implementations of behaviour when a link with an empty href is clicked.
some of them linked to the directory (/path/to/file?query → /path/to/)
some of them linked to the exact same URI (/path/to/file?query → /path/to/file?query)
some of them linked to the same page (/path/to/file?query → /path/to/file)
...and various other behaviours.
Is the behaviour defined in a specification?
If so, what is the correct behaviour?
If so, have the latest versions of the big five browsers today fixed their implementations?
Since there's no "specification" for contents of HREF (at least in HTML 4), the browsers can do whatever they damn well please.
UPDATE However, aside from HTML, there's an RFC3986: Uniform Resource Identifier (URI): Generic Syntax. It has section 4.4. Same-Document Reference which says:
When a URI reference refers to a URI that is, aside from its fragment
component (if any), identical to the base URI (Section 5.1), that
reference is called a "same-document" reference. The most frequent
examples of same-document references are relative references that are empty ...
I do not necessarily read the above as "an empty URI MUST cause the client to reload the same socument's URI", but it does sound like a "best practice" type of wording; so if I was implementing my own browser I'd almost certainly follow such a behavior.
On a related note, here's a good recent 3/2010) roundup of how browsers treat empty src attribute of <img> tag: http://www.nczonline.net/blog/2010/03/16/empty-string-urls-in-html-a-followup/ and http://www.nczonline.net/blog/2010/07/13/empty-string-urls-browser-update/ . Please note that it is a big deal, since having and empty img src would cause the page to endlessly re-load itself in the worst case scenario.

Resources