Solr WhitespaceTokenizerFactory will make URL parameter no work - search

I created a new field type as seen below:
<fieldType name="text_whitespace" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" rule="unicode" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I need WhitespaceTokenizerFactory to make special characters to index and search, and it's working now,
But I have other question,
When I used WhitespaceTokenizerFactory, it will make URL parameter no work,
e.g. http://localhost:8983/solr/Test1/select?defType=dismax&hl.fl=content&hl=on&indent=on&q=%22C#"&qf=content^100&rows=1&wt=json
when I used that parameter in Solr Web UI,
It will work and get the result,
But When I used the URL and same parameter I get no result
and this is my date:
[
{
"id" : "test1",
"title" : "test1# title C*?#",
"content" : "test1# title C*?#",
"dynamic_s": 5
},
{
"id" : "test2",
"title" : "test2 title C#",
"content" : "test2 title C#",
"dynamic_s": 10
},
{
"id" : "test3",
"title" : "test3 title",
"content" : "test3 title",
"dynamic_s": 0
}
]
If I use WhitespaceTokenizerFactory how do I make the parameter work in URL?

This is not related to Solr, but is how HTTP works.
As explained in your original post, this is because # has special meaning in HTTP URLs. A # indicates a local anchor, and is never transmitted to the server - it's used to keep a local reference to a single point in the page (these days the value behind # refers to the id of the element the page should scroll to when being displayed, but earlier it referenced an empty a tag with a name).
To use characters with special meaning in URLs (& would also mean that there's a new parameter coming instead of being interpreted as a value to an argument), you have to escape them. In Javascript you can use encodeURIComponent to do this:
encodeURIComponent("foo#&bar")
-> "foo%23%26bar"
So to send the value foo#&bar as the argument, and not introduce a new parameter or a local anchor hash, the value would be sent as foo%23%26bar instead. Your HTTP server will decode this for you automagically.
?q=field%3Afoo%23%26bar
.. will be interpreted as field:foo#&bar serverside. Since ':' can usually be used safely in URLs, you don't have to escape it - but it doesn't hurt to do it properly. Look up URL escaping in your language of choice if you're going to do this in an application.

Related

Visual Paradigm: Invalid Syntax Highlighting without acutal syntax mistake (as far as I can see)

I think I have encountered an error message that is not necessarily valid or helpful. If it is valid, please tell me what the mistake I encounter is triggered by.
You do not follows UML notation and you exchanged the parameter and its type, your operations must be
create(entity : E) : Result<E>
create(entities : iterable<E>) : ResultCollection<E>
Your create(E : entity) : Result<E> was accepted 'syntactically' because the var can be E and its type entity, but in the second case the var name iterable<E> is illegal and the tool refuses that.
From formal/2017-12-05 §9.6.4 page 117 and 118 :
If shown in a diagram, an Operation is shown as a text string of the form:
[<visibility>] <name> ‘(‘ [<parameter-list>] ‘)’
[‘:’ [<return-type>] [‘[‘ <multiplicity-range> ‘]’]
[‘{‘ <oper-property> [‘,’ <oper-property>]* ‘}’]]
and <parameter-list> is a list of Parameters of the Operation in the following format:
<parameter-list> ::= <parameter> [‘,’<parameter>]*
and § 9.4.4 page 110 :
<parameter> ::= [<direction>] <parameter-name> ’:’ <type-expression>
[’[’<multiplicity-range>’]’] [’=’ <default>]
[’{’ <parm-property> [’,’ <parm-property>]* ’}’]
So it must be <parameter-name> ’:’ <type-expression> rather than <type-expression> ’:’ <parameter-name> as you did

JMeter: # notation (for xml)

On groovy templates for jmeter page there is an example I wanted to follow:
String xml = “
<actions>
<action type=”error” info=”itsErrors”/>
<action type="warning" info=”warnWarn”/>
<action type=”info” info=”justLogInfo”/>
</actions>"
XmlParser parser = new XmlParser()
def actions= parser.parseText (xml)
actions.action.each { action ->
println "${action.'#type'}: ${action.'#info'}";
}
At least in my JMeter 5.1 it did not work as posted, but when I fixed quotation marks it did:
String xml = """
<actions>
<action type="error" info="itsErrors"/>
<action type="warning" info="warnWarn"/>
<action type="info" info="justLogInfo"/>
</actions>"""
XmlParser parser = new XmlParser()
def actions= parser.parseText (xml)
actions.action.each { action ->
println "${action.'#type'}: ${action.'#info'}";
}
My question is usage of # mainly, dot and quotes too (.'#type'). I tried web search for Groovy # and found nothing, for JMeter notations found https://jmeter.apache.org/usermanual/functions.html with only one instance of usage:
Example: ${__XPath(/path/to/build.xml, //target/#name)} This will
match all targets in build.xml and return the contents of the next
name attribute
And about variables same link:
Referencing a variable in a test element is done by bracketing the
variable name with '${' and '}'.
Groovy docs page for xml gives other notations:
https://groovy-lang.org/processing-xml.html
def text = '''
<list>
<technology>
<name>Groovy</name>
</technology>
</list>
'''
def list = new XmlParser().parseText(text)
assert list instanceof groovy.util.Node
assert list.technology.name.text() == 'Groovy'
What each notation in "${action.'#type'}: ${action.'#info'}" means?
It isn't a JMeter variable even with ${}, is it?
I managed to keep in working only w/put ', other parts seems necessary: ", ., #, {}, $. I may have put extra in last phrase, some I can explain, but just to be sure I understand it right.
It's GPath syntax used in groovy
The most common way of querying XML in Groovy is using GPath
For XML, you can also specify attributes, e.g.:
a["#href"] → the href attribute of all the a elements
a.'#href' → an alternative way of expressing this
a.#href → an alternative way of expressing this when using XmlSlurper

How can I search the special characters in Solr

I'm used Solr 6.6.2
I need to search the special characters and highlight it in Solr,
But it does not work,
my data :
[
{
"id" : "test1",
"title" : "test1# title C# ",
"dynamic_s": 5
},
{
"id" : "test2",
"title" : "test2 title C#",
"dynamic_s": 10
},
{
"id" : "test3",
"title" : "test3 title",
"dynamic_s": 0
}
]
When I search "C#",
Then it will just response like this "test1# title C# ",
It just highlights "C" this word...and "#" will not searching and highlight.
How can I make the search and highlight work for special characters?
The StandardTokenizer splits tokens on special characters, meaning that # will split the content into separate tokens - the first token will be C - and that's what's being highlighted. You'll probably get the exact same result if you just search for C.
The tokenization process will make your tokens end up being test2 title C .
Using a field type with a WhitespaceTokenizer that only splits on whitespace will probably be a better choice for this exact use case, but it's impossible to say if that'll be a good match for your regular search behavior (i.e. if you actually want to match 'C' to `C-99' etc., splitting by those characters can be needed). But - you can use a specific field for highlighting, and that fields analysis chain will be used to determine what to highlight. And you can ask for both the original and the more specific field to be highlighted, and then use the best result in your frontend application.

How can I use the Xpath function 'contains()' to return nothing if it's search param is blank or missing/false?

I'm trying to write an Xpath Statement (1.0) that can read info from a 'search' node and perform a search using it.
I was making some nice progress, but stumbled across an issue where if an attribute (used for a value in the search) is empty or doesn't exist, it fails.
Code Edited to simplify Example:
So Here is my sample XML:
<xml>
<files>
<file name="foo" description="" rating="4"/>
<file name="food" description="" rating="4"/>
<file name="foobar" description="" rating="3"/>
<file name="bar" description="" rating="3"/>
<file name="barter" description="" rating="3"/>
<file name="barterer" description="" rating="2"/>
</files>
<searches>
<search id="1">
<exclude>
<file term="foo"/>
</exclude>
</search>
</searches>
</xml>
And working XPATH:
//files/file[
not(contains(#name, //search[#id='1']/exclude/file/#term))
]
It works as expected...
However if the an expected attribute is missing or empty it will fail to work. I think because: contains(#attrib, "") matches everything for some-reason, therefore a not() will always match nothing if the attribute is "" or not present.
For Example, if I alter the exclude fragment of XML to this it fails:
<exclude>
<file term=""/>
</exclude>
with this too:
<exclude></exclude>
Is there a way to Check for an empty value and not perform the select? or is there perhaps a better way of structuring the Logic. Bare in mind I cannot use Conditionals or the other functions in Xpath2.0.
Why does the Xpath function contains() return everything if it search
param is blank or missing?
Because that is what the XPath specification says the contains() function should do:
If the value of $arg2 is the zero-length string, then the function
returns true.
You could adjust your XPath and simplify some of the conditions with the following:
//files/file[
(
(
not(//search[#id='1']/include/file/#term)
or
(
contains(#name, //search[#id='1']/include/file/#term)
or
contains(#description, //search[#id='1']/include/file/#term)
)
)
or
contains(#rating, //search[#id='1']/include/file/#rating)
)
and
(
(
not(//search[#id='1']/exclude/file/#term)
or
(
not(contains(#name, //search[#id='1']/exclude/file/#term))
and
not(contains(#description, //search[#id='1']/exclude/file/#term))
)
)
and
(
not(//search[#id='1']/exclude/file/#rating)
or
not(contains(#rating, //search[#id='1']/exclude/file/#rating))
)
)
]
Perhaps you want to say something like
//files/file[
not(contains(#name,
//search[#id='1']
/exclude/file/#term))
or not(
normalize-string(//search[#id='1']
/exclude/file/#term)
)
]
So I stumbled across the answer in a different post. Below is an example of it working.
Thanks for everyone's advise.
//files/file[
not(
contains(
#name,
concat( //search[#id='1']/exclude/file/#term,
substring('??', 1 + 2*
boolean( substring( //search[#id='1']/exclude/file/#term, 1 ) )
)
)
)
)
]
The bit where I placed "??" probably wants replacing with an invalid char or something. For each additional character used the 1+ needs to increment. For me I'm checking filenames so a questionmark seems a good idea. To be honest I probably won't be using this route, I was just eager to solve the problem after all this time.
Here is where I got the idea from:
How to give back constant if node does not exist in XPATH?

Deserialize XMLDocument with encoded characters in attribute names

I'm Trying to deserialize xml data into an object with c#. I have always done this using the .NET deserialize method, and that has worked well for most of what I have needed.
Now though, I have XML that is created by Sharepoint and the attribute names of the data I need to deserialize have encoded caracters, namely:
*space, º, ç ã, :, * and a hyphen as
x0020, x00ba, x007a, x00e3, x003a and x002d respectivly
I'm trying to figure out what I have to put in the attributeName parameter in the properties XmlAttribute
x0020 converts to a space well, so, for instance, I can use
[XmlAttribute(AttributeName = "ows_Nome Completo")]
to read
ows_Nome_x0020_Completo="MARIA..."
On The other hand, neither
[XmlAttribute(AttributeName = "ows_Motiva_x00e7__x00e3_o_x003a_")]
nor
[XmlAttribute(AttributeName = "ows_Motivação_x003a_")]
nor
[XmlAttribute(AttributeName = "ows_Motivação:")]
allow me to read
ows_Motiva_x00e7__x00e3_o_x003a_="text to read..."
With the first two I get no value returned, and the third gives me a runtime error for invalid caracters (the colon).
Anyway to get this working with .NET Deserialize, or do I have to build a specific deserializer for this?
Thanks!
What you are looking at (the "cryptic" data) is called XML entities. It's used by SharePoint to safekeep attribute names and similar elements.
There are a few ways of dealing with this, the most elegant ways to solve it is by extracting the List schema and match the element towards the schema. The schema contain all meta-data about your list data. A polished example of a Schema can be seen below or here http://www.bendsoft.com/documentation/camelot-php-tools/1_5/packets/schema-and-content-packets/schemas/example-list-view-schema/
If you don't want to walk that path you could start here http://msdn.microsoft.com/en-us/library/35577sxd.aspx
<Field Name="ContentType">
<ID>c042a256-787d-4a6f-8a8a-cf6ab767f12d</ID>
<DisplayName>Content Type</DisplayName>
<Type>Text</Type>
<Required>False</Required>
<ReadOnly>True</ReadOnly>
<PrimaryKey>False</PrimaryKey>
<Percentage>False</Percentage>
<RichText>False</RichText>
<VisibleInView>True</VisibleInView>
<AppendOnly>False</AppendOnly>
<FillInChoice>False</FillInChoice>
<HTMLEncode>False</HTMLEncode>
<Mult>False</Mult>
<Filterable>True</Filterable>
<Sortable>True</Sortable>
<Group>_Hidden</Group>
</Field>
<Field Name="Title">
<ID>fa564e0f-0c70-4ab9-b863-0177e6ddd247</ID>
<DisplayName>Title</DisplayName>
<Type>Text</Type>
<Required>True</Required>
<ReadOnly>False</ReadOnly>
<PrimaryKey>False</PrimaryKey>
<Percentage>False</Percentage>
<RichText>False</RichText>
<VisibleInView>True</VisibleInView>
<AppendOnly>False</AppendOnly>
<FillInChoice>False</FillInChoice>
<HTMLEncode>False</HTMLEncode>
<Mult>False</Mult>
<Filterable>True</Filterable>
<Sortable>True</Sortable>
</Field>
<Field>
...
</Field>
Well... I guess I kind of hacked a way around, which works for now. Just replaced the _x***_ charecters for nothing, and corrected the XmlAttributes acordingly. This replacement is done by first loading the xml as a string, then replacing, then loading the "clean" text as XML.
But I wopuld still like to know if it is possible to use some XmlAttribute Name for a more direct approach...
Try using System.Xml; XmlConvert.EncodeName and XmlConvert.DecodeName
I use a simply function to get the NameCol:
private string getNameCol(string colName) {
if (colName.Length > 20) colName = colName.Substring(0, 20);
return System.Xml.XmlConvert.EncodeName(colName);
}
I'm already searching for replace characters like á, é, í, ó, ú. EncodeName doesn't convert this characters.
Can use Replace:
.Replace("ó","_x00f3_").Replace("á","_x00e1_")

Resources