I have this content in my file:
{
"performanceHighToLow" : {
tabs : {
bySales : "by sales",
byOrder : "by order"
},
category : "performanceHighToLow",
onTabClick
},
performanceLowToHigh : {
tabs : {
bySales : "by sales",
byOrder : "by order"
},
category : "performanceLowToHigh",
onTabClick
}
}
I was wondering if I could write a regex to quote all dequoted words. On the same subject, is there a way to select full word(word boundary) before the colon(:) occurrence.
To match words before a colon you could match word character + possible whitespace + colon, but stopping the match after the word itself with \ze:
/\w\+\ze\s*:
To also match the possible last word in a line (line onTabClick) you could modify the previous pattern with an or at the colon / EOL:
/\w\+\ze\s*\(:\|$\)
In which case it could be easier to enable very-magic to simplify escaping:
/\v\w+\ze\s*(:|$)
To then "quote" these results:
:%s/\v\w+\ze\s*(:|$)/"&"/g
Related
I'm used Solr 6.6.2
I need to search the special characters and highlight it in Solr,
But it does not work,
my data :
[
{
"id" : "test1",
"title" : "test1# title C# ",
"dynamic_s": 5
},
{
"id" : "test2",
"title" : "test2 title C#",
"dynamic_s": 10
},
{
"id" : "test3",
"title" : "test3 title",
"dynamic_s": 0
}
]
When I search "C#",
Then it will just response like this "test1# title C# ",
It just highlights "C" this word...and "#" will not searching and highlight.
How can I make the search and highlight work for special characters?
The StandardTokenizer splits tokens on special characters, meaning that # will split the content into separate tokens - the first token will be C - and that's what's being highlighted. You'll probably get the exact same result if you just search for C.
The tokenization process will make your tokens end up being test2 title C .
Using a field type with a WhitespaceTokenizer that only splits on whitespace will probably be a better choice for this exact use case, but it's impossible to say if that'll be a good match for your regular search behavior (i.e. if you actually want to match 'C' to `C-99' etc., splitting by those characters can be needed). But - you can use a specific field for highlighting, and that fields analysis chain will be used to determine what to highlight. And you can ask for both the original and the more specific field to be highlighted, and then use the best result in your frontend application.
I have some hotels that contains characters which are not valie for when i want to insert these hotel names as a file name as file naming doesn't allow /, * or ? and want to know what this error means.
text?text
text?text
text**text**
text*text (text)
text *text*
text?
I am trying to use an if else statement so that if a hotel name contains any of these characters, then replace them with -. However I am receiving and error stating a dangling ?. I just want to check if I am using the replace correctly for these characters.
def hotelNameTrim = hotelName.toString()
if (hotelNameTrim.contains("/"))
{
hotelNameTrim.replaceAll("/", "-")
}
else if (hotelNameTrim.contains("*"))
{
hotelNameTrim.replaceAll("*", "-")
}
else if (hotelNameTrim.contains("?"))
{
hotelNameTrim.replaceAll("?", "-")
}
replaceAll accepts a regex as a search pattern. * and ? are special characters in regex and need to be escaped with a back slash. Which itself needs to be escaped in a Java string :)
Try this:
hotelNameTrim.replaceAll("[\\*\\?/]","-")
That will replace all you characters with a dash.
I am using the NPM module json-csv to generate a CSV file from an array of objects. However, some fields might contain a semicolon (;), and apparently the CSV gets split at the semicolon, despite the fact that this field is quoted. Can anyone make any suggestions as to how to fix this issue?
The code I use for the options is the following:
var options = {
fields: [
{
name : 'paragraphId',
label : 'ParagraphID'
},
{
name : 'paragraph',
label : 'Paragraph',
quoted : true
}
]
};
According to CSV specification, you can have delimiters in values, as long as you surround these values with double quotes. From CSV specification:
Fields with embedded commas must be delimited with double-quote characters.
And:
Fields may always be delimited with double quotes.
The delimiters will always be discarded.
Option to trigger this behavior on when exporting data using json-csv library is quoted: true in the options for a given field - I see that you've already included it, so you're good.
Also - it's worth to note that this library uses comma (,) as delimiter by default, not semicolon (;). To use different delimiter, alter your options properly:
var options = {
fields: [
{
name: 'paragraphId',
label: 'ParagraphID'
},
{
name: 'paragraph',
label: 'Paragraph',
quoted: true
}],
fieldSeparator: ';' // the important part!
};
In a certain ID field we are indexing on a document looks like this:
1234 45676
We want to be able to do fulltext searches on each of the 2 groups of numbers, just as if they are strings. I escape the number groups in quotes, which the mongo documentation says will ensure that the entire string will be searched for.
For example, if an indexed field has the word "blue" in it, only the word "blue" will be searched for. Searching on "b" will not yield a hit. (we are using non-stemmatic searching for the time being).
But that is not the result with the number groups. Even though we escape our number groups with quotes ("45676"), the number groups are subjected to wildcard searches. In our example, searching on "4" will hit on "45676".
How can we ensure that "45676" is treated as a string that will yield a hit only if "45676" is searched for?
All suggestions or perspectives are welcome! Thanks in advance.
There are two solutions for searching for a group of numbers as a unique single word.
1) Use the $text operator and text index
2) Use the $regexp operator or a regular expression.
Setup:
db = connect("test"); // same as `use test;`
db.a.drop();
db.a.insert([
{ _id: 1, txt : "Log 1: Page 23 1234 45676" },
{ _id: 2, txt : "Log 2: Page 45 0000 00000" },
{ _id: 3, txt : "Log 3: Page 59 1337 11111" }
]);
1. Example using the $text operator
Index the searchable field
db.a.ensureIndex({ txt : "text" });
Query using the $text operator
db.a.find({ $text : { $search : "45" } });
Output
{ _id: 2, txt : "Log 2: Page 45 0000 00000" }
Notice that output doesn't return the doc with _id 1, even though it contains 45676.
2. Example using a regular expression
For the regular expression, you need to wrap the numbers in a word boundary, \b, to avoid them being matched within a string.
Example:
Searching for 4 without word boundary.
/4/.test("4") == true
/4/.test("1234") == true
Searching for 4 with word boundary.
/\b4\b/.test("4") == true
/\b4\b/.test("1234") == false
Search for 45 using the regular expression
db.a.find({ txt : /\b45\b/ });
Output
{ _id: 2, txt : "Log 2: Page 45 0000 00000" }
You can form a regular expression from user's input with the following functions.
function escapeRegExp(str) {
return String(str).replace(/[[\]/{}()*+?.\^$|-]/g, "\\$&");
}
function wordToRegExp( query ){
return new RegExp( "\\b" + escapeRegExp( query ) + "\\b" );
}
var queryForWord = wordToRegExp( 45 );
// queryForWord would be sent from your server side, not created in mongo shell.
db.a.find({ txt : queryForWord });
More info:
$text doc
$regex doc
Regular Expressions Basics
Fulltext search setup
Exact matching for text is supported in ElasticSearch if the field mapping contains "index" : "not_analyzed". That way, the field wont' be tokenized and ES will use the whole string for exact matching. The Documentation
Is there a way to support both full text searching and exact matching without having to create two fields: one for full-text, and one with not_analyzed mapping for exact matching?
An example use case:
We want to search by book titles.
I like trees should return results of full text search
exact="I like trees" should return only books that have the exact title I like trees and nothing else. Case insensitive is fine.
You can use a term filter to do exact match searches
the filter looks like this
{
"term" {
"key" : "value"
}
}
a query would look like this:
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"key" : "value"
}
}
}
}
}
You don't need to store the data in two different fields, what you want is an ES multi-field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields