Number of results not as it should be on boolean Solr query - search

I have a Solr instance running with about 200 entries in its database. I want to search for strings with OR but fail to get a working query.
When running a simple query like this: q=fieldname:"string", i get 13 results. When running another query like this: q=fieldname:"otherstring", i gt 18 results. In the end i would expect it to be 27 results because together there are 31 results and 4 of the are the same ones as they contain both strings.
Problem now comes if i want to search both these strings at once it will return all kinds of results but not the expected 27. I found this site describing how it should work and tried a couple of different things:
q=fieldname:"string otherstring" gives me 10
q=fieldname:"otherstring string" gives me 0
q=fieldname:"string otherstring"~1 gives me 10
q=fieldname:"otherstring string"~1 gives me 1
q=fieldname:"(string otherstring)" gives me 37 but some are not related at all
q=(+fieldname:"string" +fieldname:"otherstring")" same as above
I could go on with this as i tried more if these combinations. Can anyone help me getting a query with the correct number of results or can anyone explain me what i am doing wrong?

If you want to perform an OR query, use OR explicitly:
q=fieldname:"string" OR fieldname:"otherstring"
The other versions will give varying results depending on the value q.op and the query parser in use.
q=fieldname:("string" OR "otherstring")
should be semantically identical.

Related

Cognos Analytics 11 - Age groups

I have a data element expression I want to use as a category for a crosstable.
This gives me the errors "QE-DEF-0459 CCLException" and "QE-DEF-0261 QFWP", although I have followed the syntax properly. Any ideas what is causing this? It seems to be related to the [BIRTHDATE] column inside the when-clauses.
The error message goes like this: qe-def-0260 parsing error before or near position: 40 in: "case when (_years_between(current_date,"
The source database is Oracle.
Usually there are messages which are appended after the error number. The message should be helpful in solving your problem, so reading it would be helpful for you and including the text, rather than just quoting an error number when you ask for assistance could be helpful to others.
I'm not familiar with any case function in Cognos where the query item is required after the case.
Also Case requires an end operator.
Re-write your expression to be something like this, where I've removed birthdate and added the end.
case
when (_years_between(current_date, [BIRTHDATE])>=0 and _years_between(current_date, [BIRTHDATE])<=49) then '0-49'
when (_years_between(current_date, [BIRTHDATE])>=50 and _years_between(current_date, [BIRTHDATE])<=100) then '50-100'
else 'null'
end

How to increase limit of 'decimal' field in expressionengine

I've got a Text Input field with 'Allowed Content' set to 'Decimal'. It won't let me set it to anything over a million on an entry, giving the error number_exceeds_limit.
I've thought about saving it as a string rather than use the decimal content type but I need to display them in order using orderby on the field and if it is a string it will treat 9 and being greater than 100 since 9 is greater than 1.
Is there a way to either increase or get around the million limit?
Expressionengine version 3.5.2 in case it's relevant.
You would need to edit the logic in cp/ee/EllisLab/Addons/text/ft.text.php (line 62 in 5.1.2) and also edit the column type in the DB structure (I just tried this locally setting the field length to 20,4 instead of 10,4.
It is probably worth raising an issue about this on the EE github as that limit seems restrictive to me (mySQL supports 65 digits). https://github.com/ExpressionEngine/ExpressionEngine/issues

elasticsearch stops adding documents after some point

I'm new to elasticsearch and want to index many sentences to search them efficiently.
At first I tried bulk adding to an index, but that didn't work for me, so now I'm adding sentences one by one using the following piece of (python) code:
c = pycurl.Curl()
add_document(c, 'myIndexName', 'someJsonString', 99)
def add_document(c, index_name, js, _id):
c.setopt(c.POST, 1)
c.setopt(c.URL, 'localhost:9200/%s/sentence/%i' % (index_name, _id))
c.setopt(c.POSTFIELDS, json.dumps(js))
c.perform()
Where I'm incrementing the id, and an example of a json input string would be:
{"sentence_id": 2, "article_name": "Kegelschnitt", "paragraph_id": 1, "plaintext": "Ein Kegelschnitt ist der zweidimensionale Sonderfall einer Quadrik .", "postags": "Ein/ART Kegelschnitt/NN ist/VAFIN der/ART zweidimensionale/ADJA Sonderfall/NN einer/ART Quadrik/NE ./$."}
So far so good, seems to work. I suspect that getting this to work in a bulk import way is a lot more efficient, but since this is a one-time only process, efficiency is not my primary concern.
I'm using this query (on the command line) to get an overview of my indices:
curl 'localhost:9200/_cat/indices?v'
Which gives me (for the relevant index):
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open wiki_dump_jan2019 5 1 795502 276551 528.1mb 528.1mb
Similarly, the query:
curl -XGET 'localhost:9200/wiki_dump_jan2019/sentence/_count?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}}}'
returns
{
"count" : 795502,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
Telling me that I have 795.502 sentences in my index.
My problem here is that in total I do over 23 million inserts. I realise that there may well be some duplicate sentences, but checked this and found over 21 million unique sentences. My python code executed fine, no errors, and I checked the elasticsearch logs and did not find anything alarming in there. I'm a bit unsure about the number of docs.deleted from the index (276.551, see above), but I understood that this may have to do with re-indexing, duplicates, and should not necessarily be a problem (and in any case, the total number of docs and the docs.deleted are still way below my number of sentences).
The only thing I could find (getting close to my problem) was this post: elasticsearch stops indexing new documents after a while, using Tire , but the following query:
curl -XGET 'localhost:9200/_nodes/stats/process?filter_path=**.max_file_descriptors'
returns:
{"nodes":{"hoOAMZoCTkOgirg6_aIkUQ":{"process":{"max_file_descr^Ctors":65536}}}}
so from what I understand upon installation it defaulted to the max value and this should not be the issue.
Anyone who can shed some light on this?
UPDATE: Ok, I guess I'm officially stupid. My issue was that I used the sentence_id as index id in the adding/inserting process. This sentence_id is coming from one particular document, so the max nr of docs (sentences) in my index would be the highest sentence_id (the longest document in my data set apparently had 795502 sentences). It just kept overwriting all entries after every document... Sorry for having wasted your time if you read this. NOT an elasticsearch issue; bug in my python code (outside of the displayed function above).

Azure Search alphanumeric order

Is there a way to order results in Azure Search by another order than the default one ?
In my case, I have entries which number looks something like that : L1-1, L1-2, L2-1, L10-1, D1-1, etc...
I would like to do two things :
Most of all, order in alphanumeric order, not alphabetical, by which I mean that I want the order :
L1-1, L1-2, L2-1, L10-1
and not
L1-1, L1-2, L10-1, L2-1
Azure search default order gives me the second one.
If possible, I would also like to add some salt by specifying a custom order for first char (let's say L > D > Q).
For now, I retrieve all results, then order by that custom order, but that prevents me from building a paging or "infinite loading" system (I can't retrieve my results 10 by 10 if they are in the wrong order).
Cheers !

Using indexed types for ElasticSearch in Titan

I currently have a VM running Titan over a local Cassandra backend and would like the ability to use ElasticSearch to index strings using CONTAINS matches and regular expressions. Here's what I have so far:
After titan.sh is run, a Groovy script is used to load in the data from separate vertex and edge files. The first stage of this script loads the graph from Titan and sets up the ES properties:
config.setProperty("storage.backend","cassandra")
config.setProperty("storage.hostname","127.0.0.1")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","db/es")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
The second part of the script sets up the indexed types:
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make();
The third part loads in the data from the CSV files, this has been tested and works fine.
My problem is, I don't seem to be able to use the ElasticSearch functions when I do a Gremlin query. For example:
g.E.has("property",CONTAINS,"test")
returns 0 results, even though I know this field contains the string "test" for that property at least once. Weirder still, when I change CONTAINS to something that isn't recognised by ElasticSearch I get a "no such property" error. I can also perform exact string matches and any numerical comparisons including greater or less than, however I expect the default indexing method is being used over ElasticSearch in these instances.
Due to the lack of errors when I try to run a more advanced ES query, I am at a loss on what is causing the problem here. Is there anything I may have missed?
Thanks,
Adam
I'm not quite sure what's going wrong in your code. From your description everything looks fine. Can you try the follwing script (just paste it into your Gremlin REPL):
config = new BaseConfiguration()
config.setProperty("storage.backend","inmemory")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","/tmp/es-so")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
g = TitanFactory.open(config)
g.makeKey("name").dataType(String.class).make()
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make()
g.makeLabel("knows").make()
g.commit()
alice = g.addVertex(["name":"alice"])
bob = g.addVertex(["name":"bob"])
alice.addEdge("knows", bob, ["property":"foo test bar"])
g.commit()
// test queries
g.E.has("property",CONTAINS,"test")
g.query().has("property",CONTAINS,"test").edges()
The last 2 lines should return something like e[1t-4-1w][4-knows-8]. If that works and you still can't figure out what's wrong in your code, it would be good if you can share your full code (e.g. in Github or in a Gist).
Cheers,
Daniel

Resources