I'm trying to get ElasticSearch to work, specifically with the River Plugin. For some reason I just can't get it to work. I've included the procedure I'm using to try and do it, found here:
curl -XDELETE 'http://localhost:9200/_all/'
Response:
{
"ok": true,
"acknowledged": true
}
This is so I know I'm working with an empty set of elasticsearch instances.
I have an existing database, called test and the river plugin has already been installed. Is there anyway to test to confirm the River Plugin is installed and running?
I issue the following command:
curl -XPUT 'http://localhost:9200/_river/my_index/_meta' -d '{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"db" : "my_couch_db",
"filter" : null
}
}'
my_couch_db is a real database, I see it in Futon. There is a document in it.
Response:
{
"ok": true,
"_index": "_river",
"_type": "my_index",
"_id": "_meta",
"_version": 1
}
Now at this point, my understanding is elasticseach should be working as I saw in the tutorial.
I try to query, just to find anything. I go to
http://localhost:9200/my_couch_db/my_couch_db.
Response:
No handler found for uri [/my_couch_db/my_couch_db] and method [GET]
What's weird is when I go to
localhost:5984/my_couch_db/__changes
I get
{
"error": "not_found",
"reason": "missing"
}
Anyone have any idea what part of this I'm screwing up?
I try to query, just to find anything.
I go to
http://localhost:9200/my_couch_db/my_couch_db.
try adding /_search (w/ optional ?pretty=true) at the end of your curl -XGET like so:
C:\>curl -XGET "http://localhost:9200/my_couch_db/my_couch_db/_search?pretty=true"
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.0,
"hits": [
{
"_index": "my_couch_db",
"_type": "my_couch_db",
"_id": "a2b52647416f2fc27684dacf52001b7b",
"_score": 1.0,
"_source": {
"_rev": "1-5e4efe372810958ed636d2385bf8a36d",
"_id": "a2b52647416f2fc27684dacf52001b7b",
"test": "hello"
}
}
]
}
}
What's weird is when I go to
localhost:5984/my_couch_db/__changes
I get {"error":"not_found","reason":"missing"}
try removing one of the underscores from your __changes and that should work like so:
C:\>curl -XGET "http://localhost:5984/my_couch_db/_changes"
{
"results": [
{
"seq": 1,
"id": "a2b52647416f2fc27684dacf52001b7b",
"changes": [
{
"rev": "1-5e4efe372810958ed636d2385bf8a36d"
}
]
}
],
"last_seq": 1
}
Related
I'm trying to build an index that is searchable for a possible case insensitive exact match. The Elasticsearch version is 8.6.2 with Lucene version is 9.4.2. The code is run in Python with Python's elasticsearch library.
settings = {"settings": {
"analysis": {
"analyzer": {"lower_analizer": {"tokenizer": "whitespace", "filter": [ "lowercase" ]} }
}
}
}
mappings = {"properties": {
"title": {"type": "text", "analyzer": "standard"},
"article": {"type": "text", "analyzer": "lower_analizer"},
"sentence_id": {"type": "integer"},
}
}
I copied the settings from Elasticsearch's tutorial. However, it returned the following error:
BadRequestError: BadRequestError(400, 'illegal_argument_exception',
'unknown setting [index.settings.analysis.analyzer.lower_analizer.filter]
please check that any required plugins are installed, or check the
breaking changes documentation for removed settings')
I'm not sure where to proceed, as it implies lowercase function does not exist?
In standard analyzer there is lowercase filter in default.
Text field types uses standard analyzer
PUT test_stackoverflow/_doc/1
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
GET test_stackoverflow/_search
{
"query": {
"match": {
"text": "quick"
}
}
}
Response:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "test_stackoverflow",
"_id": "1",
"_score": 0.2876821,
"_source": {
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
}
]
}
}
Result as image:
I am trying to get "search by example" functionality out of ElasticSearch.
I have a number of objects which have fields, e.g. name, description, objectID, etc.
I want to perform a search where, for example, "name=123" and "description=ABC"
Mapping:
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 3,
"refresh_interval": "5s",
"index.mapping.total_fields.limit": "500"
},
"mappings": {
"CFS": {
"_routing": {
"required": true
},
"properties": {
"objectId": {
"store": true,
"type": "keyword",
"index": "not_analyzed"
},
"name": {
"type": "text",
"analyzer": "standard"
},
"numberOfUpdates": {
"type": "long"
},
"dateCreated": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"lastModified": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis",
"index": "not_analyzed"
}
}
}
}
}
Trying a very simple search, without field name, gives correct result:
Request: GET http://localhost:9200/repository/CFS/_search?routing=CFS&q=CFS3
Returns:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.7831944,
"hits": [
{
"_index": "repository",
"_type": "CFS",
"_id": "589a9a62-1e4d-4545-baf9-9cc7bf4d582a",
"_score": 0.7831944,
"_routing": "CFS",
"_source": {
"doc": {
"name": "CFS3",
"description": "CFS3Desc",
"objectId": "589a9a62-1e4d-4545-baf9-9cc7bf4d582a",
"lastModified": 1480524291530,
"dateCreated": 1480524291530
}
}
}
]
}
}
But trying to prefix with a field name fails (and this happens on all fields, e.g. objectId):
Request: GET http://localhost:9200/repository/CFS/_search?routing=CFS&q=name:CFS3
Returns:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Eventually I want to do something like:
{
"bool" : {
"must" : [
{
"wildcard" : {
"name" : {
"wildcard" : "*CFS3*",
"boost" : 1.0
}
}
},
{
"wildcard" : {
"description" : {
"wildcard" : "*CFS3Desc*",
"boost" : 1.0
}
}
}
]
}
}
Maybe related? When I try to use a "multi_match" to do this, I have to prefix my field name with a wildcard, e.g.
POST http://localhost:9200/repository/CFS/_search?routing=CFS
{
"query": {
"multi_match" : {
"query" : "CFS3",
"fields" : ["*name"]
}
}
}
If I don't prefix it, it doesn't find anything. I've spent 2 days searching StackOverflow and the ElasticSearch documentation. But these issues don't seem to be mentioned.
There's lots about wildcards for search terms, and even mention of wildcards AFTER the field name, but nothing about BEFORE the field name.
What piece of information am I missing from the field name, that I need to deal with by specifying a wildcard?
I think the types of my fields in the mapping are correct. I'm specifying an analyzer.
I found out the answer to this :(
I had been keen to utilise "upserts", to avoid having to check if the object already existed, and to therefore keep performance high.
As you see at this link https://www.elastic.co/guide/en/elasticsearch/guide/current/partial-updates.html and this one https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html when calling the Update REST call, you specify your payload as:
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
When implementing the equivalent using the Java client, I didn't follow the examples exactly. Instead of what was suggested:
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
.startObject()
.field("gender", "male")
.endObject());
client.update(updateRequest).get();
I had implemented:
JsonObject state = extrapolateStateFromEvent( event );
JsonObject doc = new JsonObject();
doc.add( "doc", state );
UpdateRequest updateRequest = new UpdateRequest( indexName, event.getEntity().getType(), event.getEntity().getObjectId() );
updateRequest.routing( event.getEntity().getType() );
updateRequest.doc( doc.toString() );
updateRequest.upsert( doc.toString() );
UpdateResponse response = client.update( updateRequest ).get();
I wrapped my payload/"state" with a "doc" object, thinking it was needed.
But this had a large impact on how I interacted with my data, and at no point was I warned about it.
I guess I had accidentally created a nested object. Although I wonder why it affects the search APIs so much?
How this could be improved? Maybe the mapping could default to disallow nested objects? Or there could be some kind of validation that a programmer could perform?
I need help. I have these documents on elasticsearch 1.6
{
"name":"Sam",
"age":25,
"description":"Something"
},
{
"name":"Michael",
"age":23,
"description":"Something else"
}
with this query:
GET /MyIndex/MyType/_search?q=Michael
Elastic return this object:
{
"name":"Michael",
"age":23,
"description":"Something else"
}
... That's right, but I want to get the exactly key where text "Michael" was found. Is that possible? Thanks a lot.
I assume that by key you mean the document ID.
When indexing the following documents:
PUT my_index/my_type/1
{
"name":"Sam",
"age":25,
"description":"Something"
}
PUT my_index/my_type/2
{
"name":"Michael",
"age":23,
"description":"Something else"
}
And searching for:
GET /my_index/my_type/_search?q=Michael
You'll get the following response:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.15342641,
"_source": {
"name": "Michael",
"age": 23,
"description": "Something else"
}
}
]
}
}
As you can see, the hits array contains an object for each search hit.
The key for Michael in this case is "_id": "2" which its his document id.
Hope it helps.
I have a document in the form of:
curl -XPOST localhost:9200/books/book/1 -d '{
"user_id": 1,
"pages": [ {"page_id": 1, "count": 1}, {"page_id": 2, "count": 3}]
}
Now lets say the user reads page 1 again, so I want to increment the count. The document should become:
{
"user_id": 1,
"pages": [ {"page_id": 1, "count": 2}, {"page_id": 2, "count": 3}]
}
But how do you do this update of an element of a list using an if variable?
An example of a simple update in Elasticsearch is as follows:
curl -XPOST localhost:9200/books/book/2 -d '{
"user_id": 1,
"pages": {
"page_1": 1,
"page_2": 2
}
}'
curl -XPOST localhost:9200/books/book/2/_update -d '
{
"script": "ctx._source.pages.page_1+=1"
}'
The document now becomes:
{
"user_id": 1,
"pages": {
"page_1": 1,
"page_2": 2
}
However this more simple format of a doc looses stating the page_id as a field, so the id itself acts as the field. Similarly the value associated to the field has no real definition. Thus this isn't a great solution.
Anyway, would be great to have any ideas on how to update the array accordingly or any ideas on structuring of the data.
Note: Using ES 1.4.4, You also need to add script.disable_dynamic: false to your elasticsearch.yml file.
Assuming I'm understanding your problem correctly, I would probably use a parent/child relationship.
To test it, I set up an index with a "user" parent and "page" child, as follows:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"user": {
"_id": {
"path": "user_id"
},
"properties": {
"user_id": {
"type": "integer"
}
}
},
"page": {
"_parent": {
"type": "user"
},
"_id": {
"path": "page_id"
},
"properties": {
"page_id": {
"type": "integer"
},
"count": {
"type": "integer"
}
}
}
}
}
(I used the "path" parameter in the "_id"s because it makes the indexing less redundant; the ES docs say that path is deprecated in ES 1.5, but they don't say what it's being replaced with.)
Then indexed a few docs:
POST /test_index/_bulk
{"index":{"_type":"user"}}
{"user_id":1}
{"index":{"_type":"page","_parent":1}}
{"page_id":1,"count":1}
{"index":{"_type":"page","_parent":1}}
{"page_id":2,"count":1}
Now I can use a scripted partial update to increment the "count" field of a page. Because of the parent/child relationship, I have to use the parent parameter to tell ES how to route the request.
POST /test_index/page/2/_update?parent=1
{
"script": "ctx._source.count+=1"
}
Now if I search for that document, I will see that it was updated as expected:
POST /test_index/page/_search
{
"query": {
"term": {
"page_id": {
"value": "2"
}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "page",
"_id": "2",
"_score": 1,
"_source": {
"page_id": 2,
"count": 2
}
}
]
}
}
Here is the code all in one place:
http://sense.qbox.io/gist/9c977f15b514ec251aef8e84e9510d3de43aef8a
HI I have installed elasticsearch version 0.18.7 and configured couchdb according to these instructions. I am trying to create indexing in the following way:
curl -XPUT '10.50.10.86:9200/_river/tasks/_meta' -d '{
"type": "couchdb",
"couchdb": {
"host": "10.50.10.86",
"port": 5984,
"db": "tasks",
"filter": null
},
"index": {
"index": "tasks",
"type": "tasks",
"bulk_size": "100",
"bulk_timeout": "10ms"
}
}'
and got the message like,
{
"ok": true,
"_index": "_river",
"_type": "tasks",
"_id": "_meta",
"_version": 2
}
when trying to access the url like
curl -GET 'http://10.50.10.86:9200/tasks/tasks?q=*&pretty=true'
then
{
"error": "IndexMissingException[[tasks] missing]",
"status": 404
}
Please guide me how to indexing couchdb using elasticsearch.
I'm not sure where es_test_db2 is coming from. What's the output of this?
curl 10.50.10.86:9200/_river/tasks/_status\?pretty=1