Problems with elasticsearch where '*' is the field - python-3.x

So, I should prefix this by saying that I understand * is a special character that should be escaped for elasticsearch queries. Here's the setup and trouble I'm facing. The basic problem boils down to that I'm unable to search fields containing only '*'.
curl -XPUT 'http://localhost:9200/test_index/test_item/1' -d '{
"some_text" : "*"
}'
curl -XPUT 'http://localhost:9200/test_index/test_item/2' -d '{
"some_text" : "1+*"
}'
curl -XPUT 'http://localhost:9200/test_index/test_item/3' -d '{
"some_text" : "asterisk"
}'
curl -XGET 'http://localhost:9200/test_index/_search?q=some_text:*'
Results:
"hits":{"total":2,"max_score":1.0,"hits":[
"_source":{"some_text" : "1+*"},
"_source":{"some_text" : "asterisk"}
]
curl -XGET 'http://localhost:9200/test_index/_search?q=some_text:\*'
Results:
"hits":{"total":0,"max_score":null,"hits":[]}
Using python elasticsearch:
>>>from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"*"}}})
No hits
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"asterisk"}}})
One hit('asterisk')
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"\*"}}})
No hits
Using pyelasticsearch
>>>es.search('some_text:*', index='test_index')
2 hits, '1+*' and 'asterisk'
>>>es.search('some_text:\*', index='test_index')
No hits
How can I get the first item to show up in a search? Despite the inconsistencies between the various search methods, all of them seem to agree that I'm not allowed to get '*' back, but why? Also, escaping * seems to make the problem worse, which is kind of unusual. (I assume there is some autoescaping in the libraries perhaps, but that doesn't really explain the direct ES query).
Edit: I should mention that it is definitely indexed.
>>>es.get('test_index', 'test_item', 1)
{'_index': 'test_index', '_version': 1, '_id': '1', 'found': True, '_type': 'test_item', '_source': {'some_text': '*'}}
It may be possible that it's stored, though, which is a special thing for elasticsearch as far as I know?
Edit2:
ElasticSearch docs that talk about escaping some

Ended up solving this by changing the analyzer to a whitespace analyzer. (It was a lucene issue, not elasticsearch, which was why it was tough to find!)

Related

cURL command works on linux but not windows

Good Day All,
I've got a cURL command here
curl -X POST https://*.com/graph/api/v1/graphql -H 'content-type:application/json' -H 'Authorization:bearer *token*' -d '{"query": "{ assets(query: { searchTerm: \"hello-world-sys-app-v1\" ,type: \"app\"}) { groupId, assetId, version, type } }"}'
This runs as expected on linux and returns like this
{"data":{"assets":[{"groupId":"*","assetId":"hello-world-sys-app-v1","version":"1.0.15-D160-B167-PERF","type":"app"},{"groupId":"*","assetId":"hello-world-sys-app-v1","version":"1.0.15-D157-B115-DEV1","type":"app"}]}}
However when running on windows I get an error like this
The request's Content-Type is not supported. Expected:
application/jsoncurl: (6) Could not resolve host: application
curl: (6) Could not resolve host: bearer
curl: (6) Could not resolve host: dc411cee-c007-4955-ab45-87522b2713ad'
curl: (3) [globbing] nested brace in column 17
I've tried a few solutions i've read on here without luck
https://superuser.com/questions/1291352/curl-command-runs-in-linux-but-not-windows-2008
I've tried a few different combinations of single quotes and double quotes but just getting errors, I've tried using a data file too, but i get similar errors. I have a feeling it is a simple issue, but I can't seem to figure out the syntax.
Any help or ideas would be greatly appreciated!
Thank you
I've suffered the same pain a few days ago, and on Windows the -d must be between double quotes so you've to escape those inside. Something like this (I didn't test it):
-d "{\"query\": \"{ assets(query: { searchTerm: \"hello-world-sys-app-v1\" ,type: \"app\"}) { groupId, assetId, version, type } }\"}"
Om Windows you have Invoke-RestMethod in PowerShell which seems (and only seems) is more accurate. More info here: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-restmethod?view=powershell-7.1
If you are using something like Git Bash on Windows you may be hitting the problem with forward slash translations.
See https://github.com/bmatzelle/gow/issues/196
I've got it working now on windows, Although I still don't know exactly why its working but I first replaced all the single quotes with double quotes and was still getting errors. but after adding three backslashes I'm able to get a proper response.
curl -X POST "https://*.com/graph/api/v1/graphql" -H "content-type:application/json" -H "Authorization:bearer *" -d "{\"query\": \"{ assets(query: { searchTerm: \\\"hello-world-sys-app-v1\\\" ,type: \\\"app\\\"}) { groupId, assetId, version, type } }\"}"

How to use quotes in environment variables expansion

In How can I use environment variables in body of a curl PUT request?, I was given the great advise to always use " using doing environment variables.
Say I want do following query:
curl -XPUT http://"${HOST}"/create/"${USER}" -d'{"user":"'"${USER}"'"}'
I enclosed ${USER} between " to ensure that spaces in the user name are possible. I did the same for ${HOST}, although that was strictly not required, since hostnames cannot contain spaces as far as I know.
I am wondering if the following request is equal to the previous request:
curl -XPUT "http://${HOST}/create/${USER}" -d'{"user":"'"${USER}"'"}'
Are they equal? Which one is preferred/most standard?
Yes, they are equal.
I'd prefer
curl -XPUT "http://${HOST}/create/${USER}" -d"{\"user\":\"${USER}\"}"
first because:
it is shorter as #Ryan said in comment
second literal is more readable when in one chunk rather than concatenatig two styles of quotes
some editors will highlight them in more readable way (for example vim )
As you've seen, dealing with quoting conventions in Bash when you have arbitrary data is difficult. However, there's a third way of quoting in cases like this than can make life a lot easier: "here documents".
Using <<TOKEN in a shell command indicates that the lines after the command will be read as the standard input to the command, terminated with TOKEN. Within the here document the usual quoting characters lose their special meaning and are interpreted literally, but variable substitution still happens normally.
To demonstrate, start a netcat "server" to display requests in one terminal with
nc -kl localhost 8888
Now, in another terminal, run this shell script:
name="Alice's Restaurant"
password="quote is ' and doublequote is \\\"."
curl -XPUT http://localhost:8888/create/user --data-binary #- <<EOF
{
"name": "$name",
"password": "$password",
"created": "$(date --iso-8601)"
}
EOF
When a --data argument is given # that requests that curl read the data from the filename specified immediately after the #, and using - as the filename reads from stdin.
Note that here I use --data-binary to make the server's output easier to understand; in production use you'd want to use --data-urlencode or, if the server accepts data in another format, ensure you're setting the Content-type header to that format rather than leaving it at the default application/x-www-form-urlencoded.
When you run the above, you'll see the following in your netcat terminal:
PUT /create/user HTTP/1.1
Host: localhost:8888
User-Agent: curl/7.52.1
Accept: */*
Content-Length: 112
Content-Type: application/x-www-form-urlencoded
{
"name": "Alice's Restaurant",
"password": "quote is ' and doublequote is \".",
"created": "2018-02-20"
}
As you can see, normal quoting characters are not treated specially, you need not do any special quoting on individual shell variables that get expanded within the here document, and you can even use $() to run shell commands whose output will be substituted within the document.
(By the way, I specified the double quote within the password variable as \\\", setting it to \" in the variable after shell interpolation of a double-quoted string, because that's necessary to produce valid JSON. Oh, you can never escape the quoting issues.)

youtube api v3 search through bash and curl

I'm having a problem with the YouTube API. I am trying to make a bash application that will make watching YouTube videos easy on command line in Linux. I'm trying to take some video search results through cURL, but it returns an error: curl: (16) HTTP/2 stream 1 was not closed cleanly: error_code = 1
the cURL command that I use is:
curl "https://ww.googleapis.com/youtube/v3/search" -d part="snippet" -d q="kde" -d key="~~~~~~~~~~~~~~~~"
And of course I add my YouTube data API key where the ~~~~~~~~ are.
What am I doing wrong?
How can I make it work and return the search attributes?
I can see two things that are incorrect in your request:
First, you mistyped "www" and said "ww". That is not a valid URL
Then, curl's "-d" options are for POSTing only, not GETting ,at least not by default. You have two options:
Add the -G switch to url, which lets curl re-interpret -d options as query options:
curl -G https://www.googleapis.com/youtube/v3/search -d part="snippet" -d q="kde" -d key="xxxx"
Rework your url to a typical GET request:
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=kde&key=XX"
As a tip, using bash to interpret the resulting json might not be the best way to go. You might want to look into using python, javascript, etc. to run your query and interpret the resulting json.

How do I check for duplicate data on ElasticSearch?

When storing some documents, it should store the nonexistent and ignore the rest (should this be done at application level, maybe checking if document's id already exists, etc.?)
Here is what is stated in documentation:
Operation Type
The index operation also accepts an op_type that can be used to force a create operation, allowing for “put-if-absent” behavior. When create is used, the index operation will fail if a document by that id already exists in the index.
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'

How do I POST LF with curl command line tool?

I'm trying to POST to the HTTP gateway of an SMS provider (Sybase 365) using CURL from a Linux shell script.
I need to pass the following data (note the [ ] and LF characters)
[MSISDN]
List=+12345678
[MESSAGE]
Text=Hello
[END]
If I submit a file using the -F parameter, CURL removes the LF e.g.
curl -F #myfile "http://www.sybase.com/..."
results in this at the server (which is rejected)
[MSISDN]List=+12345678[MESSAGE]Text=Hello[END]
Is there anything I can do to avoid this or do I need an alternative tool?
I'm using a file containing my data for testing but I'd like to avoid that in practice and POST directly from the script.
Try using --data-binary instead of -d(ata-ascii).
From the manual:
--data-binary (HTTP) This posts data in a similar manner as --data-ascii does, although when using this option the entire context of the posted data is kept as-is.
If you want to post a binary file without the strip-newlines feature of the --data-ascii option, this is for you. If this option is used several times, the ones following the first will append data.
ETA: oops, I should read the question more closely. You're using -F, not -d. But --data-binary may be still be worth a shot.
Probably a silly thought, but I don't suppose it actually requires CRLF instead of just LF?
Alternatively, have you tried using the --data-binary option instead of -F?
I've got this working using -d
request=`printf "[MSISDN]\nList=$number\n[MESSAGE]\nText=$message\n[END]\n"`
response=`curl -s -u $username:$password -d "$request" http://www.sybase.com/...`
Curiously, if I use -d #myfile (where myfile contains LF separated text), it doesn't work.
I also tried --data-binary without success.
curl "url" --data-binary #myfile
posts new lines in the data [tested on curl 7.12.1]

Resources