I have a json file which looks as follows:
{
"pk": 1,
"model": "model.Model",
"fields": {
"data1": "example",
"data2": "example"
}
},
{
"pk": 1,
"model": "model.Model",
"fields": {
"data1": "example",
"data2": "example"
}
},
{
"pk": 1,
"model": "model.Model",
"fields": {
"data1": "example",
"data2": "example"
}
},
etc....
I would like to search and replace "pk": 1 and increment the value. so my file would look like:
{
"pk": 1,
"model": "model.Model",
"fields": {
"data1": "example",
"data2": "example"
}
},
{
"pk": 2,
"model": "model.Model",
"fields": {
"data1": "example",
"data2": "example"
}
},
{
"pk": 3,
"model": "model.Model",
"fields": {
"data1": "example",
"data2": "example"
}
},
So far I have tried:
:let i=1 | g/"pk": 1/s//="pk": .i./ | let i=i+1
to search for the "pk": 1 pattern and replace it using a counter, but I have a syntax error somewhere and am hitting a brick wall.
Any help / suggestions would be very much appreciated thanks.
You could try:
:let i=1 | g/"pk": \zs\d\+/ s//\=i/ | let i+=1
\zs .............. start pattern
To fix your attempt do:
:let i=1 | g/"pk": 1/s//\='"pk": ' .i/ | let i=i+1
As you can see the first mistake was a missing backslash at the substitution portion \=. The second was using a second concatenation at the end . which would make vim try to concatenate the variable with nothing. The third was forgetting to put the "pk": into 'single quotes' otherwise you would ended up with pk unquoted. Remember here we are concatenating a string with a number.
Related
Currently I have a many-to-many relationship between entities. A Transaction contains many persons and a Person contains many Transactions. I'm writing a query to filter out transactions so that the transactions contains the person's name. The problem I'm running into is when I filter based on name, it removes the other Person from the transaction.
Example:
{
"id": 1,
"txnDate": "2021-03-09T12:40:26.000Z",
"persons": [
{
"id": 1,
"name": "Bill"
},
{
"id": 2,
"name": "Jen"
}
]
},
{
"id": 2,
"txnDate": "2021-03-09T12:40:26.000Z",
"persons": [
{
"id": 2,
"name": "Jen"
},
{
"id": 3,
"name": "Bob"
}
]
},
My current query is:
query.leftJoinAndSelect('transaction.persons', 'persons')
.andWhere('persons.name = :name', { name });
where name is a string. If I set name= Bill it results in:
{
"id": 1,
"txnDate": "2021-03-09T12:40:26.000Z",
"persons": [
{
"id": 1,
"name": "Bill"
},
]
},
However, I want it so that when I query by name it would result in:
{
"id": 1,
"txnDate": "2021-03-09T12:40:26.000Z",
"persons": [
{
"id": 1,
"name": "Bill"
},
{
"id": 2,
"name": "Jen"
}
]
},
I'm really stuck and I would love any ideas on how to solve this
Current Stack: Nodejs, TypeORM, TypeScript
{
"_id": "5e28b029a0c8263a8a56980a",
"name": "Recruiter",
"data": [
{
"_id": "5e28b0980f89ba3c0782828f",
"targetLink": "https://www.linkedin.com/in/dan-kelsall-7aa0926b/",
"name": "Dan Kelsall",
"headline": "Content Marketing & Copywriting",
"actions": [
{
"result": 1,
"name": "VISIT"
},
{
"result": 1,
"name": "FOLLOW"
}
]
},
{
"_id": "5e28b0980f89ba3c078283426f",
"targetLink": "https://www.linkedin.com/in/56wergwer/",
"name": "56wergwer",
"headline": "asdgawehethre",
"actions": [
{
"result": 1,
"name": "VISIT"
}
]
}
]
}
Here is one of my mongodb document. I'd like to update data->actions->result
So this is what I've done
Campaign.updateOne({
'data.targetLink': "https://www.linkedin.com/in/dan-kelsall-7aa0926b/",
'data.actions.name': "Follow"
}, {$set: {'data.$.actions.result': 0}})
But it seems not updating anything and even it can't find the document by this 'data.actions.name'
You need the positional filtered operator since the regular positional operator ($) can only be used for one level of nested arrays:
Campaign.updateOne(
{ "_id": "5e28b029a0c8263a8a56980a", "data.targetLink": "https://www.linkedin.com/in/dan-kelsall-7aa0926b/" },
{ $set: { "data.$.actions.$[action].result": 0 } },
{ arrayFilters: [ { "action.name": "Follow" } ] }
)
My document in DocumentDb looks like this:
{
"id": 123,
"timers":
{
"projectTimer":
{
"id": 234,
"name": "My Project",
"startTime": "10:35 AM"
},
"taskTimer":
{
"id": 789,
"name": "My Task",
"startTime": "10:45 AM"
}
}
}
The key points here are:
"timers" is an object -- NOT an array
The sub-objects are also set i.e. "projectTimer" and "taskTimer"
If I set my SELECT statement to the following, it works by giving me both projectTimer and taskTimer sub-objects
SELECT c.timers
FROM Collection c
WHERE c.id = 123
But the following returns nothing. I don't understand why because it seems like a really simple JOIN:
SELECT t.projectTimer
FROM Collection c
JOIN t IN c.timers
WHERE c.id = 123
Any idea where I'm making a mistake?
The issue is that you're trying to do a JOIN on something that's not an array.
If, instead, you reworked your document slightly:
{
"id": "123",
"timers": [
{
"projectTimer": {
"id": 234,
"name": "My Project",
"startTime": "10:35 AM"
}
},
{
"taskTimer": {
"id": 789,
"name": "My Task",
"startTime": "10:45 AM"
}
}
],
}
You'd then be able to do a JOIN like:
select value t
from collection c
join t in c.timers
where c.id = "123"
Which would return each of the timers in the array:
[
{
"projectTimer": {
"id": 234,
"name": "My Project",
"startTime": "10:35 AM"
}
},
{
"taskTimer": {
"id": 789,
"name": "My Task",
"startTime": "10:45 AM"
}
}
]
Note the use of VALUE in the query, to strip away the containing t variable.
Background: I've implemented a partial search on a name field by indexing the tokenized name (name field) as well as a trigram analyzed name (ngram field).
I've boosted the name field to have exact token matches bubble up to the top of the results.
Problem: I am trying to implement a query that limits the nGram matches to ones that only match some threshold (say 80%) of the query string. I understand that minimum_should_match seems to be what I am looking for, but my problem is forming the query to actually produce those results.
My exact token matches are boosted to the top but I still get every document that has a single matching trigram in the ngram field.
GIST: Index settings and mapping
Index Settings
{
"my_index": {
"settings": {
"index": {
"number_of_shards": "5",
"max_result_window": "30000",
"creation_date": "1475853851937",
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "3"
}
},
"analyzer": {
"ngram_analyzer": {
"filter": [
"lowercase",
"ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "AuCjcP5sSb-m59bYrprFcw",
"version": {
"created": "2030599"
}
}
}
}
}
Index Mappings
{
"my_index": {
"mappings": {
"my_type": {
"properties": {
"acw": {
"type": "integer"
},
"pcg": {
"type": "integer"
},
"date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"dob": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"id": {
"type": "string"
},
"name": {
"type": "string",
"boost": 10
},
"ngram": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"bdk": {
"type": "integer"
},
"mmw": {
"type": "integer"
},
"mpi": {
"type": "integer"
},
"sex": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Solution Attempts
[GIST: Query Attempts] unlinkifying due to 2 link limit :(
(https://gist.github.com/jordancardwell/2e690013666e7e1da6ef1acee314b4e6)
I tried a multi-match query, which gives me correct search results, but I haven't had luck omitting results for names that only match a single trigram (say "odo" trigram inside "theodophilus")
//this matches 'frodo' and sends results to the top, since `name` field is boosted
// but also matches 'theodore' and 'rodolpho'
{
"size":100,
"from":0,
"query":{
"multi_match":{
"query":"frodo",
"fields":[
"name",
"ngram"
],
"type":"best_fields"
}
}
}
.
//I then tried to throw in the `minimum_must_match` option
// hoping it would filter out large strings that only had one matching trigram for instance
{
"size":100,
"from":0,
"query":{
"multi_match":{
"query":"frodo",
"fields":[
"name",
"ngram"
],
"type":"best_fields",
"minimum_should_match": "90%",
}
}
}
I've tried playing around in sense, to manually produce the match queries that this produces to allow me to only apply minimum_must_match to the ngram field but can't seem to get the syntax right.
// I then tried to contruct a custom query to just return the `minimum_should_match`d results on the ngram field
// I started with a query produced by using bodybuilder to `and` and `or` my other search criteria together
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
//each separate field's criteria `must`/`and`ed together
{
"query": {
"bool": {
"filter": {
"bool": {
"should": [
//each critereon for a specific field `should`/`or`ed together
{
//my attempt at getting `ngram` field results..
// should theoretically only return when field
// contains nothing but matching ngrams
// (i.e. exact matches and other fluke matches)
"query": {
"match": {
"ngram": {
"query": "frodo",
"minimum_should_match": "100%"
}
}
}
}
//... other critereon to be `should`/`or`ed together
]
}
}
}
}
}
//... other criteria to be `must`/`and`ed together
]
}
}
}
}
}
Can anyone see what I'm doing wrong?
It seems like this should be fairly straightforward to accomplish, but I must be missing something obvious.
UPDATE
I ran a query with _explain=true (using sense UI) to try to understand my results.
I queried for a match on the ngram field for "frod" with minimum_should_match = 100%, yet I still get every record that matches at least one ngram.
(e.g. rodolpho even though it doesn't contain fro)
GIST: test query and results
note: cross-posted from [discuss.elastic.co]
will make a link later, can't post more than 2 yet : /
(https://discuss.elastic.co/t/ngram-partial-match-limiting-ngram-results-in-multiple-field-query/62526)
I used your settings and mappings to create an index. And you queries seem to be working fine for me. I would suggest doing an explain on one of the "unexpected" documents which is being returned and see why it is being matched and returned with other results.
Here is what I did:
Run the analyze api on your analyzer to see how the query will be split into tokens.
curl -XGET 'localhost:9200/my_index/_analyze' -d '
{
"analyzer" : "ngram_analyzer",
"text" : "frodo"
}'
frodo will be split into 3 tokens with your analyzer.
{
"tokens": [
{
"token": "fro",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "rod",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "odo",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
I indexed 3 documents for testing (only used ngrams field) . Here are the docs:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 1,
"_source": {
"ngram": "theodore"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 1,
"_source": {
"ngram": "frodo"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "3",
"_score": 1,
"_source": {
"ngram": "rudolpho"
}
}
]
}
}
The first query you mentioned, it matches frodo and theodore, but not rudolpho like you mentioned - which makes sense, since rudolpho does not produce any trigrams which match trigrams from frodo
frodo -> fro, rod, odo
rudolpho -> rud, udo, dol, olp, lph, pho
Using your second query, I get back only frodo (None of the other two) .
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.53148466,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.53148466,
"_source": {
"ngram": "frodo"
}
}
]
}
}
I then ran an explain (localhost:9200/my_index/my_type/2/_explain) on other two docs (theodore and rudolpho) and I see this (I have clipped the response)
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"matched": false,
"explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause ((ngram:fro ngram:rod ngram:odo)~2)",
"details": [
The above is expected since atleast two out of three tokens from frodo should match.
I have a document in the form of:
curl -XPOST localhost:9200/books/book/1 -d '{
"user_id": 1,
"pages": [ {"page_id": 1, "count": 1}, {"page_id": 2, "count": 3}]
}
Now lets say the user reads page 1 again, so I want to increment the count. The document should become:
{
"user_id": 1,
"pages": [ {"page_id": 1, "count": 2}, {"page_id": 2, "count": 3}]
}
But how do you do this update of an element of a list using an if variable?
An example of a simple update in Elasticsearch is as follows:
curl -XPOST localhost:9200/books/book/2 -d '{
"user_id": 1,
"pages": {
"page_1": 1,
"page_2": 2
}
}'
curl -XPOST localhost:9200/books/book/2/_update -d '
{
"script": "ctx._source.pages.page_1+=1"
}'
The document now becomes:
{
"user_id": 1,
"pages": {
"page_1": 1,
"page_2": 2
}
However this more simple format of a doc looses stating the page_id as a field, so the id itself acts as the field. Similarly the value associated to the field has no real definition. Thus this isn't a great solution.
Anyway, would be great to have any ideas on how to update the array accordingly or any ideas on structuring of the data.
Note: Using ES 1.4.4, You also need to add script.disable_dynamic: false to your elasticsearch.yml file.
Assuming I'm understanding your problem correctly, I would probably use a parent/child relationship.
To test it, I set up an index with a "user" parent and "page" child, as follows:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"user": {
"_id": {
"path": "user_id"
},
"properties": {
"user_id": {
"type": "integer"
}
}
},
"page": {
"_parent": {
"type": "user"
},
"_id": {
"path": "page_id"
},
"properties": {
"page_id": {
"type": "integer"
},
"count": {
"type": "integer"
}
}
}
}
}
(I used the "path" parameter in the "_id"s because it makes the indexing less redundant; the ES docs say that path is deprecated in ES 1.5, but they don't say what it's being replaced with.)
Then indexed a few docs:
POST /test_index/_bulk
{"index":{"_type":"user"}}
{"user_id":1}
{"index":{"_type":"page","_parent":1}}
{"page_id":1,"count":1}
{"index":{"_type":"page","_parent":1}}
{"page_id":2,"count":1}
Now I can use a scripted partial update to increment the "count" field of a page. Because of the parent/child relationship, I have to use the parent parameter to tell ES how to route the request.
POST /test_index/page/2/_update?parent=1
{
"script": "ctx._source.count+=1"
}
Now if I search for that document, I will see that it was updated as expected:
POST /test_index/page/_search
{
"query": {
"term": {
"page_id": {
"value": "2"
}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "page",
"_id": "2",
"_score": 1,
"_source": {
"page_id": 2,
"count": 2
}
}
]
}
}
Here is the code all in one place:
http://sense.qbox.io/gist/9c977f15b514ec251aef8e84e9510d3de43aef8a