slice on dict field containing list does'nt work - mongoengine

pymongo: 3.12.0
mongoengine: 0.23.1
I have a document:
class Logs(Document):
reference_id = StringField(default=None)
data = DictField(default=None)
In data field, i have a list failed_stories. This can have hundreds of elements and I want to perform pagination on it. So, i write this query as:
start_idx = 0
page_size = 10
reference_id = 'asdfg345678'
Logs.objects(reference_id=reference_id).fields(slice__data__failed_stories=[start_idx, page_size])
With this, i get one document in which all field are None except the dociment id (_id).
The following query results in document with correct data in document fields.
Logs.objects(reference_id=reference_id).get()
Is there any issue with the way I am writing this?
Note: I would like to do this with mongoengine only, if possible.

Related

RedisJSON and Python3: JSON.get and ft('index').search(Query('#orig_ip:{192\.168\.210\.27}')) returning no results (matching entry in redis)

I am new to redis and created an index and am attempting to ingest Zeek Logging Data, create an index for multiple fields, and then search fields in that index. For the life of me, I cannot get any values to return when searching for the #orig_ip name or using JSON.GET to retried any id.* related fields.
UPDATE: I figured this out after more troubleshooting and am updating here to help anyone else struggling with this problem.
Here is my WRONG code for creating the index:
# Options for index creation
index_def = IndexDefinition(
index_type=IndexType.JSON,
prefix = ['uid:'],
score = 0.5,
score_field = 'doc_score'
)
# Schema definition
schema = (
TagField('$.orig_l2_addr', as_name='orig_mac'),
TagField('$.id.orig_h', as_name='orig_ip'), #Wrong field path
TagField('$.id.resp_h', as_name='resp_ip'), #Wrong field path
NumericField('$.orig_bytes', as_name='orig_bytes'),
NumericField('$.resp_bytes', as_name='resp_bytes'),
NumericField('$.ts', as_name='timestamp')
)
r.ft('py_conn_idx').create_index(schema, definition = index_def)
Here is the result I kept getting with the above WRONG schema (no results)
search_result4 = r.ft('py_conn_idx').search(Query('#orig_ip:{192\.168\.210\.27}'))
Results for "#orig_ip:{192\.168\.210\.27}":
0
UPDATE: Working schema definition:
So it turns out even though Zeek is only using the . in field names vice using it to create an object, but the . in the field names was the culprit in my query failures. I needed to access the fields for the index as follows:
# Schema definition
schema = (
TagField('$.orig_l2_addr', as_name='orig_mac'),
TagField('$.["id.orig_h"]', as_name='orig_ip'), #Fixed field reference
TagField('$.["id.resp_h"]', as_name='resp_ip'), #Fixed field reference
NumericField('$.orig_bytes', as_name='orig_bytes'),
NumericField('$.resp_bytes', as_name='resp_bytes'),
NumericField('$.ts', as_name='timestamp')
)
After recreating the index with this schema, I get results with my query:
Results for "#orig_ip:{192\.168\.210\.27}":
Document {'id': 'uid:CPvYfTI4Zb1Afp2l5',....
Thanks to this stackoverflow question for finally walking me to the cause of my troubles: How to get objects value if its name contains dots?
Putting this answer here so this question gets marked as having one. See the updated question/code above!

How to use Django iterator with value list?

I have Profile table with a huge number of rows. I was trying to filter out profiles based on super_category and account_id (these are the fields in the model Profile).
Assume I have a list of ids in the form of bulk_account_ids and super_categories
list_of_ids = Profile.objects.filter(account_id__in=bulk_account_ids, super_category__in=super_categories).values_list('id', flat=True))
list_of_ids = list(list_of_ids)
SomeTask.delay(ids=list_of_ids)
This particular query is timing out while it gets evaluated in the second line.
Can I use .iterator() at the end of the query to optimize this?
i.e list(list_of_ids.iterator()), if not what else I can do?

How to convert a Hit into a Document with elasticsearch-dsl?

Consider the following mapping for a document in ES.
class MyDoc(elasticseach_dsl.Document):
id_info = Object(IdInfo)
class IdInfo(elasticseach_dsl.InnerDoc):
id = Keyword()
type = Keyword()
Using elasticsearch-dsl, there are 2 ways of retrieving a document (that I am interested in):
Using MyDoc.search().query().execute(), that yields Hit objects
Using MyDoc.get(), that yields a MyDoc object
Here is the issue I am experiencing:
When I retrieve the same document from ES, and that document is missing, for example, the type field, I get different behaviours:
When using search(): doc being a Hit object, accessing doc.type raises a KeyError
When using get(): doc being a MyDoc object, accessing doc.type simply returns None
To workaround this discrepancy, I would like to convert a Hit instance to a MyDoc instance, so that I can always use the doc.type syntax without any errors being raised.
How can I do that?
Alternatively, is there a way that I could access Hit instances with the same behaviour as MyDoc instances?
dict_hit = hit.to_dict()
doc = YourDocument(**dict_hit)
doc.property1 # you can access the property here
I know it is a bit awkward and annoying, it used to work with versions below 6.
I found a workaround, if you take the dictionary coming out from elasticsearch response you can then ask the document class to interpret it like the following.
query = MyDoc.search()
response = query.execute()
my_doc = MyDoc.from_es(response.hits.hits[0])
We were facing this situation. In our case, is was due to the index name in the Index subclass to configure Document indices. Our model looked more or les like this:
class MyDoc(Document):
my_field = Keyword()
class Index:
name = "my-doc-v1-*"
This way, when querying for documents in indexes that match that name (for example "my-doc-v1-2022-07"), hits are automatically instantianted as MyDoc objects.
Now we have started to generate 'v2' indices, named like "my-doc-v2--000001", and then hits were not being populated as MyDoc objects.
For that to happen, we had to change Index.name to my-doc-*. That way, documents from both 'v1' and 'v2' indices are always populated automatically by the library, since they match the Index.name expression.

How to get many to many values and store in an array or list in python +django

Ok
i have this class in my model :
i want to get the agencys value which is a many to many on this class and store them in a list or array . Agency which store agency_id with the id of my class on a seprate table.
Agency has it's own tabel as well
class GPSpecial(BaseModel):
hotel = models.ForeignKey('Hotel')
rooms = models.ManyToManyField('Room')
agencys = models.ManyToManyField('Agency')
You can make it a bit more compact by using the flat=True parameter:
agencys_spe = list(GPSpecial.objects.values_list('agencys', flat=True))
The list(..) part is not necessary: without it, you have a QuerySet that contains the ids, and the query is postponed. By using list(..) we force the data into a list (and the query is executed).
It is possible that multiple GPSpecial objects have a common Agency, in that case it will be repeated. We can use the .distinct() function to prevent that:
agencys_spe = list(GPSpecial.objects.values_list('agencys', flat=True).distinct())
If you are however interested in the Agency objects, for example of GPSpecials that satisfy a certain predicate, you better query the Agency objects directly, like for example:
agencies = Agency.objects.filter(gpspecial__is_active=True).distinct()
will produce all Agency objects for which a GPSpecial object exists where is_active is set to True.
I think i found the answer to my question:
agencys_sp = GPSpecial.objects.filter(agencys=32,is_active=True).values_list('agencys')
agencys_spe = [i[0] for i in agencys_sp]

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?
After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

Resources