I want to paginate the results of an ldap query so that I get 50 users per query for each page. The documentation here http://ldap3.readthedocs.io/searches.html?highlight=generator suggests that using a generator is the simplest way to do this, however it doesn't provide any detail as to how to use it to achieve pagination. When I iterate through the generator object, it prints out every user entry, even though I specified 'paged_size=5' in my search query. Can anyone explain to me what's going on here? Thanks!!
Try setting the paged_criticality parameter to True. It could be that the server is not capable of performing a paged search. If that's the case and paged_criticality is True the search will fail rather than returning all users.
This is a similar system that I used:
# Set up your ldap connection
conn = Connection(*args)
# create a generator
entry_generator = conn.extend.standard.paged_search(
search_base=self.dc, search_filter=query[0],
search_scope=SUBTREE, attributes=self.user_attributes,
paged_size=1, generator=True)
# Then get your results:
results = []
for entry in entry_generator:
total_entries += 1
results.append(entry)
if total_entries % 50 == 0:
# do something with results
Otherwise try setting the page_size to 50 and get results like that.
Related
I am trying to check for the existence of a UUID as a primary key in my Django environment...and when it exists...my code works fine...But if it's not present I get a "" is not a Valid UUID...
Here's my code....
uuid_exists = Book.objects.filter(id=self.object.author_pk,is_active="True").first()
I've tried other variations of this with .exists() or .all()...but I keep getting the ['“” is not a valid UUID.'] error.
I did come up with a workaround....
if self.object.author_pk is not '':
book_exists = Book.objects.filter(id=self.object.author_pk,is_active="True").first()
context['author_exists'] = author_exists
Is this the best way to do this? I was hoping to be able to use a straight filter...without clarifying logic....But I've worked all afternoon and can't seem to come up with anything better. Thanks in advance for any feedback or comments.
I've had the same issue and this is what I have:
Wrapping it into try/except (in my case it's a View so it's supposed to return a Response object)
try:
object = Object.objects.get(id=object_id)
except Exception as e:
return Response(data={...}, status=status.HTTP_40...
It gets to the exception (4th line) but somehow sends '~your_id~' is not a valid UUID. text instead of proper data. Which might be enough in some cases.
This seems like an overlook, so might as well get a fix soon. I don't have enough time to investigate deeper, unfortunately.
So the solution I came up with is not ideal either but hopefully is a bit cleaner and faster than what you're using rn.
# Generate a list of possible object IDs (make use of filters in order to reduce the DB load)
possible_ids = [str(id) for id in Object.objects.filter(~ filters here ~).values_list('id', flat=True)]
# Return an error if ID is not valid
if ~your_id~ not in possible_ids:
return Response(data={"error": "Database destruction sequence initialized!"}, status=status.HTTP_401_UNAUTHORIZED)
# Keep working with the object
object = Objects.objects.get(id=object_id)
I am working with Google Cloud Datastore using the latest google.cloud.ndb library
I am trying to implement pagination use Cursor using the following code.
The same is not fetching the data correctly.
[1] To Fetch Data:
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5)
This code works fine and fetches 5 entities from MyModel
I want to implementation pagination that can be integrated with a Web frontend
[2] To Fetch Next Set of Data
from google.cloud.ndb._datastore_query import Cursor
nextpage_value = "2"
nextcursor = Cursor(cursor=nextpage_value.encode()) # Converts to bytes
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor= nextcursor)
[3] To Fetch Previous Set of Data
previouspage_value = "1"
prevcursor = Cursor(cursor=previouspage_value.encode())
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor=prevcursor)
The [2] & [3] sets of code do not fetch paginated data, but returns results same as results of codebase [1].
Please note I'm working with Python 3 and using the
latest "google.cloud.ndb" Client library to interact with Datastore
I have referred to the following link https://github.com/googleapis/python-ndb
I am new to Google Cloud, and appreciate all the help I can get.
Firstly, it seems to me like you are expecting to use the wrong kind of pagination. You are trying to use numeric values, whereas the datastore cursor is providing cursor-based pagination.
Instead of passing in byte-encoded integer values (like 1 or 2), the datastore is expecting tokens that look similar to this: 'CjsSNWoIb3Z5LXRlc3RyKQsSBFVzZXIYgICAgICAgAoMCxIIQ3ljbGVEYXkiCjIwMjAtMTAtMTYMGAAgAA=='
Such a cursor you can obtain from the first call to the fetch_page() method, which returns a tuple:
(results, cursor, more) where results is a list of query results, cursor is a cursor pointing just after the last result returned, and more indicates whether there are (likely) more results after that
Secondly, you should be using fetch_page() instead of fetch_page_async(), since the second method does not return you the cursors you need for pagination. Internally, fetch_page() is calling fetch_page_async() to get your query results.
Thirdly and lastly, I am not entirely sure whether the "previous page" use-case is doable using the datastore-provided pagination. It may be that you need to implement that yourself manually, by storing some of the cursors.
I hope that helps and good luck!
I'm using the simple_salesforce python wrapper for the Salesforce REST API. We have hundreds of thousands of records, and I'd like to split up the pull of the salesforce data so all records are not pulled at the same time.
I've tried passing a query like:
results = salesforce_connection.query_all("SELECT my_field FROM my_model limit 2000 offset 50000")
to see records 50K through 52K but receive an error that offset can only be used for the first 2000 records. How can I use pagination so I don't need to pull all records at once?
Your looking to use salesforce_connection.query(query=SOQL) and then .query_more(nextRecordsUrl, True)
Since .query() only returns 2000 records you need to use .query_more to get the next page of results
From the simple-salesforce docs
SOQL queries are done via:
sf.query("SELECT Id, Email FROM Contact WHERE LastName = 'Jones'")
If, due to an especially large result, Salesforce adds a nextRecordsUrl to your query result, such as "nextRecordsUrl" : "/services/data/v26.0/query/01gD0000002HU6KIAW-2000", you can pull the additional results with either the ID or the full URL (if using the full URL, you must pass ‘True’ as your second argument)
sf.query_more("01gD0000002HU6KIAW-2000")
sf.query_more("/services/data/v26.0/query/01gD0000002HU6KIAW-2000", True)
Here is an example of using this
data = [] # list to hold all the records
SOQL = "SELECT my_field FROM my_model"
results = sf.query(query=SOQL) # api call
## loop through the results and add the records
for rec in results['records']:
rec.pop('attributes', None) # remove extra data
data.append(rec) # add the record to the list
## check the 'done' attrubite in the response to see if there are more records
## While 'done' == False (more records to fetch) get the next page of records
while(results['done'] == False):
## attribute 'nextRecordsUrl' holds the url to the next page of records
results = sf.query_more(results['nextRecordsUrl', True])
## repeat the loop of adding the records
for rec in results['records']:
rec.pop('attributes', None)
data.append(rec)
Looping through the records and using the data
## loop through the records and get their attribute values
for rec in data:
# the attribute name will always be the same as the salesforce api name for that value
print(rec['my_field'])
Like the other answer says though, this can start to use up a lot of resources. But it what you're looking for if want to achieve page nation.
Maybe create a more focused SOQL statement to get only the records needed for your use case at that specific moment.
LIMIT and OFFSET aren't really meant to be used like that, what if somebody inserts or deletes a record on earlier position (not to mention you don't have ORDER BY in there). SF will open a proper cursor for you, use it.
https://pypi.org/project/simple-salesforce/ docs for "Queries" say that you can either call query and then query_more or you can go query_all. query_all will loop and keep calling query_more until you exhaust the cursor - but this can easily eat your RAM.
Alternatively look into the bulk query stuff, there's some magic in the API but I don't know if it fits your use case. It'd be asynchronous calls and might not be implemented in the library. It's called PK Chunking. I wouldn't bother unless you have millions of records.
I have a problem with the GmailApp search(query) function.
According to the manual it uses Gmail queries, so all the arguments should be accessible and return correct results.
var bigThreads = GmailApp.search('in:chats newer_than:1d');
Logger.log(bigThreads.length);
When I take the filter on the eMail-search-field I get 6 line back. In the log I see the count 2.
Does anyone have a idea - where my fault is?
I have a python3 script that attempts to reindex certain documents in an existing ElasticSearch index. I can't update the documents because I'm changing from an autogenerated id to an explicitly assigned id.
I'm currently attempting to do this by deleting existing documents using delete_by_query and then indexing once the delete is complete:
self.elasticsearch.delete_by_query(
index='%s_*' % base_index_name,
doc_type='type_a',
conflicts='proceed',
wait_for_completion=True,
refresh=True,
body={}
)
However, the index is massive, and so the delete can take several hours to finish. I'm currently getting a ReadTimeoutError, which is causing the script to crash:
WARNING:elasticsearch:Connection <Urllib3HttpConnection: X> has failed for 2 times in a row, putting on 120 second timeout.
WARNING:elasticsearch:POST X:9200/base_index_name_*/type_a/_delete_by_query?conflicts=proceed&wait_for_completion=true&refresh=true [status:N/A request:140.117s]
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='X', port=9200): Read timed out. (read timeout=140)
Is my approach correct? If so, how can I make my script wait long enough for the delete_by_query to complete? There are 2 timeout parameters that can be passed to delete_by_query - search_timeout and timeout, but search_timeout defaults to no timeout (which is I think what I want), and timeout doesn't seem to do what I want. Is there some other parameter I can pass to delete_by_query to make it wait as long as it takes for the delete to finish? Or do I need to make my script wait some other way?
Or is there some better way to do this using the ElasticSearch API?
You should set wait_for_completion to False. In this case you'll get task details and will be able to track task progress using corresponding API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api
Just to explain more in the form of codebase explained by Random for the newbee in ES/python like me:
ES = Elasticsearch(['http://localhost:9200'])
query = {'query': {'match_all': dict()}}
task_id = ES.delete_by_query(index='index_name', doc_type='sample_doc', wait_for_completion=False, body=query, ignore=[400, 404])
response_task = ES.tasks.get(task_id) # check if the task is completed
isCompleted = response_task["completed"] # if complete key is true it means task is completed
One can write custom definition to check if the task is completed in some interval using while loop.
I have used python 3.x and ElasticSearch 6.x
You can use the 'request_timeout' global param. This will reset the Connections timeout settings, as mentioned here
For example -
es.delete_by_query(index=<index_name>, body=<query>,request_timeout=300)
Or set it at connection level, for example
es = Elasticsearch(**(get_es_connection_parms()),timeout=60)