How to get Salesforce REST API to paginate? - python-3.x

I'm using the simple_salesforce python wrapper for the Salesforce REST API. We have hundreds of thousands of records, and I'd like to split up the pull of the salesforce data so all records are not pulled at the same time.
I've tried passing a query like:
results = salesforce_connection.query_all("SELECT my_field FROM my_model limit 2000 offset 50000")
to see records 50K through 52K but receive an error that offset can only be used for the first 2000 records. How can I use pagination so I don't need to pull all records at once?

Your looking to use salesforce_connection.query(query=SOQL) and then .query_more(nextRecordsUrl, True)
Since .query() only returns 2000 records you need to use .query_more to get the next page of results
From the simple-salesforce docs
SOQL queries are done via:
sf.query("SELECT Id, Email FROM Contact WHERE LastName = 'Jones'")
If, due to an especially large result, Salesforce adds a nextRecordsUrl to your query result, such as "nextRecordsUrl" : "/services/data/v26.0/query/01gD0000002HU6KIAW-2000", you can pull the additional results with either the ID or the full URL (if using the full URL, you must pass ‘True’ as your second argument)
sf.query_more("01gD0000002HU6KIAW-2000")
sf.query_more("/services/data/v26.0/query/01gD0000002HU6KIAW-2000", True)
Here is an example of using this
data = [] # list to hold all the records
SOQL = "SELECT my_field FROM my_model"
results = sf.query(query=SOQL) # api call
## loop through the results and add the records
for rec in results['records']:
rec.pop('attributes', None) # remove extra data
data.append(rec) # add the record to the list
## check the 'done' attrubite in the response to see if there are more records
## While 'done' == False (more records to fetch) get the next page of records
while(results['done'] == False):
## attribute 'nextRecordsUrl' holds the url to the next page of records
results = sf.query_more(results['nextRecordsUrl', True])
## repeat the loop of adding the records
for rec in results['records']:
rec.pop('attributes', None)
data.append(rec)
Looping through the records and using the data
## loop through the records and get their attribute values
for rec in data:
# the attribute name will always be the same as the salesforce api name for that value
print(rec['my_field'])
Like the other answer says though, this can start to use up a lot of resources. But it what you're looking for if want to achieve page nation.
Maybe create a more focused SOQL statement to get only the records needed for your use case at that specific moment.

LIMIT and OFFSET aren't really meant to be used like that, what if somebody inserts or deletes a record on earlier position (not to mention you don't have ORDER BY in there). SF will open a proper cursor for you, use it.
https://pypi.org/project/simple-salesforce/ docs for "Queries" say that you can either call query and then query_more or you can go query_all. query_all will loop and keep calling query_more until you exhaust the cursor - but this can easily eat your RAM.
Alternatively look into the bulk query stuff, there's some magic in the API but I don't know if it fits your use case. It'd be asynchronous calls and might not be implemented in the library. It's called PK Chunking. I wouldn't bother unless you have millions of records.

Related

Is There A Way To Improve Performance Of Data Dictionary Model To Dict Appraoch?

I am currently getting a bunch of records for formsets in my Django application with the method below...
line_items = BudgetLineItem.objects.filter(budget_pk=dropdown)
line_item_listofdicts = []
for line_item in line_items:
line_item_dict = model_to_dict(line_item)
del line_item_dict['id']
del line_item_dict['budget']
del line_item_dict['archive_budget']
del line_item_dict['new_budget']
del line_item_dict['update_budget']
line_item_listofdicts.append(line_item_dict)
UpdateBudgetLineItemFormSet = inlineformset_factory(UpdateBudget,
UpdateBudgetLineItem,
form=UpdateBudgetLineItemForm,
extra=len(line_item_listofdicts),
can_delete=True,
can_order=True)
The good news is that it works and does what I want it to. However it's super slow. It takes about 13 seconds to render the data back to my app. Not optimal. I've spent the morning trying to do various prefetches and select_relateds but nothing has worked to improve the time it takes to render these fields back to the screen. The fields in question are largely DecimalFields and I've read that they can be a bit slower. I'm trying to use this data as "input" to my formsets in a CreateView. Again it works...but it's slow. Any ideas on how to make this approach more performant?
Thanks in advance for any thoughts.
Instead of retreiving the models and deleting the fields you dont need, you could just query the models specifying only the list of fields you want
using the queryset values(*args) method which you specify the fields you need as str(s)
And it will Automatically return it as a list dictionary with the specified fields
#taking your code for example, it me assume you only need the title and added_date from your model
Note just assuming your BudgetLineItem model has the fields title and added_date then you code could be like
line_items = BudgetLineItem.objects.filter(budget_pk=dropdown).values('title', 'added_on')
UpdateBudgetLineItemFormSet = inlineformset_factory(UpdateBudget,
UpdateBudgetLineItem,
form=UpdateBudgetLineItemForm,
extra=line_items.count(),
can_delete=True,
can_order=True)
From you code you are doing operations that you dont need
-Since it just the len of the items need you dont need to evaluate the query, you could just call the count method of the queryset
-If there some peice od code that still needs to use your line_item_listofdicts variable then you could just replace it with the line_items as it is a list of dictionary containing only the fields you need, instead of converting you model queryset to a list of of model instances and deleting the fields you don't need and again converting it to another list (these operations a expensive)
You could check out the document on value

Google Cloud Datastore Cursor with google.cloud.ndb

I am working with Google Cloud Datastore using the latest google.cloud.ndb library
I am trying to implement pagination use Cursor using the following code.
The same is not fetching the data correctly.
[1] To Fetch Data:
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5)
This code works fine and fetches 5 entities from MyModel
I want to implementation pagination that can be integrated with a Web frontend
[2] To Fetch Next Set of Data
from google.cloud.ndb._datastore_query import Cursor
nextpage_value = "2"
nextcursor = Cursor(cursor=nextpage_value.encode()) # Converts to bytes
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor= nextcursor)
[3] To Fetch Previous Set of Data
previouspage_value = "1"
prevcursor = Cursor(cursor=previouspage_value.encode())
query_01 = MyModel.query()
f = query_01.fetch_page_async(limit=5, start_cursor=prevcursor)
The [2] & [3] sets of code do not fetch paginated data, but returns results same as results of codebase [1].
Please note I'm working with Python 3 and using the
latest "google.cloud.ndb" Client library to interact with Datastore
I have referred to the following link https://github.com/googleapis/python-ndb
I am new to Google Cloud, and appreciate all the help I can get.
Firstly, it seems to me like you are expecting to use the wrong kind of pagination. You are trying to use numeric values, whereas the datastore cursor is providing cursor-based pagination.
Instead of passing in byte-encoded integer values (like 1 or 2), the datastore is expecting tokens that look similar to this: 'CjsSNWoIb3Z5LXRlc3RyKQsSBFVzZXIYgICAgICAgAoMCxIIQ3ljbGVEYXkiCjIwMjAtMTAtMTYMGAAgAA=='
Such a cursor you can obtain from the first call to the fetch_page() method, which returns a tuple:
(results, cursor, more) where results is a list of query results, cursor is a cursor pointing just after the last result returned, and more indicates whether there are (likely) more results after that
Secondly, you should be using fetch_page() instead of fetch_page_async(), since the second method does not return you the cursors you need for pagination. Internally, fetch_page() is calling fetch_page_async() to get your query results.
Thirdly and lastly, I am not entirely sure whether the "previous page" use-case is doable using the datastore-provided pagination. It may be that you need to implement that yourself manually, by storing some of the cursors.
I hope that helps and good luck!

Maximo automatisation script to change statut of workorder

I have created a non-persistent attribute in my WoActivity table named VDS_COMPLETE. it is a bool that get changed by a checkbox in one of my application.
I am trying to make a automatisation script in Python to change the status of every task a work order that have been check when I save the WorkOrder.
I don't know why it isn't working but I'm pretty sure I'm close to the answer...
Do you have an idea why it isn't working? I know that I have code in comments, I have done a few experimentations...
from psdi.mbo import MboConstants
from psdi.server import MXServer
mxServer = MXServer.getMXServer()
userInfo = mxServer.getUserInfo(user)
mboSet = mxServer.getMboSet("WORKORDER")
#where1 = "wonum = :wonum"
#mboSet .setWhere(where1)
#mboSet.reset()
workorderSet = mboSet.getMbo(0).getMboSet("WOACTIVITY", "STATUS NOT IN ('FERME' , 'ANNULE' , 'COMPLETE' , 'ATTDOC')")
#where2 = "STATUS NOT IN ('FERME' , 'ANNULE' , 'COMPLETE' , 'ATTDOC')"
#workorderSet.setWhere(where2)
if workorderSet.count() > 0:
for x in range(0,workorderSet.count()):
if workorderSet.getString("VDS_COMPLETE") == 1:
workorder = workorderSet.getMbo(x)
workorder.changeStatus("COMPLETE",MXServer.getMXServer().getDate(), u"Script d'automatisation", MboConstants.NOACCESSCHECK)
workorderSet.save()
workorderSet.close()
It looks like your two biggest mistakes here are 1. trying to get your boolean field (VDS_COMPLETE) off the set (meaning off of the collection of records, like the whole table) instead of off of the MBO (meaning an actual record, one entry in the table) and 2. getting your set of data fresh from the database (via that MXServer call) which means using the previously saved data instead of getting your data set from the screen where the pending changes have actually been made (and remember that non-persistent fields do not get saved to the database).
There are some other problems with this script too, like your use of "count()" in your for loop (or even more than once at all) which is an expensive operation, and the way you are currently (though this may be a result of your debugging) not filtering the work order set before grabbing the first work order (meaning you get a random work order from the table) and then doing a dynamic relationship off of that record (instead of using a normal relationship or skipping the relationship altogether and using just a "where" clause), even though that relationship likely already exists.
Here is a Stack Overflow describing in more detail about relationships and "where" clauses in Maximo: Describe relationship in maximo 7.5
This question also has some more information about getting data from the screen versus new from the database: Adding a new row to another table using java in Maximo

NodeJS - azure-storage-node- , how to retrieve addition of two columns, and apply filtering condition

Sorry for being newbie for NodeJs and table query, my question's,
How I could create a query using Nodejs pakcage "azure-storage-node", which selects the sum/addition of two coloumns 'start' and 'period' , if the addition is greater than a threshold it will take the whole raw, my tries which didn't work is something like this,
var query = new azure.TableQuery();
total = query.select(['start']) + query.select(['period']);
query.where('total > ?' , 50000);
or may be something like this,
var query = new azure.TableQuery()
.where('start + period gt 50000');
but it throws an error of '+'.
Thanks
What you're trying to accomplish is not possible with Azure Tables at least as of today as Azure Tables has limited querying support and support for computed columns (if I may say so) is not there.
There are two possible solutions:
Have an attribute called total in your entities that will contain the value i.e. start + period. You calculate this value when you're inserting or updating the entity and store it at that time.
Do this filtering on the client side. For this you will need to download all related entities and then apply this filtering on the client side on the data that you fetched.

How to use ldap3 generator for pagination?

I want to paginate the results of an ldap query so that I get 50 users per query for each page. The documentation here http://ldap3.readthedocs.io/searches.html?highlight=generator suggests that using a generator is the simplest way to do this, however it doesn't provide any detail as to how to use it to achieve pagination. When I iterate through the generator object, it prints out every user entry, even though I specified 'paged_size=5' in my search query. Can anyone explain to me what's going on here? Thanks!!
Try setting the paged_criticality parameter to True. It could be that the server is not capable of performing a paged search. If that's the case and paged_criticality is True the search will fail rather than returning all users.
This is a similar system that I used:
# Set up your ldap connection
conn = Connection(*args)
# create a generator
entry_generator = conn.extend.standard.paged_search(
search_base=self.dc, search_filter=query[0],
search_scope=SUBTREE, attributes=self.user_attributes,
paged_size=1, generator=True)
# Then get your results:
results = []
for entry in entry_generator:
total_entries += 1
results.append(entry)
if total_entries % 50 == 0:
# do something with results
Otherwise try setting the page_size to 50 and get results like that.

Resources