I need to get data from API which url based on different ID.
url = "http://123456789/"
id = "jimmy"
I have a list of ID, here are my code
for id in ID:
response = requests.get(url+id)
info = response.json(encoding = "utf-8")
##save info
But I have 400,000 ID and it will take to long to grab all data,
So I want use multiprocess to finish this job.
Cut the ID list into 10 or more small list and run them in the same time.
How can I do that?
Please help, thanks!
You can use Pool.
Cut the list of IDs into many subIds, and different process handle different ids.
https://docs.python.org/2/library/multiprocessing.html
Related
I am working on a Python script which will check in an API request if there is any Property ID. If there is a property ID, then the script needs to open a different URL with the ID at the end of the link.
How can I make a proper loop for that?
Right now I have a script that checks in API request all ID's and stores them in a variable called Property_ID.
Right now there are only 3 ID's there, but there will be more.
So i need to check 3 diferent URL at the moment
https://api.something.com/api/loremipsum**&id=heregoesid**
How can I loop the script as many times as there are IDs stored in the variable?
My code:
JsonUrl = "https://api.something.com/api/loremipsum"
response = urlopen(JsonUrl)
data_json = json.loads(response.read())
for feature in data_json["data"]:
Property_ID = (feature["id"])
print(Property_ID)
I'm using the simple_salesforce python wrapper for the Salesforce REST API. We have hundreds of thousands of records, and I'd like to split up the pull of the salesforce data so all records are not pulled at the same time.
I've tried passing a query like:
results = salesforce_connection.query_all("SELECT my_field FROM my_model limit 2000 offset 50000")
to see records 50K through 52K but receive an error that offset can only be used for the first 2000 records. How can I use pagination so I don't need to pull all records at once?
Your looking to use salesforce_connection.query(query=SOQL) and then .query_more(nextRecordsUrl, True)
Since .query() only returns 2000 records you need to use .query_more to get the next page of results
From the simple-salesforce docs
SOQL queries are done via:
sf.query("SELECT Id, Email FROM Contact WHERE LastName = 'Jones'")
If, due to an especially large result, Salesforce adds a nextRecordsUrl to your query result, such as "nextRecordsUrl" : "/services/data/v26.0/query/01gD0000002HU6KIAW-2000", you can pull the additional results with either the ID or the full URL (if using the full URL, you must pass ‘True’ as your second argument)
sf.query_more("01gD0000002HU6KIAW-2000")
sf.query_more("/services/data/v26.0/query/01gD0000002HU6KIAW-2000", True)
Here is an example of using this
data = [] # list to hold all the records
SOQL = "SELECT my_field FROM my_model"
results = sf.query(query=SOQL) # api call
## loop through the results and add the records
for rec in results['records']:
rec.pop('attributes', None) # remove extra data
data.append(rec) # add the record to the list
## check the 'done' attrubite in the response to see if there are more records
## While 'done' == False (more records to fetch) get the next page of records
while(results['done'] == False):
## attribute 'nextRecordsUrl' holds the url to the next page of records
results = sf.query_more(results['nextRecordsUrl', True])
## repeat the loop of adding the records
for rec in results['records']:
rec.pop('attributes', None)
data.append(rec)
Looping through the records and using the data
## loop through the records and get their attribute values
for rec in data:
# the attribute name will always be the same as the salesforce api name for that value
print(rec['my_field'])
Like the other answer says though, this can start to use up a lot of resources. But it what you're looking for if want to achieve page nation.
Maybe create a more focused SOQL statement to get only the records needed for your use case at that specific moment.
LIMIT and OFFSET aren't really meant to be used like that, what if somebody inserts or deletes a record on earlier position (not to mention you don't have ORDER BY in there). SF will open a proper cursor for you, use it.
https://pypi.org/project/simple-salesforce/ docs for "Queries" say that you can either call query and then query_more or you can go query_all. query_all will loop and keep calling query_more until you exhaust the cursor - but this can easily eat your RAM.
Alternatively look into the bulk query stuff, there's some magic in the API but I don't know if it fits your use case. It'd be asynchronous calls and might not be implemented in the library. It's called PK Chunking. I wouldn't bother unless you have millions of records.
I know how to connect to Infusionsoft with Python 3 and how to process the following simple example:
#Set up the contact data we want to add
contact = {}; #blank dictionary
contact[“FirstName”] = “John”;
contact[“LastName”] = “Doe”;
contact[“Email”] = "john#doe.com";
contact[“Company”] = “ACME”;
But how do I mass update the WHOLE database? e.g. If I want to update ALL The Phone1 fields with an extra bit of code using IF statements.
Using Infusionsoft API you can only update contacts data one by one, sending a separate request per contact. Exact request depends on which type of API you use: REST or XML-RPC
I have created a user scenario in Load Impact to simulate a couple of hundred users in our web store.
The problem is that I can't seem to simulate the users in our Azure Queue.
The queue is only increasing with +1 users and not the hundreds of users as I want :)
I have created a random correlation id, but it seems like the session is still there.
Is there a way to destroy the session so when the script is looping a new session is created?
I found a LUA reference that says destroy:session but it wont work for me.
function rnd()
return math.random(0000, 9999)
end
{"POST", "http://STORE.////",
headers={["Content-Type"]="application/json;charset=UTF-8"},
data="{\"ChoosenPhoneModelId\":0,\"PricePlanId\":\"phone\",\"CorrelationId\":\"e97bdaf6-ed61-4fb3-".. rnd().."-d3bb09789feb\",\"ChoosenPhoneColor\":{\"Color\":1,\"Code\":\"#d0d0d4\",\"Name\":\"Silver\",\"DeliveryTime\":\"1-2 veckor\",\"$$hashKey\":\"005\"},\"ChoosenAmortization\":{\"AmortizationLength\":24,\"Price\":312,\"$$hashKey\":\"00H\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"IsStudent\":false,\"IsSenior\":false,\"Title\":\"Fast \",\"Description\":\"Hello.\",\"MonthlyAmount\":149,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":1,\"$$hashKey\":\"00M\"},\"ChoosenDataPackage\":{\"Description\":\"20
GB\",\"PricePerMountInKr\":149,\"DataAmountInGb\":20,\"$$hashKey\":\"00U\"}}",
auto_decompress=true}
})
Any tips on how to.
Thanks in advance.
The correlation id isn't a random number. It's set by your server in a cookie. Get and use like this:
local response = http.request_batch({
{"GET", "http://store.///step1", auto_decompress=true},
})
-- extract correlation Id
local strCorrelationId = response[1].cookies['corrIdCookie']
{"POST", "http://STORE.////",
headers={["Content-Type"]="application/json;charset=UTF-8"},
data="{\"ChoosenPhoneModelId\":0,\"PricePlanId\":\"phone\",\"CorrelationId\":\"".. strCorrelationId .. "",\"ChoosenPhoneColor\":{\"Color\":1,\"Code\":\"#d0d0d4\",\"Name\":\"Silver\",\"DeliveryTime\":\"1-2 veckor\",\"$$hashKey\":\"005\"},\"ChoosenAmortization\":{\"AmortizationLength\":24,\"Price\":312,\"$$hashKey\":\"00H\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"IsStudent\":false,\"IsSenior\":false,\"Title\":\"Fast \",\"Description\":\"Hello.\",\"MonthlyAmount\":149,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":1,\"$$hashKey\":\"00M\"},\"ChoosenDataPackage\":{\"Description\":\"20
GB\",\"PricePerMountInKr\":149,\"DataAmountInGb\":20,\"$$hashKey\":\"00U\"}}",
auto_decompress=true}
})
That is what makes your user unique. If you set CorrelationId to just any random number your server will simply not accept the session in your queue.
Once it's unique and correct your server will accept the POST properly.
Does getstream provide a way to retrieve the number of activities within a feed? I have a notification feed setup. I can retrieve the activities using paginated get. However, I would like to display the number of items within the feed.
Unfortunately not, there is no API endpoint to retrieve the amount of activities within a feed.
At the moment there's no way to count activities directly. You can try
Use a custom feed and use reactions. So now you call count from reactions.
Other way is store in your app (cache/db/etc).
It worked for me:
signature = Stream::Signer.create_jwt_token('activities', '*', '<your_secret>', '*')
uri = URI("https://us-east-api.stream-io-api.com/api/v1.0/enrich/activities/?api_key=<api_key>&ids=<id1,id2,...>&withReactionCounts=true")
req = Net::HTTP::Get.new(uri)
req['Content-Type'] = "application/json"
req['Stream-Auth-Type'] = "jwt"
req['Authorization'] = signature
res = Net::HTTP.start(uri.hostname, uri.port, :use_ssl => true) {|http|
http.request(req)
}
puts JSON.parse(res.body)
References:
Retrieve
reactions_read-feeds
activities
ruby client
Update 2022
You can only get the number of activities in groups within aggregated or notification feeds. Flat feeds are still not supported.
Source
The best solution is to store important numbers in the database which Stream provides.
Source