Dump series back into InfluxDB after querying with replaced field value - python-3.x

Scenario
I want to send data to an MQTT Broker (Cloud) by querying measurements from InfluxDB.
I have a field in the schema which is called status. It can either be 1 or 0. status=0 indicated that series has not been sent to the cloud. If I get an acknowlegdment from the MQTT Broker then I wish to rewrite the query back into the database with status=1.
As mentioned in FAQs for InfluxDB regarding Duplicate data If the information has the same timestamp as the previous query but with a different field value => then the update field will be shown.
In order to test this I created the following:
CREATE DATABASE dummy
USE dummy
INSERT meas_1, type=t1, status=0,value=123 1536157064275338300
query:
SELECT * FROM meas_1
provides
time status type value
1536157064275338300 0 t1 234
now if I want to overwrite the series I do the following:
INSERT meas_1, type=t1, status=1,value=123 1536157064275338300
which will overwrite the series
time status type value
1536157064275338300 1 t1 234
(Note: this is not possible via Tags currently in InfluxDB)
Usage
Query some information using the client with "status"=0.
Restructure JSON to be sent to the cloud
Send the information to cloud
If successful then write the output from Step 1. back into the DB but with status=1.
I am using the InfluxDBClient Python3 to create the Application (MQTT + InfluxDB)
Within the write_points API there is a parameter which mentions batch_size which require int as input.
I am not sure how can I use this with the Application that I want. Can someone guide me with this or with the Schema of the DB so that I can upload actual and non-redundant information to the cloud ?

The batch_size is actually the length of the list of the measurements that needs to passed to write_points.
Steps
Create client and query from measurement (here, we query gps information)
client = InfluxDBClient(database='dummy')
op = client.query('SELECT * FROM gps WHERE "status"=0', epoch='ns')
Make the ResultSet into a list:
batch = list(op.get_points('gps'))
create an empty list for update
updated_batch = []
parse through each measurement and change the status flag to 1. Note, default values in InfluxDB are float
for each in batch:
new_mes = {
'measurement': 'gps',
'tags': {
'type': 'gps'
},
'time': each['time'],
'fields': {
'lat': float(each['lat']),
'lon': float(each['lon']),
'alt': float(each['alt']),
'status': float(1)
}
}
updated_batch.append(new_mes)
Finally dump the points back via the client with batch_size as the length of the updated_batch
client.write_points(updated_batch, batch_size=len(updated_batch))
This overwrites the series because it contains the same timestamps with status field set to 1

Related

How can we run loop inside feature file to get data from one api and pass in another api [duplicate]

I am using Karate version 0.8.0.1 and I want to perform following steps to test some responses.
I make a Get to web service 1
find the value for currencies from the response of web service 1 using jsonpath: $.currencies
Step 2 gives me following result: ["USD","HKD","SGD","INR","GBP"]
Now I use Get method for web service 2
From the response of web service 2 I want to get the value of price field with json-path something like below(passing the values from step 3 above):
$.holding[?(#.currency=='USD')].price
$.holding[?(#.currency=='HKD')].price
$.holding[?(#.currency=='SGD')].price
$.holding[?(#.currency=='INR')].price
$.holding[?(#.currency=='GBP')].price
So there are so many currencies but I want to verify price for only the currencies returned by web service 1(which will be always random) and pass it on to the the output of web service 2 to get the price.
Once i get the price I will match each price value with the value returned from DB.
I am not sure if there is any simple way in which I can pass the values returned by service 1 into the json-path of service 2 one by one and get the results required. Any suggestions for doing this will be helpful As this will be the case for most of the web services I will be automating.
There are multiple ways to do this in Karate. The below should give you a few pointers. Note how there is a magic variable _$ when you use match each. And since you can reference any other JSON in scope, you have some very powerful options.
* def expected = { HKD: 1, INR: 2, USD: 3}
* def response1 = ['USD', 'HKD', 'INR']
* def response2 = [{ currency: 'INR', price: 2 }, { currency: 'USD', price: 3 }, { currency: 'HKD', price: 1 }]
* match response2[*].currency contains only response1
* match each response2 contains { price: '#(expected[_$.currency])' }
You probably already have seen how you can call a second feature file in a loop which may be needed for your particular use case. One more piece of the puzzle may be this - it is very easy to transform any JSON array into the form Karate expects for calling a feature file in a loop:
* def response = ['USD', 'HKD', 'INR']
* def data = karate.map(response, function(x){ return { code: x } })
* match data == [{code: 'USD'}, {code: 'HKD'}, {code: 'INR'}]
EDIT - there is a short-cut to convert an array of primitives to an array of objects now: https://stackoverflow.com/a/58985917/143475
Also see this answer: https://stackoverflow.com/a/52845718/143475

inserting rows sql syntax error with Python 3.9

I am trying to insert rows into my database. Establishing a connection to the database is successful. When I try to insert my desired rows I get an error in the sql. The error appears to be coming from my variable "network_number". I am running nested for loops to iterate through the network number ranges from 1.1.1 - 254.254.254 and adding each unique IP to the database. The network number is written as a string so should the column for the network number be set to VARCHAR or TEXT to include full stops/period? The desired output is to populate my database table with each network number. You can find the sql query assigned to the variable sql_query.
def populate_ip_table(ip_ranges):
network_numbers = ["", "", ""]
information = "Populating the IP table..."
total_ips = (len(ip_ranges) * 254**2)
complete = 0
for octet_one in ip_ranges:
network_numbers[0] = str(octet_one)
percentage_complete = round(100 / total_ips * complete, 2)
information = f"{percentage_complete}% complete"
output_information(information)
for octet_two in range(1, 254 + 1):
network_numbers[1] = str(octet_two)
for octet_three in range(1, 254 + 1):
network_numbers[2] = str(octet_three)
network_number = ".".join(network_numbers)
complete += 1
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ({network_number}, false, 0)"
execute_sql_statement(sql_query)
information = "100% complete"
output_information(information)
Output
[ * ] Connecting to the PostgreSQL database...
[ * ] Connection successful
[ * ] Executing SQL statement
[ ! ] syntax error at or near ".50"
LINE 1: ...rd (ip, scanned_status, times_scanned) VALUES (1.1.50, false...
^
As stated by the Docs:
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used
Postgresql Docs
I think you need to use VARCHAR, due to the small varying length of your ip-string. while, text is effectively avarchar (no limit), but it may have some problems related to indexing if a record with compressed size of greater than 2712 is tried to be inserted.
Actually your problem is, you need to put an extra single qoutes on network_number. To give you a string when inserting the value in postgresql.
To prove this try insert {network_number} as this:
network_number = "'" + ".".join(network_numbers) + "'"
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ({network_number}, false, 0)"
OR:
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ('{network_number}', false, 0)"
You could also, used inet dataType, which will save you this hassle.
As stated by Docs:
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions.
PostgreSQL: Network Address Types

Elastic Search via python gives wrong count

I’m new to python and I need to get connected to “Kibana” via python. we’re using Kibana 7.4.1. The requirement is to get them just the count (hits).
Due to some restrictions, I need to use Python 3.6 only. I’ve added the “ElasticSearch” & “ElasticSearch-dsl” library.
I’m able to get connected to the Kibana via the client, but I’m getting the wrong hits count.
Code:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import MultiSearch, Search
from elasticsearch_dsl.query import QueryString, Range, SimpleQueryString
es = Elasticsearch(['host2', 'host2'], http_auth=('usr', 'pass'), port=9200)
s = Search(using=es, index='c*')
s.filter(SimpleQueryString(query="tags:prod AND severity:INFO AND service: finder AND msg:* is processed"))
s.filter(Range(** {'#timestamp': {'gte': 'now-5m', 'lt': 'now'}}))
response = s.execute()
print("Got %d Hits:" % response['hits']['total']['value']) # Always coming as 1000 so this is wrong
Can I get some help with this, please?
First of all a little clarification. You are connecting to Elasticsearch and not Kibana (Kibana is a client, like the program you are writing).
You are receiving always 10000 as result, because your index has more than 10000 hits. It is a documented feature. Indeed, since the count computation is expensive in the general case it is performed only when needed. In order to obtain the right number of results you have two possibilities
to set the query parameter track_total_hits to true
use the count API.
track_total_hits
You can add this extra parameter to the search object as reported here as follows:
s = Search(using=es, index='c*')
s = s.extra(track_total_hits=True)
<the-rest of your code>
Count API approach
Instead of invoking the execute() function, you can simply use the count() function:
s = Search(using=es, index='c*')
s.filter(SimpleQueryString(query="tags:prod AND severity:INFO AND service: finder AND msg:* is processed"))
s.filter(Range(** {'#timestamp': {'gte': 'now-5m', 'lt': 'now'}}))
response = s.cpunt()
print("Got %d Hits:" % response)
Kind regards

When working with the Stripe API, is it better to sort each request or store locally and perform queries?

This is my first post, I've been lurking for a while.
Some context to my question;
I'm working with the Stripe API to pull transaction data and match these with booking numbers from another API source. (property reservations --> funds received for reconciliation)
I started by just making calls to the API and sorting the data in place using python 3, however it started to get very complicated and I thought I should persist the data in a mongodb stored on localhost. I began to do this, however I decided that storing the sorted data was still just as complicated and the request times were getting quite long, I thought, maybe I should pull all the stripe data and store it locally and then query whatever I needed.
So here I am, with a bunch of code I've written for both and still not alot of progress. I'm a bit lost with the next move. I feel like I should probably pick a path and stick with it. I'm a little unsure what is the "best practise" when working with API's, usually I would turn to YouTube, but I haven't been able to find a video which covers this specific scenario. The amount of data being pulled from the API would be around 100kb per request.
Here is the original code which would grab each query. Recently I've learnt I can use the expand method (I think this is what it's called) so I don't need to dig down so many levels in my for loop.
The goal was to get just the metadata which contains the booking reference numbers that can then be match against a response from my property management systems API. My code is a bit embarrassing, I've kinda just learnt it over the last little while in my downtime from work.
import csv
import datetime
import os
import pymongo
import stripe
"""
We need to find a Valid reservation_ref or reservation_id in the booking.com Metadata. Then we need to match this to a property ID from our list of properties in the book file.
"""
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
stripe_payouts = mydb["stripe_payouts"]
stripe.api_key = "sk_live_thisismyprivatekey"
r = stripe.Payout.list(limit=4)
payouts = []
for data in r['data']:
if data['status'] == 'paid':
p_id = data['id']
amount = data['amount']
meta = []
txn = stripe.BalanceTransaction.list(payout=p_id)
amount_str = str(amount)
amount_dollar = str(amount / 100)
txn_len = len(txn['data'])
for x in range(txn_len):
if x != 0:
charge = (txn['data'][x]['source'])
if charge.startswith("ch_"):
meta_req = stripe.Charge.retrieve(charge)
meta = list(meta_req['metadata'])
elif charge.startswith("re_"):
meta_req = stripe.Refund.retrieve(charge)
meta = list(meta_req['metadata'])
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": txn['data'],
"metadata": {
charge: [meta]
}
}
)
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
"_id": str(p_id),
"payout": str(p_id),
"transactions": txn['data'],
"metadata": {
charge: [meta]
This last section doesn't work properly, this is kinda where I stopped and starting calling all the data and storing it in mongodb locally.
I appreciate if you've read this wall of text this far.
Thanks
EDIT:
I'm unsure what the best practise is for adding additional information, but I've messed with the code below per the answer given. I'm now getting a "Key error" when trying to insert the entries into the database. I feel like It's duplicating keys somehow.
payouts = []
def add_metadata(payout_id, transaction_type):
transactions = stripe.BalanceTransaction.list(payout=payout_id, type=transaction_type, expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = [transaction.source.metadata]
if stripe_payouts.Collection.count_documents({"_id": payout_id}) == 0:
payouts.append(
{
transaction.id: transaction
}
)
for data in r['data']:
p_id = data['id']
add_metadata(p_id, 'charge')
add_metadata(p_id, 'refund')
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
#print(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")```
To answer your high level question. If you're frequently accessing the same data and that data isn't changing much then it can make sense to try to keep your local copy of the data in sync and make your frequent queries against your local data.
No need to be embarrassed by your code :) we've all been new at something at some point.
Looking at your code I noticed a few things:
Rather than fetch all payouts, then use an if statement to skip all except paid, instead you can pass another filter to only query those paid payouts.
r = stripe.Payout.list(limit=4, status='paid')
You mentioned the expand [B] feature of the API, but didn't use it so I wanted to share how you can do that here with an example. In this case, you're making 1 API call to get the list of payouts, then 1 API call per payout to get the transactions, then 1 API call per charge or refund to get the metadata for charges or metadata for refunds. This results in 1 * (n payouts) * (m charges or refunds) which is a pretty big number. To cut this down, let's pass expand=['data.source'] when fetching transactions which will include all of the metadata about the charge or refund along with the transaction.
transactions = stripe.BalanceTransaction.list(payout=p_id, expand=['data.source'])
Fetching the BalanceTransaction list like this will only work as long as your results fit on one "page" of results. The API returns paginated [A] results, so if you have more than 10 transactions per payout, this will miss some. Instead, you can use an auto-pagination feature of the stripe-python library to iterate over all results from the BalanceTransaction list.
for transaction in transactions.auto_paging_iter():
I'm not quite sure why we're skipping over index 0 with if x != 0: so that may need to be addressed elsewhere :D
I didn't see how or where amount_str or amount_dollar was actually used.
Rather than determining the type of the object by checking the ID prefix like ch_ or re_ you'll want to use the type attribute. Again in this case, it's better to filter by type so that you only get exactly the data you need from the API:
transactions = stripe.BalanceTransaction.list(payout=p_id, type='charge', expand=['data.source'])
I'm unable to test because I lack the same database that you have, but wanted to share a refactoring of your code that you may consider.
r = stripe.Payout.list(limit=4, status='paid')
payouts = []
for data in r['data']:
p_id = data['id']
amount = data['amount']
meta = []
amount_str = str(amount)
amount_dollar = str(amount / 100)
transactions = stripe.BalanceTransaction.list(payout=p_id, type='charge', expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
transactions = stripe.BalanceTransaction.list(payout=p_id, type='refund', expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": p_id}).count() == 0:
payouts.append(
{
"_id": str(p_id),
"payout": str(p_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
Here's a further refactoring using functions defined to encapsulate just the bit adding to the database:
r = stripe.Payout.list(limit=4, status='paid')
payouts = []
def add_metadata(payout_id, transaction_type):
transactions = stripe.BalanceTransaction.list(payout=payout_id, type=transaction_tyep, expand=['data.source'])
for transaction in transactions.auto_paging_iter():
meta = list(transaction.source.metadata)
if stripe_payouts.find({"_id": payout_id}).count() == 0:
payouts.append(
{
"_id": str(payout_id),
"payout": str(payout_id),
"transactions": transactions,
"metadata": {
charge: [meta]
}
}
)
for data in r['data']:
p_id = data['id']
add_metadata('charge')
add_metadata('refund')
# TODO: Add error exception to check for po id already in the database.
if len(payouts) != 0:
x = stripe_payouts.insert_many(payouts)
print("Inserted into Database ", len(x.inserted_ids), x.inserted_ids)
else:
print("No entries made")
[A] https://stripe.com/docs/api/pagination
[B] https://stripe.com/docs/api/expanding_objects

Firebase... Add/Update Firebase Using node.js Script

I have arbitrary JSON that is sensibly laid out like this:
[
{
"id":100,
"name":"Buckeye, AZ",
"status":"OPEN",
"address":{
"street":"416 S Watson RD",
"city":"Buckeye"
...
}
}
]
I've written a node.js script like this for proof of concept (why I'm using node is that the JS API seems better supported than REST or Ruby for this. I could be wrong):
http = require('http')
Firebase = require('firebase')
all_sites_url = "http://supercharge.info/service/supercharge/allSites"
firebase_url = "https://tesla-supercharger.firebaseio.com/"
http.get(all_sites_url, (res) ->
body = ""
res.on "data", (chunk) ->
body += chunk
return
res.on "end", ->
response = JSON.parse(body)
all_sites = response
send_to_firebase(response)
return
return
).on "error", (e) ->
console.log "Got error: ", e
return
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
for charger in response
console.log charger
new_child = firebase_ref.push()
new_child.set {id: charger.id, data: charger}, (error) ->
if error
console.log "Data cound not be saved #{error}"
else
console.log "Data saved successfully"
The result is a unique id generated by Firebase, which has as a child a data and an id child. The data child has the expected information like name, status, etc.
What I'd prefer is to generate a key-value pair. E.g., for an id of 100:
- 100
- name
- address
street
city
etc. So my first question is how to accomplish this or if it is even sensible.
After the first time around, this data (call it the data from an external server) will be there and a mobile app will have added some fields. These are not present in the data already there. Next time I fetch data from the external server, I want to update things that have changed that the server would know about, like status. I don't want to tamper with things that only the mobile devices would know about like remote_observations.
I know I'm seeming a bit dense here, but I'm trying to put together a sensible data model that will be updatable from that server using a CRON job and incrementally updatable from a bunch of mobile devices.
Any help is much appreciated.
UPDATE: I have found that this works for getting the structure I want:
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
for charger in response
firebase_ref.child(charger.id).update charger, (error) ->
if error
console.log "Data could not be saved #{error}"
else
responses_pending += 1
console.log "Data saved successfully : #{responses_pending} pending"
firebase_ref.on 'value', ->
console.log "value received rp=#{responses_pending}"
process.exit() if (responses_pending -= 1) < 1
So the code I settled on is this:
http = require('http')
Firebase = require('firebase')
firebase_url = '/path/to/your/firebase'
# code to get JSON of the form:
{
"id":100,
"name":"Buckeye, AZ",
"status":"OPEN",
"address":{"street":"416 S Watson RD",
"city":"Buckeye",
"state":"AZ",
"zip":"85326",
"country":"USA"},
... etc.
}
# Asynchronous get of JSON hash from some server or other.
get_my_fine_JSON().on 'complete', (response) ->
send_to_firebase(response)
send_to_firebase = (response) ->
firebase_ref = new Firebase(firebase_url)
length = response.length
for charger in response
firebase_ref.child(charger.id).update charger, (error) ->
if error
console.log "Data could not be saved #{error}"
else
console.log "Data saved successfully"
process.exit() if length -= 1 is 0
Discussion:
The idea was to have a Firebase structure like this:
- 100
- address
street: "123 Main Street"
etc.
That's reason 1 why id is pulled up to be the primary key. Reason 2 is so that I can uniquely identify an object pulled off the external server as the "same" one in my Firebase and apply any updates necessary.
Epiphany 1: Update is more like upsert. If the key is there, whatever hash you supply replaces matching values. If it's not there, then Firebase happily adds it. Which is way cool because it covers both the push and patch cases.
Epiphany 2: This process will hang waiting for events if nothing tells it to stop. That's why the countdown index, length is decremented until the code has upserted (for lack of a better term) each item.
Observation 1: Doing this in node.js is super fast compared with REST using Python or Ruby. And this upsert stuff is wicked cool if I'm understanding it right.
Observation 2: There isn't a ton of wisdom out there as of this writing regarding writing node shell scripts to do this kind of stuff. Maybe it's a good idea, maybe a bad one. I don't know.
Observation 3: Because of the asynchronous nature of node and the Firebase Javascript API (both GOOD THINGs), terminating a process before the last bit is done can be tricky because your process has to hang on just long enough to complete its last request/response with Firebase. This is, as mentioned before, done in the completion handler of the update. Otherwise we wouldn't necessarily be complete when the process exited.
Caveat 1: Related to observation 2, this could be a bad idea, but I haven't been able to find resources that speak to the problem.
Caveat 2: This could be a horrid abuse or misunderstanding of the Firebase update API. I am reporting observed behavior in the limited case of my specific data. YMMV.
Caveat 3: I'm hoping the process lifetime is as I suggest it is in observation 3.
A note to the decaffeinated: The Javascript for this is so trivially different that it shouldn't be too tough to translate. Or go to js2coffee and paste the Coffeescript into the right pane to get real Javascript in the left pane that you can tune.

Resources