I was testing a CSV download of my app events using the API.
I noticed that the CSV had different event counts for different calls for the same time period.
All data (for each download) was correct for my app and the requested time periods.
Does anyone knows if they sample the data to create the file for download?
Edited to include sample call, code for extraction, and result for 2 calls for the same time period.
Call
str_init = '20191101'
str_end = '20191102'
# Call data extraction for Flurry from IOS app
get_csv_from_flurry(str_init, str_end, 'IOS')
Code for Extraction
from datetime import datetime
from dateutil import parser
import requests
import json
import time
from functions.ribon_path import ribon_root_path_join
from functions.ribon_s3_integration import ribon_upload_to_s3
"""
Make CSV extraction from flurry based on initial date (yyyy-mm-dd), end date (yyyy-mm-dd) and platform
Save Uncompressed CSV locally for processing
Save compressed file (parquet) to S3 for backup
"""
def get_csv_from_flurry(str_ini, str_end, str_platform):
# Convert time period to datetime format
dt_ini = parser.parse(str_ini)
dt_end = parser.parse(str_end)
def unix_time_millis(dt):
# Convert date periods to unix milisecon epoch
epoch = datetime.utcfromtimestamp(0)
return (dt - epoch).total_seconds() * 1000.0
epoch_ini = unix_time_millis(dt_ini)
epoch_end = unix_time_millis(dt_end)
#print(epoch_ini)
#print(epoch_fim)
if str_platform == 'IOS' :
Flurry_apiKey = 'XXX'
else :
Flurry_apiKey = 'YYY'
# Build the parameters of the post request to the flurry API
url = 'https://rawdata.flurry.com/pulse/v1/rawData'
payload = {"data": {
"type":"rawData",
"attributes":{
"startTime": epoch_ini,
"endTime": epoch_end,
"outputFormat": "CSV",
"apiKey": Flurry_apiKey
}
}
}
headers = {"accept": "application/vnd.api+json",
"authorization": "Bearer ZZZ",
"cache-control": "no-cache",
"content-type": "application/vnd.api+json"
}
#print(payload)
# Make the request
print('Make Request to Flurry')
r = requests.post(url, data=json.dumps(payload), headers=headers)
#print(r.content)
# Test the return, get the status, download url and request id
test = r.json()
#print(teste['data']['attributes']['s3URI'])
#print(teste['data']['id'])
r_s3URI = test['data']['attributes']['s3URI']
r_id = test['data']['id']
# Check if the download link is ready
url = 'https://rawdata.flurry.com/pulse/v1/rawData/' + r_id + '?fields[rawData]=requestStatus,s3URI'
#print(url)
payload = {}
headers = {"accept": "application/vnd.api+json",
"authorization": "Bearer ZZZ",
"cache-control": "no-cache",
"content-type": "application/vnd.api+json"
}
print('Request OK')
# Check each minute if the download link is ready
print('Start Pooling to Check if the File is Ready for Download')
while r_s3URI == None:
time.sleep(60)
# Make the request
r = requests.get(url, data=json.dumps(payload), headers=headers)
print(r.content)
test = r.json()
#print(test['data']['attributes']['s3URI'])
r_s3URI = test['data']['attributes']['s3URI']
# When the download is ready, get the file and save
# Set local folder to save file
flurry_filename = str_ini + '_' + str_end + '_' + str_platform + '.csv.gz'
flurry_path_gz = ribon_root_path_join('data', 'Flurry_Download', flurry_filename)
# Download the file
print('Start Flurry Download')
myfile = requests.get(r_s3URI)
open(flurry_path_gz, 'wb').write(myfile.content)
On the link there is an image with the 2 files I got, they are not the same size and don't have the same number of records
With the help from Flurry Support, I found out the differences.
For API downloads older than 15 days, the API calls are giving the same numbers every time.
API calls for dates up to 15 days most times get different results (newer calls with more records). The older the call the smaller the difference, so I agree with the support that this can be accounted for late arriving events.
Flurry is not online and works by queuing data on the mobile and dumping that to the server.
Related
I am trying to upload a file to a website (that has an inbuilt API) using the following code. The code reads a list of medical codes/diagnoses codes etc. (1 column in a text file) and uploads it to the required page.
Issue:
After uploading the file, I noticed that the number pages is not coming out properly. There can be up to 4000 codes (lines) in the file. The code list page in the website will show 20 lines per page, which means, I would expect at least 200 pages to be there after uploading. This is not happening. I am not sure what is the mistake that I am doing.
Also, I am new to Python (primarily SAS) and have been working on automating bits and pieces of code. One such automation is this exercise. Here, the goal is to upload multiple files to the said URL. Today the team is uploading them one by one manually. With the knowledge I picked up from tutorials and other sources, I was able to come up with this.
import requests
import json
import os
import random
import pandas as pd
import time
token = os.environ.get("USER_TOKEN")
user_id = os.environ.get("USER_ID")
user_name = os.environ.get("USER_NAME")
headers = {"X-API-Key": token}
url = 'https://XXXXXXXXXXXX.com/api/code_lists'
session=requests.session()
cl = session.get(url, headers=headers).json()
def uploading_files(file,name,kind,coding_system,rand_id):
df = pd.read_table(file, converters={0:str}, header=None)
print("Came In")
CODES = df[0].astype('str').tolist()
codes = {"codes": CODES}
new_cl = {"_id": rand_id, "name": name, "project_group": "TEST BETA", "kind": kind,
"coding_system": coding_system, "user": user_id, "creator": user_name, "creation_method": "Upload", "is_category_mapping": False,
"assoc_users": [], "global": True, "readonly": False, "description": "", "num_codes": len(CODES)}
request_json = json.dumps(new_cl)
print(request_json)
codes_json = json.dumps(codes)
print(codes_json)
session.post(url, data=request_json)
session.put(url + '/' + rand_id, data=codes_json)
text_Files= os.listdir(r'C://Users//XXXXXXXXXXXXX//data')
for i in text_Files:
if ".txt" in i:
x=i.split("_")
file='C://Users//XXXXXXXXXXXXX//data//' + i
name=""
for j in i[:-4]:
if j!="_":
name+=j
elif j=="_":
name+=" "
kind=x[2]
coding_system=x[3][:-4]
rand_id = "".join(random.choice("0123456789abcdef") for i in range(24))
print("-------------START-----------------")
print("file : ", file)
print("name : ", name)
print("kind : ", kind)
print("coding system : ", coding_system)
print("Rand_Id : ", rand_id)
uploading_files(file, name, kind, coding_system, rand_id)
time.sleep(2)
print("---------------END---------------")
print("")
break ''' to upload only 1 file in the directory'''
Example data in the file (testfile.txt)
C8900
C8901
C8902
C8903
C8904
C8905
C8906
C8907
C8908
C8909
C8910
C8911
C8912
C8913
C8914
C8918
C8919
C8920
C8921
C8922
C8923
C8924
C8925
C8926
C8927
C8928
C8929
C8930
C8931
C8932
C8933
C8934
C8935
C8936
C9723
C9744
C9762
C9763
C9803
D0260
Sample Data Snapshot
Wrong Representation
Expected
I am trying to send twice post request to www.footlocker.it
sess = requests.session()
print("start-Point")
bot = BotDetector()
payload = "{\"sensor_data\":\"" + bot.generatesensordata() + "\"}"
d = sess.post(url_ak, headers=headers_ak, data=payload, verify=False, timeout=15)
bot.cookie = sess.cookies["_abck"]
payload = "{\"sensor_data\":\"" + bot.generatesensordata1() + "\"}"
d = sess.post(url_ak, headers=headers_ak, data=payload, verify=False, timeout=15)
print('Status code {},'.format(d.status_code))
print('Header {},'.format(d.headers))
Target is for getting valid cookie abck and success true as status code.
I have write some custom code for botdetector. But i can't bypass with good result.
it means your sensor data is bad most likely. take a look at the akamai script for the site & compare it to what you have now.
This code acts as an early warning system for ADFS failures, which works fine when run locally. Problem is that when I run it in Lambda, it loops non stop.
In short:
lambda_handler() runs pagecheck()
pagecheck() produces the info needed then passes 2 lists (msgdet_list, error_list) and an int (error_count) to notification().
notification() collates and prints the output. The output is two key variables (notificationheader and notificationbody).
I've #commentedOut the SNS piece which would usually email the info, and am using print() to instead send the info to CloudWatch logs until I can get the loop sorted. Logs:
CloudWatch logs
If I run this locally, it produces a clean single output. In Lambda, the function will loop until it times out. It's almost like every time the lists are updated, they're passed to the notification() module and it's run. I can limit the function time, but would rather fix the code!
Cheers,
tac
# This python/boto3/lambda script sends a request to an Office 365 landing page, parses return details to confirm a successful redirect to /
# the organisation ADFS homepage, authenticates homepge is correct, raises any errors, and sends a consolodated report to /
# an AWS SNS topic.
# Run once to produce pageserver and htmlchar values for global variables.
# Import required modules
import boto3
import urllib.request
from urllib.request import Request, urlopen
from datetime import datetime
import time
import re
import sys
# Global variables to be set
url = "https://outlook.com/CONTOSSO.com"
adfslink = "https://sts.CONTOSSO.com/adfs/ls/?client-request-id="
# Input after first run
pageserver = "Microsoft-HTTPAPI/2.0 Microsoft-HTTPAPI/2.0"
htmlchar = 18600
# Input AWS SNS ARN
snsarn = 'arn:aws:sns:ap-southeast-2:XXXXXXXXXXXXX:Daily_Check_Notifications_CONTOSSO'
sns = boto3.client('sns')
def pagecheck():
# Present the request to the webpage as if coming from a user in a browser
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
values = {'name' : 'user'}
headers = { 'User-Agent' : user_agent }
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
# "Null" the Message Detail and Error lists
msgdet_list = []
error_list = []
request = Request(url)
req = urllib.request.Request(url, data, headers)
response = urlopen(request)
with urllib.request.urlopen(request) as response:
# Get the URL. This gets the real URL.
acturl = response.geturl()
msgdet_list.append("\nThe Actual URL is:")
msgdet_list.append(str(acturl))
if adfslink not in acturl:
error_list.append(str("Redirect Fail"))
# Get the HTTP resonse code
httpcode = response.code
msgdet_list.append("\nThe HTTP code is: ")
msgdet_list.append(str(httpcode))
if httpcode//200 != 1:
error_list.append(str("No HTTP 2XX Code"))
# Get the Headers as a dictionary-like object
headers = response.info()
msgdet_list.append("\nThe Headers are:")
msgdet_list.append(str(headers))
if response.info() == "":
error_list.append(str("Header Error"))
# Get the date of request and compare to UTC (DD MMM YYYY HH MM)
date = response.info()['date']
msgdet_list.append("The Date is: ")
msgdet_list.append(str(date))
returndate = str(date.split( )[1:5])
returndate = re.sub(r'[^\w\s]','',returndate)
returndate = returndate[:-2]
currentdate = datetime.utcnow()
currentdate = currentdate.strftime("%d %b %Y %H%M")
if returndate != currentdate:
date_error = ("Date Error. Returned Date: ", returndate, "Expected Date: ", currentdate, "Times in UTC (DD MMM YYYY HH MM)")
date_error = str(date_error)
date_error = re.sub(r'[^\w\s]','',date_error)
error_list.append(str(date_error))
# Get the server
headerserver = response.info()['server']
msgdet_list.append("\nThe Server is: ")
msgdet_list.append(str(headerserver))
if pageserver not in headerserver:
error_list.append(str("Server Error"))
# Get all HTML data and confirm no major change to content size by character lenth (global var: htmlchar).
html = response.read()
htmllength = len(html)
msgdet_list.append("\nHTML Length is: ")
msgdet_list.append(str(htmllength))
msgdet_list.append("\nThe Full HTML is: ")
msgdet_list.append(str(html))
msgdet_list.append("\n")
if htmllength // htmlchar != 1:
error_list.append(str("Page HTML Error - incorrect # of characters"))
if adfslink not in str(acturl):
error_list.append(str("ADFS Link Error"))
error_list.append("\n")
error_count = len(error_list)
if error_count == 1:
error_list.insert(0, 'No Errors Found.')
elif error_count == 2:
error_list.insert(0, 'Error Found:')
else:
error_list.insert(0, 'Multiple Errors Found:')
# Pass completed results and data to the notification() module
notification(msgdet_list, error_list, error_count)
# Use AWS SNS to create a notification email with the additional data generated
def notification(msgdet_list, error_list, errors):
datacheck = str("\n".join(msgdet_list))
errorcheck = str("\n".join(error_list))
notificationbody = str(errorcheck + datacheck)
if errors >1:
result = 'FAILED!'
else:
result = 'passed.'
notificationheader = ('The daily ADFS check has been marked as ' + result + ' ' + str(errors) + ' ' + str(error_list))
if result != 'passed.':
# message = sns.publish(
# TopicArn = snsarn,
# Subject = notificationheader,
# Message = notificationbody
# )
# Output result to CloudWatch logstream
print('Response: ' + notificationheader)
else:
print('passed')
sys.exit()
# Trigger the Lambda handler
def lambda_handler(event, context):
aws_account_ids = [context.invoked_function_arn.split(":")[4]]
pagecheck()
return "Successful"
sys.exit()
Your CloudWatch logs contain the following error message:
Process exited before completing request
This is caused by invoking sys.exit() in your code. Locally your Python interpreter will just terminate when encountering such a sys.exit().
AWS Lambda on the other hand expects a Python function to just return and handles sys.exit() as an error. As your function probably got invoked asynchronously AWS Lambda retries to execute it twice.
To solve your problem, you can replace the occurences of sys.exit() with return or even better, just remove the sys.exit() calls, as there would be already implicit returns in the places where you use sys.exit().
I am very new to Python. I've written an API call for a large amount of electricity settlement data largely from the API instructions. The secured API limits requests to 50,000 rows per request. The API instructions offer a HTTP response header "X-TotalRows" to assist in looping through the entire data set which may be millions of rows.
How do I write the loop for the Python call to append all data, 50k rows at a time? I've included my code for the initial dataset (rows 1-50,000) but do not know how to use the HTML Header "X-TotalRows" to append the entire data set, 50k at a time.
The instructions recommend looping through the data using the HTML Header "X-TotalRows" and altering the Start Row parameter to be "1 + rowCount".
This may seem elementary but I've searched and experimented for hours and hours trying to crack this code. Any help is appreciated.
import http.client, urllib.request, urllib.parse, urllib.error, base64
headers = {
# Request headers
'Ocp-Apim-Subscription-Key': 'xyz', 'content-type': 'application/json'
}
params = urllib.parse.urlencode({
# Request parameters
'download': 'true',
'rowCount': '50000',
'sort': 'datetime_beginning_ept',
'order': 'asc',
'startRow': 1,
'isActiveMetadata': 'true',
'fields': 'datetime_beginning_utc, datetime_beginning_ept, pnode_id, pnode_name, voltage, equipment, type, zone, system_energy_price_da, total_lmp_da, congestion_price_da, marginal_loss_price_da, row_is_current, version_nbr',
#'datetime_beginning_utc': '{string}',
'datetime_beginning_ept': '1-1-2018 00:00 to 1-31-2018 23:00',
#'pnode_id': '{number}',
#'pnode_name': '{string}',
#'voltage': '{string}',
#'equipment': '{string}',
#'type': '{string}',
'zone': 'aep;comed;pseg'
})
try:
conn = http.client.HTTPSConnection('api.pjm.com')
conn.request("GET", "/api/v1/da_hrl_lmps?%s" % params, "{body}", headers)
response = conn.getresponse()
print(response.status, response.reason)
data = response.read()
#print(data)
conn.close()
file = open('output.txt', 'w')
s=str(data)
file.write(s)
file.close()
print("Go to ouput.txt")
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
I found the answer:
strtotalcount = response.headers['X-TotalRows']
totalcount = int(strtotalcount)
Then loop through the data using rowcount and totalcount into the file in append mode.
I have a problem with Burpsuite API that I can't find a proper function to print out the response for edited requests . I'm developing a new plugin for burpsuite with python . myscript is simply takes requests from proxy then it edit headers and send it again .
from burp import IBurpExtender
from burp import IHttpListener
import re,urllib2
class BurpExtender(IBurpExtender, IHttpListener):
def registerExtenderCallbacks(self, callbacks):
self._callbacks = callbacks
self._helpers = callbacks.getHelpers()
callbacks.setExtensionName("Burp Plugin Python Demo")
callbacks.registerHttpListener(self)
return
def processHttpMessage(self, toolFlag, messageIsRequest, currentRequest):
# only process requests
if messageIsRequest:
requestInfo = self._helpers.analyzeRequest(currentRequest)
#timestamp = datetime.now()
#print "Intercepting message at:", timestamp.isoformat()
headers = requestInfo.getHeaders()
#print url
if(requestInfo.getMethod() == "GET"):
print "GET"
print requestInfo.getUrl()
response = urllib2.urlopen(requestInfo.getUrl())
print response
elif(requestInfo.getMethod() == "POST"):
print "POST"
print requestInfo.getUrl()
#for header in headers:
#print header
bodyBytes = currentRequest.getRequest()[requestInfo.getBodyOffset():]
bodyStr = self._helpers.bytesToString(bodyBytes)
bodyStr = re.sub(r'=(\w+)','=<xss>',bodyStr)
newMsgBody = bodyStr
newMessage = self._helpers.buildHttpMessage(headers, newMsgBody)
print "Sending modified message:"
print "----------------------------------------------"
print self._helpers.bytesToString(newMessage)
print "----------------------------------------------\n\n"
currentRequest.setRequest(newMessage)
return
You need to print the response but you don't do anything in case messageIsRequest is false. When messageIsRequest is false it means that the currentRequest is a response and you can print out the response as you did for the request. I did it in Java like this:
def processHttpMessage(self, toolFlag, messageIsRequest, httpRequestResponse):
if messageIsRequest:
....
else
HTTPMessage = httpRequestResponse.getResponse()
print HTTPMessage
There is even a method that lets you bind request and response together when using a proxy. It can be found in IInterceptedProxyMessage:
/**
* This method retrieves a unique reference number for this
* request/response.
*
* #return An identifier that is unique to a single request/response pair.
* Extensions can use this to correlate details of requests and responses
* and perform processing on the response message accordingly.
*/
int getMessageReference();
I don't think it is supported for HTTPListeners.
I am writing the extensions in Java and tried to translate to Python for this anwser. I haven't tested this code and some bugs might be introduced due to translation.