Aws cloudwatch logs getQueryResults returns empty when tried with boto3

Aws cloudwatch logs getQueryResults returns empty when tried with boto3 - python-3.x

Using boto3 of aws, I am trying to run start query and get the results using query id. but it didnt work as expected in python script. It returns the expected json output for start_query and able to fetch the queryID. But if i try to fetch the query results using queryID, it returns empty json.
<code>
import boto3
client = boto3.client('logs')
executeQuery = client.start_query(
logGroupName='LOGGROUPNAME',
startTime=STARTDATE,
endTime=ENDDATE,
queryString='fields status',
limit=10000
)
getQueryId=executeQuery.get('queryId')
getQueryResults = client.get_query_results(
queryId=getQueryId
)
</code>
it returns the reponse of get_query_results as
{'results': [], 'statistics': {'recordsMatched': 0.0, 'recordsScanned': 0.0, 'bytesScanned': 0.0}, 'status': 'Running',
But if i try using aws cli with the queryID generated from script, it returns json output as expected.
Anyone could able to tell why it didnt work from boto3 python script and worked in cli?
Thank you.

The query status is Running in your example. Its not in Complete status yet.
Running queries is not instantaneous. Have to wait a bit for query to complete, before you can get results.
You can use describe_queries to check if your query has completed or not. You can also check if logs service has dedicated waiters in boto3 for the results. They would save you from pulling describe_queries API in a loop waiting till your queries finish.
When you do this in CLI, probably there is more time before you start the query, and query results using CLI.

The other issue you might be encountering is that the syntax for the queryString in the API is different from a query you would type into the CloudWatch console.
Console query syntax example:
{ $.foo = "bar" && $.baz > 0 }
API syntax for same query:
filter foo = "bar" and baz > 0
Source: careful reading and extrapolation from the official documentation plus some trial-and-error.
My logs are in JSON format. YMMV.

Not sure if this problem is resolved. I was facing the same issue with AWS Java SDK . But when i terminate the thread performing the executeQuery query and perform the get_query_results using a new thread and the old queryId. It seems to be working fine.

Adding sleep will work here. If the query is exceeding the Sleep time then again it will show as Running status. You can write a Loop where you can check the status Completed, if the status is Running you can run Sleep again for some second and retry. You can give some retry count here.
Sample Pseudocode:
function for sleep; (let's say SleepFunc())
Loop till retry count
        check if the status is Completed;
        If yes                break;
        else call               SleepFunc();

Related

Google Cloud Bigquery Library Error

I am receiving this error
Cannot set destination table in jobs with DDL statements
When I try to resubmit a job from the job.build_resource() function in the google.cloud.bigquery library.
It seems that the destination table is set to something like this after that function call.
'destinationTable': {'projectId': 'xxx', 'datasetId': 'xxx', 'tableId': 'xxx'},
Am I doing something wrong here? Thanks to anyone that can give me any guidance here.
EDIT:
The job is initially being triggered by this
query = bq.query(sql_rendered)
We store the job id and use it later to check the status.
We get the job like this
job = bq.get_job(job_id=job_id)
If it meets a condition, in this case it failed due to rate limiting. We retry the job.
We retry the job like this
di = job._build_resource()
jo = bigquery.Client(project=self.project_client).job_from_resource(di)
jo._begin()
I think that's pretty much all of the code you need, but happy to provide more if needed.

You are seeing this error because you have a DDL statement in your query. What is happening is that the job_config is changing some values after the execution of the first query, particularly the job_config.destination . In order to try to overcome this issue, you could try to reset the value of job_config.destination to None after each job submission or use a different job_config for every query.

How to record the timestamp of an XML request in SoapUI and use it in an assertion?

I have a test case in SoapUI NG Pro which has the following steps:
POST REST Request that starts a process
JDBC Request where I check that the process Start Date has been logged to a database table
Delay (to simulate the time it takes for the process to run)
JDBC Request where I check that the End Date and Duration have been logged to the table
I would like to capture the timestamp of the POST Request to use within my assertions in steps 2 and 4.
I have looked around online and some people have mentioned using Events while others have mentioned using a Script TestStep but I haven't been able to get either to work.
I can get the POST Response timestamp but am looking for the Request timestamp in particular. I also noticed that there is a timestamp in the Request Log but again I don't know how to access that.
Any help would be greatly appreciated. Its probably also worth mentioning that I am using JavaScript instead of Groovy.

You can add a Script Assertion for the Soap Request test step and add the below statement in order to show the time taken.
log.info messageExchange.response.timeTaken
If you want the above value to be accessible in other steps, then use below(which stores the value to test case level, so that it is easy to access the test case property in other steps of the same test case):
context.testCase.setPropertyValue('TIME_TAKEN', messageExchange.response.timeTaken.toString())
In the later steps, use Property Expansion to read the test case level property value
def timeTaken = context.expand('${#TestCase#TIME_TAKEN}') as Integer

Right way to delete and then reindex ES documents

I have a python3 script that attempts to reindex certain documents in an existing ElasticSearch index. I can't update the documents because I'm changing from an autogenerated id to an explicitly assigned id.
I'm currently attempting to do this by deleting existing documents using delete_by_query and then indexing once the delete is complete:
self.elasticsearch.delete_by_query(
index='%s_*' % base_index_name,
doc_type='type_a',
conflicts='proceed',
wait_for_completion=True,
refresh=True,
body={}
)
However, the index is massive, and so the delete can take several hours to finish. I'm currently getting a ReadTimeoutError, which is causing the script to crash:
WARNING:elasticsearch:Connection <Urllib3HttpConnection: X> has failed for 2 times in a row, putting on 120 second timeout.
WARNING:elasticsearch:POST X:9200/base_index_name_*/type_a/_delete_by_query?conflicts=proceed&wait_for_completion=true&refresh=true [status:N/A request:140.117s]
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='X', port=9200): Read timed out. (read timeout=140)
Is my approach correct? If so, how can I make my script wait long enough for the delete_by_query to complete? There are 2 timeout parameters that can be passed to delete_by_query - search_timeout and timeout, but search_timeout defaults to no timeout (which is I think what I want), and timeout doesn't seem to do what I want. Is there some other parameter I can pass to delete_by_query to make it wait as long as it takes for the delete to finish? Or do I need to make my script wait some other way?
Or is there some better way to do this using the ElasticSearch API?

You should set wait_for_completion to False. In this case you'll get task details and will be able to track task progress using corresponding API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api

Just to explain more in the form of codebase explained by Random for the newbee in ES/python like me:
ES = Elasticsearch(['http://localhost:9200'])
query = {'query': {'match_all': dict()}}
task_id = ES.delete_by_query(index='index_name', doc_type='sample_doc', wait_for_completion=False, body=query, ignore=[400, 404])
response_task = ES.tasks.get(task_id) # check if the task is completed
isCompleted = response_task["completed"] # if complete key is true it means task is completed
One can write custom definition to check if the task is completed in some interval using while loop.
I have used python 3.x and ElasticSearch 6.x

You can use the 'request_timeout' global param. This will reset the Connections timeout settings, as mentioned here
For example -
es.delete_by_query(index=<index_name>, body=<query>,request_timeout=300)
Or set it at connection level, for example
es = Elasticsearch(**(get_es_connection_parms()),timeout=60)

http://localhost:8529/_api/query/225 - syntax correct?

I am trying to abort a long running query (with id 225) with the new API feature for the first time. But I can't get it killing the request. the API request answers with
{"error":true,"code":404,"errorNum":404,"errorMessage":"not found"}
although the query still is running.
[
{
"id": "225",
"query": [SNIP]
}
]
What am I doing wrong?
Thanks in advance ...

Can only guess as the question does not contain full information on what is actually posted and with which HTTP method.
My guess is that you used HTTP GET when you tried killing the query and not HTTP DELETE. So the URL was probably correct but not the HTTP method. In this case you will also get a 404 error.
There are two ways for terminating a running query:
using the ArangoShell
First of all, the query id needs to be determined. This can be achieved as follows:
require("org/arangodb/aql/queries").current();
Using the returned id value, the command to kill the query is:
require("org/arangodb/aql/queries").kill(id);
using HTTP
When the query is known, it can be used in an HTTP DELETE request:
curl -X DELETE http://127.0.0.1:8529/_api/query/id
Again, id needs to be query's real id.

StackExchange Redis Client StringSet return false?

We have been using StackExchange Redis .Net client for several months without issue. Our logs indicate that StringSet returned false thousands of time over the course of an hour recently, but it is working as expected again.
I can't find what FALSE means anywhere. I assume this means that the value was not put in cache, but if that is correct, how do I tell why? The client is not throwing an exception. Can someone show me the API Specification that describes the return value and how to troubleshoot?
We are running against Redis in Azure if that matters.
result = cache.StringSet(fullKey, value, GetCacheTime(cacheType));
if (!result)
{
if (_logger != null)
{
_logger.LogError( "Failed to Set Cache");
}
}

http://redis.io/commands/set
Simple string reply: OK if SET was executed correctly. Null reply: a
Null Bulk Reply is returned if the SET operation was not performed
becase the user specified the NX or XX option but the condition was
not met.
Though it looks like you are using SETEX (http://redis.io/commands/setex) are you setting a valid timespan as the third argument?
SETEX is atomic, and can be reproduced by using the previous two
commands inside an MULTI / EXEC block. It is provided as a faster
alternative to the given sequence of operations, because this
operation is very common when Redis is used as a cache. An error is
returned when seconds is invalid.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Aws cloudwatch logs getQueryResults returns empty when tried with boto3 - python-3.x

Not sure if this problem is resolved. I was facing the same issue with AWS Java SDK . But when i terminate the thread performing the executeQuery query and perform the get_query_results using a new thread and the old queryId. It seems to be working fine.

Related

Google Cloud Bigquery Library Error

How to record the timestamp of an XML request in SoapUI and use it in an assertion?

Right way to delete and then reindex ES documents

http://localhost:8529/_api/query/225 - syntax correct?

StackExchange Redis Client StringSet return false?

Categories

Resources