we need create custom last evaluated key
The use case goes Here:
First scan table its having ten records,the tenth record should my last evaluated key when i do second time scan operation
Thanks in advance
#Exclusive start key should be null for your first page
esk = None
#Get the first page
scan_generator = MyTable.scan(Limit=10, exclusive_start_key=esk)
#Get the key for the next page
esk = scan_generator.kwargs['exclusive_start_key'].values()
#Now get page 2
scan_generator = MyTable.scan(Limit=10, exclusive_start_key=esk)
EDIT:
exclusive_start_key (list or tuple) – Primary key of the item from which to continue an earlier query. This would be provided as the LastEvaluatedKey in that query.
For example
exclusive_start_key = ('myhashkey');
or
exclusive_start_key = ('myhashkey', 'myrangekey');
Related
I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)
I have Profile table with a huge number of rows. I was trying to filter out profiles based on super_category and account_id (these are the fields in the model Profile).
Assume I have a list of ids in the form of bulk_account_ids and super_categories
list_of_ids = Profile.objects.filter(account_id__in=bulk_account_ids, super_category__in=super_categories).values_list('id', flat=True))
list_of_ids = list(list_of_ids)
SomeTask.delay(ids=list_of_ids)
This particular query is timing out while it gets evaluated in the second line.
Can I use .iterator() at the end of the query to optimize this?
i.e list(list_of_ids.iterator()), if not what else I can do?
I am trying to check if a value is in SQLite with python to then either update the table if the value exists or create a new value if it is not. I have tried to create a cursor to check rows, append the rows to a list with loop, check if value exists, check the count of the rows... I seems to get hung up on the if statement when trying to access the value initialized from the query. Here is the code:
checkT = db.execute("SELECT COUNT(*) FROM trans WHERE stock=:stock AND id=:user_id", stock=request.form.get("symbol"), user_id=session["user_id"])
if checkT > 0:
print("there")
else:
print("not there")
How can I fix this? Thank you!
From the CS50 Library for Python doc for execute
Returns
for SELECTs, a list of dict objects, each of which represents a row in the result set; for INSERTs, the primary key of a newly inserted row (or None if none); for UPDATEs, the number of rows updated; for DELETEs, the number of rows deleted; for CREATEs, True on success; on error, a RuntimeError is raised
checkT is a list with one element, which is a dict with one key/value pair.
This checkT[0]['COUNT(*)'] will give the number returned from the sql. Counting the rows would not be appropriate in this case because this query will always return one row.
One hint: column names in a SELECT can be aliased, given a different name, like so:
SELECT COUNT(*) as count from....... It would just be typing convenience, because then the key in the returned dict will be count instead of COUNT(*).
Remember: in the flask run terminal there is a traceback with gives more details information on the error received (assuming "hung up" means a 500 Internal Server Error).
I have a Paginated List displayed on the visual force page and in the backend I was using a StandardSetController to control the pagination. However, one column on the table is an aggregated field whose calculation is done in a wrapper class. Recently, I want to sort the paginated list against the calculated field. And unfortunately the calculated result cannot be done on the data model(SObject) level.
So I am thinking to passed a sorted list of SObject to the StandardSetController constructor. That is to sort the record before it has been pass into the StandardSetController.
The code is like below:
List<Job__c> jobs = new List<Job__c>();
List<Job__c> tempJobs = Database.Query(basicQuery + filterExpression);
//sort with values
List<JobWrapper> jws = createJobWrappers(tempJobs);
JobWrapper.sortBy = JobWrapper.SORTBY_CALCULATEDFIELD_ASC;
jws.sort();
for(JobWrapper jw : jws){
jobs.add(jw.JobRecord);
}
jobs = jobs.deepClone(true, true, true);
StandardSetController con = new ApexPages.StandardSetController(jobs);
con.setPageSize(10);
However after executing the last line system throw exception:Modified rows exist in the records collection!
I did not modify any rows in the controller. Could anyone help me understanding the exception?
I am trying to query the WadPerformanceCountersTable generated by Azure Diagnostics which has a PartitionKey based on tick marks accurate up to the minute. This PartitionKey is stored as a string (which I do not have any control over).
I want to be able to query against this table to get data points for every minute, every hour, every day, etc. so I don't have to pull all of the data (I just want a sampling to approximate it). I was hoping to using the modulus operator to do this, but since the PartitionKey is stored as a string and this is an Azure Table, I am having issues.
Is there any way to do this?
Non-working example:
var query =
(from entity in ServiceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable")
where
long.Parse(entity.PartitionKey) % interval == 0 && //bad for a variety of reasons
String.Compare(entity.PartitionKey, partitionKeyEnd, StringComparison.Ordinal) < 0 &&
String.Compare(entity.PartitionKey, partitionKeyStart, StringComparison.Ordinal) > 0
select entity)
.AsTableServiceQuery();
If you just want to get a single row based on two different time interval (now and N time back) you can use the following query which returns the single row as described here:
// 10 minutes span Partition Key
DateTime now = DateTime.UtcNow;
// Current Partition Key
string partitionKeyNow = string.Format("0{0}", now.Ticks.ToString());
DateTime tenMinutesSpan = now.AddMinutes(-10);
string partitionKeyTenMinutesBack = string.Format("0{0}", tenMinutesSpan.Ticks.ToString());
//Get single row sample created last 10 mminutes
CloudTableQuery<WadPerformanceCountersTable> cloudTableQuery =
(
from entity in ServiceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable")
where
entity.PartitionKey.CompareTo(partitionKeyNow) < 0 &&
entity.PartitionKey.CompareTo(partitionKeyTenMinutesBack) > 0
select entity
).Take(1).AsTableServiceQuery();
The only way I can see to do this would be to create a process to keep the Azure table in sync with another version of itself. In this table, I would store the PartitionKey as a number instead of a string. Once done, I could use a method similar to what I wrote in my question to query the data.
However, this is a waste of resources, so I don't recommend it. (I'm not implementing it myself, either.)