I would like to provide one piece of content per day storing all items in dynamoDB. I will add new content from time to time but only one piece of content needs to be read per day.
It seems it's not recommended to have incremental Id as primary key on dynamoDB.
Here is what I have at the moment:
content_table
id, content_title, content_body, content_author, view_count
1b657df9-8582-4990-8250-f00f2194abe9, title_1, body_1, author_1, view_count_1
810162c7-d954-43ff-84bf-c86741d594ee, title_2, body_2, author_2, view_count_2
4fdac916-0644-4237-8124-e3c5fb97b142, title_3, body_3, author_3, view_count_3
The database will have a low rate of adding new item has I will add new content myself manually.
How can I get the item number XX without querying all the database in nodeJS ?
Should I switch back to a MySQL database ?
Should I use a homemade auto increment even if it's an anti pattern ?
Should I used a time-based uuid, and do a query like, get all ids, sort them, and get the number X in the array ?
Should I use a tool like http://www.stateful.co/ ?
Thanks for your help
I would make the date your hash key, you can then simply get the content from any particular day using GetItem.
date, content_title, content_body, content_author, view_count
20180208, title_1, body_1, author_1, view_count_1
20180207, title_2, body_2, author_2, view_count_2
20180206, title_3, body_3, author_3, view_count_3
If you think you might have more than one piece of content for any one day in future, you could add a datetime attribute and make this the range key
date, datetime, content_title, content_body, content_author, view_count
20180208, 20180208101010, title_1, body_1, author_1, view_count_1
20180208, 20180208111111, title_2, body_2, author_2, view_count_2
20180206, 20180208101010, title_3, body_3, author_3, view_count_3
Its then still very fast and simple to execute a Query to get the content for a particular day.
Note that due to the way DynamoDB distributes throughput, if you choose the second option, you might want to archive old content into another table.
Related
So, the criteria are already quite powerful. Yet I came across a case I seem to not be able to replicate on the criteria object.
I needed to filter out all entries that were not timely relevant.
In a world, where you'd be able to mix SQL with the field definition, it would look like this:
...->addFilter(
new RangeFilter('DATEDIFF(NOW(), INTERVAL createdAt DAY)', [RangeFilter::LTE => 1])
)
Unfortunately that doesn't work in our world.
When i pass the criteria to a searchfunction, i only get:
"DATEDIFF(NOW(), INTERVAL createdAt DAY)" is not a field on xyz
I tried to do it with ->addExtensions and several other experiments, but i couldn't get it to work. I resorted to using the queryBuilder from Doctrine, using queryParts, but the data i'm getting is not very clean and not assigned to an ORMEntity like it should be.
Is it possible to write a criteria that incooperates native SQL filtering?
The DAL is designed in a way that should explicitly not accept SQL statements as it is a core concept of the abstraction. As the DAL offers extendibility for third party extensions it should be preferred to raw SQL in most cases. I would suggest writing a lightweight query that only fetches the IDs using your SQL query and then use these pre-filtered IDs to fetch complete data sets using the DAL.
$ids = (new QueryBuilder($connection))
->select(['LOWER(HEX(id))'])
->from('product')
->where('...')
->execute()
->fetchFirstColumn();
$criteria = new Criteria($ids);
This should offer the best of both worlds, the freedom of using raw SQL and the extendibility features of the DAL.
In your specific case you could also just take the current day, remove the amount of days that should have passed and use this threshold date to compare it to the creation date:
$now = new \DateTimeImmutable();
$dateInterval = new \DateInterval('P1D');
$thresholdDate = $now->sub($dateInterval);
// filter to get all with a creation date greater than now -1 day
$filter = new RangeFilter(
'createdAt',
[RangeFilter::GTE => $thresholdDate->format(Defaults::STORAGE_DATE_TIME_FORMAT)]
);
I'm using the simple_salesforce python wrapper for the Salesforce REST API. We have hundreds of thousands of records, and I'd like to split up the pull of the salesforce data so all records are not pulled at the same time.
I've tried passing a query like:
results = salesforce_connection.query_all("SELECT my_field FROM my_model limit 2000 offset 50000")
to see records 50K through 52K but receive an error that offset can only be used for the first 2000 records. How can I use pagination so I don't need to pull all records at once?
Your looking to use salesforce_connection.query(query=SOQL) and then .query_more(nextRecordsUrl, True)
Since .query() only returns 2000 records you need to use .query_more to get the next page of results
From the simple-salesforce docs
SOQL queries are done via:
sf.query("SELECT Id, Email FROM Contact WHERE LastName = 'Jones'")
If, due to an especially large result, Salesforce adds a nextRecordsUrl to your query result, such as "nextRecordsUrl" : "/services/data/v26.0/query/01gD0000002HU6KIAW-2000", you can pull the additional results with either the ID or the full URL (if using the full URL, you must pass ‘True’ as your second argument)
sf.query_more("01gD0000002HU6KIAW-2000")
sf.query_more("/services/data/v26.0/query/01gD0000002HU6KIAW-2000", True)
Here is an example of using this
data = [] # list to hold all the records
SOQL = "SELECT my_field FROM my_model"
results = sf.query(query=SOQL) # api call
## loop through the results and add the records
for rec in results['records']:
rec.pop('attributes', None) # remove extra data
data.append(rec) # add the record to the list
## check the 'done' attrubite in the response to see if there are more records
## While 'done' == False (more records to fetch) get the next page of records
while(results['done'] == False):
## attribute 'nextRecordsUrl' holds the url to the next page of records
results = sf.query_more(results['nextRecordsUrl', True])
## repeat the loop of adding the records
for rec in results['records']:
rec.pop('attributes', None)
data.append(rec)
Looping through the records and using the data
## loop through the records and get their attribute values
for rec in data:
# the attribute name will always be the same as the salesforce api name for that value
print(rec['my_field'])
Like the other answer says though, this can start to use up a lot of resources. But it what you're looking for if want to achieve page nation.
Maybe create a more focused SOQL statement to get only the records needed for your use case at that specific moment.
LIMIT and OFFSET aren't really meant to be used like that, what if somebody inserts or deletes a record on earlier position (not to mention you don't have ORDER BY in there). SF will open a proper cursor for you, use it.
https://pypi.org/project/simple-salesforce/ docs for "Queries" say that you can either call query and then query_more or you can go query_all. query_all will loop and keep calling query_more until you exhaust the cursor - but this can easily eat your RAM.
Alternatively look into the bulk query stuff, there's some magic in the API but I don't know if it fits your use case. It'd be asynchronous calls and might not be implemented in the library. It's called PK Chunking. I wouldn't bother unless you have millions of records.
I'm using NodeJS and DynamoDB. I'm never used DynamoDB before, and primary a C# developer (where this would simply just be a .Where(x => x...) call, not sure why Amazon made it any more complicated then that). I'm trying to simply just query the table based on if an id starts with certain characters. For example, we have the year as the first 2 characters of the Id field. So something like this: 180192, so the year is 2018. The 20 part is irrelevant, just wanted to give a human readable example. So the Id starts with either 18 or 17 and I simply want to query the db for all rows that Id starts with 18 (for example, could be 17 or whatever). I did look at the documentation and I'm not sure I fully understand it, here's what I have so far that is just returning all results and not the expected results.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "begins_with(id, :year)",
ExpressionAttributeValues: {
':year': '18'
}
return db.docClient.scan(params).promise();
So as you can see, I'm thinking that this would be a begins_with call, where I look for 18 against the Id. But again, this is returning all results (as if I didn't have KeyConditionExpression at all).
Would love to know where I'm wrong here. Thanks!
UPDATE
So I guess begin_with won't work since it only works on strings and my id is not a string. As per commenters suggestion, I can use BETWEEN, which even that is not working either. I either get back all the results or Query key condition not supported error (if I use .scan, I get back all results, if I use .query I get the error)
Here is the code I'm trying.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "id BETWEEN :start and :end",
ExpressionAttributeValues: {
':start': 18000,
':end': 189999
}
};
return db.docClient.query(params).promise();
It seems as if there's no actual solution for what I was originally trying to do unfortunately. Which is a huge downfall of DynamoDB. There really needs to be some way to do 'where' using the values of columns, like you can in virtually any other language. However, I have to admit, part of the problem was the way that id was structured. You shouldn't have to rely on the id to get info out of it. Anyways, I did find another column DateofFirstCapture which using with contains (all the dates are not the same format, it's a mess) and using a year 2018 or 2017 seems to be working.
if you want to fetch data by id, add it as the partition key. If you want to get data by part of the string, you can use "begins with" on sort key.
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
source: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/Query.html
begins_with and between can only be used on sort keys.
For query you must always supply partition key.
So if you change your design to have unique partition key (or unique combo of partition/sort keys) and strings like 180192 as sort key you will be able to query begins_with(sortkey, ...).
Sorry for being newbie for NodeJs and table query, my question's,
How I could create a query using Nodejs pakcage "azure-storage-node", which selects the sum/addition of two coloumns 'start' and 'period' , if the addition is greater than a threshold it will take the whole raw, my tries which didn't work is something like this,
var query = new azure.TableQuery();
total = query.select(['start']) + query.select(['period']);
query.where('total > ?' , 50000);
or may be something like this,
var query = new azure.TableQuery()
.where('start + period gt 50000');
but it throws an error of '+'.
Thanks
What you're trying to accomplish is not possible with Azure Tables at least as of today as Azure Tables has limited querying support and support for computed columns (if I may say so) is not there.
There are two possible solutions:
Have an attribute called total in your entities that will contain the value i.e. start + period. You calculate this value when you're inserting or updating the entity and store it at that time.
Do this filtering on the client side. For this you will need to download all related entities and then apply this filtering on the client side on the data that you fetched.
Is it possible to insert a new document into a Couchbase bucket without specifying the document's ID? I would like use Couchbase's Java SDK create a document and have Couchbase determine the document's UUID with Groovy code similar to the following:
import com.couchbase.client.java.CouchbaseCluster
import com.couchbase.client.java.Cluster
import com.couchbase.client.java.Bucket
import com.couchbase.client.java.document.JsonDocument
// Connect to localhost
CouchbaseCluster myCluster = CouchbaseCluster.create()
// Connect to a specific bucket
Bucket myBucket = myCluster.openBucket("default")
// Build the document
JsonObject person = JsonObject.empty()
.put("firstname", "Stephen")
.put("lastname", "Curry")
.put("twitterHandle", "#StephenCurry30")
.put("title", "First Unanimous NBA MVP)
// Create the document
JsonDocument stored = myBucket.upsert(JsonDocument.create(person));
No, Couchbase documents have to have a key, that's the whole point of a key-value store, after all. However, if you don't care what the key is, for example, because you retrieve documents through queries rather than by key, you can just use a uuid or any other unique value when creating the document.
It seems there is no way to have Couchbase generate the document IDs for me. At the suggestion of another developer, I am using UUID.randomUUID() to generate the document IDs in my application. The approach is working well for me so far.
Reference: https://forums.couchbase.com/t/create-a-couchbase-document-without-specifying-an-id/8243/4
As you already found out, generating a UUID is one approach.
If you want to generate a more meaningful ID, for instance a "foo" prefix followed by a sequence number, you can make use of atomic counters in Couchbase.
The atomic counter is a document that contains a long, on which the SDK relies to guarantee a unique, incremented value each time you call bucket.counter("counterKey", 1, 2). This code would take the value of the counter document "counterKey", increment it by 1 atomically and return the incremented value. If the counter doesn't exist, it is created with the initial value 2, which is the value returned.
This is not automatic, but a Couchbase way of creating sequences / IDs.