I have an in memory database that is fairly deep, I think, I have found the best implementation of the code, if anyone has a better way to do a double forEach loop in lodash or if I am using it wrong somehow would be great to know.
The Object:
smallDB = {
user1:{
permissions:[],
locations:[
{ip:'0.0.0.0',messages:[]},
{ip:'',messages:[]}
]
},
user2:{
permissions:[],
locations:[
{ip:'0.0.0.0',messages:[]},
{ip:'0.0.1.0',messages:[{mid:'a unique id','user':'the sender',message:'the text of the message'}]}
]
}
}
sending a message I use:
ld.forEach smallDB, (a)->
ld.forEach a.locations, (b)->
b.messages.push {mid,user,message}
I just wanted to make sure this is the best way to add messages into the mini database for all the users in smallDB and in all their locations
There's one very easy optimization you can do if the objects you create with {mid,user,message} are immutable:
record = {mid,user,message}
ld.forEach smallDB, (a)->
ld.forEach a.locations, (b)->
b.messages.push record
If you do this, you save yourself the cost of creating a new object each time you push. Dropping the forEach in favor of using for would also make things faster.
It would be further optimizable by allowing a more direct access to the arrays that you access as b.messages in your original code, but this would complicate addition and removal of users or locations. I don't think is worth it unless proved by profiling that it is not a premature optimization.
Related
The Situation
I recently started working on a new project using nodejs. I have a background of using Python/Django and C#/.NET (not a huge fan of the latter). Node is awesome, but I must say I miss the ease of building models and automating migrations in Django. I am currently using the AdonisJS framework which leverages Knex. Knex is a powerful library, but the migrations all need to be manually built. Additionally, the AdonisJS ORM that manages the Models is independent of Knex (migration manager). You also do not define field attributes on the Models, which can have benifits for dynamically doing things in the front and back end. All things considered, there is a lot of room for human error, miscommunication and a boat load more typing required. I know the the hot thing these days is to keep it loose and fast, but for this specific project, I am looking for a bit more structure than loosely defined models.
Current State
What I have landed on is building a new Class called tableModel and a field class to define the fields within table model. I have already completed this and I am successfully writing the migration files leveraging mustache. I plan on also automatically writing the Models which I shouldn't have a problem with (fingers crossed).
The Problem
Here is where it gets a little tough and where I need help...I need to track what has been added or removed via migration so I can effectively write ups and downs as the tableModels change over time.
So let's say I add a "tableModel" which creates a migration to create table Foo with fields {id (bigint), user_id(int), name(string255)}
Later I want to add a field called description so I would simply add it to my "tableModel" and then run a build command which would build out the migration.
How do I check what has already been created though so I only do an up() for description?
Then I want to remove the name field so I mark it out in my "tableModel" and run a build migration command. How do I check what has been migrated that now needs to be added in to the down().
Edit: I would add a remove field to the up and the corresponding roll back to the down.
Bonus Round
Let's say I want to change user_id from an int to a bigint, because who makes a foreign key just an int? How do I check not just what needs to be added to the up and down, but also checks if I need to change a property on a field.
Edit: would just write the up. and a corresponding roll back to the down
The Big Question
Basically, how do I define dirty "tableModels" classes
Possible Solution?
I am thinking that maybe I should capture some type of registry or snapshot and then run the comparison when building the migrations and or models, then recapture/snapshot. If this is the route, should I store in a json file, write this to the DB itself, or is there another/better option.
If I create the tableModel instances as constants, could I actually write back to the JS file and capture the snapshot as an attribute? IF this is an option, is Node's file system the way to go and what's the best way to do this? Node keep suprising me so I wouldn't be baffled if any of these are an option.
Help!
If anyone has gone down this path before or knows of any tools I could leverage, I would greatly appreciate it and thank you in advance. Also, if I am headed in a completely wrong direction, then please let me know, I both handle and appreciate all types of feedback.
Example
Something to note, when I define the "tableModel" for a given migration or model, it is an instance of the class, I am not creating an extended class since this is not my orm.
class tableModel {
constructor(tableName, modelName = tableName, fields = []) {
this.tableName = tableName
this.modelName = modelName
this.fields = fields
}
// Bunch of other stuff
}
fooTableModel = new tableModel('fooTable', 'fooModel', fields = [
new tableField.stringField('title'),
new tableField.bigIntField('related_user_id'),
new tableField.textField('description','Testing Default',false,true)
]
)
which equates to:
tableModel {
tableName: 'fooTable',
modelName: 'fooModel',
fields:
[ stringField {
name: 'title',
type: 'string',
_unique: false,
allow_null: null,
fieldAttributes: {},
default_value: null },
bigIntField {
name: 'related_user_id',
type: 'bigInteger',
_unique: false,
allow_null: null,
fieldAttributes: {},
default_value: 0 },
textField {
name: 'description',
type: 'text',
_unique: false,
allow_null: true,
fieldAttributes: {},
default_value: 'Testing Default' } ]
You have the up and down notation mixed up. Those are for migrating the "latest" (runs the up function) and doing rollbacks (runs the down function). Up and down to not relate to dropping or adding table columns.
The migrations up is for any change, and the down is to reverse those changes. So if you wanted to drop a column from some table, you write the command in the up, then write the opposite in the down (you'd add it back in...), such that you can "rollback" and the change is effectively reversed. You have to be careful with such things though, as you can put yourself in a situation where you actually lose data.
Want to add a column? Write it in the up, and drop the column in the down.
One of the major points behind the migrations mechanism is to track the state of changes of your database, as time goes forward. So generally, if you created a table in some migration, then a day or so later you realize you need to drop/add columns, you normally don't go back and edit the existing migration, especially if the migration has already been run. You'd just write a new migration to drop/add your column.
Since you're using knex, there are a couple "knex" tables that get created. By default the one you're looking for is knex_migrations, unless someone specifically modified the settings to change the name of it. This table holds all the migrations that have run against your DB, per batch. From the CLI, assuming you have knex.js installed globally, you can run knex migrate:latest, and that will push all the migrations that exist in your directory to the target database, if they have not yet been run. It does this by way of examining that knex_migrations table. If you roll a change and don't like it, and assuming you've properly done the down function, you can invoke knex migrate:rollback to reverse the change. If there are 3 migration files that have NOT yet been run, invoking knex migrate:latest will run all 3 of those migration files under a new batch #, which is 1 higher than the most recent batch number. Conversely, if you invoke a knex migrate:rollback, it will find the highest batch number (there could be more than 1 migration in a batch...), and invoke the down function on all those files, effectively rollback those changes.
All that said, knex is a "query builder" tool. It's got a ton of helper functions to help build the sql for you. Personally, I find this to be a major distraction. Why spend hours on hours figuring out all the helper functions when I can just go crank out raw SQL and run that. Thus, that's what we've done in our system. we use knex.raw('') and write our own DDL and DML. It works great and does exactly what we need it to. We don't need to go figure out the magic of the query building.
The short answer is that knex will automatically know what has and has not been run for you (again, via that knex_migrations table it creates for you...).
Things can get weird though when it start involving git and different branches. I recommend that if you're writing migrations on some branch, and you need to go do other work, always remember to first perform a rollback of any migrations you've done in that branch BEFORE switching branches. Otherwise you will be in weird DB states that don't coincide with the application code.
I would personally just deal with updating models independently of writing migrations. For example, if I'm adding a description column to some table, then I probably want to manually update the ORM to reflect the change of the new db schema. Generally, I've found trying to use a tool that automagically does that for you (rather, if I change the orm, stuff happens to write all the underlying sql...) usually winds me up in a heap of trouble and I just spend more time trying to un-fudge stuff. But, that's just my 2 cents :)
Here is where it gets a little tough and where I need help...I need to track what has been added or removed via migration so I can effectively write ups and downs as the tableModels change over time.
You could store changes in a DB/txt file and those can act as snapshots. So when you want to rollback to a particular migration, you would find the changes (up/down) made for that mutation and adjust accordingly.
Later I want to add a field called description so I would simply add it to my "tableModel" and then run a build command which would build out the migration. How do I check what has already been created though so I only do an up() for description?
Here you either call the database itself directly and check what fields have already been created. If a field is already their and the attributes are the same, you can either ignore it or stop the transaction all together.
Bonus Round Let's say I want to change user_id from an int to a bigint, because who makes a foreign key just an int? How do I check not just what needs to be added to the up and down, but also checks if I need to change a property on a field.
Again, call the DB itself on the table in question. I know the SQL call would be:
describe [table_name];
After reading the end, I think you answered this yourself, but I think capturing these changes would work best in a NoSql database since you're using Node or PostGres with it's json field.
I would like to store a value in the config file and look it up in the design document for comparing against update values. I'm sure I have seen this but, for the life of me, I can't seem to remember how to do this.
UPDATE
I realize (after the first answer) that there was more than one way to interpret my question. Hopefully this example clears it up a little. Given a configuration:
curl -X PUT http://localhost:5984/_config/shared/token -d '"0123456789"'
I then want to be able to look it up in my design document
{
"_id": "_design/loadsecrets",
"validate_doc_update": {
"test": function (newDoc,oldDoc) {
if (newDoc.supersecret != magicobject.config.shared.token){
throw({unauthorized:"You don't know the super secret"});
}
}
}
}
It's the abilitly to do something like the magicobject.config.shared.token that I am looking for.
UPDATE 2
Another potentially useful (contrived) scenario
curl -X PUT http://trustedemployee:5984/_config/eventlogger/detaillevel -d '"0"'
curl -X PUT http://employee:5984/_config/eventlogger/detaillevel -d '"2"'
curl -X PUT http://vicepresident:5984/_config/eventlogger/detaillevel -d '"10"'
Then on devices tracking employee behaviour:
{
"_id": "_design/logger",
"updates": {
"logger": function (doc,req) {
if (!doc) {
doc = {_id:req.id};
}
if(req.level < magicobject.config.eventlogger.detaillevel ){
doc.details = req.details;
}
return [doc, req.details];
}
}
}
Here's a follow-up to my last answer with more general info:
There is no general way to use configuration, because CouchDB is designed with scalability, stability and predictability in mind. It has been designed using many principles of functional programming and pure functions, avoiding side effects as much as possible. This is a Good Thing™.
However, each type of function has additional parameters that you can use, depending on the context the function is called with:
show, list, update and filter functions are executed for each request, so they get the request object. Here you have the req.secObj and req.userCtx to (ab)use for common configuration. Also, AFAIK the this keyword is set to the current design document, so you can use the design doc to get common configuration (at least up to CouchDB 1.6 it worked).
view functions (map, reduce) don't have additional parameters, because the results of a view are written to disk and reused in subsequent calls. map functions must be pure (so don't use e.g. Math.random()). For shared configuration across view functions within a single design doc you can use CommonJS require(), but only within the views.lib key.
validate doc update functions are not necessarily executed within a user-triggered http request (they are called before each write, which might not be triggered only via http). So they have the userCtx and secObj added as separate parameters in their function signature.
So to sum up, you can use the following places for configuration:
userCtx for user-specific config. Use a special role (e.g. with a prefix) for storing small config bits. For example superLogin does this.
secObj for database-wide config. Use a special member name for small bits (as you should normally use roles instead of explicit user names, secObj.members.names or secObj.admins.names is a good place).
the design doc itself for design-doc-wide config. Best use the this.views.lib.config for this, as you can also read this key from within views. But keep in mind that all views are invalidated as soon as you change this key. So if the view results will stay the same no matter what the config values are, it might be better to use a this.config key.
Hope this helps! I can also add examples if you wish.
I think I know what you're talking about, and if I'm right then what you are asking for is no longer possible. (at least in v1.6 and v2.0, I'm not sure when this feature was removed)
There was a lesser-known trick that allowed a view/show/list/validation/etc function to access the parent design document as this in your function. For example:
{
"_id": "_design/hello-world",
"config": {
"PI": 3.14
},
"views": {
"test": {
"map": "function (doc) { emit(this.config.PI); })"
}
}
}
This was a really crazy idea, and I imagine it was removed because it created a circular dependency between the design document and the code of the view that made the process of invalidating/rebuilding a view index a very tricky affair.
I remember using this trick at some point in the distant past, but the feature is definitely gone now. (and likely to never return)
For your special use-case (validating a document with a secret token), there might be a workaround, but I'm not sure if the token might leak in some place. It all depends what your security requirements are.
You could abuse the 4th parameter to validate_doc_update, the securityObject (see the CouchDB docs) to store the secret token as the first admin name:
{
"test": "function (newDoc, oldDoc, userCtx, secObj) {
var token = secObj.admins.names[0];
if (newDoc.supersecret != token) {
throw({unauthorized:"You don't know the super secret"});
}
}"
}
So if you set the db's security object to {admins: {names: ["s3cr3t-t0k3n"], roles: ["_admin"]}}, you have to pass 's3cr3t-t0k3n' as the doc's supersecret property.
This is obviously a dirty hack, but as far as I remember, the security object may only be read or modified by admins, you wouldn't immediately leak your token to the public. But consider adding a separate layer between the CouchDB and your caller if you need "real" security.
I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}
Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.
I have a cakephp 1.3 application and I have run into a 'data leak' security hole. I am looking for the best solution using cake and not just something that will work. The application is a grade tracking system that lets teachers enter grades and students can retrieve their grades. Everything is working as expected but when I started to audit security I found that the basic CRUD operations have leaks. Meaning that student X can see student Y's grades. Students should only see their own grades. I will limit this questions to the read operation.
Using cake, I have a grade_controller.php file with this view function:
function view($id = null) {
// Extra, not related code removed
$this->set('grade', $this->grade->read(null, $id));
}
And
http://localhost/grade/view/5
Shows the grade for student $id=5. That's great. But if student #5 manipulates the URL and changes it to a 6, person #6's grades are shown. The classic data leak security hole.
I had two thoughts on the best way to resolve this. 1) I can add checks to every CRUD operations called in the controller. Or 2) add code to the model (for example using beforeFind()) to check if person X has access to that data element.
Option #1 seems like it is time consuming and error prone.
Option #2 seem like the best way to go. But, it required calling find() before some operations. The read() example above never executes beforeFind() and there is no beforeRead() callback.
Suggestions?
Instead of having a generic read() in your controller, you should move ALL finds, queries..etc into the respective model.
Then, go through each model and add any type of security checks you need on any finds that need to be restricted. 1) it will be much more DRY coding, and 2) you'll better be able to manage security risks like this since you know where all your queries are held.
For your example, I would create a getGrade($id) method in my Grade model and check the student_id field (or whatever) against your Auth user id CakeSession::read("Auth.User.id");
You could also build some generic method(s) similar to is_owner() to re-use the same logic throughout multiple methods.
If CakePHP supports isAuthorized, here's something you could do:
Create a column, that has the types of users (eg. 'student', 'teacher', ...)
Now, it the type of User is 'student', you can limit their access, to view only their data. An example of isAuthorized is as follows. I am allowing the student to edit only their profile information. You can extend the concept.
if ((($role['User']['role'] & $this->user_type['student']) == $this->user_type['student']) {
if (in_array($this->action, array('view')) == true) {
$id = $this->params->pass[0];
if ($id == $user_id) {
return (true);
}
}
}
}
This should be simple, but I am still lost.
There is a very similar post here: How to share data between Drools rules in a map?
but it doesn't fix my problem:
I have a set of rules and, before launching them, I insert a Map<String, Object> as a fact.
In these rules I use the map to write some conclusions like:
when
$map : Map();
something ocurrs;
then
$map.put("conclusion1", 100);
Now I would like to use these intermediate conclusions in other rules, something like:
when
$map : Map(this["conclusion1"] > 50)
then
do something cool;
The problem is that when I execute the rules it is like the second rule doesn't see the conclusions of the first one, and it does not fire.
I have tried putting a break point and analysing the working memory, and, in fact, the Map would contain the conclusion1, 100 after the first rule is fired.
I have also tried by making an update($map) in the conclusion, but that would trigger an infinite loop.
Any idea of why this wouldn't work, or any alternative solution to my problem?
Thanks !
When you modify a fact you need to notify the engine you are doing so. One of the ways of doing that is by using modify(). E.g.:
when
$map : Map();
something ocurrs;
then
modify( $map ) {
put("conclusion1", 100)
}
end