Is it possible to do a Lookup use Kiba - kiba-etl

Is it possible to do a "Lookup" with Kiba.
Since it's quite a normal process in a etl.
Could you show a demo if yes, thanks.

Yes, a lookup can be done with Kiba!
For a tutorial, see this live coding session I recorded, I create a lookup transform to lookup extra fields using a given fields by tapping in the MovieDB database.
Leveraging this example, you could for instance implement a simple ActiveRecord lookup using a block transform:
# assuming you have a 'country_iso_2' field in the row above
transform do |row|
country = Country.where(iso_2: row['country_iso_2']).first
row['country_name'] = country.try(:name) || 'Unknown'
row
end
or you could extract a more reusable class transform that you would call like this:
transform ActiveRecordLookup, model: Country,
lookup_on: 'country_iso_2',
fetch_fields: { 'name' => 'country_name' }
transform DefaultValue, 'name' => 'Unknown'
Obviously, if you have the need for large volumes, you will have to implement some improvements (e.g. caching, bulk reading).
Hope this helps!

Related

Sorting custom columns with django-datatables-view

Using Django 3, python 3.6, django-datatable-view 1.19.1
Trying to do a datatable with columns from my model and computed before output ones.
I draw all values that I needed but after trying to sort custom column getting an error:
Cannot resolve keyword 'XXXX' into field. Choices are: ...
I've found a way to register a column as virtual column, without db source, but it was in django-datatable-view 0.5.4 docs but those ways don't work anymore. In last version documentation links with info that I need are unavailable.
Please, help me to figure out, how can I deal with custom computed columns from my model's fields( sort, render )
This is a little tricky now.
To define computed fields:
class ListJson(BaseDatatableView):
columns = ["id", "status_code", "computed_field"]
order_columns = ["id", "status_code"]
# Override render_column method
def render_column(self, row, column):
if column == "computed_field":
return row.computed_field()
else:
return super(ListJson, self).render_column(row, column)
This allows you to return the computed_field method value().
The situation becomes complicated when we want to sort on calculated fields. In this case, it is best to disable serverSide operations in JavaScript
$.extend($.fn.dataTable.defaults, {
serverSide: false,
});
However, you will then have to return all the lines at once, which can kill the server.
If you want to sort on the backend side, you need to arrange and return the appropriate queryset with a virtual field.
class ListJson(BaseDatatableView):
def get_initial_queryset(self):
return qs
Just build your query like THIS.

learning mapreduce in Fauxton

I am brand new to noSQL, couchDB, and mapreduce and need some help.
I have the same question discussed here {How to use reduce in Fauxton} but do not understand the answer:(.
I have a working map function:
function (foo) {
if(foo.type == "blog post");
emit(foo)
}
which returns 11 individual documents. I want to modify this to return foo.type along with a count of 1.
I have tried:
function (doc) {
if(doc.type == "blog post");
return count(doc)
}
and "_count" from the Reduce panel, but clearly am doing something wrong as the View does not return anything.
Thanks in advance for any assistance or guidance!
In Fauxton, the Reduce step is kind of awkward and unintuitive to find.
Select _count in the "Reduce (optional)" popup below where you type
in your Map.
Select "Save Document and then Build Index". That will display your
map results.
Find the "Options" button at the top next to a gears icon. If you see a
green band instead, close the green band with the X.
Select Options, then the "Reduce" check-circle. Select Run Query.
Map
So when you build a map function, you are literally creating a dictionnary or map which are key:value data structures.
Your map function should emit keys that you will query. You can also emit a value but if you intend to simply get the associated document, you don't have to emit any values. Why? Because there is a query parameter that can be used to return the document associated (?include_docs=true).
Reduce
Then, you can have reduce function which will be called for every result with the same keys. Every result with the same key will be processed through your reduce function to reduce the value.
Corrected example
So in your case, you want to map document the document per type I suppose.
You could create a function that emit documents that have the type property.
function(doc){
if(doc.type)
emit(doc.type);
}
If you query this view, you will see that the keys of each rows will be the type of the document. If you choose the _count reduce function, you should have the number of document per types.
When querying the view, you have to specify : group=true&reduce=true
Also, you can get all the document of type blog postby querying with those parameters : ?key="blog post"

loopback relational database hasManyThrough pivot table

I seem to be stuck on a classic ORM issue and don't know really how to handle it, so at this point any help is welcome.
Is there a way to get the pivot table on a hasManyThrough query? Better yet, apply some filter or sort to it. A typical example
Table products
id,title
Table categories
id,title
table products_categories
productsId, categoriesId, orderBy, main
So, in the above scenario, say you want to get all categories of product X that are (main = true) or you want to sort the the product categories by orderBy.
What happens now is a first SELECT on products to get the product data, a second SELECT on products_categories to get the categoriesId and a final SELECT on categories to get the actual categories. Ideally, filters and sort should be applied to the 2nd SELECT like
SELECT `id`,`productsId`,`categoriesId`,`orderBy`,`main` FROM `products_categories` WHERE `productsId` IN (180) WHERE main = 1 ORDER BY `orderBy` DESC
Another typical example would be wanting to order the product images based on the order the user wants them to
so you would have a products_images table
id,image,productsID,orderBy
and you would want to
SELECT from products_images WHERE productsId In (180) ORDER BY orderBy ASC
Is that even possible?
EDIT : Here is the relationship needed for an intermediate table to get what I need based on my schema.
Products.hasMany(Images,
{
as: "Images",
"foreignKey": "productsId",
"through": ProductsImagesItems,
scope: function (inst, filter) {
return {active: 1};
}
});
Thing is the scope function is giving me access to the final result and not to the intermediate table.
I am not sure to fully understand your problem(s), but for sure you need to move away from the table concept and express your problem in terms of Models and Relations.
The way I see it, you have two models Product(properties: title) and Category (properties: main).
Then, you can have relations between the two, potentially
Product belongsTo Category
Category hasMany Product
This means a product will belong to a single category, while a category may contain many products. There are other relations available
Then, using the generated REST API, you can filter GET requests to get items in function of their properties (like main in your case), or use custom GET requests (automatically generated when you add relations) to get for instance all products belonging to a specific category.
Does this helps ?
Based on what you have here I'd probably recommend using the scope option when defining the relationship. The LoopBack docs show a very similar example of the "product - category" scenario:
Product.hasMany(Category, {
as: 'categories',
scope: function(instance, filter) {
return { type: instance.type };
}
});
In the example above, instance is a category that is being matched, and each product would have a new categories property that would contain the matching Category entities for that Product. Note that this does not follow your exact data scheme, so you may need to play around with it. Also, I think your API query would have to specify that you want the categories related data loaded (those are not included by default):
/api/Products/13?filter{"include":["categories"]}
I suggest you define a custom / remote method in Product.js that does the work for you.
Product.getCategories(_productId){
// if you are taking product title as param instead of _productId,
// you will first need to find product ID
// then execute a find query on products_categories with
// 1. where filter to get only main categoris and productId = _productId
// 2. include filter to include product and category objects
// 3. orderBy filter to sort items based on orderBy column
// now you will get an array of products_categories.
// Each item / object in the array will have nested objects of Product and Category.
}

How to convert Rep[T] to T in slick 3.0?

I used a code, generated from slick code generator.
My table has more than 22 columns, hence it uses HList
It generates 1 type and 1 function:
type AccountRow
def AccountRow(uuid: java.util.UUID, providerid: String, email: Option[String], ...):AccountRow
How do I write compiled insert code from generated code?
I tried this:
val insertAccountQueryCompiled = {
def q(uuid:Rep[UUID], providerId:Rep[String], email:Rep[Option[String]], ...) = Account += AccountRow(uuid, providerId, email, ...)
Compiled(q _)
}
I need to convert Rep[T] to T for AccountRow function to work. How do I do that?
Thank you
;TLDR; Not possible
Explanation
There are two levels of abstraction in Slick: Querys and DBIOActions.
When you're dealing with Querys, you have to access your schema definitions, and rows, Reps and, basically, it's very constrained as it's the closest level of abstraction to the actual DB you're using. A Rep refers to an hypothetical value in the database, not in your program.
Then you have DBIOActions, which are the next level... not just some definition of a query, but the execution of it. You usually get DBIOActions when getting information out of a query, like with the result method or (TADAN!) when inserting rows.
Inserts and Updates are not queries and so what you're trying to do is not possible. You're dealing with DBIOAction (the += method), and Query stuff (the Rep types). The only way to get a Rep inside a DBIOAction is by executing a Query and obtaining a DBIOAction and then composing both Actions using flatMap or for comprehensions (which is the same).

Couchdb: filter and group in a single view

I have a Couchdb database with documents of the form: { Name, Timestamp, Value }
I have a view that shows a summary grouped by name with the sum of the values. This is straight forward reduce function.
Now I want to filter the view to only take into account documents where the timestamp occured in a given range.
AFAIK this means I have to include the timestamp in the emitted key of the map function, eg. emit([doc.Timestamp, doc.Name], doc)
But as soon as I do that the reduce function no longer sees the rows grouped together to calculate the sum. If I put the name first I can group at level 1 only, but how to I filter at level 2?
Is there a way to do this?
I don't think this is possible with only one HTTP fetch and/or without additional logic in your own code.
If you emit([time, name]) you would be able to query startkey=[timeA]&endkey=[timeB]&group_level=2 to get items between timeA and timeB grouped where their timestamp and name were identical. You could then post-process this to add up whenever the names matched, but the initial result set might be larger than you want to handle.
An alternative would be to emit([name,time]). Then you could first query with group_level=1 to get a list of names [if your application doesn't already know what they'll be]. Then for each one of those you would query startkey=[nameN]&endkey=[nameN,{}]&group_level=2 to get the summary for each name.
(Note that in my query examples I've left the JSON start/end keys unencoded, so as to make them more human readable, but you'll need to apply your language's equivalent of JavaScript's encodeURIComponent on them in actual use.)
You can not make a view onto a view. You need to write another map-reduce view that has the filtering and makes the grouping in the end. Something like:
map:
function(doc) {
if (doc.timestamp > start and doc.timestamp < end ) {
emit(doc.name, doc.value);
}
}
reduce:
function(key, values, rereduce) {
return sum(values);
}
I suppose you can not store this view, and have to put it as an ad-hoc query in your application.

Resources