Clustering algorithm using NodeJS - node.js

I have a set of data:
{ name: 'item1', timeOfDay: '36000', dayOfWeek: '1', dayOfMonth: '15', room: '1'}
{ name: 'item2', timeOfDay: '3600', dayOfWeek: '2', dayOfMonth: '10', room: '2'}
{ name: 'item1', timeOfDay: '18000', dayOfWeek: '3', dayOfMonth: '20', room: '3'}
{ name: 'item3', timeOfDay: '72000', dayOfWeek: '4', dayOfMonth: '5', room: '4'}
Given a new item i'm looking for an algorithm to find the closest items order by distance
{ name: 'item2', timeOfDay: '36000', dayOfWeek: '5', dayOfMonth: '3', room: '2'}
First I looked at kMeans to organise items around centers but I have the feeling I need something sorting the item at the good location.
Multicriteria sort ? but that could learn which criteria is the stronger ?
I don't want to do Array.sort() because:
I need to add a new item without sorting all the array
I need to merge same (closest) values

Related

how to create an object from a python array

I have the following structure, which I convert from a .txt with pandas
[[000001, 'PEPE ', 'S', 'LAST_NAME ', 'CIP ', 'CELLPHONE'],
[0000002, 'LUIS ', 'S', 'ADRESS ', ' ', 'nan'],
[0000003, 'PEDRO ', 'S', 'STREET ', 'CITY', ' nan']]
My code
import pandas as pd
file = 'C:\\Users\\Admin\\Desktop\\PRUEBA.txt'
columns = ("service", "name", "Active", "reference1", "reference2", "reference3")
df = pd.read_csv(file, sep="|", names=columns, header=None)
cl = df.values.tolist()
print(cl)
but to be able to give it the treatment, which it requires, either by removing the empty strings and nan, how can I transform the service to int and create an object based on the service and the references in this way.
[
{ service: 1, name: 'PEPE', order: 0, ref: 'LAST_NAME' },
{ service: 1, name: 'PEPE', order: 1, ref: 'CIP' },
{ service: 1, name: 'PEPE', order: 2, ref: 'CELLPHONE' },
{ service: 2, name: 'LUIS', order: 0, ref: 'ADRESS' },
{ service: 3, name: 'PEDRO', order: 0, ref: 'STREET' },
{ service: 3, name: 'PEDRO', order: 1, ref: 'CITY' }
]
How can I achieve this, very grateful for your comments
Key: Use df.melt() to unpivot the table and subsequently perform df.to_dict(orient='records') to convert the dataframe to a record-oriented dict as mentioned by #QuangHoang. The rest are regular filtering and miscellaneous adjustments.
# data
ls = [['000001', 'PEPE ', 'S', 'LAST_NAME ', 'CIP ', 'CELLPHONE'],
['0000002', 'LUIS ', 'S', 'ADRESS ', ' ', 'nan'],
['0000003', 'PEDRO ', 'S', 'STREET ', 'CITY', ' nan']
]
df = pd.DataFrame(ls, columns=("service", "name", "Active", "reference1", "reference2", "reference3"))
# reformat and strip over each column
for col in df:
if col == "service":
df[col] = df[col].astype(int)
else:
df[col] = df[col].str.strip() # accessor
# unpivot and adjust
df2 = df.melt(id_vars=["service", "name"],
value_vars=["reference1", "reference2", "reference3"],
value_name="ref")\
.sort_values(by="service")\
.drop("variable", axis=1)\
.reset_index(drop=True)
# filter out empty or nan
df2 = df2[~df2["ref"].isin(["", "nan"])]
# generate order numbering by group
df2["order"] = df2.groupby("service").cumcount()
df2 = df2[["service", "name", "order", "ref"]] # reorder
# convert to a record-oriented dict
df2.to_dict(orient='records')
Out[99]:
[{'service': 1, 'name': 'PEPE', 'order': 0, 'ref': 'LAST_NAME'},
{'service': 1, 'name': 'PEPE', 'order': 1, 'ref': 'CIP'},
{'service': 1, 'name': 'PEPE', 'order': 2, 'ref': 'CELLPHONE'},
{'service': 2, 'name': 'LUIS', 'order': 0, 'ref': 'ADRESS'},
{'service': 3, 'name': 'PEDRO', 'order': 0, 'ref': 'STREET'},
{'service': 3, 'name': 'PEDRO', 'order': 1, 'ref': 'CITY'}]

How to add condition in mongoose?

I need a mongoose query, to select only one record in or condition. There is a collection of blogs. Some of them are in English or in French. Certain blogs are a duplicate like they have the same content but in a different language.But they have same 'group_id'.
If I filter blogs based on some criteria, like 'category' only blogs with the 'userLanguage' should show. But if there is no, show the other language. How do I generate a mongoose query for this?
db.find($and:[{category:'Health'},{$or: [{language: 'en'},{language:'fr'}]}])
But it gives 2 records if both 'en' and 'fr' is present, I need only either 'en' or 'fr'. How can I implement this in one query ?
These are some sample documents,
"{_id: '1', category_id: 'xyz', language:'en', group_id: 'aaa'} ",
"{_id: '2', category_id: 'xyz', language:'fr', group_id: 'aaa'}",
"{_id: '3', category_id: 'xyz' language: 'en', group_id: 'bbb'}",
"{_id: '4', category_id: 'xyz', language: 'fr', group_id: 'ccc'}"
I request for category_id : 'xyz' and language: 'en'. So the result should be,
"{_id: '1', category_id: 'xyz', language:'en', group_id: 'aaa'} ",
"{_id: '3', category_id: 'xyz' language: 'en', group_id: 'bbb'}",
"{_id: '4', category_id: 'xyz', language: 'fr', group_id: 'ccc'}"
I have an array of distinct group_id = ['aaa','bbb','ccc']. So the condition is like,
group_id: $in:group_id $$ language should be 'en' if not 'fr'
Use findOne query, which gives you only one result, like this
db.findOne($and:[{category:'Health'},{$or: [{language: 'en'},{language:'fr'}]}]);

Querying a result returned by mongoose in nodejs

I am relatively new to Node.js, Mongoose and MongoDB.
I want to perform filter functionality and want to filter products by criteria selected by the user.
Is it possible in Node.js to query a response returned by Mongoose?
Sample response from Mongoose as below:
[ { _id: 589860c21f9997fce3502f10,
title: 'Watch',
brand: 'PUMA2',
store: 'ZARA',
for: 'MALE',
size: '32',
colour: 'RED',
userId: '58a420cd7c77aca4b3ce34cd' },
{ _id: 5899bd33c28dbdf2b938f698,
title: 'Watch 2',
brand: 'PUMA',
store: 'ZARA',
for: 'MALE',
size: '32',
colour: 'RED',
userId: '58a420cd7c77aca4b3ce34cd' },
{ _id: 5899bd59c28dbdf2b938f69a,
title: 'Watch 4',
brand: 'PUMA',
store: 'ZARA',
for: 'MALE',
size: '32',
colour: 'RED',
userId: '5899bde3c28dbdf2b938f69e' }]
Now how can I query this response to select data based on brand.
For finding PUMA brand, then query will be
db.collection.find({'brand': 'PUMA'})
Thanks for help everyone, finally got nice plugin called array-query which helped me to achieve what I want.
https://www.npmjs.com/package/array-query

merging tables - inserting every nth

I have the instance where I need to merge 2 tables to one where the result of the second is inserted every nth into the merged table.
to illustrate i have a products table and i need to insert an advert in every nth (say 4th) result of the final output. i have made the assumption the best way to handle these is to keep the two data source tables and merge the desired result into a new table (if theres a better way to use just one table and insert into that I'm interested)
the main data source: products
var productDataTable = [
{id: '1', content_type: 'product', name:'t-shirt'},
{id: '2', content_type: 'product', name:'t-shirt2'},
{id: '3', content_type: 'product', name:'t-shirt3'},
// .....
];
secondary datasource : adverts
var advertsTable = [
{id: 'advert1', content_type: 'advert', name: null},
{id: 'advert2', content_type: 'advert', name: null},
{id: 'advert3', content_type: 'advert', name: null},
// ....
];
desired output of final table required - an advert has been inserted every nth
var feedOutput = [
{id: '1', content_type: 'product', name:'t-shirt'},
{id: '2', content_type: 'product', name:'t-shirt2'},
{id: '3', content_type: 'product', name:'t-shirt3'},
{id: 'advert1', content_type: 'advert', name: null},
{id: '4', content_type: 'product', name:'t-shirt4'},
{id: '5', content_type: 'product', name:'t-shirt5'},
{id: '6', content_type: 'product', name:'t-shirt6'},
{id: 'advert2', content_type: 'advert', name: null},
];
query example
var params = {
TableName: 'feedOutput',
};
docClient.scan(params, function(err, data){
console.log(data)
});
Question:
what would be the most appropriate way to joint these two tables in this nth manner ? i am using node.js with the aws sdk.
thanks in advance.

Transform a list of dict to an simpler dict

I have list of dict like this:
[{
'attr': 'bla',
'status': '1',
'id': 'id1'
}, {
'attr': 'bla',
'status': '1',
'id': 'id2'
}, {
'attr': 'bli',
'status': '0',
'id': 'id1'
}, {
'attr': 'bli',
'status': '1',
'id': 'id2'
}]
I wan't to get a simpler results dict like this:
result = {
'bla' : True,
'bli' : False
}
If the two id have a 1 for an attr, the value will be True. else, it will False.
I've tried with
for elem in dict:
for key, value in enumerate(elem):
# ???
But i don't see how to do. I've alos tried something like
if all( val == '1' for val in list ):
# ..
Here you go:
dicts = [{
'attr': 'bla',
'status': '1',
'id': 'id1'
}, {
'attr': 'bla',
'status': '1',
'id': 'id2'
}, {
'attr': 'bli',
'status': '0',
'id': 'id1'
}, {
'attr': 'bli',
'status': '1',
'id': 'id2'
}]
# First run is to create all nessecary items in the
# new Dictionary so i can use the and operator on them later.
newDict = {}
for dictio in dicts:
for key, value in dictio.items():
if key == 'attr':
newDict[value] = True
# The second run uses the and operator
for dictio in dicts:
for key, value in dictio.items():
if key == 'attr':
tmpAttr = value
if key == 'status':
newDict[tmpAttr] = newDict[tmpAttr] and (value == '1')
print(newDict)
Have a nice day!

Resources