how could I make list of lists from neo4j record - python-3.x

I am facing list manipulation from loop iteration. I am trying to populate a list from Neo4j record
myquery="""MATCH (c :Customer {walletId:$item})-[:MR|:SENDS_MONEY]-(d)-[:PAYS]->(m)
WHERE NOT (c)-[]-(m)
RETURN c.walletId, m.walletId, m.name, COUNT(m.name) ORDER BY COUNT(m.name) DESC LIMIT 30"""
result=graphdbsessionwallet.run(myquery,item=item)
#print(result)
for record in result:
print(list(record))
and my current result is
['01302268120', '01685676658', 'Shojon Medical Hall', 6]
['01302268216', '01733243988', 'APEXFOOTWEAR LIMITED', 1]
and so on
desired
[['01302268120', '01685676658', 'Shojon Medical Hall', 6],['01302268216', '01733243988', 'APEXFOOTWEAR LIMITED', 1]]
I want to put this lists into one list , kindly help me to solve this

You can modify your query to return the list with the help of COLLECT clause:
MATCH (c :Customer {walletId:$item})-[:MR|:SENDS_MONEY]-(d)-[:PAYS]->(m)
WHERE NOT (c)-[]-(m)
WITH c, m, COUNT(m.name) as cnt
ORDER BY cnt DESC
RETURN COLLECT([c.walletId, m.walletId, m.name, cnt])
LIMIT 30

Related

Normalising units/Replace substrings based on lists using Python

I am trying to normalize weight units in a string.
Eg:
1.SUCO MARACUJA COM GENGIBRE PCS 300 Millilitre - SUCO MARACUJA COM GENGIBRE PCS 300 ML
2. OVOS CAIPIRAS ANA MARIA BRAGA 10UN - OVOS CAIPIRAS ANA MARIA BRAGA 10U
3. SUCO MARACUJA MAMAO PCS 300 Gram - SUCO MARACUJA MAMAO PCS 300 G
4. SUCO ABACAXI COM MACA PCS 300Milli litre - SUCO ABACAXI COM MACA PCS 300ML
The keyword table is :
unit = ['Kilo','Kilogram','Gram','Milligram','Millilitre','Milli
litre','Dozen','Litre','Un','Und','Unid','Unidad','Unidade','Unidades']
norm_unit = ['KG','KG','G','MG','ML','ML','DZ','L','U','U','U','U','U','U']
I tried to take up these lists as a table but am having difficulty in comparing two dataframes or tables in python.
I tried the below code.
unit = ['Kilo','Kilogram','Gram','Milligram','Millilitre','Milli
litre','Dozen','Litre','Un','Und','Unid','Unidad','Unidade','Unidades']
norm_unit = ['KG','KG','G','MG','ML','ML','DZ','L','U','U','U','U','U','U']
z='SUCO MARACUJA COM GENGIBRE PCS 300 Millilitre'
#for row in mongo_docs:
#z = row['clean_hntproductname']
for x in unit:
for y in norm_unit:
if (re.search(r'\s'+x+r'$',z,re.I)):
# clean_hntproductname = t.lower().replace(x.lower(),y.lower())
# myquery3 = { "_id" : row['_id']}
# newvalues3 = { "$set": {"clean_hntproductname" : 'clean_hntproductname'} }
# ds_hnt_prod_data.update_one(myquery3, newvalues3)
I'm using Python(Jupyter) with MongoDb(Compass). Fetching data from Mongo and writing back to it.
From my understanding you want to:
Update all the rows in a table which contain the words in the unit array, to the ones in norm_unit.
(Disclaimer: I'm not familiar with MongoDB or Python.)
What you want is to create a mapping (using a hash) of the words you want to change.
Here's a trivial solution (i.e. not best solution but would probably point you in the right direction.)
unit_conversions = {
'Kilo': 'KG'
'Kilogram': 'KG',
'Gram': 'G'
}
# pseudo-code
for each row that you want to update
item_description = get the value of the string in the column
for each key in unit_conversion (e.g. 'Kilo')
see if the item_description contains the key
if it does, replace it with unit_convertion[key] (e.g. 'KG')
update the row

Numerical integration of a numpy array in incremental time steps

I have two arrays. The first one is time in terms of Age (yrs) and the second one is a parameter that needs to be integrated with respect to time.
age = [5.00000e+08, 5.60322e+08, 6.27922e+08, 7.03678e+08, 7.88572e+08,
8.83709e+08, 9.90324e+08, 1.10980e+09, 1.24369e+09, 1.39374e+09,
1.56188e+09, 1.75032e+09, 1.96148e+09, 2.19813e+09, 2.46332e+09,
2.76050e+09, 3.09354e+09, 3.46676e+09, 3.88501e+09, 4.35371e+09,
4.87897e+09, 5.46759e+09, 6.12722e+09, 6.86644e+09, 7.69484e+09,
8.62318e+09, 9.66352e+09, 1.08294e+10, 1.21359e+10, 1.36000e+10]
sfr = [1.86120543e-02, 1.46680445e-02, 1.07275184e-02, 8.56960274e-03,
6.44041855e-03, 4.93194263e-03, 3.69203448e-05, 2.69813985e-04,
6.17644783e-04, 1.00780427e-02, 1.20645391e-02, 3.05009362e-02,
3.91535011e-02, 5.35479858e-02, 7.36489068e-02, 9.63931263e-02,
1.11108326e-01, 1.47781221e-01, 1.63057763e-01, 2.27429626e-01,
2.20941333e-01, 2.74413180e-01, 2.72010867e-01, 4.32215233e-01,
5.79654549e-01, 7.39362218e-01, 9.41168727e-01, 1.18868347e+00,
1.42839043e+00, 1.91326333e+00]
I want to perform integration of sfr array with respect to age array, but in steps.
For example, the first integration should contain only the first elements of both arrays, the second integration should contain the first 2 elements of both arrays, the third should have first 3 elements of both arrays and so on and so forth. And save the integration result for each step in a single output array.
The exact form of your desired result is not so clear. So, here are 2 posibilities:
age = [5.00000e+08, 5.60322e+08, 6.27922e+08, 7.03678e+08, 7.88572e+08,
8.83709e+08, 9.90324e+08, 1.10980e+09, 1.24369e+09, 1.39374e+09,
1.56188e+09, 1.75032e+09, 1.96148e+09, 2.19813e+09, 2.46332e+09,
2.76050e+09, 3.09354e+09, 3.46676e+09, 3.88501e+09, 4.35371e+09,
4.87897e+09, 5.46759e+09, 6.12722e+09, 6.86644e+09, 7.69484e+09,
8.62318e+09, 9.66352e+09, 1.08294e+10, 1.21359e+10, 1.36000e+10]
sfr = [1.86120543e-02, 1.46680445e-02, 1.07275184e-02, 8.56960274e-03,
6.44041855e-03, 4.93194263e-03, 3.69203448e-05, 2.69813985e-04,
6.17644783e-04, 1.00780427e-02, 1.20645391e-02, 3.05009362e-02,
3.91535011e-02, 5.35479858e-02, 7.36489068e-02, 9.63931263e-02,
1.11108326e-01, 1.47781221e-01, 1.63057763e-01, 2.27429626e-01,
2.20941333e-01, 2.74413180e-01, 2.72010867e-01, 4.32215233e-01,
5.79654549e-01, 7.39362218e-01, 9.41168727e-01, 1.18868347e+00,
1.42839043e+00, 1.91326333e+00]
integr_pairs = [[(a, s) for a, s in zip(age[:i], sfr[:i])] for i in range(1, len(age))]
print(integr_pairs)
# [[(500000000.0, 0.0186120543)], [(500000000.0, 0.0186120543), (560322000.0, 0.0146680445)], ....
integr_list = [[item for t in [(a, s) for a, s in zip(age[:i], sfr[:i])] for item in t ]for i in range(1, len(age))]
print(integr_list)
# [[500000000.0, 0.0186120543], [500000000.0, 0.0186120543, 560322000.0, 0.0146680445],

How to sum map values based on two keys?

I have a map with two keys (customer and price) like the one shown below:
[
customer: ['Clinton', 'Clinton', 'Mark', 'Antony', 'Clinton', 'Mark'],
price: [15000.0, 27000.0, 28000.0, 56000.0, 21000.0, 61000.0]
]
customer and price values are mapped by their index positione i.e first name from customer list maps with the first price and so on.
Example
Cliton price is 15000.0
Cliton price 27000.0
Mark price 28000.0
Antony price 56000.0
Clinton price 21000.0
Mark price 61000.0
I would like to sum up price grouped by names. Expected output:
Clinton price 63000
Mark price 89000
Antony price 56000
Are there any built-in functions to achieve this in Groovy, or do I need to iterate over the map and sum values by writing my own functions?
You can start with a transpose on both lists to get tuples of customer and price. From there its basically like the other answers (group by customer, build map with customer and summed up prices). E.g.:
def data = [
customer:['Clinton', 'Clinton', 'Mark', 'Antony', 'Clinton', 'Mark'],
price:[15000.0, 27000.0, 28000.0, 56000.0, 21000.0, 61000.0]
]
println(
[data.customer, data.price].transpose().groupBy{ c, p -> c }.collectEntries{ c, ps -> [c, ps*.last().sum() ] }
)
// => [Clinton:63000.0, Mark:89000.0, Antony:56000.0]
In the problems like this, we should always plan having every entry as separate object inside a list to make it intuitive and future easy manipulation in the list.
In that case the same result can be obtained in naturally
def list = [
[customer: 'Clinton', price: 15000.0],
[customer: 'Clinton', price: 27000.0],
[customer: 'Mark', price: 28000.0],
[customer: 'Antony', price: 56000.0],
[customer: 'Clinton', price: 21000.0],
[customer: 'Mark', price: 61000.0]
]
def map = list.groupBy({it.customer}).collectEntries {k, v -> [k, v.price.sum()]}
map.each {println it}
The followint creates a map of price aggregated by customer:
def vals = (0..(-1+map.customer.size()))
.collect{['name':map.customer[it], 'price': map.price[it]]}
.groupBy{it['name']}
.collectEntries{[(it.key): it.value.collect{it['price']}.sum()]}
That results in:
[Clinton:63000.0, Mark:89000.0, Antony:56000.0]
It's essentially an iteration using a range of numbers from 0 to map.customer.size() - 1, followed by a group-by with sum of values.
This version is derived from #cfrick's answer, with an alternative to the summation. Using a map with default values isn't as "functional"/declarative, but IMHO the code is arguably easier to grok later (i.e. maintenance):
Given:
def map = [
customer: ['Clinton', 'Clinton', 'Mark', 'Antony', 'Clinton', 'Mark'],
price: [15000.0, 27000.0, 28000.0, 56000.0, 21000.0, 61000.0]
]
Approach:
def getSumMap = { data ->
def result = [:].withDefault { c -> 0.0 }
[data.customer, data.price].transpose().each { c, p -> result[c] += p }
result
}
assert 63000.0 == getSumMap(map)['Clinton']
assert 89000.0 == getSumMap(map)['Mark']
assert 56000.0 == getSumMap(map)['Antony']

Python3: Adding two sets of dictionaries into new format

I have two dictionaries,
MaleDict = {'Jason':[(2014, 394),(2013, 350)...],
'Stephanie':[(2014, 3), (2013, 21),..]....}
FemaleDict = {'Jason':[(2014, 56),(2013, 23)...],
'Stephanie':[(2014, 335), (2013, 217),..]....}
I am attempting to add them so that
CompleteDict = {'Jason':[(2014, 394, 56),(2013, 350, 23)...],
'Stephanie':[(2014, 3, 335), (2013, 21, 217),..]....}
I have created a list comprehension that completes the task when the each dictionary has that year present. However, I need the output to present even if the year is not present in one of the MaleDict or FemaleDict. For example, if one year was not in the MaleDict the code would read ...'Stephanie':[....., (1999, 0, 389), ....]...
my list comprehensions are
for name_key in name_keys:
for year_key in year_keys:
[BaseDict[name_key].append((year_key, a[1], b[1])) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key == a[0] == b[0])]
#This is where I am stuck. My list comprehensions dont work when there is no value for a specific year
[BaseDict[name_key].append((year_key, a[1], 0)) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key == a[0] != b[0])]
[BaseDict[name_key].append((year_key, 0, b[1])) for a in MaleDict[name_key] for b in FemaleDict[name_key] if (year_key != a[0] == b[0])]
print(BaseDict)
If your data format is not set in stone, i would consider using a defaultdict from collections:
Instead of [(2014, 394),(2013, 350)...]
use collections.defaultdict(int, {2014: 394, 2013: 350}) etc.
Then you can use
for name_key in name_keys:
for year_key in year_keys:
CompleteDict[name_key].update([FemaleDict[year_key], MaleDict[year_key]])
CompleteDict['Stephanie'][1999] will then be [0, 389]

data.frame slicing

I hope this question is not too simple for this board.
I have created a data.frame df:
CAS Name CID
89 13010-47-4 Lomustine 3950
90 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
91 130636-43-0 Nifekalant 268083
92 130929-57-6 Entacapone 5281081
and a vector vec
[1] 5282380 18471829 45923789 44308022 44266812 24883465 24867475 24867460
I would like to extract the rows of df which contains any number of vec. I tried to solve this problem by this code:
df$GC[(df$CID %in% vec)] = 1
df[df$GC==1,]
But the problem with this solution is, that I only get the rows, which contain only one number in the CID column. Rows which contain several values in CID like line 90 do not appear.
Is there an elegant solution for this problem?
Thanks in advance
Given your comment on EDi's answer (which I like) I thought I'd make a suggestion.
Squeezing comma separated values into a single column of a data frame is awkward and (in my experience) just leads to frustration. I often find it simpler to keep it in a separate data structure, a list:
dat <- read.table(text = " CAS Name CID
13010-47-4 Lomustine 3950
130209-82-4 Latanoprost 5311221,5282380,46705340,3890
130636-43-0 Nifekalant 268083
130929-57-6 Entacapone 5281081",sep = "",header = TRUE)
cid <- sapply(dat$CID,strsplit,",",USE.NAMES = FALSE)
In this form, things are often easier to work with:
ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
dat[sapply(cid,function(x) {any(x %in% as.character(ID))}),]
CAS Name CID
1 13010-47-4 Lomustine 3950
2 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
You can always use rownames in dat and the names of the list to keep each item straight, if you're worried about orderings changing.
(Also note that my anonymous function is assuming that ID will be found eventually by R's scoping rules; you can alter the function to pass in ID explicitly if you like.)
One way is to use grep():
> txt <- " CAS Name CID
+ 13010-47-4 Lomustine 3950
+ 130209-82-4 Latanoprost 5311221,5282380,46705340,3890
+ 130636-43-0 Nifekalant 268083
+ 130929-57-6 Entacapone 5281081
+ "
> con <- textConnection(txt)
> df <- read.table(con, header = TRUE)
> close(con)
> ID <- c(5282380, 18471829, 45923789, 44308022, 44266812, 24883465, 24867475, 24867460, 3950)
> grep(paste("\\b", ID, "\\b", sep="", collapse = "|"), dat$CID)
[1] 1 2

Resources