I am running a clustering algorithm that clusters a collection of collections of Doubles. However once the clustering is done, I would like to figure out to which parent class each Double belongs to.
class PatientMRNA {
Patient patient
MRNA mrna
Double value
}
I am querying the database with hql and putting the selected values from the PatientMRNA table into a Collection like so:
[[x11,x12...x1m],[x21,x22...x2m]...[xn1, xn2, xnm]]
This collection gets clustered by a very nice algorithm found here: https://coviello.wordpress.com/2013/03/30/learning-functional-programming-a-k-means-implementation-in-groovy/
Once the clustering is done, our result is as follows:
[[centroid]: [[x?1, x?2...x?m],[x?1,x?2...x?m]...[x?1,x?2...x?m]] [centroid2]: [[],[]...[]]
Where each x array (belonging to a patient) value is randomized depending on the cluster it is assigned to.
My question is: Is there any way to extend the Double class in groovy so that it also has a PatientId property? Or should I really be looking at this problem in some other way?
kocko is wrong, as long as ALL of your code is in Groovy, you can use the instance metaClass as shown here:
// a list of patients
def patients = [ 'a', 'b', 'c', 'd', 'e' ]
// A list of doubles
def doubles = [ 5.0, 6.0, 2.0, 1.0, 9.0 ]
// for each double, set it's patient property to the parent at the same index
def decorated = [doubles, patients].transpose().collect { dbl, patient ->
dbl.metaClass.patient = patient
dbl
}
// sort it for fun, to prove it works
def sortedDecorated = decorated.sort(false)
// and print each out
sortedDecorated.each {
println "$it ${it.patient}"
}
That prints:
1.0 d
2.0 c
5.0 a
6.0 b
9.0 e
Of course, if you pass the list of doubles off to some Java code, then kocko is right as Java has no knowledge of the metaClass, so will just return you a list of plain undecorated Doubles
Related
I have a series of dictionaries which each contain the same keys but their values are different i.e Age in dictionary 1 = 2, Age in dictionary 2 = 4 etc etc but they are broadly identical in structure.
what I would like to do is to randomly select one of these dictionaries and then assign specific values with the dictionary to variables. i.e python randomly chooses Dictionary 1 and then I then want to fill the dictAge variable with the age value from Dictionary 1.
import random
dictList = ['myDict', 'otherDict']
mydict = {
'age' : 10,
'other': "dummy data"
}
.
.
.
randomDict = random.choice(dictList)
dictAge = randomDict['age']
print(dictAge)
In the case of the code above what should happen is:
randomDict is assigned a random value from the distList variable (at the top). This sets which dictionary's values will be used going forward.
I next want the dictAge variable to then be assigned the age value from the selected dictionary. In this case (as mydict is was the only dictionary available) it should be assigned the age value of 10.
The error I am getting is:
TypeError: string indices must be integers
I know this is such a common error but my brain can't quite work out what the best solution is.
(Disclaimer: I haven't used python in ages so I know I am doing something really obviously silly but I can't quite work out what to do).
Right now, you are not actually using the definition of your dicts.
This is because dictList is comprised of strings: ['myDict', 'otherDict'].
So, when doing randomDict = random.choice(dictList), randomDict will either be the string 'myDict', or the string 'otherDict'.
Then you are doing randomDict['age'], which means you are trying to slice a string, with a string. As the error suggests, this can't be done and indices can only be ints.
What you want to do, is move the definition of the dictList to be after the definitions of your dicts, and include references to the dicts themselves, not strings. Something like:
mydict = {
'age' : 10,
'other': "dummy data"
}
.
.
.
dictList = [myDict, otherDict]
In the following piece of code:
dictAge = randomDict['age']
You are trying to index the name of dictionary variable (a string) returned by random.choice function.
To make it work you would need to do it using locals:
locals()[randomDict]['age']
or rather correct the dictList to contain the dictionaries instead of their names:
dictList = [myDict, otherDict]
In the latter case please note that myDict and otherDict should be declared before dictList.
I have a map with differnt keys and multiple values.If there is any matching job among different keys,I have to display only one row and grouping code values.
def data = ['Test1':[[name:'John',dob:'02/20/1970',job:'Testing',code:51],[name:'X',dob:'03/21/1974',job:'QA',code:52]],
'Test2':[name:'Michael',dob:'04/01/1973',job:'Testing',code:52]]
for (Map.Entry<String, List<String>> entry : data.entrySet()) {
String key = entry.getKey();
List<String> values = entry.getValue();
values.eachWithIndex{itr,index->
println("key is:"+key);
println("itr values are:"+itr);
}
}
Expected Result : [job:Testing,code:[51,52]]
First flatten all the relevant maps, so whe just have a flat list of all of them, then basically the same as the others suggested: group by the job and just keep the codes (via the spread operator)
def data = ['Test1':[[name:'John',dob:'02/20/1970',job:'Testing',code:51],[name:'X',dob:'03/21/1974',job:'QA',code:52]], 'Test2':[name:'Michael',dob:'04/01/1973',job:'Testing',code:52]]
assert data.values().flatten().groupBy{it.job}.collectEntries{ [it.key, it.value*.code] } == [Testing: [51, 52], QA: [52]]
Note: the question will be changed according to the comments from the other answers.
Above code will give you the jobs and and their codes.
As of now, it's not clear, what the new expected output should be.
You can use the groovy's collection methods.
First you need to extract the lists, since you dont need the key of the top level element
def jobs = data.values()
Then you can use the groupBy method to group by the key "job"
def groupedJobs = jobs.groupBy { it.job }
The above code will produce the following result with your example
[Testing:[[name:John, dob:02/20/1970, job:Testing, code:51], [name:Michael, dob:04/01/1973, job:Testing, code:52]]]
Now you can get only the codes as values and do appropriate changes to make key as job by the following collect function
def result = groupedJobs.collect {key, value ->
[job: key, code: value.code]
}
The following code (which uses your sample data set):
def data = ['Test1':[name:'John', dob:'02/20/1970', job:'Testing', code:51],
'Test2':[name:'Michael', dob:'04/01/1973', job:'Testing', code:52]]
def matchFor = 'Testing'
def result = [job: matchFor, code: data.findResults { _, m ->
m.job == matchFor ? m.code : null
}]
println result
results in:
~> groovy solution.groovy
[job:Testing, code:[51, 52]]
~>
when run. It uses the groovy Map.findResults method to collect the codes from the matching jobs.
I'm trying to build a string that contains all attributes of a class-object. The object name is jsonData and it has a few attributes, some of them being
jsonData.Serial,
jsonData.InstrumentSerial,
jsonData.Country
I'd like to build a string that has those attribute names in the format of this:
'Serial InstrumentSerial Country'
End goal is to define a schema for a Spark dataframe.
I'm open to alternatives, as long as I know order of the string/object because I need to map the schema to appropriate values.
You'll have to be careful about filtering out unwanted attributes, but try this:
' '.join([x for x in dir(jsonData) if '__' not in x])
That filters out all the "magic methods" like __init__ or __new__.
To include those, do
' '.join(dir(jsonData))
These take advantage of Python's dir method, which returns a list of all attributes of an object.
I don't quite understand why you want to group the attribute names in a single string.
You could simply have a list of attribute names as the order of a python list is persist.
attribute_names = [x for x in dir(jsonData) if '__' not in x]
From there you can create your dataframe. If you don't need to specify the SparkTypes, you can just to:
df = SparkContext.createDataFrame(data, schema = attribute_names)
You could also create a StructType and specify the types in your schema.
I guess that you are going to have a list of jsonData records that you want to consider as Rows.
Let's considered it as a list of objects, but the logic would still be the same.
You can do that as followed:
my_object_list = [
jsonDataClass(Serial = 1, InstrumentSerial = 'TDD', Country = 'France'),
jsonDataClass(Serial = 2, InstrumentSerial = 'TDI', Country = 'Suisse'),
jsonDataClass(Serial = 3, InstrumentSerial = 'TDD', Country = 'Grece')]
def build_record(obj, attr_names):
from operator import attrgetter
return attrgetter(*attr_names)(obj)
So the data attribute referred previously would be constructed as:
data = [build_record(x, attribute_names) for x in my_object_list]
record=['MAT', '90', '62', 'ENG', '92','88']
course='MAT'
suppose i want to get the marks for MAT or ENG what do i do? I just know how to find the index of the course which is new[4:10].index(course). Idk how to get the marks.
Try this:
i = record.index('MAT')
grades = record[i+1:i+3]
In this case i is the index/position of the 'MAT' or whichever course, and grades are the items in a slice comprising the two slots after the course name.
You could also put it in a function:
def get_grades(course):
i = record.index(course)
return record[i+1:i+3]
Then you can just pass in the course name and get back the grades.
>>> get_grades('ENG')
['92', '88']
>>> get_grades('MAT')
['90', '62']
>>>
Edit
If you want to get a string of the two grades together instead of a list with the individual values you can modify the function as follows:
def get_grades(course):
i = record.index(course)
return ' '.join("'{}'".format(g) for g in record[i+1:i+3])
You can use index function ( see this https://stackoverflow.com/a/176921/) and later get next indexes, but I think you should use a dictionary.
I have an Object in Groovy like:
class Person {
def name
def age
}
And a collection of persons stored in a map:
Person a = new Person(name: 'A', age:29)
Person b = new Person(name: 'B', age:15)
Map persons = ['1':a, '2':b]
I'm trying to update the age field for all persons, I know that I can do something like:
persons.each{ k,v -> v.age=0 }
But, I was wondering if is there another way to do it without iterating the entire map. As you can see, all persons should have the same value
You can use the spread operator:
persons.values()*.age = 0