Avoid duplicates in a set - groovy

I have an array of objects which is defined as below:
def list = [{'name':'test','grade':1,'num':1},{'name':'test1','grade':2,'num':2},{'name':'test','grade':1,'num':1}]
I am trying to avoid duplicate of num values so i tried the below way:
//Set<String> studentArray = new HashSet<String>(Arrays.asList(studentList.num));
HashSet <String> studentInfo = new HashSet <String>();
studentInfo.addAll(list.num)
println("Information:"+studentInfo);
Now I can see distinct values but in the console, I see the value is appending with an array like [1]. How to see only the value?

HashSet does not allow to duplicate values. The code you have constructed creates a set of a single list of elements 1, 2 and 1. If you print studentArray to the console you will see something like this:
[[1, 2, 1]]
And this is correct because the type of the constructed structure is Set<List<Integer>>. The way you use set in this case would prevent from adding another list [1,2,1].
If you want to create a set like [1,2] then you can cast studentList.num as Set.
def studentList = [[name:'test',grade:1,num:1],[name:'test1',grade:2,num:2],[name:'test',grade:1,num:1]]
def studentNums = studentList.num as Set
assert studentNums == [1,2] as Set

Related

How to define a function that will convert list into dictionary while also inserting 'None' where there is no value for a key (Python3)?

Say we have a contact_list like this:
[["Joey", 30080],["Miranda"],["Lisa", 30081]]
So essentially "Miranda" doesn't have a zipcode, but with the function I want to define, I'd like it to automatically detect that and add "None" into her value slot, like this:
{
"Joey": 30080,
"Miranda": None,
"Lisa": 30081
}
So far I have this, which just converts the list to a dict:
def user_contacts(contact_list):
dict_contact = dict(contact_list)
print(dict_contact)
Not sure where I go from here, as far as writing in the code to add the None for "Miranda". Currently, I just get an error that says the 1st element ("Miranda") requires two lengths instead of one.
Eventually I want to just a pass any list like the one above in the defined function: user_contacts and, again, be able to get the dictionary above as the output.
user_contacts([["Joey", 30080],["Miranda"],["Lisa", 30081]])
so here is what you can do. you can check to see if the len of a certain element in your list, meets the expectations (in this case, 2 for name and zipcode). then if it fails the expectations, you can add "none":
contacts = [["Joey", 30080], ["Miranda"], ["Lisa", 30081]]
contacts_dict = {}
for c in contacts:
if len(c) < 2:
c.append('None')
contacts_dict.update({c[0]: c[1]})
print(contacts_dict)
and the output is:
{'Joey': 30080, 'Miranda': 'None', 'Lisa': 30081}
Try this:
def user_contacts(contact_list):
dict_contact = dict((ele[0], ele[1] if len(ele) > 1 else None) for ele in
contact_list)
print(dict_contact)

Assigning specific dictionary values to variables

I have a series of dictionaries which each contain the same keys but their values are different i.e Age in dictionary 1 = 2, Age in dictionary 2 = 4 etc etc but they are broadly identical in structure.
what I would like to do is to randomly select one of these dictionaries and then assign specific values with the dictionary to variables. i.e python randomly chooses Dictionary 1 and then I then want to fill the dictAge variable with the age value from Dictionary 1.
import random
dictList = ['myDict', 'otherDict']
mydict = {
'age' : 10,
'other': "dummy data"
}
.
.
.
randomDict = random.choice(dictList)
dictAge = randomDict['age']
print(dictAge)
In the case of the code above what should happen is:
randomDict is assigned a random value from the distList variable (at the top). This sets which dictionary's values will be used going forward.
I next want the dictAge variable to then be assigned the age value from the selected dictionary. In this case (as mydict is was the only dictionary available) it should be assigned the age value of 10.
The error I am getting is:
TypeError: string indices must be integers
I know this is such a common error but my brain can't quite work out what the best solution is.
(Disclaimer: I haven't used python in ages so I know I am doing something really obviously silly but I can't quite work out what to do).
Right now, you are not actually using the definition of your dicts.
This is because dictList is comprised of strings: ['myDict', 'otherDict'].
So, when doing randomDict = random.choice(dictList), randomDict will either be the string 'myDict', or the string 'otherDict'.
Then you are doing randomDict['age'], which means you are trying to slice a string, with a string. As the error suggests, this can't be done and indices can only be ints.
What you want to do, is move the definition of the dictList to be after the definitions of your dicts, and include references to the dicts themselves, not strings. Something like:
mydict = {
'age' : 10,
'other': "dummy data"
}
.
.
.
dictList = [myDict, otherDict]
In the following piece of code:
dictAge = randomDict['age']
You are trying to index the name of dictionary variable (a string) returned by random.choice function.
To make it work you would need to do it using locals:
locals()[randomDict]['age']
or rather correct the dictList to contain the dictionaries instead of their names:
dictList = [myDict, otherDict]
In the latter case please note that myDict and otherDict should be declared before dictList.

Read indices stored in a list

I have a dataset which is stored as a list. I want to be able to retrieve different pieces of the data and alter them. The indices of pieces I need are stored in a different list.
For example:
data_list = [[[1,2],[3,4]],[5,6]]
indices = [[0,0,1],[1,0]]
In this case I might want to retrieve data_list[0][0][1] and data_list[1][0] and change them to value 6, but I cannot simply do data_list[indices[0]] = 6. Is there a good way to do this?
You can try to loop over all the keys/sub-keys until you get the data you need.
What you can do is set a variable to a reference to the data_list and loop over the indexes and shift the reference until it's pointing to the lowest nested list.
Then you can set the value in that lowest list to whatever value you need.
data_list = [[[1,2],[3,4]],[5,6]]
indices = [[0,0,1],[1,0]]
for *path, final in indices:
val = data_list
for i in path:
val = val[i]
val[final] = 6
print(data_list)

How do I build a string of variable names?

I'm trying to build a string that contains all attributes of a class-object. The object name is jsonData and it has a few attributes, some of them being
jsonData.Serial,
jsonData.InstrumentSerial,
jsonData.Country
I'd like to build a string that has those attribute names in the format of this:
'Serial InstrumentSerial Country'
End goal is to define a schema for a Spark dataframe.
I'm open to alternatives, as long as I know order of the string/object because I need to map the schema to appropriate values.
You'll have to be careful about filtering out unwanted attributes, but try this:
' '.join([x for x in dir(jsonData) if '__' not in x])
That filters out all the "magic methods" like __init__ or __new__.
To include those, do
' '.join(dir(jsonData))
These take advantage of Python's dir method, which returns a list of all attributes of an object.
I don't quite understand why you want to group the attribute names in a single string.
You could simply have a list of attribute names as the order of a python list is persist.
attribute_names = [x for x in dir(jsonData) if '__' not in x]
From there you can create your dataframe. If you don't need to specify the SparkTypes, you can just to:
df = SparkContext.createDataFrame(data, schema = attribute_names)
You could also create a StructType and specify the types in your schema.
I guess that you are going to have a list of jsonData records that you want to consider as Rows.
Let's considered it as a list of objects, but the logic would still be the same.
You can do that as followed:
my_object_list = [
jsonDataClass(Serial = 1, InstrumentSerial = 'TDD', Country = 'France'),
jsonDataClass(Serial = 2, InstrumentSerial = 'TDI', Country = 'Suisse'),
jsonDataClass(Serial = 3, InstrumentSerial = 'TDD', Country = 'Grece')]
def build_record(obj, attr_names):
from operator import attrgetter
return attrgetter(*attr_names)(obj)
So the data attribute referred previously would be constructed as:
data = [build_record(x, attribute_names) for x in my_object_list]

How can I add to Python dictionary value using string keys

I want to add string dictionary keys like this:
x = "%s-%s-%s %s:%s:00"%(dt.year,dt.month,dt.day,dt.hour,dt.minute)
dict[x] +=a1
But it gives me an error like this:
KeyError: '2015-11-26 8:47:00'
If I try print type(x) it prints str
But if i try this:
dict = {}
x = "abc"
dict[x] = 1
print dict
it print to this:
{'abc': 1}
I don't understand what is the difference.
First error is that you named your dictionary dict. That name's
already being used; it's the name of the dictionary type. Overwriting an
existing name like this is called "shadowing". Don't do it, it will mess
you up.
You're using +=. This implies that there's already a value associated
with the key, which can be incremented. If that key isn't in the dict
yet, you get a KeyError.
You probably want to set a default value of zero. This can be done in
various ways. The simplest is:
d[x] = d.get(x, 0) + a1
Also see the collections standard library, which has a defaultdict
type.

Resources