(Julia 1.x) Set value of array index in struct - struct

A Dict containing multiple Array values can have its data altered elementwise by iterating over the corresponding keys and the arrays themselves, along the lines of:
"""dictdemo.jl"""
tempdict = Dict{String, Any}("a"=>zeros(1),
"b"=>zeros(2),
"c"=>zeros(3),
"x"=>zeros(4),
"y"=>zeros(5),
"z"=>zeros(6))
for var ∈ ["x", "y", "z"]
for i in eachindex(tempdict[var])
tempdict[var][i] = rand()
end
end
for key in sort(collect(keys(tempdict)))
println("$key: $(tempdict[key])")
end
$> julia dictdemo.jl
a: [0.0]
b: [0.0, 0.0]
c: [0.0, 0.0, 0.0]
x: [0.0444697, 0.715464, 0.703251, 0.0739732]
y: [0.168588, 0.548075, 0.923591, 0.124419, 0.753477]
z: [0.481123, 0.976423, 0.00690676, 0.0602968, 0.326228, 0.448793]
Akin to this, I have a struct which contains multiple fields of type Array and am attempting to alter the values within those arrays elementwise for multiple fields at a time. I know of a few methods by which the array values may be set, but they are all limited in either being unable to iterate over multiple fields (mystruct.field[indices] = value) or being unable to set individual elements (setfield!(mystruct, field, value), mystruct.field = value).
"""structdemo.jl"""
mutable struct MyStruct
a::Array{Float64,1}
b::Array{Float64,1}
c::Array{Float64,1}
x::Array{Float64,1}
y::Array{Float64,1}
z::Array{Float64,1}
MyStruct() = new(zeros(1),
zeros(2),
zeros(3),
zeros(4),
zeros(5),
zeros(6))
end
tempstruct = MyStruct()
setfield!(tempstruct, Symbol("x"), [rand(), rand(), rand(), rand()])
tempstruct.y = [rand(), rand(), rand(), rand(), rand()]
for i in eachindex(tempstruct.z)
tempstruct.z[i] = rand()
end
for f in fieldnames(typeof(tempstruct))
println("$f: $(getfield(tempstruct, f))")
end
$> julia structdemo.jl
a: [0.0]
b: [0.0, 0.0]
c: [0.0, 0.0, 0.0]
x: [0.222734, 0.796599, 0.565279, 0.0488704]
y: [0.67695, 0.367068, 0.384466, 0.160438, 0.154411]
z: [0.744013, 0.0358193, 0.466726, 0.562945, 0.895279, 0.815217]
I am looking to have something of the form (except my values are not set by rand()):
for var ∈ ["x", "y", "z"]
for i in eachindex(tempstruct.Symbol(var))
tempstruct.Symbol(var)[i] = rand()
end
end
My question is then, is this possible?

You can use getproperty function like this:
for var ∈ ["x", "y", "z"]
field = getproperty(tempstruct,Symbol(var))
for i in eachindex(field)
field[i] = rand()
end
end
You can use propertynames function to get a Tuple of Symbols that give you a list of names of properties of your struct:
julia> propertynames(tempstruct)
(:a, :b, :c, :x, :y, :z)
Also note that you could just write [:x, :y, :z] instead of ["x", "y", "z"] and there is no need of Symbol(var) conversion.
As an additional information it is good to know that some types override getpropety function (e.g. have a look at DataFrames.jl type DataFrame structure which does not return its fields via getproperty but its columns). In such cases a direct access to fields of a struct can be gained by the getfield function and the fieldnames function gives you a list of field names of your type.

Related

Pick a struct element by field name and some non-sequential index

I mean to use a struct to hold a "table":
% Sample data
% idx idxstr var1 var2 var3
% 1 i01 3.5 21.0 5
% 12 i12 6.5 1.0 3
The first row contains the field names.
Assume I created a struct
ds2 = struct( ...
'idx', { 1, 12 }, ...
'idxstr', { 'i01', 'i12' }, ...
'var1', { 3.5, 6.5 }, ...
'var2', { 21, 1 }, ...
'var3', { 5, 3 } ...
);
How can I retrieve the value for field var2, for the row corresponding to idxstr equal to 'i01'?
Notes:
I cannot ensure the length of idxstr elements will always be 3.
Ideally, I would have a method that also works for columns var2 containing strings, or any other type of variable.
PS: I think https://stackoverflow.com/a/35976320/2707864 can help.
As I mentioned in the comments, I believe you have the wrong kind of struct for this work. Instead of an array of (effectively single-row) structs, you should instead have a single struct with 'array' fields. (numeric or cell, as appropriate).
E.g.
d = struct(
'idx', [1, 12 ],
'idxstr', {{'i01', 'i12'}},
'var1', [3.5, 6.5],
'var2', [21, 1],
'var3', [5, 3]
);
With this structure, your problem becomes infinitely easier to deal with:
d.var2( strcmp( 'i01', d.idxstr ) )
% ans = 21
This is also far more comparable to R / pandas dataframes functionality (which are also effectively initialised via names and equally-sized arrays like this).
PS. Note carefully the syntax used for the 'idxstr' field: there is an 'outer' cell array with a single element, meaning you're only creating a single struct, rather than an array of structs. This single element happens to be a cell array of strings, where this cell array is of the same size (i.e. has the same number of 'rows') as the numeric arrays.
UPDATE
In response to the comment, adding 'rows' should be fairly straightforward. Here is one approach:
function S = addrow( S, R )
FieldNames = fieldnames( S ).'; NumFields = length( FieldNames );
for i = 1 : NumFields,
S.( FieldNames{i} ) = horzcat( S.( FieldNames{i} ), R{i} );
end
end
Then you can simply do:
d = addrow( d, {5, 'i011', 2.7, 10, 11} );
Assuming that idxstr can be more than 3 characters (there is a shorter version of its always 3 chars), this is the thing I came up with (tested on MATLAB):
logical_index=~cellfun(#isempty,strfind({ds2(:).idxstr},'i01'))
you can access the variables as:
ds2(~cellfun(#isempty,strfind({ds2(:).idxstr},'i01'))).var2;
% using above variable
ds2(logical_index).var2;
You can understand now why MATLAB introduced tables hehe.
Maybe you can try the code like below using strcmp
>> [ds2.var2](strcmp('i01',{ds2.idxstr}))
ans = 21
I put together function
function el = struct_pick(s, cdata, cnames, rname)
% Pick an element from a struct by column and row name
coldata = vertcat(s.(cdata));
colnames = mat2cell(vertcat(s.(cnames)), ones(1, length(s)));
% This assumes rname is a string
flt = strcmp(colnames, rname);
el = coldata(logical(flt));
endfunction
which is called with
% Pick an element by column and row name
cdata = 'var3';
cnames = 'idxstr';
rname = 'i01';
elem = struct_pick(ds2, cdata, cnames, rname);
and it seems to do the job.
I don't know if it is an unnecessarily contrived way of doing it.
Still have to deal with the possibility that the row names are not strings, as with
cnames = 'idx';
rname = 1;
EDIT: If the strings in idxstr are not all of the same length, this throws error: vertcat: cat: dimension mismatch.
The answer by Ander Biguri can handle this case.

Assigning values to discontinued slices in a ndarray

I have a base array that contains data. Some indices in that array need to be re-assigned a new value, and the indices which do are discontinued. I'd like to avoid for-looping over all of that and using the slice notation as it's likely to be faster.
For instance:
arr = np.zeros(100)
sl_obj_1 = slice(2,5)
arr[sl_obj_1] = 42
Works for a single slice. But I have another discontinued slice to apply to that same array, say
sl_obj_2 = slice(12,29)
arr[sl_obj_1] = 55
I would like to accomplish something along the lines of:
arr[sl_obj_1, sl_obj_2] = 42, 55
Any ideas?
EDIT: changed example to emphasis that sequences are or varying lenghts.
There isn't a good way to directly extract multiple slices from a NumPy array, much less different-sized slices. But you can cheat by converting your slices into indices, and using an index array.
In the case of 1-dimensional arrays, this is relatively simple using index arrays.
import numpy as np
def slice_indices(some_list, some_slice):
"""Convert a slice into indices of a list"""
return np.arange(len(some_list))[some_slice]
# For a non-NumPy solution, use this:
# return range(*some_slice.indices(len(some_list)))
arr = np.arange(10)
# We'll make [1, 2, 3] and [8, 7] negative.
slice1, new1 = np.s_[1:4], [-1, -2, -3]
slice2, new2 = np.s_[8:6:-1], [-8, -7]
# (Here, np.s_ is just a nicer notation for slices.[1])
# Get indices to replace
idx1 = slice_indices(arr, slice1)
idx2 = slice_indices(arr, slice2)
# Use index arrays to assign to all of the indices
arr[[*idx1, *idx2]] = *new1, *new2
# That line expands to this:
arr[[1, 2, 3, 8, 7]] = -1, -2, -3, -8, -7
Note that this doesn't entirely avoid Python iteration—the star operators still create iterators and the index array is a regular python list. In a case with large amounts of data, this could be noticeably slower than the manual approach, because it calculates each index that will be assigned.
You will also need to make sure the replacement data is already the right shape, or you can use NumPy's manual broadcasting functions (e.g. np.broadcast_to) to fix the shapes. This introduces additional overhead—if you were rely on automatic broadcasting, you're probably better off doing the assignments in a loop.
arr = np.zeros(100)
idx1 = slice_indices(arr, slice(2, 5))
idx2 = slice_indices(arr, slice(12, 29))
new1 = np.broadcast_to(42, len(idx1))
new2 = np.broadcast_to(55, len(idx2))
arr[*idx1, *idx2] = *new1, *new2
To generalize to more dimensions, slice_indices will need to take care of shape, and you'll have to be more careful about joining multiple sets of indices (rather than arr[[i1, i2, i3]], you'll need arr[[i1, i2, i3], [j1, j2, j3]], which can't be concatenated directly).
In practice, if you need to do this a lot, you'd probably be better off using a simple function to encapsulate the loop you're trying to avoid.
def set_slices(arr, *indices_and_values):
"""Set multiple locations in an array to their corresponding values.
indices_and_values should be a list of index-value pairs.
"""
for idx, val in indices_and_values:
arr[idx] = val
# Your example:
arr = np.zeros(100)
set_slices(arr, (np.s_[2:5], 42), (np.s_[12:29], 55))
(If your only goal is making it look like you are using multiple indices simultaneously, here are two functions that try to do everything for you, including broadcasting and handling multidimensional arrays.)
1 np.s_

updating tuple string and how to optimize my code

I have one list like that :
[`('__label__c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('__label__96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]`
I want to replace tuple element string value like this:
'__label__c091cb93-c737-4a67-95d7-49feecc6456c' to 'c091cb93-c737-4a67-95d7-49feecc6456c'
I try this :
l = [('__label__c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('__label__96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]
j = []
for x in l:
for y in x:
if type(y) == str:
z = y.replace('__label__',"")
j.append((z, x[1]))
print(j)
Output:
[('c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]
how to optimize my code in pythonic way and any other way to update tuple value because tuple is immutable
You are right, tuples are immutables in Python, but lists are not. So you should be able to update the list l in-place.
Moreover, it looks like you already know the position of the element you have to modify and the position of the substring you want to remove, so you can avoid one loop and the replace function which will iterate once more over your string.
for i in range(len(l)):
the_tuple = l[i]
if isinstance(the_tuple[0], str) and the_tuple[0].startswith('__label__'):
l[i] = (the_tuple[0][len('__label__'):], the_tuple[1])
# you can also replace "len('__label__')" by "8" to increase performances
# but I think Python already optimizes it
You can use map function:
data = [('__label__c091cb93-c737-4a67-95d7-49feecc6456c', 0.5), ('__label__96693d45-4dec-4b66-a2e2-621329d64b92', 0.498047)]
def f(row): return row[0].replace('__label__', ''), row[1]
print(list(map(f, data)))

Filling a dictionary with lists of values - why is my nested loop only running once?

I'm trying to create a function that takes an array, bins the data in that array (by quantile), and fills a dictionary with the binned data. In the dictionary that gets produced, I want the keys to correspond to bin numbers, and the values to be lists of data from the input array that fall within the jth and (j+1)th bin limits.
Here is my code:
output = []
def binning(array1):
d1 = {} # empty dictionary to fill with lists of values
bin_edges = sp.stats.mstats.mquantiles(array1, prob=[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875,1.00])
j = 0
while j < len(bin_edges):
for i in range(0, len(array1)):
if float(array1[i]) > bin_edges[j] and float(array1[i]) <= bin_edges[j+1]:
output.append(array1[i])
d1["bin_number{0}".format(j)]= output
j+=1
return d1
The problem is, the inner loop only runs once, so I'm getting an output like
d1 = {'bin_number0': [value1, value2, etc.]}.
What I want to see is:
d1 = {'bin_number0': [value1, value2, etc.],'bin_number1': [value3, value4, etc.],'bin_number2': [value5, value6, etc.]}
...and so on, so there are 8 keys corresponding to 8 lists of values.
Can anyone tell me why the inner loop only runs once (for j = 0)? I've looked at it so many times I need a fresh pair of eyes.
return d1 should not be indented into the while loop. Unindent it so that it is indented only once. This is why your code only loops once. Hope this helps!!

How to get keys from nested dictionary of arbitrary length in Python

I have a dictionary object in python. Let's call it as dict. This object could contain another dictionary which may in turn contain another dictionary and so on.
dict = { 'k': v, 'k1': v1, 'dict2':{'k3': v3, 'k4':v4} , 'dict3':{'k5':v5, dict4:{'k6':v6}}}
This is just an example. Length of outermost dictionary could be anything. I want to extract keys from such dictionary object in following two ways :
get list of only keys.
[k,k1,k2,k3,k4,k5,k6]
get list of keys and its parent associated dictionary so something like this :
outer_dict_keys = [k ,dict2, dict3]
dict2_keys = [k3,k4]
dict3_keys = [k5, dict4]
dict4_keys = [k6]
Outermost dictionary dict length is always changing so I can not hard code anything.
What is best way to achieve above result ?
Use a mix of iteration and tail recursion. After quoting undefined names, making spacing uniform, and removing 'k2' from the first result, I came up with the code below. (Written and tested for 3.4, it should run on any 3.x and might on 2.7.) A key thing to remember is that the iteration order of dicts is essentially random, and varies with each run. Recursion as done here visit sub-dicts in depth-first rather than breadth-first order. For dict0, both are the same, But if dict4 were nested in dict2 rather than dict3, they would not be.
dict0 = {'k0': 0, 'k1': 1, 'dict2':{'k3': 3, 'k4': 4},
'dict3':{'k5': 5, 'dict4':{'k6': 6}}}
def keys(dic, klist=[]):
subdics = []
for key in sorted(dic):
val = dic[key]
if isinstance(val, dict):
subdics.append(val)
else:
klist.append(key)
for subdict in subdics:
keys(subdict, klist)
return klist
result = keys(dict0)
print(result, '\n', result == ['k0','k1','k3','k4','k5','k6'])
def keylines(dic, name='outer_dict', lines=[]):
vals = []
subdics = []
for key in sorted(dic):
val = dic[key]
if isinstance(val, dict):
subdics.append((key,val))
else:
vals.append(key)
vals.extend(pair[0] for pair in subdics)
lines.append('{}_keys = {}'.format(name, vals))
for subdict in subdics:
keylines(subdict[1], subdict[0], lines)
return lines
result = keylines(dict0)
for line in result:
print(line,)
print()
expect = [
"outer_dict_keys = ['k0', 'k1', 'dict2', 'dict3']",
"dict2_keys = ['k3', 'k4']",
"dict3_keys = ['k5', 'dict4']",
"dict4_keys = ['k6']"]
for actual, want in zip(result, expect):
if actual != want:
print(want)
for i, (c1, c2) in enumerate(zip(actual, want)):
if c1 != c2:
print(i, c1, c2)

Resources