Related
I have a list like this:
[ '0D',
'0A,0C',
'0C,0A',
'0C,0E,0D,0F',
'0C,0D,0E,0F',
'0B,0G',
'0B,0F'
]
In this list '0A,0C' and '0C,0A'.Also '0C,0E,0D,0F' &
'0C,0D,0E,0F' are similar. How to get the unique items from a list like this. I tried set but I guess the functionality of set is a bit different.
´set´ is good, if you use ´split´ first:
l = ['0D', '0A,0C', '0C,0A', '0C,0E,0D,0F', '0C,0D,0E,0F', '0B,0G', '0B,0F']
for i in range(len(l)):
l[i] = ','.join(sorted(l[i].split(',')))
l = set(l)
# {'0A,0C', '0B,0F', '0B,0G', '0C,0D,0E,0F', '0D'}
I have some names and scores as follows
input = {
'Maths': dict(Mohsen=19, Sadegh=18, Hafez=15),
'Physics': dict(Sadegh=16, Hafez=17, Mohsen=17),
'Chemistry': dict(Hafez=13),
'Literature': dict(Sadegh=14),
'Biology': dict(Mohsen=16, Sadegh=10),
}
if a person don't have any lesson its score consider zero also get avrege of scores's person and sort final list by averge and i want to get an output like this.
answer = [
dict(Name='Sadegh', Literature=14, Chemistry=0, Maths=18, Physics=16, Biology=10, Average=11.6),
dict(Name='Mohsen', Maths=19, Physics=17, Chemistry=0, Biology=16, Literature=0, Average=10.4),
dict(Name='Hafez', Chemistry=13, Biology=0, Physics=17, Literature=0, Maths=15, Average=9),
]
how to do it?
Essentially, you have a dictionary, where the information is arranged based on subjects, where for each subject, you have student marks. You want to collection all information related to each student in separate dictionaries.
One of the approaches which can try, is as below:
Try converting the data which you have into student specific data and then you can calculate the Average of the Marks of all subjects for that student. There is a sample code below.
Please do note that, this is just a sample and you should be trying
out a solution by yourself. There are many alternate ways of doing it and you should explore them by yourself.
The below code works with Python 2.7
from __future__ import division
def convert_subject_data_to_student_data(subject_dict):
student_dict = {}
for k, v in subject_dict.items():
for k1, v1 in v.items():
if k1 not in student_dict:
student_dict[k1] = {k:v1}
else:
student_dict[k1][k] = v1
student_list = []
for k,v in student_dict.items():
st_dict = {}
st_dict['Name'] = k
st_dict['Average'] = sum(v.itervalues()) / len(v.keys())
st_dict.update(v)
student_list.append(st_dict)
print student_list
if __name__ == "__main__":
subject_dict = {
'Maths': dict(Mohsen=19, Sadegh=18, Hafez=15),
'Physics': dict(Sadegh=16, Hafez=17, Mohsen=17),
'Chemistry': dict(Hafez=13),
'Literature': dict(Sadegh=14),
'Biology': dict(Mohsen=16, Sadegh=10),
}
convert_subject_data_to_student_data(subject_dict)
sample_input = {
'Maths': dict(Mohsen=19, Sadegh=18, Hafez=15),
'Physics': dict(Sadegh=16, Hafez=17, Mohsen=17),
'Chemistry': dict(Hafez=13),
'Literature': dict(Sadegh=14),
'Biology': dict(Mohsen=16, Sadegh=10),
}
def foo(lessons):
result = {}
for lesson in lessons:
for user in lessons[lesson]:#dictionary
if result.get(user):
#print(result.get(user))
result.get(user).setdefault(lesson, lessons[lesson].get(user,0))
else:
result.setdefault(user, dict(name=user))
result.get(user).setdefault(lesson,lessons[lesson].get(user,0))
#return list(result.values())
return result.values()
#if name == '__main__':
print(foo(sample_input))
I have a list of database names and I want to exclude those which start with postgres.
So if I have [ "postgres", "post", "postgres2", "custom1", "custom2" ]
the result should be [ "post", "custom1", "custom2" ]
I tried two different variants but both did not yield the result I was looking for:
either:
f_dbs = [d for d in all_dbs if not d.startswith("postgres")]
or:
f_dbs = list(filter(lambda d: not d.startswith("postgres"), all_dbs))
f_dbs_str = "\n".join(f_dbs)
print(f"Postgres databases to drop:\n{f_dbs_str}")
Both dont exclude anything from the list.
How should I write this?
Edit:
I updated the question with the additional usage of the filtered list, the output always prints postgres as well.
Edit:
I found the problem, all the items in the list had a leading whitespace, after striping all those, it works as expected.
>>> all_dbs = [ "postgres", "post", "postgres2", "custom1", "custom2" ]
>>> [d for d in all_dbs if not d.startswith('postgres')]
['post', 'custom1', 'custom2']
Your first solution works for me. You just need to set it to a variable:
>>> filtered_dbs = [d for d in all_dbs if not d.startswith('postgres')]
>>> filtered_dbs
['post', 'custom1', 'custom2']
The first of those methods create a new list, rather than modifying the original and the second creates an iterator which you can convert to a list fairly easily.
list_of_dbs = [ "postgres", "post", "postgres2", "custom1", "custom2" ]
filtered_list = [item for item in list_of_dbs if not item.startswith("postgres")]
print(filtered_list)
>>> ['post', 'custom1', 'custom2']
filter_iterator = filter(lambda d: not d.startswith("postgres"), list_of_dbs)
print(filter_iterator)
>>><filter object at 0x10339d470>
print(list(filter_iterator))
>>>['post', 'custom1', 'custom2']
I have a use case where I have multiple line plots (with legends), and I need to update the line plots based on a column condition. Below is an example of two data set, based on the country, the column data source changes. But the issue I am facing is, the number of columns is not fixed for the data source, and even the types can vary. So, when I update the data source based on a callback when there is a new country selected, I get this error:
Error: attempted to retrieve property array for nonexistent field 'pay_conv_7d.content'.
I am guessing because in the new data source, the pay_conv_7d.content column doesn't exist, but in my plot those lines were already there. I have been trying to fix this issue by various means (making common columns for all country selection - adding the missing column in the data source in callback, but still get issues.
Is there any clean way to have multiple line plots updating using callback, and not do a lot of hackish way? Any insights or help would be really appreciated. Thanks much in advance! :)
def setup_multiline_plots(x_axis, y_axis, title_text, data_source, plot):
num_categories = len(data_source.data['categories'])
legends_list = list(data_source.data['categories'])
colors_list = Spectral11[0:num_categories]
# xs = [data_source.data['%s.'%x_axis].values] * num_categories
# ys = [data_source.data[('%s.%s')%(y_axis,column)] for column in data_source.data['categories']]
# data_source.data['x_series'] = xs
# data_source.data['y_series'] = ys
# plot.multi_line('x_series', 'y_series', line_color=colors_list,legend='categories', line_width=3, source=data_source)
plot_list = []
for (colr, leg, column) in zip(colors_list, legends_list, data_source.data['categories']):
xs, ys = '%s.'%x_axis, ('%s.%s')%(y_axis,column)
plot.line(xs,ys, source=data_source, color=colr, legend=leg, line_width=3, name=ys)
plot_list.append(ys)
data_source.data['plot_names'] = data_source.data.get('plot_names',[]) + plot_list
plot.title.text = title_text
def update_plot(country, timeseries_df, timeseries_source,
aggregate_df, aggregate_source, category,
plot_pay_7d, plot_r_pay_90d):
aggregate_metrics = aggregate_df.loc[aggregate_df.country == country]
aggregate_metrics = aggregate_metrics.nlargest(10, 'cost')
category_types = list(aggregate_metrics[category].unique())
timeseries_df = timeseries_df[timeseries_df[category].isin(category_types)]
timeseries_multi_line_metrics = get_multiline_column_datasource(timeseries_df, category, country)
# len_series = len(timeseries_multi_line_metrics.data['time.'])
# previous_legends = timeseries_source.data['plot_names']
# current_legends = timeseries_multi_line_metrics.data.keys()
# common_legends = list(set(previous_legends) & set(current_legends))
# additional_legends_list = list(set(previous_legends) - set(current_legends))
# for legend in additional_legends_list:
# zeros = pd.Series(np.array([0] * len_series), name=legend)
# timeseries_multi_line_metrics.add(zeros, legend)
# timeseries_multi_line_metrics.data['plot_names'] = previous_legends
timeseries_source.data = timeseries_multi_line_metrics.data
aggregate_source.data = aggregate_source.from_df(aggregate_metrics)
def get_multiline_column_datasource(df, category, country):
df_country = df[df.country == country]
df_pivoted = pd.DataFrame(df_country.pivot_table(index='time', columns=category, aggfunc=np.sum).reset_index())
df_pivoted.columns = df_pivoted.columns.to_series().str.join('.')
categories = list(set([column.split('.')[1] for column in list(df_pivoted.columns)]))[1:]
data_source = ColumnDataSource(df_pivoted)
data_source.data['categories'] = categories
Recently I had to update data on a Multiline glyph. Check my question if you want to take a look at my algorithm.
I think you can update a ColumnDataSource in three ways at least:
You can create a dataframe to instantiate a new CDS
cds = ColumnDataSource(df_pivoted)
data_source.data = cds.data
You can create a dictionary and assign it to the data attribute directly
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [17.0, 116.0], [17.0, 126.0]],
'ys1': [[179.0, 169.0], [179.0, 1169.0], [1729.0, 169.0]],
'xs2': [[27.0, 276.0], [27.0, 216.0], [27.0, 226.0]],
'ys2': [[279.0, 269.0], [279.0, 2619.0], [2579.0, 2569.0]]
}
data_source.data = d
Here if you need different sizes of columns or empty columns you can fill the gaps with NaN values in order to keep column sizes. And I think this is the solution to your question:
import numpy as np
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [np.nan], [np.nan]],
'ys1': [[179.0, 169.0], [np.nan], [np.nan]],
'xs2': [[np.nan], [np.nan], [np.nan]],
'ys2': [[np.nan], [np.nan], [np.nan]]
}
data_source.data = d
Or if you only need to modify a few values then you can use the method patch. Check the documentation here.
The following example shows how to patch entire column elements. In this case,
source = ColumnDataSource(data=dict(foo=[10, 20, 30], bar=[100, 200, 300]))
patches = {
'foo' : [ (slice(2), [11, 12]) ],
'bar' : [ (0, 101), (2, 301) ],
}
source.patch(patches)
After this operation, the value of the source.data will be:
dict(foo=[11, 22, 30], bar=[101, 200, 301])
NOTE: It is important to make the update in one go to avoid performance issues
Could you help me wiht my issue ? Let's say that I have few list with ID's their members, like below:
team_A = [1,2,3,4,5]
team_B = [6,7,8,9,10]
team_C = [11,12,13,14,15]
and now I have a dictionary with their values:
dictionary = {5:23, 10:68, 15:68, 4:1, 9:37, 14:21, 3:987, 8:3, 13:14, 2:98, 7:74, 12:47, 1:37, 6:82, 11:99}
I would like to take correct elements from dictionary and create new dictionary for team A, B and C, like below:
team_A_values = {5:23, 4:1, 3:987, 2:98, 1:37}
Could you give advice how to do that ? Thanks for your help
You can do something like below by just Iterating through the lists
team_A = [1,2,3,4,5]
team_B = [6,7,8,9,10]
team_C = [11,12,13,14,15]
dictionary = {5:23, 10:68, 15:68, 4:1, 9:37, 14:21, 3:987, 8:3, 13:14, 2:98, 7:74, 12:47, 1:37, 6:82, 11:99}
team_A_values = {}
for i in team_A:
team_A_values[i] = dictionary[i]
print(team_A_values )
can repeat this to team B and team C
in that case you can do like this
team_values = [{i: dictionary[i] for i in team_A },{i: dictionary[i] for i in team_B},{i: dictionary[i] for i in team_C}]
teamA,teamB,teamC = team_values
print(team_values)
print(teamA)
print(teamB)
print(teamC)
in one line you can do like this
team_values = [{i: dictionary[i] for i in team } for team in [team_A ,team_B ,team_C]]
teamA,teamB,teamC = team_values
print(team_values)
print(teamA)
print(teamB)
print(teamC)