Given a list of search terms and a pandas dataframe, what is the most pythonic way to print whether the search term is present in the target dataframe?
search_terms = ["red", "blue", "green", "orange"]
input_df looks like...
color count
0 red 15
1 blue 39
2 yellow 40
3 green 21
I want to see...
red = true
blue = true
green = true
orange = false
I know how to filter the input_df to include only search_terms. This doesn't alert me to the fact that "orange" was not located in the input_df. The search_terms could contain hundreds or thousands of strings.
color = ['red', 'blue', 'yellow', 'green']
count = [15,39,40,21]
input_dict = dict(color=color, count=count)
input_df = pd.DataFrame(data=input_dict)
found_df = input_df[input_df['color'].isin(search_terms)]
You can try:
out = dict(zip(search_terms, pd.Series(search_terms).isin(input_df['color'])))
Or:
out = dict(zip(search_terms, np.isin(search_terms, input_df)) )
Output:
{'red': True, 'blue': True, 'green': True, 'orange': False}
I have a List of dictionaries that have key and values and other info, like so:
mylist = [ {'key': 'captial' , 'value': 'captial of india'},
{'key': 'captial' , 'value': 'captial of usa'},
{'key': 'fruit' , 'value': 'colour of apple'},
{'key': 'fruit' , 'value': 'colour of orange'}]
How do I flatten the list to get the below output
result=[{'title':'captial',questions:[{text:'captial of usa'},{text:'captial of india'}]},
{'title':'fruit',questions:[{text:'colour of apple'},{text:'colour of orange'}]}]
You can use defaultdict and a list comprehension to achieve your results. defaultdict will group the values of the same keys present in mylist and then you put to use those grouped values with your custom keys(as required in your result) using a list comprehension.
from collections import defaultdict
mylist = [...]
a_list = defaultdict(list)
for item in mylist:
a_list[item["key"]].append(item["value"])
result = [
{"title": key, "questions": [{"text": v} for v in value]}
for key, value in a_list.items()
]
Output:
[
{
"title": "captial",
"questions": [{"text": "captial of india"}, {"text": "captial of usa"}],
},
{
"title": "fruit",
"questions": [{"text": "colour of apple"}, {"text": "colour of orange"}],
},
]
I have this CSV:
color,property,type,id
red,house,building,02
I'm trying to convert a csv to dictionary with the following structure:
{
"0": {"val1": 1, "val2": 2, "val3": 3, ..., "valn": n},
"1": {"val1": 45, "val2": 7, "val3": None, ..., "valn": 68},
}
Where as val1, val2 and so on are the header names of the columns and "0" and "1" are the number of rows.
So we should have:
CSV content is like this:
color,property,type,id
red,house,building,02
blue,department,flat,04
{
"0": {"color": "red", "property": "house", "type": "building", ..., "valn": n},
"1": {"color": "blue", "property": "farm", "type": "area", ..., "valn": n},
}
How can I achieve this result without using any library? I'd like to implement it from the scratch and don't use CSV library or the like.
Thank you.
Try this approach:
inp = """color,property,type,id
red,house,building,02
blue,department,flat,04
cyan,,flat,10
"""
lines = inp.split('\n')
colnames = list(map(lambda x: x.strip(), lines[0].split(',')))
lines = lines[1:]
res = {}
for i, line in enumerate(lines[:-1]):
res[i] = {
colname: val if val != '' else None
for colname, val in zip(colnames, map(lambda x: x.strip(), line.split(',')))
}
print(res)
However for additional features like type deduction code will be more complex: you can follow answers to this question
question has been delted sorry for inconvenience caused to anyone referiing
Here's a possible solution using a couple of bits of dummy aesthetics.
I called geom_boxplot on a subset of the data to exclude "Diff" and "rare". I gave this a dummy color aesthetic in order to get a legend for the boxplot.
Then I called geom_point on a subset where the data is only "Diff" or "rare". ggplot expects long shaped data, so rather than filtering for values and calling geom_point twice, you want to instead call geom_point once and use an aesthetic to have two different shapes.
The next step was controlling labels to match what you're after and hide the dummies. Adding guide = guide_legend(order = 1) or order = 2 sets the order of the legends, so that the one for Method A comes before the one for Method B.
One drawback is that there are two Method B legends, one for color and one for shape. That's because they have two different sets of levels. A workaround might be possible with interaction(section, Categories) instead.
library(tidyverse)
library(data.table)
df <- structure(
list(Categories = c("Aaas", "Aaas", "Aaas", "Aaas", "Bbbs", "Bbbs", "Bbbs", "Bbbs", "Cccs", "Cccs", "Cccs", "Cccs", "Diffs", "Diffs", "Diffs", "Diffs", "rare", "rare", "rare", "rare"),
section = c("red", "blue", "green", "yellow", "red", "blue", "green", "yellow", "red", "blue", "green", "yellow", "red", "blue", "green", "yellow", "red", "blue", "green", "yellow"),
range = c("top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K", "top1K", "top2K", "top3K", "top4K"),
values = c(20L, 32L, 42L, 32L, 21L, 12L, 14L, 14L, 15L, 13L, 43L, 21L, 2L, 10L, 13L, 11L, 3L, 5L, 7L, 9L)),
class = "data.frame",
row.names = c(NA, -20L),
spec = structure(list(cols = structure(list(dummy = structure(list(), class = c("collector_character", "collector")), Categories = structure(list(), class = c("collector_character", "collector")), section = structure(list(), class = c("collector_character", "collector")), range = structure(list(), class = c("collector_character", "collector")), values = structure(list(), class = c("collector_integer", "collector"))),
.Names = c("dummy", "Categories", "section", "range", "values")), default = structure(list(), class = c("collector_guess", "collector"))), .Names = c("cols", "default"), class = "col_spec"), .Names = c("Categories", "section", "range", "values"))
ggplot(df, aes(x = range, y = values)) +
geom_boxplot(aes(fill = "Type A"), data = df[!df$Categories %in% "Diffs" & !df$Categories %like% "rare", ]) +
geom_jitter(data = df[!df$Categories %in% "Diffs" & !df$Categories %like% "rare", ]) +
geom_point(aes(color = Categories, shape = section), data = df[df$Categories %in% "Diffs" | df$Categories %like% "rare", ]) +
scale_fill_manual(values = "white", guide = guide_legend(order = 1)) +
scale_color_discrete(guide = guide_legend(order = 2), labels = c("Type B", "Type C")) +
labs(fill = "Method A", color = "Method B", shape = "Method B")
Created on 2018-04-22 by the reprex package (v0.2.0).
I'm trying to get the dictionary with the biggest value in the key
'points'.
So, I have the following list and I'm trying to create a function where the output will be the item from the list (in this case a dictionary) with the biggest value in the key points).
[ {"name":"John","points":4} , {"name": "Michael", "points":10} ]
I want the output to be:
{"name": "Michael", "points":10}
I 'm not posting any code because I have no idea of how to do this.
Thank you for the help!
Try the following:
max(scores, key=lambda item: item['points'])
>>> scores = [ {"name":"John","points":4} , {"name": "Michael", "points":10} ]
>>> max(scores, key=lambda item: item['points'])
{'points': 10, 'name': 'Michael'}
>>>