How to Fix VAR functions in a Switch Statement in Power BI - switch-statement

Newbie here. I'm trying to set up a return value called "Label" based on two criteria in Power BI. We have 23 countries in our company. If they meet 3 specific countries, I then want the label to be called "Cars" as notated below, if they do NOT meet these countries, I then essentially want to give it an alternate label based on my layer 2 name criteria.
The problem is that SOME of these managers also have headcount in these 3 countries that could be Photos or Cameras, but want them to say Cars instead.
It's just not working for me, unfortunately. Any help would be greatly appreciated.
Here is the DAX I am trying to get right:
Label =
VAR _Country = SWITCH(
TRUE (),
ActiveHC[Country Name] = "Turkey", "Cars",
ActiveHC[Country Name] = "Greece", "Cars",
ActiveHC[Country Name] = "Italy", "Cars",
Blank()
)
VAR _Segment = SWITCH(
True(),
ActiveHC[Layer 2] = "Beth", "Corporate",
ActiveHC[Layer 2] = "Joanie", "Corporate",
ActiveHC[Layer 2] = "Dan", "Corporate",
ActiveHC[Layer 2] = "Bill", "Corporate",
ActiveHC[Layer 2] = "Christina", "Corporate",
ActiveHC[Layer 2] = "Steven", "Cars",
ActiveHC[Layer 2] = "Bobby", "Audio",
ActiveHC[Layer 2] = "Matt", "Photos",
ActiveHC[Layer 2] = "Peter", "Photos",
ActiveHC[Layer 2] = "Edward", "Photos",
ActiveHC[Layer 2] = "Joey", "Software",
ActiveHC[Business Unit] = "Cameras", "Cameras",
BLANK ()
)
RETURN IF(ISBLANK(_Country), _Segment, _Country)
I essentially want to be able to create a table that says this:
Label Count
Cars 7
Imaging 1
Audio 1
Corporate 1
Software 1
Photos 1
I also want to use the field as a filter and also add this calculated column in a table with other data as a records export.
Any help would be much appreciated. Thanks so much!!!
![Data Sample]https://imgur.com/a/Q5ZTgR9

Create a calculated column in your model like so:
Country Name Filtered =
SWITCH (
TRUE (),
ActiveHC[Country Name] = "Turkey", "Cars",
ActiveHC[Country Name] = "Greece", "Cars",
ActiveHC[Country Name] = "Italy", "Cars",
BLANK ()
)
Then, create another calculated column called segment:
Segment =
SWITCH (
TRUE (),
ActiveHC[Layer 2] = "Beth", "Corporate",
ActiveHC[Layer 2] = "Joanie", "Corporate",
ActiveHC[Layer 2] = "Dan", "Corporate",
ActiveHC[Layer 2] = "Bill", "Corporate",
ActiveHC[Layer 2] = "Christina", "Corporate",
ActiveHC[Layer 2] = "Steven", "Cars",
ActiveHC[Layer 2] = "Bobby", "Audio",
ActiveHC[Layer 2] = "Matt", "Photos",
ActiveHC[Layer 2] = "Peter", "Photos",
ActiveHC[Layer 2] = "Edward", "Photos",
ActiveHC[Layer 2] = "Joey", "Software",
ActiveHC[Business Unit] = "Cameras", "Cameras",
BLANK ()
)
Combine these columns into another, call it blended or something:
Blended =
IF ( ISBLANK ( Table[Country Name Filtered] ), Table[Segment], Table[Country Name Filtered] )
Now you can reference these values in a slicer or measure much easier:
Count Values :=
COUNT ( Table[Blended] )
So you would basically create a table or matrix visual using Table[Blended] as your row values and [Count Values] as your measure.
Not an elegant solution but it will work.
Hope it helps!!!

Related

Using groupBy to generate an excel with two sheet and two datasets

I have two datasets df1 and df2
my goal is to create an excel-file with fruit name and inside file I want to create two sheets with customer details and second sheet with vender details.
df1 = pd.DataFrame({
"Fruit": ["apple", "orange", "banana", "apple", "orange"],
"customerName": ["John", "Sam", "David", "Rebeca", "Sydney"],
"customerID": [877, 546, 767, 887, 890],
"PurchasePrice": [1, 2, 5, 6, 4]})
df2 = pd.DataFrame({
"Fruit": ["apple", "orange", "banana", "apple", "orange"],
"VenderName": ["share", "cami", "sniff", "tom", "Adam"],
"VenderID": [0091, 0092, 0094, 0097, 0076]})
I know how to do groupby with on dataset and generate a file.
grouped = df.groupby("Fruit")
# run this to generate separate Excel files
for fruit, group in grouped:
group.to_excel(excel_writer=f"{fruit}.xlsx", sheet_name= customer, index=False)
Could please help to in solving this issue.
Use ExcelWriter:
from pandas import ExcelWriter
fruits = set(df1["Fruit"].unique().tolist() + df2["Fruit"].unique().tolist())
for fruit in fruits:
sheets = {
"Customer": df1.loc[df1["Fruit"].eq(fruit)],
"Vendor": df2.loc[df2["Fruit"].eq(fruit)]
}
with ExcelWriter(f"{fruit}_.xlsx") as writer:
for sh_name, table in sheets.items():
table.to_excel(writer, sheet_name=sh_name, index=False)

How to filter a 2 level list based on values within sublists?

Think that the following list is a table, where sublist[0] contains the column headers.
data = [
['S1', 'S2 ', 'ELEMENT', 'C1', 'C2'],
['X' , 'X' , 'GRT' , 1, 4 ],
['' , 'X' , 'OIP' , 3, 2 ],
['' , 'X' , 'LKJ' , 2, 7 ],
['X' , '' , 'UBC' , 1, 0 ]
]
I'm trying to filter the list based on the values in "column S1" and "column S2".
I want to get:
a new list "S1" containing the sublists that has an "X" in "column S1"
a new list "S2" containing the sublists that has an "X" in "column S2"
Like this:
S1 = [
['ELEMENT', 'C1', 'C2'],
['GRT', 1, 4 ],
['UBC', 1, 0 ]
]
S2 = [
['ELEMENT', 'C1', 'C2'],
['GRT', 1, 4 ],
['OIP', 3, 2 ],
['LKJ', 2, 7 ]
]
Below I show the code I have so far, where I make a copy of source list data an then check which sublist doesn't have "X" in "column S1". I get correct content in new list S1,
but I don't know why the source list data is being modified and I cannot use it to get new list S2.
S1 = data
for sublist in S1[1:]:
if sublist[0] != "X":
s1.remove(sublist)
s2 = data
for sublist in S2[1:]:
if sublist[1] != "X":
s2.remove(sublist)
>>> data
[['S1', 'S2 ', 'ELEMENT', 'C1', 'C2'], ['X', 'X', 'GRT', 1, 4], ['X', '', 'UBC', 1, 0]]
>>> S1
[['S1', 'S2 ', 'ELEMENT', 'C1', 'C2'], ['X', 'X', 'GRT', 1, 4], ['X', '', 'UBC', 1, 0]]
>>>
How would be a better way to get lists S1 and S2? Thanks.
Your problem is because simply assigning the list to a new name does not make a copy.
You might be able to make your solution work by doing
S1 = data[:] # slicing makes a copy
S2 = data[:]
instead.
Here's a generic solution:
def split_from_columns(ls, i_columns=(), indicator='X'):
for i in i_columns:
yield [
[v for k, v in enumerate(sl) if k not in i_columns]
for j, sl in enumerate(ls)
if j == 0 or sl[i] == indicator
]
Usage:
>>> S1, S2 = split_from_columns(data, i_columns=(0, 1))
>>> S1
[['ELEMENT', 'C1', 'C2'], ['GRT', 1, 4], ['UBC', 1, 0]]
>>> S2
[['ELEMENT', 'C1', 'C2'], ['GRT', 1, 4], ['OIP', 3, 2], ['LKJ', 2, 7]]
The if j == 0 part makes sure we always copy the header. You can change i_columns to adjust where the indicator columns are.

Find a difference between two DataFrames with the same shape

I am trying to find a difference between two excel files with the number of rows. I first want to sort both workbooks on two column then output a third file with the differences. I'm having trouble exporting a difference file properly.
Any help is highly appreciated!!! Thanks in advance!
import pandas as pd
df1 = pd.DataFrame({
'ID' : ['3', '3', '55','55', '66', '66'],
'date' : [20180102, 20180103, 20180104, 20180105, 20180106, 20180107],
'age': [0, 1, 9, 4, 2, 3],
})
df2 = pd.DataFrame({
'ID' : ['3', '55', '3','66', '55', '66'],
'date' : [20180103, 20180104, 20180102, 20180106, 20180105, 20180107],
'age': [0, 1, 9, 9, 8, 7],
})
df3 = df1.sort_values(by= ['ID', 'date'] , ascending=False)
df4 = df2.sort_values(by= ['ID', 'date'] , ascending=False)
dfDiff = df3.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_old = df3.iloc[row,col]
value_new = df4.iloc[row,col]
if value_old == value_new:
dfDiff.iloc[row,col] = df4.iloc[row,col]
else:
dfDiff.iloc[row,col] = ('{}->{}').format(value_old,value_new)
writer = pd.ExcelWriter('diff', engine='xlsxwriter')
dfDiff.to_excel(writer, sheet_name='DIFF', index= False)
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)
writer.save()
I think you are only missing the .xlsx at the end of your file path
df1 = pd.DataFrame({
'ID' : ['3', '3', '55','55', '66', '66'],
'date' : [20180102, 20180103, 20180104, 20180105, 20180106, 20180107],
'age': [0, 1, 9, 4, 2, 3],
})
df2 = pd.DataFrame({
'ID' : ['3', '55', '3','66', '55', '66'],
'date' : [20180103, 20180104, 20180102, 20180106, 20180105, 20180107],
'age': [0, 1, 9, 9, 8, 7],
})
df3 = df1.sort_values(by= ['ID', 'date'] , ascending=False)
df4 = df2.sort_values(by= ['ID', 'date'] , ascending=False)
dfDiff = df3.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_old = df3.iloc[row,col]
value_new = df4.iloc[row,col]
if value_old == value_new:
dfDiff.iloc[row,col] = df4.iloc[row,col]
else:
dfDiff.iloc[row,col] = ('{}->{}').format(value_old,value_new)
# added `.xlsx' to path here
writer = pd.ExcelWriter('diff.xlsx', engine='xlsxwriter')
dfDiff.to_excel(writer, sheet_name='DIFF', index= False)
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)
writer.save()

Mapping lists to individual dictionary keys and values

I have 5 lists. I would like to map these 5 lists to a list of dictionaries, each one with a key/value pair from each of the 5 lists for every dictionary instance [n]. My first thought is to set up a loop to enumerate each occurrence of a dictionary in the list of dictionaries, but not sure what that might look like. Any thoughts?
name = ["John", "Sally", "Allen", "Nick", "Charles", "Richie", "Derek"]
age = [21, 36, 33, 29, 40, 18, 35]
hometown = ["New York", "Washington", "Philadelphia", "Atlanta", "Miami", "LA", "Seattle"]
favorite_food = ["chicken", "steak", "spaghetti", "fish", "oreos", "hamburger", "cereal"]
pet = ["cat", "fish", "dog", "hamster", "dog", "cat", "snake"]
list of dictionaries such that
D[0]={'name':'John', 'age':'21', 'hometown': 'New York', 'favorite_food':
'chicken', 'pet': 'cat'}
You can use the built-in function zip and list/dict comprehensions for this:
name = ["John", "Sally", "Allen", "Nick", "Charles", "Richie", "Derek"]
age = [21, 36, 33, 29, 40, 18, 35]
hometown = ["New York", "Washington", "Philadelphia", "Atlanta", "Miami", "LA",
"Seattle"]
favorite_food = ["chicken", "steak", "spaghetti", "fish", "oreos", "hamburger", "cereal"]
pet = ["cat", "fish", "dog", "hamster", "dog", "cat", "snake"]
fields = ["name", "age", "hometown", "favourite_food", "pet"]
zipped = zip(name, age, hometown, favorite_food, pet)
d = [{k: v for k, v in zip(fields,el)} for el in zipped]
The zip function will allow you to "pair" up or tuple up several lists.
For the first three attributes, you can do this to get a tuple:
>>> for i in zip(name, age, hometown):
... print(i)
...
('John', 21, 'New York')
('Sally', 36, 'Washington')
('Allen', 33, 'Philadelphia')
('Nick', 29, 'Atlanta')
('Charles', 40, 'Miami')
('Richie', 18, 'LA')
('Derek', 35, 'Seattle')
If you make a list
L = []
you can add dictionaries to it:
>>> L=[]
>>> for i in zip(name, age, hometown):
... d = {}
... d['name']=t[0]
... d['age']=t[1]
... d['hometown']=t[2]
... L.append(d)
...
That's for the first three - extending to the whole lot should be clear.

Basemap - draw points on map depending on coordinates; dot size = number of occurences

I have a dataset like the following:
import pandas as pd
import numpy as np
df = pd.DataFrame({
# some ways to create random data
'Name of City':np.random.choice(["City A", 'City B', 'City C', "City D", "City E", "City F", "City G"], 22),
'Name of Country':np.random.choice(["Country A", "Country B", "Country C"], 22),
'lat':np.random.choice([-41, -20, 1, 19, 34, 66, 81], 22),
'lon': np.random.choice([- 10, 10, 4, 1, -20, 60, 0], 22)
})
where the lat/ lon denotes coordinates and the name of the city denotes the belonging city.
I would like to plot the city coordinates on a world map using the coordinates - with the dot size depending on the number of ocurrences of this city in my data set but don't know how to best go about it.
Based on this code
for idx, row in df.iterrows():
x, y = row[['lon','lat']]
plt.annotate(
str(idx),
xy = (x, y), xytext = (-20, 20),
textcoords = 'offset points', ha = 'right', va = 'bottom',
bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
plt.show()
I managed to plot the dots somehow but cannot figure out how to put them on a map. Can someone point me in the right direction?
Many thanks in advance!
I was not quite clear on how your coordinates should relate to your city names, but assumed that the same coordinate pair should be used for each time a certain city is mentioned. Based on this I took a little bit of freedom how to generate a database that fulfils these requirements and how to extract data from it. The rest is more or less straight forward using Basemap:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits import basemap
cities = pd.DataFrame({
'city': ["City A", 'City B', 'City C', "City D", "City E", "City F", "City G"],
'lat': [-41, -20, 1, 19, 34, 66, 81],
'lon': [- 10, 10, 4, 1, -20, 60, 0],
})
print(cities)
choices = np.random.choice(range(len(cities.lat)),22)
print(choices)
counts = np.array([list(choices).count(i) for i in range(len(cities.lat))])
print(counts)
fig, ax = plt.subplots()
bmap = basemap.Basemap(ax = ax)
bmap.drawcountries()
bmap.drawcoastlines()
x,y = bmap(cities.lon, cities.lat)
ax.scatter(x, y, s=(2*counts)**2, c='r', label=cities.city)
for idx, row in cities.iterrows():
x, y = bmap(*row[['lon','lat']])
plt.annotate(
str(idx),
xy = (x, y), xytext = (-20, 20),
textcoords = 'offset points', ha = 'right', va = 'bottom',
bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
plt.show()
The resulting image looks something like this:

Resources