Check the missing integers from a range in Python - python-3.x

Given a building infos dataframe as follows:
id floor type
0 1 13 office
1 2 12 office
2 3 9 office
3 4 9 office
4 5 7 office
5 6 6 office
6 7 9 office
7 8 5 office
8 9 5 office
9 10 5 office
10 11 4 retail
11 12 3 retail
12 13 2 retail
13 14 1 retail
14 15 -1 parking
15 16 -2 parking
16 17 13 office
I want to check if in the column floor, there are missing floors (except for floor 0, which is by default not existing).
Code:
set(df['floor'])
Out:
{-2, -1, 1, 2, 3, 4, 5, 6, 7, 9, 12, 13}
For example, for the dataset above (-2, -1, 1, 2, ..., 13), I want to return an indication floor 8, 10, 11 are missing in your dataset. Otherwise, just returns no missing floor in your dataset.
How could I do that in Pandas or Numpy? Thanks a lot for your help at advance.

Use np.setdiff1d for difference with range created np.arange and omitted 0:
arr = np.arange(df['floor'].min(), df['floor'].max() + 1)
arr = arr[arr != 0]
out = np.setdiff1d(arr, df['floor'])
out = ('no missing floor in your dataset'
if len(out) == 0
else f'floor(s) {", ".join(out.astype(str))} are missing in your dataset')
print (out)
floor(s) 8, 10, 11 are missing in your dataset

Related

Spotfire calculate the difference of values in a single column based on date for each group

I have a requirement of calculating the difference in Spotfire for values in a single column based on the occurrence.
the data looks like
ID
Value
Date
1
7
07/01/2021
1
8
09/01/2021
1
10
10/01/2021
1
15
11/01/2021
1
6
12/01/2021
1
3
15/01/2021
2
10
07/01/2021
2
11
08/01/2021
2
12
09/01/2021
the expected output is
ID
Value
Date
Flag
1
7
07/01/2021
True
1
8
09/01/2021
1
10
10/01/2021
1
15
11/01/2021
1
6
12/01/2021
1
3
15/01/2021
2
10
07/01/2021
False
2
11
08/01/2021
2
12
09/01/2021
the logic is
we need to find the flag by comparing the value of first received and latest received for each id.
for id 1 first received value is 7
for id 1 latest received value is 3
is 7>3 True .
for easy understanding i had sorted the id column.
Thanks in Advance.
Hi I think you can do this in two step:
First step :
Concatenate(String([Value])) OVER ([ID])
Will say you name this column : test
Which will give you (7,8,10,15,6,3)
Second step :
Integer(left([test],1)) > Integer(right([test],1))
ID
Value
Date
test
flag
1
7
07/01/2021
7, 8, 10, 15, 6, 3
True
1
8
09/01/2021
7, 8, 10, 15, 6, 3
True
1
10
10/01/2021
7, 8, 10, 15, 6, 3
True
1
15
11/01/2021
7, 8, 10, 15, 6, 3
True
1
6
12/01/2021
7, 8, 10, 15, 6, 3
True
1
3
15/01/2021
7, 8, 10, 15, 6, 3
True
2
10
07/01/2021
10, 11, 12
False
2
11
08/01/2021
10, 11, 12
False
2
12
09/01/2021
10, 11, 12
False

Pandas Min and Max Across Rows

I have a dataframe that looks like below. I want to get a min and max value per city along with the information about which products were ordered min and max for that city. Please help.
Dataframe
db.min(axis=0) - min value for each column
db.min(axis=1) - min value for each row
use Dataframe.min and Datafram.max
DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
DataFrame.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
matrix = [(22, 16, 23),
(33, 50, 11),
(44, 34, 11),
(55, 35, 60),
(66, 36, 13)
]
dfObj = pd.DataFrame(matrix, index=list('abcde'), columns=list('xyz'))
x y z
a 22 16.0 23.0
b 33 50 11.0
c 44 34.0 11.0
d 55 35.0 60
e 66 36.0 13.0
Get a series containing the minimum value of each row
minValuesObj = dfObj.min(axis=1)
print('minimum value in each row : ')
print(minValuesObj)
output
minimum value in each row :
a 16.0
b 11.0
c 11.0
d 35.0
e 13.0
dtype: float64
MMT Marathi, based on the answers provided by Danil and Sutharp777, you should be able to get to your answer. However, I see you have questions for them. Not sure if you are looking for a column to be created that has the min/max value for each row.
Here's the full dataframe with the solution. I am merely compiling the answers they have already given
import pandas as pd
d = [['20in Monitor',2,2,1,2,2,2,2,2,2],
['27in 4k Gaming Monitor',2,1,2,2,1,2,2,2,2],
['27in FHD Monitor',2,2,2,2,2,2,2,2,2],
['34in Ultrawide Monitor',2,1,2,2,2,2,2,2,2],
['AA Batteries (4-pack)',5,5,6,7,6,6,6,6,5],
['AAA Batteries (4-pack)',7,7,8,8,9,7,8,9,7],
['Apple Airpods Headphones',2,2,3,2,2,2,2,2,2],
['Bose SoundSport Headphones',2,2,2,2,3,2,2,3,2],
['Flatscreen TV',2,1,2,2,2,2,2,2,2]]
c = ['Product','Atlanta','Austin','Boston','Dallas','Los Angeles',
'New York City','Portland','San Francisco','Seattle']
df = pd.DataFrame(d,columns=c)
df['min_value'] = df.min(axis=1)
df['max_value'] = df.max(axis=1)
print (df)
The output of this will be:
Product Atlanta Austin ... Seattle min_value max_value
0 20in Monitor 2 2 ... 2 1 2
1 27in 4k Gaming Monitor 2 1 ... 2 1 2
2 27in FHD Monitor 2 2 ... 2 2 2
3 34in Ultrawide Monitor 2 1 ... 2 1 2
4 AA Batteries (4-pack) 5 5 ... 5 5 7
5 AAA Batteries (4-pack) 7 7 ... 7 7 9
6 Apple Airpods Headphones 2 2 ... 2 2 3
7 Bose SoundSport Headphones 2 2 ... 2 2 3
8 Flatscreen TV 2 1 ... 2 1 2
If you want the min and max of each column, then you can do this:
print ('min of each column :', df.min(axis=0).to_list()[1:])
print ('max of each column :', df.max(axis=0).to_list()[1:])
This will give you:
min of each column : [2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2]
max of each column : [7, 7, 8, 8, 9, 7, 8, 9, 7, 7, 9]

pd.Series(pred).value_counts() how to get the first column in dataframe?

I apply pd.Series(pred).value_counts() and get this output:
0 2084
-1 15
1 13
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
dtype: int64
When I create a list I get only the second column:
c_list=list(pd.Series(pred).value_counts()), Out:
[2084, 15, 13, 10, 7, 4, 3, 3, 3, 2, 2, 2, 2]
How do I get ultimately a dataframe that looks like this including a new column for size% of total size?
df=
[class , size ,relative_size]
0 2084 , x%
-1 15 , y%
1 13 , etc.
3 10
4 7
6 4
11 3
8 3
2 3
9 2
7 2
5 2
10 2
You are very nearly there. Typing this in the blind as you didn't provide a sample input:
df = pd.Series(pred).value_counts().to_frame().reset_index()
df.columns = ['class', 'size']
df['relative_size'] = df['size'] / df['size'].sum()

Create a dataset from two different dataset

I have two different data set:
1. state VDM MDM OM
AP 1 2 5
GOA 1 2 1
GU 1 2 4
KA 1 5 1
2. Attribute:Value Support Item
VDM:1 4 1
VDM:2 0 2
VDM:3 0 3
VDM:4 0 4
VDM:5 0 5
MDM:1 0 6
MDM:2 3 7
MDM:3 0 8
MDM:4 0 9
MDM:5 1 10
OM:1 2 11
OM:2 0 12
OM:3 0 13
OM:4 1 14
OM:5 1 15
The first dataset only contains 1-5 values.
The second dataset holds the Attribute:Value pair and it's occurrences and a sequence number (Item).
I want a Dataset which looks like below:
state Item Number
AP 1, 7, 15
GOA 1, 7, 11
GU 1, 7, 14
KA 1, 10, 11
None of these are really appealing to me. But sometimes you just have to thrash about to get your data munged.
Attempt #0
a = dict(zip(df2['Attribute:Value'], df2['Item']))
cols = ['VDM', 'MDM', 'OM']
b = {
'Item Number':
[', '.join([str(a[f'{c}:{t._asdict()[c]}']) for c in cols]) for t in df1.itertuples()]
}
df1[['state']].assign(**b)
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #1
a = dict(zip(df2['Attribute:Value'], df2['Item'].astype(str)))
d1 = df1.set_index('state').astype(str)
r1 = (d1.columns + ':' + d1).replace(a) # Thanks #anky_91
# r1 = (d1.columns + ':' + d1).applymap(a.get)
r1
VDM MDM OM
state
AP 1 7 15
GOA 1 7 11
GU 1 7 14
KA 1 10 11
Then
pd.DataFrame({'state': r1.index, 'Item Number': [*map(', '.join, zip(*map(r1.get, r1)))]})
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #2
a = dict(zip(df2['Attribute:Value'], df2['Item'].astype(str)))
cols = ['VDM', 'MDM', 'OM']
b = {
'Item Number':
[*map(', '.join, zip(*[[a[f'{c}:{i}'] for i in df1[c]] for c in cols]))]
}
df1[['state']].assign(**b)
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #3
from itertools import cycle
a = dict(zip(zip(*df2['Attribute:Value'].str.split(':').str), df2['Item'].astype(str)))
d = df1.set_index('state')
b = {
'Item Number':
[*map(', '.join, zip(*[map(a.get, zip(cycle(d), np.ravel(d).astype(str)))] * 3))]
}
df1[['state']].assign(**b)
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Attempt #4
a = pd.Series(dict(zip(
zip(*df2['Attribute:Value'].str.split(':').str),
df2.Item.astype(str)
)))
df1.set_index('state').stack().astype(str).groupby(level=0).apply(
lambda s: ', '.join(map(a.get, s.xs(s.name).items()))
).reset_index(name='Item Number')
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
Here is another approach using stack, map and unstack:
s = df.set_index('state').stack()
s_map = df2.set_index(['Attribute:Value'])['Item']
s.loc[:] = (s.index.get_level_values(1) + ':' + s.astype(str)).map(s_map)
s.unstack().astype(str).apply(', '.join, axis=1).reset_index(name='Item Number')
[out]
state Item Number
0 AP 1, 7, 15
1 GOA 1, 7, 11
2 GU 1, 7, 14
3 KA 1, 10, 11
I feel like this is merge and pivot problem
s=df2['Attribute:Value'].str.split(':',expand=True).assign(Item=df2.Item)
s[1]=s[1].astype(int)
s1=df1.melt('state')
s1.merge(s,right_on=[0,1],left_on=['variable','value']).pivot('state','variable','Item')
Out[113]:
variable MDM OM VDM
state
AP 7 15 1
GOA 7 11 1
GU 7 14 1
KA 10 11 1

Compare two matrices and create a matrix of their common values [duplicate]

This question already has an answer here:
Numpy intersect1d with array with matrix as elements
(1 answer)
Closed 5 years ago.
I'm currently trying to compare two matrices and return matching rows into the "intersection matrix" via python. Both matrices are numerical data-and I'm trying to return the rows of their common entries (I have also tried just creating a matrix with matching positional entries along the first column and then creating an accompanying tuple). These matrices are not necessarily the same in dimensionality.
Let's say I have two matrices of matching column length but arbitrary (can be very large and different row length)
23 3 4 5 23 3 4 5
12 6 7 8 45 7 8 9
45 7 8 9 34 5 6 7
67 4 5 6 3 5 6 7
I'd like to create a matrix with the "intersection" being for this low dimensional example
23 3 4 5
45 7 8 9
perhaps it looks like this though:
1 2 3 4 2 4 6 7
2 4 6 7 4 10 6 9
4 6 7 8 5 6 7 8
5 6 7 8
in which case we only want:
2 4 6 7
5 6 7 8
I've tried things of this nature:
def compare(x):
# This is a matrix I created with another function-purely numerical data of arbitrary size with fixed column length D
y =n_c(data_cleaner(x))
# this is a second matrix that i'd like to compare it to. note that the sizes are probably not the same, but the columns length are
z=data_cleaner(x)
# I initialized an array that would hold the matching values
compare=[]
# create nested for loop that will check a single index in one matrix over all entries in the second matrix over iteration
for i in range(len(y)):
for j in range(len(z)):
if y[0][i] == z[0][i]:
# I want the row or the n tuple (shown here) of those columns with the matching first indexes as shown above
c_vec = ([0][i],[15][i],[24][i],[0][25],[0][26])
compare.append(c_vec)
else:
pass
return compare
compare(c_i_w)
Sadly, I'm running into some errors. Specifically it seems that I'm telling python to improperly reference values.
Consider the arrays a and b
a = np.array([
[23, 3, 4, 5],
[12, 6, 7, 8],
[45, 7, 8, 9],
[67, 4, 5, 6]
])
b = np.array([
[23, 3, 4, 5],
[45, 7, 8, 9],
[34, 5, 6, 7],
[ 3, 5, 6, 7]
])
print(a)
[[23 3 4 5]
[12 6 7 8]
[45 7 8 9]
[67 4 5 6]]
print(b)
[[23 3 4 5]
[45 7 8 9]
[34 5 6 7]
[ 3 5 6 7]]
Then we can broadcast and get an array of equal rows with
x = (a[:, None] == b).all(-1)
print(x)
[[ True False False False]
[False False False False]
[False True False False]
[False False False False]]
Using np.where we can identify the indices
i, j = np.where(x)
Show which rows of a
print(a[i])
[[23 3 4 5]
[45 7 8 9]]
And which rows of b
print(b[j])
[[23 3 4 5]
[45 7 8 9]]
They are the same! That's good. That's what we wanted.
We can put the results into a pandas dataframe with a MultiIndex with row number from a in the first level and row number from b in the second level.
pd.DataFrame(a[i], [i, j])
0 1 2 3
0 0 23 3 4 5
2 1 45 7 8 9

Resources