How to get rid of first element of store id using pandas? - python-3.x

My data-frame consists of StoreId which needs to be changed for a particular type of store:
StoreType StoreId
A 105
A 213
B 401
B 402
B 711
B 910
B 913
B 915
In this dataframe, just for storeType = B, I want to get rid of all the 4's if the storeId starts with 4,(for example, 401 should change to 01, 402 should change to 02), for any other StoreID with storetype = B, there is no such logic and hence it needs to be hard coded like 711 should change to I0, 910 to 801, 913 to 804, 915 to 814.
How can I write an efficient code using pandas data-frame in python??

You can use a simple regular expression here, along with where to only change columns where a B is found in the other series.
u = df.StoreId.astype(str)
df.assign(StoreId=u.where(df.StoreType.ne('B'), u.str.replace('^4', '')))
StoreType StoreId
0 A 105
1 A 213
2 B 01
3 B 02
4 B 711
5 B 910
6 B 913
7 B 915

Related

Create two new Dataframes from existing one based on unique and repeated values of a column

colA colB
A 125
B 546
C 4586
D 547
A 869
B 789
A 258
E 123
I want to create two new dataframe and the first one should be based on the unique values in 'colA' and the second one should be the repeated values of 'colB'. The colB has no repeated values. The first output is like this:
ColA colB
A 125
B 546
C 4586
D 547
E 123
The second output is like this:
colA colB
A 869
B 789
A 258
For the first group, use drop_duplicates. For second group, use duplicated:
print (df.drop_duplicates("colA"))
colA colB
0 A 125
1 B 546
2 C 4586
3 D 547
7 E 123
print (df[df.duplicated("colA")])
colA colB
4 A 869
5 B 789
6 A 258

Perform computation on a value in one row and update another row's column with that value

I have a dataframe that looks somewhat like :
Categor_1 Categor_2 Numeric_1 Numeric_2 Numeric_3 Numeric_col4 Month
ABC XYZ 3523 454 4354 565 2018-02
ABC XYZ 333 444 123 565 2018-03
qww ggg 3222 568 123 483976 2018-03
I would like to apply some simple math on a column with a condition and assign it to a different row.
For instance
if Month == 2018-03 & Categor_2 == 'XYZ', perform Numeric_3*2 and assign it to Numeric_3 under month 2018-02.
So the output would be something like :
Categor_1 Categor_2 Numeric_1 Numeric_2 Numeric_3_ Adj Numeric_col4 Month
ABC XYZ 3523 454 246 565 2018-02
ABC XYZ 333 444 123 565 2018-03
qww ggg 3222 568 123 483976 2018-03
I was thinking of taking out the necessary columns, then doing a pivot, applying the math, then again reshaping it back in the orginal way.
However if there is a quick way, would be grateful to know
It depends what is length of Series of filtered DataFrame - here is one element Series, so possible set to scalar by next with iter for posible add default value if condition not match:
mask = (df.Month == '2018-03') & (df.Categor_2 == 'XYZ')
print (df.loc[mask, 'Numeric_3'] * 3)
1 369
Name: Numeric_3, dtype: int64
#get first value of Series, if emty Series is returned 0
a = next(iter(df.loc[mask, 'Numeric_3'] * 3), 0)
print (a)
369
df.loc[df.Month == '2018-02', 'Numeric_3'] = a
print (df)
Categor_1 Categor_2 Numeric_1 Numeric_2 Numeric_3 Numeric_col4 Month
0 ABC XYZ 3523 454 369 565 2018-02
1 ABC XYZ 333 444 123 565 2018-03
2 qww ggg 3222 568 123 483976 2018-03

Is there a way to find location of the top n elements in a group by

Need to extract attribute of top n elements of a pandas dataframe
input data is like below
KEY variable value
0 1 A 0.476970
101 1 B 0.513333
202 1 C 0.376970
203 2 B 0.5667
101 2 A 0.513333
202 2 C 0.376970
...
i need out put of top two as this
KEY variable value
1 A 0.476970
1 B 0.513333
2 B 0.5667
2 A 0.513333
...
the code i tried is as follows
test=pred_melt.groupby(['KEY'])['value'].nlargest(2)
this gives me
KEY
1 101 0.513333
0 0.476970
...
Name: value, Length: 198, dtype: float64
idea was to join with original with the index (101,0 etc) to add the variable column but cannot get the index out of get the desired output as above.
not the group by column is key and not the variable.
Thanks Supratim, yes index but i added the rest of details that i had to workout. please comment if you feel needed.
test=pred_melt.groupby(['KEY'])['value'].nlargest(2)
test.index
returns MultiIndex
as per
https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html
the structure is
MultiIndex(levels=[...], [...]],
codes=[[...], [..]],
names=[...])
I am interested in
test.index.levels[1]
which is giving me the second column of this
KEY
1 101 0.513333
0 0.476970
...
Name: value, Length: 198, dtype: float64
as 0,101 etc which can use to get the records from pred_melt
KEY variable value
0 1 A 0.476970
101 1 B 0.513333
202 1 C 0.376970
203 2 B 0.5667
101 2 A 0.513333
202 2 C 0.376970
as
pred_melt.iloc[test.index.levels[1]]

pandas df merge avoid duplicate column names

The question is when merge two dfs, and they all have a column called A, then the result will be a df having A_x and A_y, I am wondering how to keep A from one df and discard another one, so that I don't have to rename A_x to A later on after the merge.
Just filter your dataframe columns before merging.
df1 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(0,100,12),'C':list('ABCD')*3})
df2 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(100,1000,12),'C':list('ABCD')*3})
df1.merge(df2[['Key','A']], on='Key')
Output: (Note: C is not duplicated)
A_x C Key A_y
0 60 A 0 440
1 65 B 1 731
2 76 C 2 596
3 67 D 3 580
4 44 A 4 477
5 51 B 5 524
6 7 C 6 572
7 88 D 7 984
8 70 A 8 862
9 13 B 9 158
10 28 C 10 593
11 63 D 11 177
It depends if need append columns with duplicated columns names to final merged DataFrame:
...then add suffixes parameter to merge:
print (df1.merge(df2, on='Key', suffixes=('', '_')))
--
... if not use #Scott Boston solution.

Combining data tables

I have two data tables, similar to the ones below:
table1
index value
a 6352
a 67
a 43
b 7765
b 53
c 243
c 7
c 543
table 2
index value
a 425
a 6
b 532
b 125
b 89
b 664
c 314
I would like to combine the data in one table as in the table bellow using the index values. The order is important, so the first batch of values under one index in the common table must be from the table 1
index value
a 6352
a 67
a 43
a 425
a 6
b 7765
b 53
b 532
b 125
b 89
b 664
c 243
c 7
c 543
c 314
I tried to do it using VBA but I'm sadly a complete novice and I was wondering if someone has any pointers how to approach to write the code?
Copy the values of the second table (without the headers) under the values of the first table, select the two resultant columns and sort them by index.
Hope it works!

Resources