Combining data tables - excel

I have two data tables, similar to the ones below:
table1
index value
a 6352
a 67
a 43
b 7765
b 53
c 243
c 7
c 543
table 2
index value
a 425
a 6
b 532
b 125
b 89
b 664
c 314
I would like to combine the data in one table as in the table bellow using the index values. The order is important, so the first batch of values under one index in the common table must be from the table 1
index value
a 6352
a 67
a 43
a 425
a 6
b 7765
b 53
b 532
b 125
b 89
b 664
c 243
c 7
c 543
c 314
I tried to do it using VBA but I'm sadly a complete novice and I was wondering if someone has any pointers how to approach to write the code?

Copy the values of the second table (without the headers) under the values of the first table, select the two resultant columns and sort them by index.
Hope it works!

Related

How to check if a value in a column is found in a list in a column, with Spark SQL?

I have a delta table A as shown below.
point
cluster
points_in_cluster
37
1
[37,32]
45
2
[45,67,84]
67
2
[45,67,84]
84
2
[45,67,84]
32
1
[37,32]
Also I have a table B as shown below.
id
point
101
37
102
67
103
84
I want a query like the following. Here in obviously doesn't work for a list. So, what would be the right syntax?
select b.id, a.point
from A a, B b
where b.point in a.points_in_cluster
As a result I should have a table like the following
id
point
101
37
101
32
102
45
102
67
102
84
103
45
103
67
103
84
Based on your data sample, I'd do an equi-join on point column and then an explode on points_in_cluster :
from pyspark.sql import functions as F
# assuming A is df_A and B is df_B
df_A.join(
df_B,
on="point"
).select(
"id",
F.explode("points_in_cluster").alias("point")
)
Otherwise, you use array_contains:
select b.id, a.point
from A a, B b
where array_contains(a.points_in_cluster, b.point)

Find duplicated Rows based on selected columns condition in a pandas Dataframe

I have an extensive base converted into a dataframe where it is difficult to manually identify the following
The dataframe has columns with the names from_bus and to_bus, which are unique identifiers regardless of the order, for example for element 0:
L_ABAN_MACA_0_1 the associated ordered pair (109,140) is the same as (140,109).
name
from_bus
to_bus
x_ohm_per_km
0
L_ABAN_MACA_0_1
109
140
0.444450
1
L_AGOY_BAÑO_1_1
69
66
0.476683
2
L_AGOY_BAÑO_1_2
69
66
0.476683
3
L_ALAN_INGA_1_1
189
188
0.452790
4
L_ALAN_INGA_1_2
188
189
0.500450
So I want to identify the duplicate ordered pairs and replace them with a single one, whose column value x_ohn_per_km is defined as the sum of the duplicated values, as follows:
name
from_bus
to_bus
x_ohm_per_km
0
L_ABAN_MACA_0_1
109
140
0.444450
1
L_AGOY_BAÑO_1_1
69
66
0.953366
3
L_ALAN_INGA_1_1
189
188
0.953240
Let us try groupby on from_bus and to_bus after sorting the values in these columns along axis=1 then agg to aggregate the result, optionally reindex to conform the order of columns:
c = ['from_bus', 'to_bus']
df[c] = np.sort(df[c], axis=1)
df.groupby(c, sort=False, as_index=False)\
.agg({'name': 'first', 'x_ohm_per_km': 'sum'})\
.reindex(df.columns, axis=1)
Alternative approach:
d = {**dict.fromkeys(df, 'first'), 'x_ohm_per_km': 'sum'}
df.groupby([*np.sort(df[c], axis=1).T], sort=False, as_index=False).agg(d)
name from_bus to_bus x_ohm_per_km
0 L_ABAN_MACA_0_1 109 140 0.444450
1 L_AGOY_BAÑO_1_1 66 69 0.953366
2 L_ALAN_INGA_1_1 188 189 0.953240

How to get rid of first element of store id using pandas?

My data-frame consists of StoreId which needs to be changed for a particular type of store:
StoreType StoreId
A 105
A 213
B 401
B 402
B 711
B 910
B 913
B 915
In this dataframe, just for storeType = B, I want to get rid of all the 4's if the storeId starts with 4,(for example, 401 should change to 01, 402 should change to 02), for any other StoreID with storetype = B, there is no such logic and hence it needs to be hard coded like 711 should change to I0, 910 to 801, 913 to 804, 915 to 814.
How can I write an efficient code using pandas data-frame in python??
You can use a simple regular expression here, along with where to only change columns where a B is found in the other series.
u = df.StoreId.astype(str)
df.assign(StoreId=u.where(df.StoreType.ne('B'), u.str.replace('^4', '')))
StoreType StoreId
0 A 105
1 A 213
2 B 01
3 B 02
4 B 711
5 B 910
6 B 913
7 B 915

Line up Column B and its Value in Column C with Column A values retaining original order of column A. Puts non-matching column B values below

Please can anyone help out me doing this?
I have two main columns A and B. Column A contains a product code and column B contain its price. Now I have some product code in column C. I need their prices and maintain column C order.
A B C D
110 $10 115
111 $12 120
112 $18 117
113 $13 111
114 $22
115 $24
116 $98
117 $26
118 $77
119 $34
120 $17
Enter the formula in Column D and drag it down,
=IFERROR(INDEX(B:B,MATCH(C1,A:A,0),1),"")

remove mismatched rows in excel

I am having an excel file with 2000 records containing few columns like
A B C D E
114 5 270 product1 118
117 3 150 product1 190
118 9 300 product2 114
190 6 110 product1
191 11 540 product3
what I want to do is I want to remove the rows that are not matching the column A with E.
Expected Output
A B C D E
114 5 270 product1 114
118 9 300 product2 118
190 6 110 product1 190
Please help me
Assumption: your data table is in Sheet1, your Expected Output Table is in Sheet2.
Steps:
Copy column E of data table (DT) to column A of Expected Output Table (EOT).
Sort col A of EOT ascendingly (e.g. Data Ribbon > Sort & Filter).
Formula in B1 (EOT):
=Index(Sheet1!B$1:B$5, Match(Sheet2!$A1, Sheet1!$A$1:$A$5, 0), 1)
Above formula goes into the columns B to D in EOT.
Formula in E1 (EOT):
=$A1
The Index/Match would work even better if you had column headers. Then it would not matter whether the info from col B (DT) also goes into col B in EOT. Anyways, remember to adjust the ranges to your actual ones, and be careful with the $ signs.

Resources