Combine two columns values in one Spark- Python - apache-spark

I have this table bellow:
FrameForm | Sections | Framefrom_section | FrameFrom_echelon
----------|----------|-------------------|------------------
70 | 11/12 | 11/12 | 50004
70 | 13/14 | 13/14 | 60003
How can I do a test via pySpark on a FrameFrom column to combine the two values of Framefrom_section and FrameFrom_echelon to obtain this result:
FrameForm | Framefrom_section | FrameFrom_echelon
----------|-------------------|------------------
70 | [11/12,13/14] | [50004,60003]

Related

Create descending list inlcuding duplicates based on filter criteria

Excel-File
| A | B | C | D | E | F |
---|--------------|-------------------|--------|-----------------|------------|------------|-
1 | Sales | Product | | Product | Sales | |
2 | 20 | Product_A | | Product_D | 100 | Product_D |
3 | 10 | Product_A | | Product_D | 90 | |
4 | 50 | Product_A | | Product_D | 50 | |
5 | 80 | Product_B | | Product_D | 50 | |
6 | 40 | Product_C | | | | |
7 | 30 | Product_C | | | | |
8 | 100 | Product_D | | | | |
9 | 90 | Product_D | | | | |
10 | 50 | Product_D | | | | |
11 | 50 | Product_D | | | | |
12 | | | | | | |
In Column B I have list of different products with their corresponding sales in Column A.
Products can appear mutliple times in the list.
Sales numbers can be equal for multiple products.
I want to use the value in Cell F2 as Filter-Criteria to create a descending list of the products in Column D and Column E sorted by the sales in Column A.
Therefore, I tried to add the FILTER function to the formula from this question:
=INDEX(SORT(FILTER(A2:B11,A2:A11=F2,""),2,-1),SEQUENCE(COUNT(A2:A11)),{2,1})
However, with this formula I get error #VALUE.
How do I need to modify the formula to make it work?
Simply add COUNTIF() inside the SEQUENCE():
=INDEX(SORT(FILTER(A2:B11,B2:B11=F2,""),2,-1),SEQUENCE(COUNTIF(B2:B11,F2)),{2,1})
Current view on OP side:
Due to unknown reason only Column D gets filled.

Excel - Transform / Extract data from a single column/cell to multiple columns cells

Can someone show how I would format/transform a ton of these types of data...
(single cell, one column)
| 040/2.5OZ |
| 001/20# |
| 012/2.8# |
To this
(three cells, three columns)
| 40 | 2.5 | OZ |
| 01 | 20 | # |
| 12 | 2.8 | # |
Thanks!
You can try below formula-
=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"/","</s><s>"),"#","</s><s>#"),"OZ","</s><s>OZ")&"</s></t>","//s"))

MATCH-formula where 'lookup_value' is array

I have 3 Excel-files (automated exports) that contain the following information:
1. The total list of shelves in one particular store:
| Shelf_code |
|------------|
| AB01 |
| AA02 |
2. The total list of all shelves linked to each article
| SKU_code | Shelf_code |
|----------|------------|
| 111 | AA01 |
| 111 | AB01 |
| 111 | AC01 |
| 112 | AA01 |
3. The list of all available SKUs
| SKU_code | Other stuff |
|----------|-------------|
| 111 | ... |
| 112 | ... |
| 113 | ... |
| 114 | ... |
And what I want to do is to link the Shelf_codes from that specific store to the total available SKU-list, so it will look like this:
| SKU_code | Other stuff | Shelf_code_store1 |
|----------|-------------|-------------------|
| 111 | ... | AB01 |
| 112 | ... | |
| 113 | ... | |
| 114 | ... | AB01 |
I have tried to embed the MATCH formula within another INDEX/MATCH formula (see code below) which was partially successful since this will only work if the shelf_code in file 2 happens to be the first one to match the SKU_code.
Since this will be mostly not the case, it will return a #N/A error
MATCH(
INDEX({file2_shelfcode},MATCH(file3_skucode,{file2_skucode},0)),
{file1_shelfcode}
)
Does anyone has a solution for this?
Since these files contain over 1000 articles, 200 shelves, 6 stores, and will be frequently updated I don't think using a Pivottable on file 2 will fit my needs.

How to transpose all subfields in front of the parent field of a pivot table in Excel

I have a data of automotive spare parts with their multiple store locations in a warehouse.
all I want to do is get the locations in front of the part number, so that it is easy to know all the locations of a specific part number.
The current pivot data looks like this
I've manually transposed a few rows in the below image, but the data contains around 70K rows, Hence I'm looking for a better solution
Kindly refer to the below table
+--------------+-----+-------+-------------+
| Item name | Qty | UoM | Stock |
+--------------+-----+-------+-------------+
| '0450000115 | 324 | piece | G12B04 |
| '0450000A61 | 312 | piece | G12B05 |
| '0450000115 | 336 | piece | G12B06 |
| '0450000A61 | 228 | piece | G12B07 |
| '0450000115 | 336 | piece | G12B08 |
| '0450000115 | 192 | piece | G12B09 |
| '087902E200A | 470 | piece | G12B10 |
| '087902E200A | 760 | piece | G12B13 |
| '087902E200A | 759 | piece | G12B14 |
| '0450000115 | 336 | piece | G12B15 |
| '087902E200A | 400 | piece | G12B16 |
| '087902E200A | 10 | piece | G3B32 |
| '084B410426 | 100 | piece | G3B32 |
| '087902E200A | 300 | piece | G4B08 |
| '0450000A61 | 2 | piece | GDB01 |
| '084B410426 | 60 | piece | GR.04.C.04. |
| '087902E200A | 327 | piece | HD.03.K.05. |
+--------------+-----+-------+-------------+
You need to create a measure, using the CONCATENATEX function. For this you need to add your data to the datamodel. You can do this by checking the box add this data to the datamodel on the bottom of the create pivottable dialogbox.
Rightclick the table on the Pivottable Fields Pane and select add measure. Then create the following measure: = CONCATENATEX('table','table'[Stock],", ")
Now put [Item name] on Rows and the measure [StockText] on Values. This should be the result:

Count text occurrences in a column in Excel

I have the following list in Excel:
+-------+----------+
| am | ipiresia |
+-------+----------+
| 50470 | 29 |
| 50470 | 43 |
| 50433 | 29 |
| 6417 | 51 |
| 6417 | 52 |
| 6417 | 53 |
| 4960 | 25 |
| 4960 | 26 |
| 5567 | 89 |
| 6716 | 88 |
+-------+----------+
I want to add a column, let's say 'num' and count the occurrences of column 'am' in a row adding one when a new occurrence happens as follows:
+-------+----------+-----+
| am | ipiresia | num |
+-------+----------+-----+
| 50470 | 29 | 1 |
| 50470 | 43 | 2 |
| 50433 | 29 | 1 |
| 6417 | 51 | 1 |
| 6417 | 52 | 2 |
| 6417 | 53 | 3 |
| 4960 | 25 | 1 |
| 4960 | 26 | 2 |
| 5567 | 89 | 1 |
| 6716 | 88 | 1 |
+-------+----------+-----+
Is it possible to get this automatically with a formula in Excel?
yes,
my example:
(assume you start your table containing 3 columns at Excels origin at A1 without header lines)
Then fill C1 with value "1"
and then start in C2 with entering a formula
simple like this:
=if($A2=$A1;$C1+1;1)
then you drag C2 down at the cells downright located autofill position as far as you want. Most times also double click works to let Excel autofill the columns down to the end of you prefilled table.
If you need assistance for AutoFill press F1 in Excel an the help with tell you in detail.
Assuming the sample table starts at A1 (with headers) the following formula will provide the expected results even if the list is not sorted.
=COUNTIF($A$1:$A2,A2)
Enter the formula at cell C2 then paste it down to the last cell of the data (or use AutoFill)

Resources