I have a simple data which is
Name Age
Venky 20
Anil 22
Output should be like :
Name : Venky
Age : 20
Name : Anil
Age : 22
Note : For each record should have header values
I have tried multiple ways apart from Macros
Can you please give me you inputs?
Without more specific context from the question, we can combine BYROW, MAP, TOCOL, TEXTJOIN, TEXTSPLIT and SUBSTITUTE to get the expected result (adding the link since some of them are relatively new Excel functions):
=LET(y,SUBSTITUTE(TEXTJOIN(";",,
BYROW(A2:C4, LAMBDA(x, TEXTJOIN(";",,MAP(TOCOL(A1:C1), TOCOL(x),
LAMBDA(a,b, a & ":"& b & ";")))))), ";;",";"),
TEXTSPLIT(LEFT(y, LEN(y) - 1),,";"))
The main idea is create on the fly for each row of input data(A1:B3 including the headers) a new array via MAP with the following structure (let’s call it tempArray:
| Name:name1; |
|-------------|
| Age:age1; |
Note: It works for more than two columns in the header, if that is the case a row per header item will be created.
For example:
=MAP(TOCOL(A1:B1), TOCOL(A2:B2), LAMBDA(a,b, a & ":"& b & ";"))
For the first row of the input data will return:
| Name:Venky; |
| ------------|
| Age:10; |
Then convert tempArray into a text via: TEXTJOIN so every element of BYROW, i.e. x is converted to a string, where each record is delimited by ;. For the first row it will be:
Name:Venky;;Age:10;
SUBSTITUTE is used to remove the extra ; generated by TEXTJOIN. For example for the first row it would be:
Name:Venky;Age:10;
After executing SUBSTITUTE the intermediate result would be:
Name:Venky;Age:20;Name:Anil;Age:22;
We use LEFT to remove the extra ; at the end. Now we have a string that has each row delimited by ;, so the string is ready to be converted back to an array using TEXTSPLIT.
Sample adding an additional record and Last Name as a new header item:
Related
In pyspark, I'm trying to replace multiple text values in a column by the value that are present in the columns which names are present in the calc column (formula).
So to be clear, here is an example :
Input:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |param_1-param_2
|Cell 3 |Cell 4 |param_2/param_1
Output needed:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |Cell 1-Cell 2
|Cell 3 |Cell 4 |Cell 4/Cell 3
In the column calc, the default value is a formula. It can be something as much as simple as the ones provided above or it can be something like "2*(param_8-param_4)/param_2-(param_3/param_7)".
What I'm looking for is something to substitute all the param_x by the values in the related columns regarding the names.
I've tried a lot of things but nothing works at all and most of the time when I use replace or regex_replace with a column for the replacement value, the error the column is not iterable occurs.
Moreover, the columns param_1, param_2, ..., param_x are generated dynamically and the calc column values can some of these columns but not necessary all of them.
Could you help me on the subject with a dynamic solution ?
Thank you so much.
Best regards
Update: Turned out I misunderstood the requirement. This would work:
for exp in ["regexp_replace(calc, '"+col+"', "+col+")" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Yet Another Update: To Handle Null Values add coalesce:
for exp in ["coalesce(regexp_replace(calc, '"+col+"', "+col+"), calc)" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Input/Output:
------- Keeping the below section for a while just for reference -------
You can't directly do that - as you won't be able to use column value directly unless you collect in a python object (which is obviously not recommended).
This would work with the same:
df = spark.createDataFrame([["1","2", "param_1 - param_2"],["3","4", "2*param_1 + param_2"]]).toDF("param_1", "param_2", "calc");
df.show()
df=df.withColumn("row_num", F.row_number().over(Window.orderBy(F.lit("dummy"))))
as_dict = {row.asDict()["row_num"]:row.asDict()["calc"] for row in df.select("row_num", "calc").collect()}
expression = f"""CASE {' '.join([f"WHEN row_num ='{k}' THEN ({v})" for k,v in as_dict.items()])} \
ELSE NULL END""";
df.withColumn("Result", F.expr(expression)).show();
Input/Output:
Input should be like,
schema name
schema value
ps_din_ghr_${hiveoff:suffix}
post
and output should be like below in new coulmn:
schema name
ps_din_ghr_post
i want to replace after $ part with the value from the side column.
please let me know how to execute this using excel. thanks in advance
EDIT : For English users replace ; by , in the formula
You can use SEARCH to get the position of the $ and REPLACE everything after it.
For example with schema name in column A and schema value in column B -> result in column C.
To find the $ you need to use SEARCH("$";A2) it will give you the position.
Then you can count the number of characters to replace after the "$" by substracting the position to the length of the schema name with LEN(). (+1 to get the last char)
Then you can combine everything :
=REPLACE(A2;SEARCH("$";A2);LEN(A2)-SEARCH("$";A2)+1;B2)
Result in my Excel :
I don't know excel very well and I am trying to take something like this (with a lot of entries):
Field ......Value ....... ID
A .......... blabla1 .......1
B ...........blabla2 .......1
C ...........blabla3 .......1
D ...........blabla4 .......1
A ...........blabla5 .......2
B ...........blabla6 .......2
C ...........blabla7 .......2
D ...........blabla8 .......2
and turn into something more readable like this:
ID -----A -------------B ---------------- C ---------------- D
1 ------blabla1 -----blabla2 -------- blabla3 --------blabla4
2 ------blabla5----- blabla6 -------- blabla7-------- blabla8
Does anyone know a good way to do that? Thank you
(sorry about the bad formatting)
The exact delimiter beween each word is key if text not already split in separate cells..
Assuming there are numerous words in place of '.....', with each word separated by a single space (different delimiter would be required if the blablas represented sentences comprising one / more spaces), then you could achieve the desired table representation as follows
(several function in this soln requires Office 365 compatible version of Excel,
the lookup in step 3 does not require Office 365, but may mean IDs and Fields need to be manually entered or VB could be deployed):
Starting position (after removing bank rows):
Field Value ID
A blabla1 1
B blabla2 1
C blabla3 1
D blabla4 1
A blabla5 2
B blabla6 2
C blabla7 2
D blabla8 2
1) Split cells according to delimiter (skip this step if not relevant)
=TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(F3," ","</y><y>")&"</y></x>","//y"))
(replace the " " inside the substitute function with a different delimiter if required/desired)
2) Obtain unique IDs (rows) and Fields (columns)
=UNIQUE(K4:K11)
=TRANSPOSE(UNIQUE(I4:I11))
3) Index lookup for table content
=INDEX(J4:J11,MATCH(M4#&N3#,K4:K11&I4:I11,0),0)
I have an Excel-file with 2 tabs:
Tab1
ActorID | ActorName
--------------------
4321 | ActorName1
4322 | ActorName2
4323 | ActorName3
4324 | ActorName4
In the second tab I want to put in the name of the actor and see if it's in the array
So I used this formula: =(Tab1!A1:A10="ActorName1"), but I get FALSE. When I use the same formula in the first tab (=(A1:A10="ActorName1")) I get TRUE.
I don't understand why I get FALSE if the formula is used in another tab :/
The formula works on the tab only if the name you are searching is the first. You are trying to compare an array to a single item, Excel will only look at the first.
To search a range of names use MATCH(). To return TRUE/FALSE wrap it in ISNUMBER(), as MATCH will return a number if found or an error if not found.
=ISNUMBER(MATCH("ActorName1",Tab1!A1:A10,0))
I have two tables.
Table one contains: phone number list
Table Two contains: prefix and destination list
I want look up prefix and destination for phone number.
Given below Row data table and result table
Table 01 ( Phone Number List)
Phone Number
------------
12426454407
12865456546
12846546564
14415332165
14426546545
16496564654
16896546564
16413216564
Table 02 (Prefix and Destination List)
PREFIX |COUNTRY
-------+---------------------
1 |Canada_USA_Fixed
1242 |Bahamas
1246 |Barbados
1268 |Antigua
1284 |Tortola
1340 |Virgin Islands - US
1345 |Cayman Island
144153 |Bermuda-Mobile
1473 |Grenada
1649 |Turks and Caicos
1664 |Montserrat
Table 03 (Result)
Phone Number | PREFIX | COUNTRY
--------------+--------+-------------------
12426454407 | 1242 | Bahamas
12865456546 | 1 | Canada_USA_Fixed
12846546564 | 1284 | Tortola
14415332165 | 144153 | Bermuda-Mobile
14426546545 | 1 | Canada_USA_Fixed
16496564654 | 1649 | Turks and Caicos
16896546564 | 1 | Canada_USA_Fixed
16643216564 | 1664 | Montserrat
Lets assume phone numbers are in column A, now in column B you need to extract the prefix. Something like this:
=LEFT(A1, 4)
However your Canada_USA_Fixed creates problems as does the Antigua mobile. I'll let you solve this issue yourself. Start with IF statements.
Now that you have extracted the prefix you can easily use VLOOKUP() to get the country.
Assuming that the longest prefix is 6 digits long you, can add 6 columns (B:G) next to the column with the phone numbers in table 1 (I assume this is column A). In column B you'd show the first 6 characters using =LEFT(A2,6), in the next column you show 5 chars, etc.
Then you add another 6 columns (H:M) , each doing a =MATCH(B2,Table2!A:A,0) to see if this prefix is in the list of prefixes.
Now if any of the 6 potential prefixes match, you'll get the row number of the prefix - else you'll get an #N/A error. Put the following formula in column N: {=INDEX(H2:M2,MATCH(FALSE,ISERROR(H2:M2),0))} - enter the formula as an array formula, i.e. instead of pressing Enter after entering it, press Ctrl-Shift-Enter - you'll see these {} around the formula then, so don't enter those manually!.
Column N now contains the row of the matching prefix or #N/A if no prefix matches. Therefore, put =IF(ISNA(N2,'No matching prefix',INDEX(Table2!B:B,N2)) in the next column and you'll be done.
You could also the above approach with less columns but more complex formulas but I wouldn't recommend it.
I'm also doing longest prefix matches and, like everyone else that Google has turned up, it's also for international phone number prefixes!
My solution is working for my table of 200 prefixes (including world zone 1, ie. having 1 for US/Canada and 1242 for Bahamas, etc).
Firstly you need this array formula (which I'm going to call "X" in the following but you'll want to type out in full)
(LEFT(ValueToFind,LEN(PrefixArray))=PrefixArray)*LEN(PrefixArray)
This uses the trick of multiplying a logical value with an integer so the result is zero if there's no match. You use this find the maximum value in one cell (which I'm calling "MaxValue").
{=MAX(X)}
If MaxValue is more than zero (and therefore some sort of match was found), you can then find the position of the maximum value in your prefix array.
{=MATCH(MaxValue,X,0)}
I've not worried about duplicates here - you can check for them in your PrefixArray separately.
Notes for neophytes:
PrefixArray should be an absolute reference, either stated with lots of $ or as a "named range".
I'm assuming you'll make ValueToFind, MaxValue and the resultant index into PrefixArray as cells on the same row, and therefore have a $ against their column letter but not their row number. This allows easy pasting for lots of rows of ValueToFind.
Array formula are indicated by curly braces, but are entered by typing the text without the curly braces and then hitting Ctrl-Shift-Enter.