Split cells containing ALLCAPS words - excel

I have a .csv file containing a column "First + Last name". I'd like to split the cells to get 2 columns (First Name and Last name). In each cell, the last name is written ALLCAPS. So here is my file right now :
First + Last name
-----------------------------
John DOE
Marie-Helen ANDRE-JACQUES
Jean-Claude DOE
And i'd like to split cells so i have :
First name | Last name
--------------------------------------------------
John | DOE
Marie-Helen | ANDRE-JACQUES
Jean-Claude | DOE
How would i do this in excel (or numbers) ?

A good way to solve this problem is to use the FLASH-FILL Function of Excel.
https://support.office.com/en-us/article/using-flash-fill-in-excel-3f9bcf1e-db93-4890-94a0-1578341f73f7?ui=en-US&rs=en-US&ad=US

Related

Can you write an IF statement based on a character in a string?

In excel I have a column that contains given names. Some of those are one word and some are two words. Think of something like this:
FIRST NAME
Emma
Anthony
Anne Marie
John
I want to concatenate this column with another one to create an id, and I am just interested in getting the first word. So my ideal output would look like column 'ID'
FIRST NAME | CODE | ID
Emma | 2D1 | 2D1_Emma
Anthony | 4G3 | 4G3_Anthony
Anne Marie | 8Y2 | 8Y2_Anne
John | 5L9 | 5L9_John
I have tried it with this formula, but it is not working, it retrieves all the text in the first column instead of just the first word:
=CONCAT($B2;"_";IF($A2="* *";(LEFT($A2(FIND(" ";A2;1)-1)));A2))
If I don't use the * as a wildcard, the result I get is the same. Any other combinations I tried give an error.
Any idea how I can get it to pick the text on the left of a blank space if there is any?
Thanks!
Brisa
Concatenate a space when using FIND:
=B2&"_"&LEFT(A2,FIND(" ",A2&" ")-1)
Since your version of Excel uses ; as the separator:
=B2&"_"&LEFT(A2;FIND(" ";A2&" ")-1)
Use this formula
Assuming E2 has Code and D2 has F-Name
=E2&"_"&LEFT(D2,FIND(" ",D2)-1)

Manipulate CSV file: increment cell coordinates/position

I have a csv file with one entry on each line, three entries form a whole dataset. So what I need to do now, is to put these sets in the columns in one row. I have difficutlies to describe the problem (thus my search was not giving me a solution), so here's an example.
Sample CSV file:
1 Joe
2 Doe
3 7/7/1990
4 Jane
5 Done
6 6/6/2000
What I want in the end is this:
1 Name Surname Birthdate
2 Joe Doe 7/7/1990
3 Jane Done 6/6/2000
I'm trying to find a solution to make this automatically, as my actual file consists of 480 datasets, each set containing 16 entries, and it would take me days to do it manually.
I was able to fill the first line with Excel's indirect function:
=INDIRECT("A"&COLUMN()-COLUMN($A1))
As COLUMN returns the column number, if I drag the first line down in Excel, obviously this shows exactly the same as the first line:
1 Name Surname Birthdate
2 Joe Doe 7/7/1990
3 Joe Doe 7/7/1990
Now I'm looking for a way to increment the cell position by one:
A B C D
1 Joe =A1 =B1+1 =C1+1
2 Doe =D1+1
3 7/7/1990
4 Jane
What should lead to:
A B C D
1 Joe =A1 =A2 =A3
2 Doe =A4 =A5 =A4
3 7/7/1990
4 Jane
As you can see in the example given, the cell coordinates for A increment by one, and I have no idea how to do this automatically in Excel. I think there must be a better way than using nested Excel function, as the task (increment +1) seems actually pretty easy.
I'm also open to solutions involving sed, awk (of which I only have a very superficial knowledge) or other command line tools.
You're help is appreciated very much!
awk 'BEGIN { y=1; printf "Name Surname Birthdate\n%s",y; x=1;}
{if (x == 3) {
y = y + 1;
printf "%s\n%s",$2,y;
x=1;
}
else {
printf " %s ",$2;
x = x + 1;
}}' input_file.txt
This may work for what you want to do. Your sample does not include the commas, so I'm not sure if they are really in there or not. If they are, you will need to modify the code slightly with the -F, flag so that it treats them as such.
This second code snippet will provide the output with a comma delimiter. Again, it is assuming that your sample input file did not have commas to delimit the 1 Joe and 2 Doe.
awk 'BEGIN { y=1; printf "Name Surname Birthdate\n%s",y; x=1;}
{if (x == 3) {
y = y + 1;
printf "%s\n%s,",$2,y;
x=1;
}
else {
printf " %s,",$2;
x = x + 1;
}}' input_file.txt
Both of the awk scripts will set x and y variables to one, where the y variable will increment your line numbering. The x variable will count up to 3 and then reset itself back to one. This is so that it prints each line in a row, until it gets to the 3rd item where it will then insert a newline character.
There are easier/more complex ways to do this with regexes and a language like perl, but since you mentioned awk, I believe this will work fine.

Exact frequency of a specific word in a single cell (excluding suffix and prefix)

I earlier worked out a good solution for this with the help of the comunity, it works really good but I found out it can only handle suffix words (it dosen't ignore prefix-words).
Formula:
=IF(B1<>"";(LEN(A1)-LEN(SUBSTITUTE(A1;B1&" ";"")))/(LEN(B1)+1)+IF(RIGHT(A1;LEN(B1))=B1;1;0);"")
A contains sentences, multiple words (without punctuation)
B contains the word I want to count the exact frequency of.
C here is there the formula is placed and where I get the result
Sample table:
| A | B | C |
|:-------------------------:|:----:|:--------:|
| boots | shoe | 0 |
----------------------------------------------|
| shoe | shoe | 1 |
----------------------------------------------|
| shoes | shoe | 0 |
----------------------------------------------|
| ladyshoe dogshoe catshoe | shoe | 3 |
----------------------------------------------|
In C-column I am getting correct output in row 1, 2 and 3 but not 4. I want C4 should return 0 and not 3.
The problem is that it makes no match for shoexxxxxxxxxxx (correct) but makes a match for xxxxxxxxxxxshoe (wrong).
I only want the formula to count the exact match for shoe, any other word should not be counted for.
You want this formula:
=IF(B1<>"",(LEN(A1)-LEN(SUBSTITUTE(A1," "&B1&" ","")))/(LEN(B1)+2),"")+IF(A1=B1,1,0)+IF(LEFT(A1,LEN(B1)+1)=B1&" ",1,0)+IF(RIGHT(A1,LEN(B1)+1)=" "&B1,1,0)
I'll denote a space by * to make the following clearer:
There are four cases to consider:
string; the word has no spaces on either side (and is therefore the only word in cell A1
string*; the word appears at the start of a list of words.
*string; the word appears at the end of a list of words.
*string*; the word is in the middle of a list of words.
First we count the number of occurrences of *string*, by substituting "*string*" for "", subtracting the length of the new string from the old one, and dividing by len(string)+2 (which is the length of *string*).
Then we add one more to our count if A1 is exactly string, with no spaces either side.
Then we add one more if A1 starts with string*, and one more if A1 ends with *string.

How to merge 2 rows into 1 row at the same column using awk

I just started using the UNIX and also no much experience in scripting. Now I am struggling a lot to merge the 2 rows at the same column. Below is original data.
There columns are split into 2 rows but ideally should be in 1 row.
But I don't know how to do it.
Original File
User Middle Last
Name Name Name
Htat Ko Lin
John Smith Bill
Trying to achieve:
UserName MiddleName LastName
Htat Ko Lin
John Smith Bill
Thanks!
Htat Ko
This can be done using awk and for loops
awk 'NR==1{for(i=1;i<=NF;i++)a[i]=$i;next}NR==2{for(i=1;i<=NF;i++)$i=a[i]$i}1' file
Output
UserName MiddleName LastName
Htat Ko Lin
John Smith Bill
Explanation
NR==1
If the record number is 1. i.e the first record then execute the next block
for(i=1;i<=NF;i++)
Loop from one to the number of fields(NF).Incrementing by one each time.
a[i]=$i
Using i as a key set an array element in the array a to the field i ($i).
next
Skip all further instruction and move to the next record.
NR==2
Same as before but for record 2
for(i=1;i<=NF;i++)
Exactly the same as before
$i=a[i]$i
Set field i to the stored value in the array and then itself
1
Defaults to true so prints all lines unless next has been used
Additional notes
if you want keep the columns in line the easiest was to do this is to pipe that command into column -t
awk '...' file | column -t
Reduced version
awk '{for(i=1;i<=NF;i++)(NR==2&&$i=a[i]$i)||a[i]=$i}NR>1' file

Pre-pend required number of characters in Excel

I need to prepend field values in an Excel sheet with the required number of characters to equal 5 characters in the field, then concatenate two fields and have all of the characters show in the new field.
Example:
Field 1 | Field 2 | Show as
abc | 123 | 00abc00123
d | 5678 | 0000d05675
Ideas?
I think what you may want is something like:
=REPT(0,5-LEN(A1))&A1&REPT(0,5-LEN(B1))&B1
You could also use TEXT on a number field.
=REPT(0,5-LEN(A1))&A1&TEXT(B1,"00000")

Resources