String split and count in Excel - excel

I have the following column K in Excel:
K
2 Apps -
3 Appointed CA - Apps - Assist - Appointed NA - EOD Efficiency
4 Appointed CA -
5 Appointed CA -
I want to split at - and count the number occurrences of the specific words in the string.
I tried the following formula which splits my string and returns everything LEFT to the -
=LEFT( K2, FIND( "-", K2 ) - 2 )
But the ideal output should be:
Apps Appointed CA Assist Appointed NA EOD Efficiency
1
1 1 1 1 1
1
1
Based on the above data.
Regards,

Here is a VBA macro that will
Generate a unique list of phrases from all of the data
Create the "header row" containing the phrases for the output
Go through the original data again and generate the counts for each phrase
As written, the macro is case insensitive. To make it case sensitive, one would have change the method of generating the unique list -- using the Dictionary object instead of a collection.
To enter this Macro (Sub), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and paste the code below into the window that opens. It should be obvious where to make changes to handle variations in where your source data is located, and where you want the results.
To use this Macro (Sub), alt-F8 opens the macro dialog box. Select the macro by name, and RUN.
It will generate results as per your ideal output above
Option Explicit
Option Compare Text
Sub CountPhrases()
Dim colP As Collection
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim vSrc As Variant, vRes() As Variant
Dim I As Long, J As Long, K As Long
Dim V As Variant, S As String
'Set Source and Results worksheets and ranges
Set wsSrc = Worksheets("sheet1")
Set wsRes = Worksheets("sheet2")
Set rRes = wsRes.Cells(1, 1) 'Results will start in A1 on results sheet
'Get source data and read into array
With wsSrc
vSrc = .Range("K2", .Cells(.Rows.Count, "K").End(xlUp))
End With
'Collect unique list of phrases
Set colP = New Collection
On Error Resume Next 'duplicates will return an error
For I = 1 To UBound(vSrc, 1)
V = Split(vSrc(I, 1), "-")
For J = 0 To UBound(V)
S = Trim(V(J))
If S <> "" Then colP.Add S, CStr(S)
Next J
Next I
On Error GoTo 0
'Dimension results array
'Row 0 will be for the column headers
ReDim vRes(0 To UBound(vSrc, 1), 1 To colP.Count)
'Populate first row of results array
For J = 1 To colP.Count
vRes(0, J) = colP(J)
Next J
'Count the phrases
For I = 1 To UBound(vSrc, 1)
V = Split(vSrc(I, 1), "-")
For J = 0 To UBound(V)
S = Trim(V(J))
If S <> "" Then
For K = 1 To UBound(vRes, 2)
If S = vRes(0, K) Then _
vRes(I, K) = vRes(I, K) + 1
Next K
End If
Next J
Next I
'write results
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.EntireColumn.AutoFit
End With
End Sub

Assuming result range starts in column L:
L2: =IF(FIND("Apps", K2, 1) <> 0, 1, "")
M2: =IF(FIND("Appointed CA", K2, 1) <> 0, 1, "")
etc.
Autofill downwards.
EDIT:
Assuming all possible string combinations we're looking for are known ahead of time, the following should work. If the possible string combinations are not known, I would recommend building a UDF to sort it all out.
Anyway, assuming the strings are known, following the same principle as above:
L2: =IF(FIND("Apps", K2, 1) <> 0, (LEN(K2) - LEN(SUBSTITUTE(K2, "Apps", "")) / LEN(K2)), "")
M2: =IF(FIND("Appointed CA", K2, 1) <> 0, (LEN(K2) - LEN(SUBSTITUTE(K2, "Appointed CA", "")) / LEN(K2)), "")
Increase for as many strings as you like, autofill downwards.

Related

how to divide data in excel

There is an excel issue where we have one column with values like below and we want the respective values to go into corresponding new columns like allocation, primary purpose etc.
data is like
Allocation: Randomized|Endpoint Classification: Safety/Efficacy Study|Intervention Model: Parallel Assignment|Masking: Double Blind (Subject, Caregiver)|Primary Purpose: Treatment
Allocation: Randomized|Primary Purpose: Treatment
Allocation: Randomized|Intervention Model: Parallel Assignment|Masking: Open Label|Primary Purpose: Treatment
There are many such rows like this.
First use text to columns to split data using | delimiter.
Assuming data layout as in screenshot:
Add the following in A6 and drag across/down as required:
=IFERROR(MID(INDEX(1:1,0,(MATCH("*"&A$5&"*",1:1,0))),FIND(":",INDEX(1:1,0,(MATCH("*"&A$5&"*",1:1,0))),1)+2,1000),"")
It uses the MATCH/INDEX function to get the text of cell containing the heading, then uses MID/FIND function to get the text after the :. The whole formula is then enclosed in IFERROR so that if certain rows do not contain a particular header item, it returns a blank instead of #N/A's
You did not ask for a VBA solution, but here is one anyway.
Determine the column headers by examining each line and generate a unique list of the headers, storing it in a dictionary
You can add a routine to sort or order the headers
Create a "results" array and write the headers to the first row, using the dictionary to store the column number for later lookup
examine each line again and pull out the value associated with each column header, populating the correct slot in the results array.
write the results array to a "Results" worksheet.
In the code below, you may need to rename the worksheet where the source data resides. The Results worksheet will be added if it does not already exist -- feel free to rename it.
Test this on a copy of your data first, just in case.
Be sure to set the reference to Microsoft Scripting Runtime (Tools --> References) as indicated in the notes in the code.
Option Explicit
'Set References
' Microsoft Scripting Runtime
Sub MakeColumns()
Dim vSrc As Variant, vRes As Variant
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim dHdrs As Dictionary
Dim V As Variant, W As Variant
Dim I As Long, J As Long
Set wsSrc = Worksheets("Sheet1")
'Get source data
With wsSrc
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
'Set results sheet and range
On Error Resume Next
Set wsRes = Worksheets("Results")
If Err.Number = 9 Then
Worksheets.Add.Name = "Results"
End If
On Error GoTo 0
Set wsRes = Worksheets("Results")
Set rRes = wsRes.Cells(1, 1)
'Get list of headers
Set dHdrs = New Dictionary
dHdrs.CompareMode = TextCompare
'Split each line on "|" and then ":" to get header/value pairs
For I = 1 To UBound(vSrc, 1)
V = Split(vSrc(I, 1), "|")
For J = 0 To UBound(V)
W = Split(V(J), ":") 'W(0) will be header
If Not dHdrs.Exists(W(0)) Then _
dHdrs.Add W(0), W(0)
Next J
Next I
'Create results array
ReDim vRes(0 To UBound(vSrc, 1), 1 To dHdrs.Count)
'Populate Headers and determine column number for lookup when populating
'Could sort or order first if desired
J = 0
For Each V In dHdrs
J = J + 1
vRes(0, J) = V
dHdrs(V) = J 'column number
Next V
'Populate the data
For I = 1 To UBound(vSrc, 1)
V = Split(vSrc(I, 1), "|")
For J = 0 To UBound(V)
'W(0) is the header
'The dictionary will have the column number
'W(1) is the value
W = Split(V(J), ":")
vRes(I, dHdrs(W(0))) = W(1)
Next J
Next I
'Write the results
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
.EntireColumn.AutoFit
End With
End Sub
If you have not used macros before, to enter this Macro (Sub), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this Macro (Sub), opens the macro dialog box. Select the macro by name, and RUN.

EXCEL - Fuse text cells and split into different lines

I have a file that looks like this, containing a huge amount of data
>ENSMUSG00000020333|ENSMUST00000000145|Acsl6
AGCTCCAGGAGGGCCCGTCTCAGTCCGATGAACTTTGCAGCAATATTATAGTTATTCGTG
GTTCACAGAATTCCATTAAACATAAAGAAAAAACATAA
>ENSMUSG00000000001|ENSMUST00000000001|Gnai3
GAGGATGGCATAGTAAAAGCTATTACAGGGAGGAGTGTTGAGACCAGATGTCATCTACTG
CTCTGTAATCTAATGTTTAGGGCATATTGAAGTTGAGGTGCTGCCTTCCAGAACTTAAAC
the columns should be transformed so that lines always contain:
ENSMUSG*** ENSMUST*** GeneName Sequence (four separate columns)
the Sequence column should be the lines starting with either A,C,G,or T fused into one text cell, the number of cells to fuse varies from gene to gene.
does anyone have advice how to solve this?
thank you so much for your help!
best wishes
kk
Use the Text to Columns button on the Data tab. Choose Delimited , click Next, then select Other and in the box type the pipe symbol |. Then click Next and Finish.
I believe it is only those with Office 365 subscriptions that have the worksheet function CONCAT, which might be useful in this situation. So I would do this with a VBA macro.
First line -- split using the pipe | delimiter
Then concatenate the next lines until we get to one that does not start with "A", "C", "G", "T"
Store the results in a Collection object
Write the results back to the worksheet.
Since you have a large database, the "work" is done in VBA arrays as this will process much more rapidly.
It is assumed that your data is in Column A, starting in A1; and that your results will be written in columns B:E
If your database is clean, and formatted as you show, it should work OK. If it does not fall into the format you have presented, some error-checking might need to be added.
Option Explicit
Sub Organize()
Dim COL As Collection
Dim vSrc As Variant, vRes As Variant
Dim WS As Worksheet, rRes As Range
Dim V As Variant, W As Variant, S As String
Dim I As Long, J As Long
Set WS = ActiveSheet
With WS
Set rRes = .Cells(1, 2)
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
Set COL = New Collection
For J = 1 To UBound(vSrc, 1)
ReDim vRes(0 To 3)
W = Split(vSrc(J, 1), "|") 'First line
For I = 0 To 2
vRes(I) = W(I)
Next I
S = ""
'Concatenate subsequent lines
'Could look for the "<" but OP gave specifice starting letters
' So will use that
Do
Select Case Left(vSrc(J + 1, 1), 1)
Case "A", "C", "G", "T"
S = S & vSrc(J + 1, 1)
Case Else
Exit Do
End Select
J = J + 1
Loop Until J = UBound(vSrc, 1)
vRes(3) = S
COL.Add vRes
Next J
ReDim vRes(1 To COL.Count, 1 To 4)
I = 0
For Each W In COL
I = I + 1
For J = 1 To 4
vRes(I, J) = W(J - 1)
Next J
Next W
Set rRes = rRes.Resize(rowsize:=UBound(vRes, 1), columnsize:=UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.EntireColumn.AutoFit
End With
End Sub

Splitting specific information in one excel cell to several others

I need to find a way to split some data on excel: e.g.
If a cell has the following in: LWPO0001653/1654/1742/1876/241
All of the info after the / should be LWPO000... with that number.
Is there anyway of separating them out and adding in the LWPO000in? So they come out as LWPO0001653
LWPO0001654
etc etc
I could do manually yes, but i have thousands to do so would take a long time.
Appreciate your help!
Here is a solution using Excel Formulas.
With your original string in A1, and assuming the first seven characters are the one's that get repeated, then:
B1: =LEFT($A1,FIND("/",$A1)-1)
C1: =IF(LEN($A1)-LEN(SUBSTITUTE($A1,"/",""))< COLUMNS($A:A),"",LEFT($A1,7)&TRIM(MID(SUBSTITUTE(MID($A1,8,99),"/",REPT(" ",99)),(COLUMNS($A:A))*99,99)))
Select C1 and fill right as far as required. Then Fill down from Row 1
EDIT: For a VBA solution, try this code. It assumes the source data is in column A, and puts the results adjacent starting in Column B (easily changed if necessary). It works using arrays within VBA, as doing multiple worksheet read/writes can slow things down. It will handle different numbers of splits in the various cells, although could be shortened if we knew the number of splits was always the same.
Option Explicit
Sub SplitSlash()
Dim vSrc As Variant
Dim rRes As Range, vRes() As Variant
Dim sFirst7 As String
Dim V As Variant
Dim COL As Collection
Dim I As Long, J As Long
Dim lMaxColCount As Long
Set rRes = Range("B1") 'Set to A1 to overwrite
vSrc = Range("a1", Cells(Rows.Count, "A").End(xlUp))
'If only a single cell, vSrc won't be an array, so change it
If Not IsArray(vSrc) Then
ReDim vSrc(1 To 1, 1 To 1)
vSrc(1, 1) = Range("a1")
End If
'use collection since number of columns can vary
Set COL = New Collection
For I = 1 To UBound(vSrc)
sFirst7 = Left(vSrc(I, 1), 7)
V = Split(vSrc(I, 1), "/")
For J = 1 To UBound(V)
V(J) = sFirst7 & V(J)
Next J
lMaxColCount = IIf(lMaxColCount < UBound(V), UBound(V), lMaxColCount)
COL.Add V
Next I
'Results array
ReDim vRes(1 To COL.Count, 1 To lMaxColCount + 1)
For I = 1 To UBound(vRes, 1)
For J = 0 To UBound(COL(I))
vRes(I, J + 1) = COL(I)(J)
Next J
Next I
'Write results to sheet
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
.EntireColumn.AutoFit
End With
End Sub
I'm clearly missing the point :-) but anyway, in B1 and copied down to suit:
=SUBSTITUTE(A1,"/","/"&LEFT(A1,7))
Select ColumnB, Copy and Paste Special, Values over the top.
Apply Text to Columns to ColumnB, Delimited, with / as the delimiter.
There's a couple of ways to solve this. The quickest is probably:
Assuming that the data is in column A:
Highlight the column, go to Data>>Text To Columns
Choose "Delimited" and in the "Other" box, put /
Click ok. You'll have your data split into multiple cells
Insert a column at B and put in the formula =Left(A1, 7)
Insert a column at C and pit in formula =Right(A1, Length(A1)-7)
You'll now have Column B with your first 7 characters, and columns B,C,D,E,F, etc.. with the last little bit. You can concatenate the values back together for each column you have with =Concatenate(B1,C1), =Concatenate(B1,D1), etc..
A quick VBa, which does nearly the same thing that #Kevin's does as well. I wrote it before I saw his answer, and I hate to throw away work ;)
Sub breakUpCell()
Dim rngInput As Range, rngInputCell As Range
Dim intColumn As Integer
Dim arrInput() As String
Dim strStart As String
Dim strEnd As Variant
'Set the range for the list of values (Assuming Sheet1 and A1 is the start)
Set rngInput = Sheet1.Range("A1").Resize(Sheet1.Range("A1").End(xlDown).Row)
'Loop through each cell in the range
For Each rngInputCell In rngInput
'Split up the values after the first 7 characters using "/" as the delimiter
arrInput = Split(Right(rngInputCell.Value, Len(rngInputCell.Value) - 7), "/")
'grab the first 7 characters
strStart = Left(rngInputCell.Value, 7)
'We'll be writing out the values starting in column 2 (B)
intColumn = 2
'Loop through each split up value and assign to strEnd
For Each strEnd In arrInput
'Write the concatenated value out starting at column B in the same row as rngInputCell
Sheet1.Cells(rngInputCell.Row, intColumn).Value = strStart & strEnd
'Head to the next column (C, then D, then E, etc)
intColumn = intColumn + 1
Next strEnd
Next rngInputCell
End Sub
Here is how you can do it with a macro:
This is what is happening:
1) Set range to process
2) Loop through each cell in range and check it isn't blank
3) If the cell contains the slash character then split it and process
4) Skip the first record and concatenate "LWPO000" plus the current string to adjacent cells.
Sub CreateLWPO()
On Error Resume Next
Application.ScreenUpdating = False
Dim theRange
Dim cellValue
Dim offset As Integer
Dim fields
'set the range of cells to be processed here
Set theRange = range("A1:A50")
'loop through each cell and if not blank process
For Each c In theRange
offset = 0 'this will be used to offset each item found 1 cell to the right (change this number to this first column to be populated)
If c.Value <> "" Then
cellValue = c.Value
If InStr(cellValue, "/") > 0 Then
fields = Split(cellValue, "/")
For i = 1 To UBound(fields)
offset = offset + 1
cellValue = "LWPO000" & fields(i)
'if you need to pad the number of zeros based on length do this and comment the line above
'cellValue = "LWPO" & Right$(String(7, "0") & fields(i), 7)
c.offset(0, offset).Value = cellValue
Next i
End If
End If
Next
Application.ScreenUpdating = True
End Sub

Excel transpose formula

I've been wraping my head around it for some time and just don't know how to approach this problem. My table consists of groups of data which I want to transpose from rows to columns. Every row has an index number in first column and all of the rows in one group have the same index.
1 a
1 b
1 c
1 d
1 e
1 f
1 g
1 h
2 as
2 bs
2 cs
5 ma
5 mb
5 mc
5 md
and I want my final result to be:
1 a b c d e f g h
2 as bs cs
5 ma mb mc md
is it possible to do this with formulas or do I have to do it in VBA?
You can also do this using a macro. Here is one method.
To enter this Macro (Sub), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this Macro (Sub), alt-F8 opens the macro dialog box. Select the macro by name, and RUN.
Option Explicit
Sub ReArrange()
Dim vSrc As Variant, rSrc As Range
Dim vRes As Variant, rRes As Range
Dim I As Long, J As Long, K As Long
Dim lColsCount As Long
Dim Col As Collection
'Upper left cell of results
Set rRes = Range("D1")
'Assume Data in A1:Bn with no labels
Set rSrc = Range("a1", Cells(Rows.Count, "A").End(xlUp)).Resize(columnsize:=2)
'Ensure Data sorted by index number
rSrc.Sort key1:=rSrc.Columns(1), order1:=xlAscending, key2:=rSrc.Columns(2), order2:=xlAscending, MatchCase:=False, _
Header:=xlNo
'Read Source data into array for faster processing
' compared with going back and forth to worksheet
vSrc = rSrc
'Compute Number of rows = unique count of index numbers
'Collection object can only have one entry per key
' otherwise it produces an error, which we skip
Set Col = New Collection
On Error Resume Next
For I = 1 To UBound(vSrc)
Col.Add Item:=vSrc(I, 1), Key:=CStr(vSrc(I, 1))
Next I
On Error GoTo 0
'Compute Maximum Number of columns in results
' Since there is one entry per Index entry, maximum number of
' columns will be equal to the Index that has the most lines
' So we iterate through each Index and check that.
For I = 1 To Col.Count
J = WorksheetFunction.CountIf(rSrc.Columns(1), Col(I))
lColsCount = IIf(J > lColsCount, J, lColsCount)
Next I
'Set up Results array
' Need to add one to the columns to account for the column with the Index labels
ReDim vRes(1 To Col.Count, 1 To lColsCount + 1)
'Now populate the results array
K = 1
For I = 1 To Col.Count
vRes(I, 1) = vSrc(K, 1)
J = 2
Do
vRes(I, J) = vSrc(K, 2)
J = J + 1: K = K + 1
If K > UBound(vSrc) Then Exit Do
Loop Until vSrc(K, 1) <> vRes(I, 1)
Next I
'Set the results range to be the same size as our array
Set rRes = rRes.Resize(rowsize:=UBound(vRes, 1), columnsize:=UBound(vRes, 2))
'Clear the results range and then copy the results array to it
rRes.EntireColumn.Clear
rRes = vRes
'Format the width. Could also format other parameters
rRes.EntireColumn.ColumnWidth = 10
End Sub
Yes its possible. You would need the following functions:
IF
MATCH
ISNA
INDEX
Assume you have the data in sheet 1 in columns A and B:
C1:
place the value "1" in cell C1
C2:
=C1+1
drag down as much as needed
D1
=MATCH(C1,A:A, 0)
Drag down as much as cell C2
E1
=MATCH(C1,A:A, 1)
Drag down as much as cell C2
Sheet 2:
Now place the following formulas in cell A1 in sheet2:
=IF(ISNA(Sheet1!$D1), "", IF(Sheet1!$D1="", "", IF(COLUMN()-1+Sheet1!$D1 <=Sheet1!$E1, INDEX(Sheet1!$B:$B, COLUMN()-1+Sheet1!$D1), "")))
Drag / Copy it to as many cells as needed:
Result:
Also I have an article on my blog about the INDEX function. It might help Excel INDEX Function.
You can also download the complete file here.

how to CONCATENATE different cells based on same sku

I have a csv with 2 columns
sku,color
001,blue
001,red
001,pink
001,yellow
002,blue
002,red
002,pink
002,yellow
etc..
how can i create a new cell and combine the colors based on sku number? like this:
sku,combinedColors
001,"blue,red,pink,yellow"
002,"blue,red,pink,yellow"
thanks
There isn't a single formula that will do this and a macro is the best way.
But there is a way to do it with 2 formulas.
Open the CSV in Excel and you MUST sort column A ascending/descending.
Then in C2 , add this formula and drag it down
=IF(A2<>A1,B2,C1 & "," & B2)
in D2 ,add this formula and drag it down
=IF(A2<>A3,CONCATENATE(A2,",""",C2,""""),"")
Put the Autofilter on Row 1 and select NON BLANKS in column D.
You can then copy column D as you wanted!
You can do this with a VBA Macro. To enter this Macro (Sub), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
You may want to make some changes depending on the location of your source data and where you want the results written. The macro assumes your source data is in columns A:B with a header row, and that your results will be written in columns D:E
I also assumed, since this is tagged with Excel, that you have imported the csv data into excel and you want the results in two columns.
To use this Macro (Sub), alt-F8 opens the macro dialog box. Select the macro by name, and .
Option Explicit
Sub ConcatColorsBySKU()
Dim colSKU As Collection
Dim vSrc As Variant, vRes() As Variant
Dim I As Long, J As Long
Dim rRes As Range
vSrc = Range("A1", Cells(Rows.Count, "A").End(xlUp)).Resize(columnsize:=2)
Set rRes = Range("D1")
'Unique SKU's
Set colSKU = New Collection
On Error Resume Next
For I = 2 To UBound(vSrc)
colSKU.Add Item:=CStr(vSrc(I, 1)), Key:=CStr(vSrc(I, 1))
Next I
On Error GoTo 0
'Results Array
ReDim vRes(1 To colSKU.Count + 1, 1 To 2)
vRes(1, 1) = "SKU"
vRes(1, 2) = "Combined Colors"
For I = 1 To colSKU.Count
vRes(I + 1, 1) = colSKU(I)
For J = 2 To UBound(vSrc)
If vSrc(J, 1) = vRes(I + 1, 1) Then _
vRes(I + 1, 2) = vRes(I + 1, 2) & ", " & vSrc(J, 2)
Next J
vRes(I + 1, 2) = Mid(vRes(I + 1, 2), 2)
Next I
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
rRes.NumberFormat = "#"
rRes = vRes
rRes.EntireColumn.AutoFit
End Sub

Resources