Transposing Data in 1 column by custom delimiters - excel

I've converted a PDF to Excel; All the data is in one column like so:
Item1
Header1
String1
String2
Header2
String1
String2
Item 2
Header1
String1
String2
Header2
String1
String2
The headers are consistent throughout the items. Is there an easy way to transpose this data by custom delimiters (Header1, Header2)?
My goal is to transpose this data as such:
| Header1 | Header2 |
Item1 | String1&String2 | String1&String2 |
Item2 | String1&String2 | String1&String2 |
I'm assuming the most straight-forward way would be writing a custom sorting macro that can sort by the Headers as delimiters
Can anyone give me a jump start?

Manually place the headers in Sheet2, cells B1 and C1 then go back to Sheet1 and run this:
Sub cropier()
Dim N As Long, i As Long, j As Long, K As Long
Dim s1 As Worksheet, s2 As Worksheet
Set s1 = Sheets("Sheet1")
Set s2 = Sheets("Sheet2")
N = Cells(Rows.Count, "A").End(xlUp).Row
K = 2
s = ""
For i = 1 To N
v = s1.Cells(i, "A").Value
j = i Mod 7
Cells(i, "B") = j
Select Case j
Case 1
s2.Cells(K, 1) = v
Case 3
t = v
Case 4
t = t & v
s2.Cells(K, 2) = t
s2.Cells(K, 3) = t
t = ""
K = K + 1
End Select
Next i
End Sub

An alternative approach not using VBA, which is more general than the 2 rows, 2 columns and 2 concatenated strings output shown in your example and which selects the column headings as well as the row headings from the input data...
I shall assume that your data has a structure as follows:
the "body" of the required output table has I rows and J columns
each "cell" of the output table is formed from a concatention of K "data strings"
Your input data list will comprise I item labels, IJ header labels (since there are J header labels per item) and IJK data strings, so I + IJ + IJK =
I(1+J(1+K)) values in total. For convenience number the rows of the input data list sequentially, starting at 1.
The approach is based on picking out the positions within the input data list of the i'th item label, j'th header label and the data strings which comprise the (i,j)th cell. Please note i and j are distinct from I and J.
I will deal with the header labels first as that is possibly the easiest to understand.
Suppose the first header label within an item is at position p in
the input data list. The next header label will occur at 1+K
positions further down the list ie at position p+1+K. If J>2 (ie
more than two header labels in the input data) then the third header
label will occur at p+2(1+K). In general, the header labels will
occur at
p+n(1+K) for n=0,1,...,J-1
I will return to the value of p shortly.
Consider now the item labels. Just as the header labels within an
item occurred every 1+K positions, the item labels occur every
1+J+JK = 1+J(1+K)) positions. Since the first position in the data
list is an item label, the positions of the item labels can be
written generally as
1 + n(1+J(1+K)) n=0,1,...,I-1
Returning to the value of p in 1., the first header label within an
item occurs at the position following an item label. Item labels are
listed in 2. which it is convenient to rewrite as 1 +
(i-1)(1+J(1+K)) i=1,2,...,I. Adding 1 to this expression and
substituting it for p in 1. yields an expression for the positions of
the IJ header labels in the input list as 2+(i-1)(1+J(1+K))+n(1+K) i=1,2,...,I n=0,...,J-1. This expression can be slightly rewritten as 2+(i-1)(1+J(1+K)) + (j-1)(1+K) i=1,2,...,I; j=1,2...,J.
The final expression in 3. provides much that is needed for the
positions of the data strings. These occur in the K positions after
the header labels. So the first data string is obtained by adding 1
to this expression, the second by adding 2,...,and the last by
adding K.
This now gives all the positional information to construct the output table.
The row labels (items) are given by the rewritten version of the expression in 2. ie 1 + (i-1)(1+J(1+K)) i=1,2,...,I. These label positions are a function of i.
The column labels (headers) are given by the final expression in 3. and since these labels are repeated for each item, any value of i can be chosen. Taking i as 1 simplifies the formula to 2+(j-1)(1+K) j=1,2...,J. These label positions are a function of j.
The (i,j)th cell of the output table is a concatentation of the strings starting at position 3+(i-1)(1+J(1+K)) + (j-1)(1+K) and ending at position (2+K)+(i-1)(1+J(1+K)) + (j-1)(1+K) (where i is between 1 and I and j is between 1 and J). The positions of the data strings are a function of i and j.
This positional information now allows construction of an output table with I rows and J columns in its body and including row and column header labels. The table can be constructed in the usual way from an I*1 column containing 1,...,I and a 1*J row containing 1,...,J. (Typically this column and this row are positioned to the left of and above the output, respectively, but could be placed anywhere in the workbook.)
UPDATE Worked example based on 3 items, 4 headers and 5 strings per output cell is here

Related

Loop n->x, scan if col f exists, write col a. Always application or object error

My problem is so simple I may be confusing with my rank ignorance. My apologies. I specify as args column to read, row to start, row to stop, pad length, output data per line, column to check if empty. I wish to iterate thru rowstart to row stop and build string from values read... Furthermore it appends quotes to text leaving ints alone. It's an error code table for ESP32 projects.
Cells(intRow, col);
Worksheets(Application.activesheet.name).range(cells(introw,col),cells(introw,col);
Worksheets("errorCodes").Cells(introw,col);
range(cells(introw,col),cells(introw,col)).value;
range(introw, col).value.... more..
getcarray(12,400,800,40,6,12) in immediate window:
Public Function getCArray(col, row, rowend, pad, maxPerRow, checkCol)
...Do...
checkCell = Worksheets("errorCodes").Cells(intRow, col).Value
cccheck = Worksheets("errorCodes").Cells(intRow, checkCol).Value
If (Len(cccheck) > 2)
...
I want a return of cells col from row to rowend where checkcol cell len > 2
returned between "," and a padding of spaces of pad long.

How can I lookup data from one column, when the value I'm referencing changes columns?

I want to do an INDEX-MATCH-like lookup between two documents, except my MATCH's index array doesn't stay in one column.
In Vague-English: I want a value from a known column that matches another value that may be found in any column.
Refer to the image below. Let's call everything to the left of the bold vertical line on column H doc1, and the right side will be doc2.
Doc2 has a column "Find This", which will be the INDEX's array. It is compared with "ID1" from doc1 (Note that the values in "Find This" will not be in the same order as column ID1, but it's easier to undertsand this way).
The "[Result]" column in doc2 will be the value from doc1's "Want This" column from the row that matches "FIND THIS" ...However, sometimes the value from "FIND THIS" is not in the "ID1" column, and is instead in "ID2","ID3", etc.
So, I'm trying to generate Col K from Col J. This would be like pressing Ctrl+F and searching for a value in Col J, then taking the value from Col D in that row and copying it to Col K.
I made identical values from a column the same color in the other doc to make it easier to visualize where they are coming from.
Note also that in column F of doc1, the same value from doc2's "Find This" can be found after some other text.
Also note that the column headers are only there as examples, the ID columns are not actually numbered.
I would simply hard-code the correct column to search from, but I'm not in control of doc1, and I'm worried that future versions may have new "ID" columns, with other's being removed.
I'd prefer this to be a solution in the form of a formula, but VB will do.
To generate column K based on given values of column J then you could use the following:
=INDEX(doc1!$D$2:$D$14,SUMPRODUCT((doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14))-1)
Copy that formula down as far as you need to go.
It basically only returns the row of the where a matching column J is found. we then find that row in the index of your D range to get your value in K.
Proof of concept:
UPDATE:
If you are working with non unique entities n column J. That is the value on its own can be found in multiple rows and columns. Consider using the following to return the Last row where there J value is found:
=INDEX(doc1!$D$2:$D$14,AGGREGATE(14,6,(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14),1)-1)
UPDATE 2:
And to return the first row where what you are looking in column J is found use:
=INDEX($D$2:$D$14,AGGREGATE(15,6,1/($B$2:$H$14=J2)*ROW($B$2:$H$14)-1,1))
Thanks to Scott Craner for the hint on the minimum formula.
To determine if you have UNIQUE data from column J in your range B2:H14 you can enter this array formula. In order to enter an array formula you need to press CTRL+SHFT+ENTER at the same time and not just ENTER. You will know you have done it right when you see {} around your formula in the formula bar. You cannot at the {} manually.
=IF(MAX(COUNTIF($B$2:$H$14,J2:J14))>1,"DUPLICATES","UNIQUE")
UPDATE 3
AGGREGATE - A relatively new function to me but goes back to Excel 2010. Aggregate is 19 functions rolled into 1. It would be nice if they all worked the same way but they do not. I think it is functions numbered 14 and up that will perform the same way an array formula or a CSE formula if you prefer. The nice thing is you do not need to use CSE when entering or editing them. SUMPRODUCT is another example of a regular formula that performs array formula calculations.
The meat of this explanation I believe is what is happening inside of the AGGREGATE brackets. If you click on the link you will get an idea of what the first two arguments are. The first defines which function you are using, and the second tell AGGREGATE how to deal with Errors, hidden rows, and some other nested functions. That is the relatively easy part. What I believe you want to know is what is happening with this:
(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14)
For illustrative purpose lets reduce this formula to something a little smaller in scale that does the same thing. I'll avoid starting in A1 as that can make life a little easier when counting since it the 1st row and first column. So by placing the example range outside of it you can see some more special considerations potentially.
What I want to know is what row each of the items list in Column C occurs in column B
| B | C
3 | DOG | PLATYPUS
4 | CAT | DOG
5 | PLATYPUS |
The full formula for our mini example would be:
{=($B$3:$B$5=C2)*ROW($B$3:$B$5)}
And we are going to look at the following as an array
=INDEX($B$3:$B$5,AGGREGATE(14,6,($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
So the first brackets is going to be a Boolean array as you noted. Every cell that is TRUE will TRUE until its forced into a math calculation. When that happens, True becomes 1 and False becomes 0.I that formula was entered as a CSE formula and place in D2, it would break down as follows:
FALSE X 3
FALSE X 4
TRUE X 5
The 3, 4 and 5 come from ROW() returning the value of the row number that it is dealing with at the time of the array math operation. Little trick, we could have had ROW(1:3). Just need to make sure the size of the array matches! This is not matrix math is just straight across multiplication. And since the Boolean is now experiencing a math operation we are now looking at:
0 X 3 = 0
0 X 4 = 0
1 X 5 = 5
So the array of {0,0,5} gets handed back to the aggregate for more processing. The important thing to note here is that it contains ONLY 0 and the individual row numbers where we had a match. So with the first aggregate formula, formula 14 was chosen which is the LARGE function. And we also told it to ignore errors, which in this particular case does not matter. So after providing the array to the aggregate function, there was a ,1) to finish off the aggregate function. The 1 tells the aggregate function that we want the 1st larges number when the array is sorted from smallest to largest. If that number was 2 it would be the 2nd largest number and so on. So the last row or the only row that something is found on is returned. So in our small example it would be 5.
But wait that 5 was buried inside another function called Index. and in our small example that INDEX formula would be:
=INDEX($B$3:$B$5,AGGREGATE(...)-2)
Well we know that the range is only 3 rows long, so asking for the 5th row, would have excel smacking you up side the head with an error because your index number is out of range. So in comes the header row correction of -1 in the original formula or -2 for the small example and what we really see for the small example is:
=INDEX($B$3:$B$5,5-2)
=INDEX($B$3:$B$5,3)
and here is a weird bit of info, That last one does not become PLATYPUS...it becomes the cell reference to =B5 which pulls PLATYPUS. But that little nuance is a story for another time.
Now in the comments Scott essentially told me to invert for the error to get the first row. And this is important step for the aggregate and it had me running in circles for awhile. So the full equation for the first row option in our mini example is
=INDEX($B$3:$B$5,AGGREGATE(15,6,1/($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
And what Scott Craner was actually suggesting which Skips one math step is:
=INDEX($B$3:$B$5,AGGREGATE(15,6,ROW($B$3:$B$5)/($B$3:$B$5=C2),1)-2)
However since I only realized this after writing this all up the explanation will continue with the first of these two equations
So the important thing to note here is the change from function 14 to function 15 which is SMALL. Think of it a finding the minimum. And this time that 6 plays a huge factor along with the 1/. So our array in the middle this time equates to:
1/FALSE X 3
1/FALSE X 4
1/TRUE X 5
Which then becomes:
1/0 X 3
1/0 X 4
1/1 X 5
Which then has excel slapping you up side the head again because you are trying to divide by 0:
#div/0 X 3
#div/0 X 4
1/1 X 5
But you were smart and you protected yourself from that slap upside the head when you told AGGREGATE to ignore error when you used 6 as the second argument/reference! Therefore what is above becomes:
{5}
Since we are performing a SMALL, and we passed ,1) as the closing part of the AGGREGATE, we have essentially said give me the minimum row number or the 1st smallest number of the resulting array when sorted in ascending order.
The rest plays out the same as it did for the LARGE AGGREGATE method. The pitfall I fell into originally is I did not use the 1/ to force an error. As a result, every time I tried getting the SMALL of the array I was getting 0 from all the false results.
SUMPRODUCT works in a very similar fashion, but only works when your result array in the middle only returns 1 non zero answer. The reason being is the last step of the SUMPRODUCT function is to all the individual elements of the resulting array. So if you only have 1 non zero, you get that non zero number. If you had two rows that matched for instance 12 and 31, then the SUMPRODUCT method would return 43 which is not any of the row numbers you wanted, where as aggregate large would have told you 31 and aggregate small would have told you 12.
Something like this maybe, starting in K2 and copied down:
=IFERROR(INDEX(D:D,MAX(IFERROR(MATCH(J2,B:B,0),-1),IFERROR(MATCH(J2,E:E,0),-1),IFERROR(MATCH(J2,G:G,0),-1),IFERROR(MATCH(J2,H:H,0),-1))),"")
If you want to keep the positions of the columns for the Match variable, consider creating generic range names for each column you want to check, like "Col1", "Col2", "Col3". Create a few more range names than you think you will need and reference them to =$B:$B, =$E:$E etc. Plug all range names into Match functions inside the Max() statement as above.
When columns are added or removed from the table, adjust the range name definitions to the columns you want to check.
For example, if you set up the formula with five Matches inside the Max(), and the table changes so you only want to check three columns, point three of the range names to the same column. The Max() will only return one result and one lookup, even if the same column is matched several times.
I came up with a vba solution if I understood correctly:
Sub DisplayActiveRange()
Dim sheetToSearch As Worksheet
Set sheetToSearch = Sheet2
Dim sheetToOutput As Worksheet
Set sheetToOutput = Sheet1
Dim search As Range
Dim output As Range
Dim searchCol As String
searchCol = "J"
Dim outputCol As String
outputCol = "K"
Dim valueCol As String
valueCol = "D"
Dim r As Range
Dim currentRow As Integer
currentRow = 1
Dim maxRow As Integer
maxRow = sheetToOutput.UsedRange.Rows.Count
For currentRow = 1 To maxRow
Set search = Range("J" & currentRow)
For Each r In sheetToSearch.UsedRange
If r.Value <> "" Then
If r.Value = search.Value Then
Set output = sheetToOutput.Range(outputCol & currentRow)
output.Value = sheetToSearch.Range(valueCol & currentRow).Value
currentRow = currentRow + 1
Set search = sheetToOutput.Range(searchCol & currentRow)
End If
End If
Next
Next currentRow
End Sub
There might be better ways of doing it, but this will give you what you want. We assume headers in both "source" and "destination" sheets. You will need to adapt the "Const" declarations according to how your sheets are named. Press Control & G in Excel to bring up the VBA window and copy and paste this code into "This Workbook" under the "VBA Project" group, then select "Run" from the menu:
Option Explicit
Private Const sourceSheet = "Source"
Private Const destSheet = "Destination"
Public Sub FindColumns()
Dim rowCount As Long
Dim foundValue As String
Sheets(destSheet).Select
rowCount = 1 'Assume a header row
Do While Range("J" & rowCount + 1).value <> ""
rowCount = rowCount + 1
foundValue = FncFindText(Range("J" & rowCount).value)
Sheets(destSheet).Select
Range("K" & rowCount).value = foundValue
Loop
End Sub
Private Function FncFindText(value As String) As String
Dim rowLoop As Long
Dim colLoop As Integer
Dim found As Boolean
Dim pos As Long
Sheets(sourceSheet).Select
rowLoop = 1
colLoop = 0
Do While Range(alphaCon(colLoop + 1) & rowLoop + 1).value <> "" And found = False
rowLoop = rowLoop + 1
Do While Range(alphaCon(colLoop + 1) & rowLoop).value <> "" And found = False
colLoop = colLoop + 1
pos = InStr(Range(alphaCon(colLoop) & rowLoop).value, value)
If pos > 0 Then
FncFindText = Mid(Range(alphaCon(colLoop) & rowLoop).value, pos, Len(value))
found = True
End If
Loop
colLoop = 0
Loop
End Function
Private Function alphaCon(aNumber As Integer) As String
Dim letterArray As String
Dim iterations As Integer
letterArray = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
If aNumber <= 26 Then
alphaCon = (Mid$(letterArray, aNumber, 1))
Else
If aNumber Mod 26 = 0 Then
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations - 1, 1)) & (Mid$(letterArray, 26, 1))
Else
'we deliberately round down using 'Int' as anything with decimal places is not a full iteration.
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations, 1)) & (Mid$(letterArray, (aNumber - (26 * iterations)), 1))
End If
End If
End Function

Pass from multidimensional array to spreadsheet

I have a multidimensional array which of the form:
where each row represents a sheet and each column represents a type of variable.
What I want to do is export the data into a spreadsheet where each row gets stored in a sheet with its columns. (Maybe it could be easier separate the sim multidimensional array into five matrices and then export those together to a spreadsheet) but I don't know if it's the most efficient way.
Here is a image of the spreadsheet:
Below is my code, which basically reads the sheets from a Excel and make a simulation of 1,000 cases for each type of data:
sim={};
for k = 1 : 5
sheet = xlsread('CB.xlsx', k);
[m, n] = size(sheet)
for i = 1 : n
[f,x] = ecdf(sheet(:,[i]));
[f, dup] = unique(f);
x = x(dup);
randomValues = rand(1, 1000);
sim{k,i} = round(interp1(f,x, randomValues));
end
end
trying to use Mikhail_Sam'answer I got the follow problems:
`sheet = 1
for i = 1:5
for j = 1:5
xlswrite('filename.xls',sim(:,i,j),sheet,strcat(char(64+j),int2str(1)))
end
sheet = sheet+1;
end`
but it says "Index exceeds matrix dimensions", then I tried this code
sheet = 1
for i = 1:5
for j = 1:size(sheet)
xlswrite('filename2.xls',sim{i,j},sheet,'A1:E1000');
end
sheet = sheet+1;
end
This code write the file ith the 5 sheets but it is filled with the first 5 numbers of each row of the multidimensional array. Then I tried to apply num2cell and run your code but it gives "index exceeds matrix dimension" again
I don't know about 'the most efficient way', but I did it this way:
for example I create 3x3x3 matrix and try to write it to excel:
sheet = 1
for i = 1:3
for j = 1:3
xlswrite(filename,x(:,i,j),sheet,strcat(char(64+j),int2str(1)));
end
sheet = sheet+1;
end
And I got what you want - 3 sheets, and data in rows.
replace 3 at your dimensions. But I have one weak place - I don't remember numeric mechanism in multidimensional arrays - is i is a row in your example, or j... Anyway it's work just take a look at data and replace i - j - : if necessary.
Hope, it helps :)
Finally I figure out how pass from multidimensional array to each spreadsheet. I decided to arrange the data in the same multidimensional array to then pass the data to a spreadshe, using A{k}=transpose(vertcat(simulado{k,:})) and then apply a for loop to add the data to each sheet separately.
sheet = 1
for i = 1:5
xlswrite('filename.xls',A{1,i},sheet);
sheet = sheet+1;
end`

Count number of occurences of a string and relabel

I have a n x 1 cell that contains something like this:
chair
chair
chair
chair
table
table
table
table
bike
bike
bike
bike
pen
pen
pen
pen
chair
chair
chair
chair
table
table
etc.
I would like to rename these elements so they will reflect the number of occurrences up to that point. The output should look like this:
chair_1
chair_2
chair_3
chair_4
table_1
table_2
table_3
table_4
bike_1
bike_2
bike_3
bike_4
pen_1
pen_2
pen_3
pen_4
chair_5
chair_6
chair_7
chair_8
table_5
table_6
etc.
Please note that the dash (_) is necessary Could anyone help? Thank you.
Interesting problem! This is the procedure that I would try:
Use unique - the third output parameter in particular to assign each string in your cell array to a unique ID.
Initialize an empty array, then create a for loop that goes through each unique string - given by the first output of unique - and creates a numerical sequence from 1 up to as many times as we have encountered this string. Place this numerical sequence in the corresponding positions where we have found each string.
Use strcat to attach each element in the array created in Step #2 to each cell array element in your problem.
Step #1
Assuming that your cell array is defined as a bunch of strings stored in A, we would call unique this way:
[names, ~, ids] = unique(A, 'stable');
The 'stable' is important as the IDs that get assigned to each unique string are done without re-ordering the elements in alphabetical order, which is important to get the job done. names will store the unique names found in your array A while ids would contain unique IDs for each string that is encountered. For your example, this is what names and ids would be:
names =
'chair'
'table'
'bike'
'pen'
ids =
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
1
1
1
2
2
names is actually not needed in this algorithm. However, I have shown it here so you can see how unique works. Also, ids is very useful because it assigns a unique ID for each string that is encountered. As such, chair gets assigned the ID 1, followed by table getting assigned the ID of 2, etc. These IDs will be important because we will use these IDs to find the exact locations of where each unique string is located so that we can assign those linear numerical ranges that you desire. These locations will get stored in an array computed in the next step.
Step #2
Let's pre-allocate this array for efficiency. Let's call it loc. Then, your code would look something like this:
loc = zeros(numel(A), 1);
for idx = 1 : numel(names)
id = find(ids == idx);
loc(id) = 1 : numel(id);
end
As such, for each unique name we find, we look for every location in the ids array that matches this particular name found. find will help us find those locations in ids that match a particular name. Once we find these locations, we simply assign an increasing linear sequence from 1 up to as many names as we have found to these locations in loc. The output of loc in your example would be:
loc =
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
5
6
7
8
5
6
Notice that this corresponds with the numerical sequence (the right most part of each string) of your desired output.
Step #3
Now all we have to do is piece loc together with each string in our cell array. We would thus do it like so:
out = strcat(A, '_', num2str(loc));
What this does is that it takes each element in A, concatenates a _ character and then attaches the corresponding numbers to the end of each element in A. Because we want to output strings, you need to convert the numbers stored in loc into strings. To do this, you must use num2str to convert each number in loc into their corresponding string equivalents. Once you find these, you would concatenate each number in loc with each element in A (with the _ character of course). The output is stored in out, and we thus get:
out =
'chair_1'
'chair_2'
'chair_3'
'chair_4'
'table_1'
'table_2'
'table_3'
'table_4'
'bike_1'
'bike_2'
'bike_3'
'bike_4'
'pen_1'
'pen_2'
'pen_3'
'pen_4'
'chair_5'
'chair_6'
'chair_7'
'chair_8'
'table_5'
'table_6'
For your copying and pasting pleasure, this is the full code. Be advised that I've nulled out the first output of unique as we don't need it for your desired output:
[~, ~, ids] = unique(A, 'stable');
loc = zeros(numel(A), 1);
for idx = 1 : numel(names)
id = find(ids == idx);
loc(id) = 1 : numel(id);
end
out = strcat(A, '_', num2str(loc));
If you want an alternative to unique, you can work with a hash table, which in Matlab would entail to using the containers.Map object. You can then store the occurrences of each individual label and create the new labels on the go, like in the code below.
data={'table','table','chair','bike','bike','bike'};
map=containers.Map(data,zeros(numel(data),1)); % labels=keys, counts=values (zeroed)
new_data=data; % initialize matrix that will have outputs
for ii=1:numel(data)
map(data{ii}) = map(data{ii})+1; % increment counts of current labels
new_data{ii} = sprintf('%s_%d',data{ii},map(data{ii})); % format outputs
end
This is similar to rayryeng's answer but replaces the for loop by bsxfun. After the strings have been reduced to unique labels (line 1 of code below), bsxfun is applied to create a matrix of pairwise comparisons between all (possibly repeated) labels. Keeping only the lower "half" of that matrix and summing along rows gives how many times each label has previously appeared (line 2). Finally, this is appended to each original string (line 3).
Let your cell array of strings be denoted as c.
[~, ~, labels] = unique(c); %// transform each string into a unique label
s = sum(tril(bsxfun(#eq, labels, labels.')), 2); %'// accumulated occurrence number
result = strcat(c, '_', num2str(x)); %// build result
Alternatively, the second line could be replaced by the more memory-efficient
n = numel(labels);
M = cumsum(full(sparse(1:n, labels, 1)));
s = M((1:n).' + (labels-1)*n);
I'll give you a psuedocode, try it yourself, post the code if it doesn't work
Initiate a counter to 1
Iterate over the cell
If counter > 1 check with previous value if the string is same
then increment counter
else
No- reset counter to 1
end
sprintf the string value + counter into a new array
Hope this helps!

Excel VBA - Referring between ranges

Here's my problem:
I have two ranges, r_products and r_ptypes which are from two different sheets, but of same length i.e.
Set r_products = Worksheets("Products").Range("A2:A999")
Set r_ptypes = Worksheets("SKUs").Range("B2:B999")
I'm searching for something in r_products and I've to select the value at the same position in r_ptypes. The result of Find method is being stored in cellfound. Now, consider the following data:
Sheet: Products
A B C D
1 Product
2 S1
3 P1
4 P2
5 S2
6 S3
Sheet: SKUs
A B C D
1 SKU
2 S1-RP003
3 P1-BQ900
4 P2-HE300
5 S2-NB280
6 S3-JN934
Now, when I search for S1, cellfound.Row gives me value 2, which is, as I understand, 2nd row in the total worksheet, but is actually 1st row in the range(A2:A999).
When I use this cellfound.Row value to refer to r_ptypes.cells(cellfound.Row), It is taking it as an Index value and returns B3 (P1-BQ900) instead of what I want, i.e. B2 (S1-RP003).
My question is how'll I find out the index number in cellfound? If not possible, how can I use Row number to extract data from r_ptypes?
Dante's solution above works fine. Also, I managed to get the index value using built in excel function Match instead of using Find method of a range. Listing it here for reference.
indexval = Application.WorksheetFunction.Match("searchvalue", r_products, 0)
Using the above, I'm now able to refer the rows in r_ptypes
skuvalue = r_ptypes.Rows(indexval).Value
Because .Row always returns the absolute row number of a sheet, not the offset (i.e. index) in the range.
So, just do some minus job to deal with it.
For you example,
r_ptypes.Cells(cellfound.Row - r_ptypes.Cells(1).Row + 1)
or a little bit neat (?)
With r_ptypes
.Cells(cellfound.Row - .Cells(1).Row + 1)
End With
That is, get the row difference between cellfound and the first cell and + 1 because Excel counts cells from 1.

Resources