Currently have an input file of the format shown below
The expected output should
How to achieve this thru Excel?
Here's a VBA solution to your problem. It can handle any number of columns and any number of rows. It would require modification if your columns aren't blank below the significant data shown in your sample.
Sub ColumnsToList()
' 148
Dim Arr As Variant ' data array
Dim Fun As Variant ' output array
Dim C As Long ' loop counter: columns
Dim R As Long ' loop counter: rows
Dim i As Long ' Fun index
With ActiveSheet.UsedRange
Arr = .Value
ReDim Fun(1 To .Cells.Count * 2)
End With
For C = 1 To UBound(Arr, 2)
For R = 2 To UBound(Arr)
If Len(Arr(R, C)) Then
i = i + 2
Fun(i) = Arr(R, C)
Fun(i - 1) = Arr(1, C)
End If
Next R
Next C
If i Then
ReDim Preserve Fun(1 To i)
With ActiveSheet
' this specifies the first empty column on the source sheet for output
' specify another cell like this:-
' Worksheets("Sheet1").cells(1, "A").Resize( ... continue as below
.Cells(1, .UsedRange.Columns.Count + 1).Resize(UBound(Fun)).Value = Application.Transpose(Fun)
End With
End If
End Sub
Edit 30 Dec 2020 In response to the claim that the above code returns #N/A errors from column 1750 onward I used the code below to create a set of data which I believe might be similar to the actual data you use.
Private Sub CreateData()
' 148
Dim C As Long
Dim R As Long
Dim L As Integer
Application.ScreenUpdating = False
For C = 1 To 5000
Cells(1, C).Value = "PIN " & C
For R = 2 To Int(4 * Rnd) + 2
With Cells(R, C)
.Value = .Address(0, 0)
End With
Next R
Next C
Application.ScreenUpdating = True
End Sub
I then ran my above procedure ColumnsToList on the data thus created. I was amazed at the speed with which more than 25000 rows were produced, instantly and without any errors.
If you have Office 365 then you can try below formula. Otherwise using VBA is better bet.
=IFERROR(FILTERXML("<t><d>"&TEXTJOIN("",TRUE,IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>",""),IF($B$2:$B$6<>"",$B$1&"</d><d>"&$B$2:$B$6&"</d><d>",""),IF($C$2:$C$6<>"",$B$1&"</d><d>"&$C$2:$C$6&"</d><d>",""),IF($D$2:$D$6<>"",$D$1&"</d><d>"&$D$2:$D$6&"</d><d>",""))&"</d></t>","//d["&ROWS($A$1:$A1)&"]"),"")
First portion of this formula is IF where we build a concatenated expression for each column which builds up data in PIN1</d><d>main1.txt</d><d> form:
IF($A$2:$A$6<>"",$A$1&"</d><d>"&$A$2:$A$6&"</d><d>","")
All 4 columns are joined together by TEXTJOIN formula to build a valid XML data. And then we extract using the FILTERXML formula. This may seem a little complicated to begin with but it is fairly straightforward once you read information in below link:
Excel - Extract substring(s) from string using FILTERXML
Here is a Power Query solution (available in Excel 2010+).
It should adapt to any changes in number of rows or columns.
It should also ignore blank entries in the table.
See the comments in the code for the algorithm, and explore the Applied Steps window to see what happens at each step.
To open the PQ editor, in later versions of Excel
select some cell in the data table
Data => Get & Transform => From Table/Range
Be sure to change the Table name in line 4 to match the real name in your workbook
You can then paste the code into the Advanced Editor accessible from the HOME / Query tab of the UI.
M Code
let
//Change table name in next line to match the REAL table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Unpivot all the columns to generate a two column table
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {}, "Attribute", "Value"),
//Sort by Attribute (column Header), then by Value (column data)
//May need to create a Custom Sort for the data if it does not sort readily into what you want
#"Sorted Rows" = Table.Sort(#"Unpivoted Columns",{{"Attribute", Order.Ascending}, {"Value", Order.Ascending}}),
//combine the two columns into an alternating List of Header/Data
zipList = List.Zip({Table.Column(#"Sorted Rows","Attribute"),
Table.Column(#"Sorted Rows","Value")}),
#"Converted to Table" = Table.FromList(zipList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
//Expand the list into new rows
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1")
in
#"Expanded Column1"
This is a very fiddly problem to do in Excel efficiently... but I have an inefficient solution for you.
If you know the range you're extracting from, this will provide you with the right answer:
=LET(
InputRange, Sheet1!A1:D7,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
InputRowCount, ROWS(InputRange)-1,
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(
RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
So, what is this doing? I'm using a LET function to make it a bit more readable.
The first steps are to take in the input range, and calculate how many rows and columns we have, and store those in variables (InputRowCount and InputColumnCount, respectively):
=LET(
InputRange, Sheet1!A1:D7,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
InputRowCount, ROWS(InputRange)-1,
The next step will be to design a number of ranges to help us. Excel now handles dynamic ranges, so, we can generate some ranges to help us pick out the right data
I use a lot of MOD functions. MOD is basically the remainder after division, and if you take the MOD of a default SEQUENCE, the result will be a list of numbers that increase from 0 to one less than the divisor, and then it will go back to 0 and rise again. So, if I were to take MOD(SEQUENCE(Rows * Columns,, 0), Rows), the resulting list of numbers will increase from 0 to Rows, Column times.
That's what these are for:
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0)
RowIndexSkip, MOD(RowSequence,2)
RowSequenceSkip, (RowSequence-RowIndexSkip)/2
RowIndex, MOD(RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount + 1)
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1
RowIndexSkip is basically a list of alternating 0's and 1's, that we're going to use below to track whether we're inserting a header or a data item.
And for the final options, bringing it all together:
UnfilteredResult,
IF(
RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
The above basically creates a listing of every data item, interleaved with the headers, as you requested. However… we still have to apply a filter, because your original data shows that there can be blank spaces.
If, on the other hand, you only know the columns, but you don't know how many rows, this should do the trick:
=LET(
InputRange, Sheet1!A:D,
InputColumnCount, COLUMNS(InputRange),
CheckIsBlank_FullRange, 1-ISBLANK(InputRange),
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
This time, you can see that I've slightly changed how it finds the final row.
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(COLUMNS(InputRange),,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(InputRange)),CheckIsBlank_ByRow>0))-1,
That MMULT isn't very efficient, but it will at least ensure that this formula won't be thrown off by blank spaces or columns with different amounts of data.
And finally, if you don't know how many columns OR rows there are, but only know the name of the sheet, this is how I would do it:
=LET(
InputSheet, "Sheet1",
InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),
Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),
LongRange, INDIRECT(InputSheet&"!A:"&LastColumnLetters),
CheckIsBlank_FullRange, 1-ISBLANK(LongRange),
CheckIsBlank_ByRow, MMULT(CheckIsBlank_FullRange,SEQUENCE(InputColumnCount,,1,0)),
InputRowCount, MAX(FILTER(SEQUENCE(ROWS(LongRange)),CheckIsBlank_ByRow>0)),
InputRange, INDIRECT(InputSheet&"!A1:"&LastColumnLetters&InputRowCount),
RowSequence, SEQUENCE(InputRowCount*InputColumnCount*2,,0),
RowIndexSkip, MOD(RowSequence,2),
RowSequenceSkip, (RowSequence-RowIndexSkip)/2,
RowIndex, MOD(RowSequenceSkip,InputRowCount)+2,
ColumnIndex, (RowSequenceSkip-MOD(RowSequenceSkip,InputRowCount))/InputRowCount+1,
UnfilteredResult,
IF(RowIndexSkip=0,
INDEX(InputRange,,ColumnIndex),
INDEX(InputRange,RowIndex,ColumnIndex)
),
FILTER(UnfilteredResult,INDEX(CheckIsBlank_FullRange,RowIndex,ColumnIndex))
)
This code is even slower and more confusing, because there's so much to be done, but it will check to see that it has every column. It does it with this:
InputColumnCount, XMATCH(FALSE,ISBLANK(INDIRECT(InputSheet&"!1:1")),0,-1),
This code searches the first row of the target worksheet, looking for the last column. It does this with the -1 command in the XMATCH, this tells excel to start looking from the right/bottom, and work up/left.
Then I've included some code for addressing the relevant columns:
Column1Counter, MOD(InputColumnCount-1,26)+1,
Column2Counter, (MOD((InputColumnCount-27-MOD(InputColumnCount-27,26))/26,26)+1)*(InputColumnCount>26),
Column3Counter, (((InputColumnCount-Column1Counter)/26)-Column2Counter)/26,
LastColumnLetters,
IF(Column3Counter=0,"",CHAR(Column3Counter+64))&
IF(Column2Counter=0,"",CHAR(Column2Counter+64))&
CHAR(Column1Counter+64),
You could use an INDEX to generate this, but it will actually slow down quite a bit, because Excel would be essentially loading all 1048576 * ColumnCount cells into memory (and this code already does this once... no need to do it twice).
(Note: This is my first question. Please let me know if I can improve how I ask or explain)
I have created a module that can look through a pivot table on any excel worksheet and apply conditional formatting to each column to create quartile performances.
The pivot table can be any size and have different columns at any point - so I can't explicitly reference a particular column name or caption etc
Additionally, the quartile reporting identifies the top 25% for a particular figure, the next 25, next 25 and bottom 25.
For some figures high is better; for others, low is better.
So I list all the values that are high-to-low in an array, then run a quick "if name in array, rank this way; otherwise, rank this way" function.
All of this works like a dream - until we come to certain fields that result in an ambiguous name issue (and error). It seems that some caption names are similar to the source or database names.
The code reads like this (below).
Any ideas how I can refer dynamically to the column name and identify the column beneath it, please?
Dim myColumnNames As Variant
myColumnNames = VBA.Array("CONTRACTGROSSVOLUME", _
"MigrationVolume") 'etc
' Set colour choices for quartiles
Dim myQuartile1 As Long
Dim myQuartile2 As Long
Dim myQuartile3 As Long
Dim myQuartile4 As Long
myQuartile1 = RGB(146, 208, 80) 'Top Quartile - Green
myQuartile2 = RGB(255, 255, 0) '2nd Quartile - yellow
myQuartile3 = RGB(255, 192, 0) '3rd Quartile - orange
myQuartile4 = RGB(255, 0, 0) 'Bottom Quartile - red
Dim myRankingFactor As Boolean
myRankingFactor = True 'true is low to high / false is high to low ranking
Dim myPivotTable As PivotTable
Dim myPivotTableName As String
For Each myPivotTable In ActiveSheet.PivotTables
myPivotTableName = myPivotTable.Name
Next
Dim myPivotField As PivotField
Dim myPivotSourceName As String
Set myPivotTable = ActiveSheet.PivotTables(myPivotTableName)
For Each myPivotField In myPivotTable.DataFields
'get the source name for the pivot field
myPivotSourceName = myPivotField.Name
'Check if column name is in our list to rank high-to-low
If Not IsError(Application.Match(myPivotSourceName, myColumnNames, False)) Then
'This column name is in our list of names that should be ranked high-to-low (ie. Higher is better)
myRankingFactor = True
Else
'This column name is not in our list and should be ranked low-to-high (ie. lower is better)
myRankingFactor = False
End If
The error then comes on the following line:
myPivotTable.PivotSelect (myPivotSourceName), xlDataOnly, True
I've tried refering to the column with .caption, .name etc - no avail.
Any dieas on what I need to do to dynamically get the column name, check if it's in the array of names, then refer to that entire column to apply my formatting, please?
Thanks
Additional info:
The value is passing (apparently) correctly.
The column name displayed is "My Volume" and the variable is displaying "My Volume" as the value.
It's source name is "MYVOLUME" (one word), which I've also tried referencing without success.
The error generated is:
Run-time error '1004': An item name is ambiguous. Another field in the
pivottable report may have an item with the same name. Use the syntax
field [item]. For example, if the item is Oranges and the field is
Product, you would use Product[Oranges].
As an addition, I just manually changed one of the column names to a unique, single word ("ABCDEFG") that is not present in the database, object or anywhere in the data output to see if it would be picked up.
Changing that alias/caption value worked fine and didn't error.
Summary:
It's behaving as if the column name is already used elsewhere in the pivot - but it is not.
How do I explicitly refer to the column name/caption/label, but on-the-fly? :)
FIXED!
I ensured that the pivot table column name was passed as a string:
myPivotSourceName = myPivotField.Name
Then rather than referencing the data field with the pivot field object, I referenced the DataRange with the string:
myPivotTable.PivotFields(myPivotSourceName).DataRange.Select
Works perfectly and is completely portable for any pivottable on any sheet with any fields
I could reproduce the error by having a data item with the same name as the one of the data fields. For example, if you have the following table from which you create a pivot:
Product Price
Cola 123
Fanta 456
Sum of Price 789
then by creating a pivot table, you will have these items: Cola, Fanta, 'Sum of Price', and the following field labels: 'Row labels', 'Sum of Price'.
If you try to use 'Sum of Price' in the PivotTable.PivotSelect function, then the error message in the question will appear.
I think the Name parameter in PivotSelect is not clear enough, I did not find the documentation of the naming convention used in it, so I recommend referring to the datafield explicitly:
myPivotTable.PivotFields("Sum of Price").DataRange.Select
Note: There are many stylistic errors a superfluous parts in your code, e.g. the parentheses in the line that causes the error are not required, the loops just select the last item.
myPivotTable.PivotFields(myPivotSourceName).DataRange.Select
This references the name of the column as a string. Works on all pivot tables I've tested it on.
I'm working on Social Survey project.Due to discrepancies in data I'm stuck at a certain place. The survey conducting volunteers were given tablets with unique IDs. On different dates, the tablets were used in different cities
Sheet 1 one contains a list of around thousands of responses for which city names are missing and Sheet 2 contains a list of tablets in use in different cities on different dates.
Sheet 1
City DeviceID StartDate EndDate
Delhi 25 21-08-2014 26-08-2014
Mumbai 39 14-05-2014 21-05-2014
Chennai 91 17-11-2014 21-11-2014
Bangalore 91 11-10-2014 21-10-2014
Delhi 91 26-05-2015 29-05-2015
Hyderabad 25 23-05-2015 28-05-2015
Sheet 2
S.Id DeviceId SurveyDate City
203 91 15-10-2014 ?
204 25 24-08-2014 ?
I need to somehow fill up the values for the city column in Sheet 2.
I tried using Vlookup but being a beginner to excel, was unable to get things working. I managed to format the string in date columns as date.
But am unsure about how to pursue this further.
From my understanding, Vlookup requires that the date ranges to be continuous, with no missing values in between. It is not so in this case. This is real world data and hence imperfect.
What would be the right approach to this problem ? Can this be done with excel macros ?
I also read up a bit about nested if statements but am confused being a beginner to excel formulas and data manipulation.
There is two ways to do what you want.
The first one is using vba and create a macro to do the job BUT you will have to iterate through all your data multiple time (n1*n2 loops in the worst case scenario where n1 and n2 is the number of rows in it's table respectively) which is really slow if you have a lot of data.
The other way is a little more complicated and includes array formulas but is really faster than vba because it uses the build in functions of excel (which are optimized already).
So I will use a much simpler example and you can use that as you wish on your data.
I have the following tables:
Table1
city ID start end
A 1 3 5
B 3 4 6
C 3 5 8
Table 2
ID point city
3 5 ?
So we want a formula that completes the second table. where ID match exactly and point is between start-end. We are going to use MATCH and INDEX to get it.
Here it is:
=INDEX(A$2:A$4;MATCH(1;(B$2:B$4=G2)*(C$2:C$4<=H2)*(D$2:D$4>=H2);0))
First of all to run this after you write it you should not press enter but instead ctrl+shift+enter to tell excel to run it as an array formula otherwise it will not run at all.
Now we got that out of the way let me explain what is going on here:
The MATCH does the following:
match the value 1 (TRUE) in the range I created and that should be an exact match. But how the range is created? Lets take that part for example:
This B$2:B$4=G2 -gives-> {1;3;3}=3 --> {FALSE;TRUE;TRUE}
Similarly the second thing in the MATCH gives: {TRUE;TRUE;FALSE}
So now we have (keep in mind that the * is similar to logical AND):
{FALSE;TRUE;TRUE}*{TRUE;TRUE;FALSE} --> {FALSE;TRUE;FALSE}
and this combined with the third gives {FALSE;TRUE;FALSE}
So now we have MATCH(1;{FALSE;TRUE;FALSE};0) --> 2 because in the range only the second row matches the 1 (first row that it matches).
So now we just use index to get from another range whatever is on row 2.
You can use the above on your own data to get the expected results.
Good luck!
If the deviceId values should match and the survey date should be between the start date and end date, VLookup won't suffice. The following pointers, however, should get you started:
1) Define the date ranges from which the date comparisons should be made.
2) Use an overlap date checking function to determine if the date in question overlaps the start and end dates.
3) Loop through the date ranges and insert in Sheet2 when a match is found, i.e. when the deviceId values match and the date overlaps.
The following function takes as parameters the date to be checked, the start and end date and returns True, if dateVal overlaps the start and end date:
Function dateOverlap(dateVal As String, startDate As String, endDate As String) As Boolean
If DateDiff("d", startDate, dateVal) >= 0 And DateDiff("d", endDate, dateVal) <= 0 Then _
dateOverlap = True
End Function
Example usage
Debug.Print dateOverlap("05-10-2016", "01-10-2016", "10-10-2016") (returns true).
Here we use MEDIAN() as an easy way to test for "in-between".
Sub FillInTheBlanks()
Dim s1 As Worksheet, s2 As Worksheet
Dim N1 As Long, N2 As Long, i As Long, j As Long
Dim rc As Long, DeId As Long, sDate As Date
Dim wf As WorksheetFunction
Set s1 = Sheets("Sheet1")
Set s2 = Sheets("Sheet2")
Set wf = Application.WorksheetFunction
rc = Rows.Count
N1 = s1.Cells(rc, "A").End(xlUp).Row
N2 = s2.Cells(rc, "A").End(xlUp).Row
For i = 2 To N2
DeId = s2.Cells(i, "B").Value
sDate = s2.Cells(i, "C").Value
For j = 2 To N1
If DeId = s1.Cells(j, 2).Value Then
If sDate = wf.Median(sDate, s1.Cells(j, "C").Value, s1.Cells(j, "D").Value) Then
s2.Cells(i, "D").Value = s1.Cells(j, "A").Value
End If
End If
Next j
Next i
End Sub
Sheet2:
starting from Sheet1: