Is there a way to generate comma-delimited values in Excel (optimally using a PivotTable)? Consider the following data:
Object Color
foo Red
foo Blue
bar Red
bar Blue
bar Green
baz Yellow
I'd like to get a table like the following:
Object Count of Color Colors
foo 2 Red,Blue
bar 3 Red,Blue,Green
baz 1 Yellow
Is this possible in Excel? The data is coming from a SQL query, so I could write a UDF with a recursive CTE to calculate, but this was for a single ad-hoc query, and I wanted a quick-and-dirty way to get the denormalized data. In the end, it's probably taken longer to post this than to write the UDF, but...
Here's a much simpler answer, adapted from this superuser answer (HT to #yioann for pointing it out and #F106dart for the original):
Assuming the data is in columns A (Category) and B (Value):
Create a new column (C), and name it "Values". Use this formula, starting in cell C2 and copying all the way down: =IF(A2=A1, C1&","&B2, B2)
Create a second new column (D), and name it "Count". Use this formula, starting in cell D2, and copying all the way down: =IF(A2=A1, D1+1, 1)
Create a third new column (E), and name it "Last Line?". Use this fomula, starting in cell E2, and copying all of the way down: =A2<>A3
You can now hide column B (Value) and filter column E (Last Line?) for only the TRUE values.
In summary:
A B C D E
+--------- ----- ----------------------- ------------------- ----------
1| Category Value Values Count Last Line?
2| foo Red =IF(A2=A1,C1&","&B2,B2) =IF(A2=A1, D1+1, 1) =A2<>A3
3| foo Blue =IF(A3=A2,C2&","&B3,B3) =IF(A3=A2, D2+1, 1) =A3<>A2
etc.
Yes, you would be much better off using the tools of whatever RDBMS you're running (MS SQL, MySQL, etc.).
Such a pivot table is possible in Excel. But, only if you write a cumbersome VBA module -- which I don't recommend.
However, the task is simpler in MS Access -- which usually comes bundled with Excel. Microsoft makes it "easy" to link Access and Excel and to use the former to run queries on the latter.
So, given the spreadsheet cells as stated:
For best results, sort the table by Object and then by Color.
Make sure the spreadsheet is saved.
Open up MS Access.
Select File --> Open (CtrlO)
Under Files of type, select Microsoft Excel
Navigate to and choose your existing spreadsheet.
Choose the worksheet or named range that contains your table.
Give the linked table the name MyPivot.
Open the Visual Basic Editor... Tools --> Macro --> Visual Basic Editor (AltF11)
Insert a module and paste in this UDF:
'Concat returns a comma-seperated list of items
Public Function Concat (CategoryCol As String, _
ItemCol As String) As String
Static LastCategory As String
Static ItemList As String
If CategoryCol = LastCategory Then
ItemList = ItemList & ", " & ItemCol
Else
LastCategory = CategoryCol
ItemList = ItemCol
End If
Concat = ItemList
End Function
Save the project and close the VB editor
Under Queries, Create a new query in design view.
Switch to the SQL View.
Paste in this SQL:
SELECT
Object,
COUNT (Color) AS [Count of Color],
LAST (Concat (Object, Color)) AS [List 'O Colors]
FROM
MyPivot
GROUP BY
Object
Run the query (Press the red exclamation mark or just select the Datasheet View).
Voilà, done in 15 easy steps! ;)
Results:
Object Count of Color List 'O Colors
bar 3 Blue, Green, Red
baz 1 Yellow
foo 2 Blue, Red
An even easier way is to add the data to the data model when you create the pivot table and then use a "measure" (called "Colours") as follows:
=CONCATENATEX(Table1,[Color],", ")
Then add the "Colours" field to the values portion of the pivot.
Related
I have these 2 tables:
On column B i'm trying to get one of the Header Names of a feature that is not empty on Table B. I want it to be selected randomly. The order of the items in Table A can be different than the order of the items in Table B, I'll need some sort of INDEX MATCH here too.
Excel Version: Office 365
Attempted Formula: I tried to base my formula on this:
=INDEX(datarange,RANDBETWEEN(1,COLUMNS(datarange)),1)
but there are more things to consider, like header name if the index match of the same fruit isn't empty, so I know it is more complex.
Any help will be greatly appreciated.
Assuming you have Excel 365 and a volatile result is acceptable:
=LET(
Fruits, Table_B[Fruit],
Properties, Table_B[[Red]:[Green]],
PropertiesHeaders, Table_B[[#Headers],[Red]:[Green]],
ThisFruit, [#Fruits],
ThisProperties, FILTER(Properties, Fruits = ThisFruit),
ThisPropertiesFiltered, FILTER(PropertiesHeaders, ThisProperties <> 0),
ThisPropertiesCount, COUNTA(ThisPropertiesFiltered),
IndexRand, RANDBETWEEN(1,ThisPropertiesCount),
IFERROR(INDEX(ThisPropertiesFiltered,IndexRand),"-")
)
ThisProperties is the row in Table_B for your fruit. I left out the column for the fruit names.
ThisPropertiesFiltered is the names of the properties that the fruit has. I filtered the header names based on if the fruit row had a non-zero value or not.
IndexRand gets a random number between 1 and the number of available properties. Note, if there are zero available properties, ThisPropertiesFiltered returns #CALC! so ThisPropertiesCount will return 1. This is handled later on.
Last we use INDEX to get the random property name. IFERROR returns "-" if no properties were available.
Here are the tables:
Table_A:
Fruits
Result
Watermelon
Heavy
Melon
Green
Banana
Tropic
Peach
Red
Apple
Green
Table_B:
Fruit
Red
Yellow
Tropic
Heavy
Green
Apple
x
x
Banana
x
x
Peach
x
Melon
x
Watermelon
x
x
Since you have access to dynamic arrays you could try:
Formula in B2:
=LET(X,FILTER(E$1:I$1,INDEX(E$2:I$6,MATCH(A2,D$2:D$6,0),0)<>"","No Feature"),INDEX(X,RANDBETWEEN(1,COUNTA(X))))
Or without LET():
=#SORT(SORT(CHOOSE({1;2;3},E$1:I$1,FILTER(E$2:I$6,D$2:D$6=A2),RANDARRAY(1,5)),3,1,1),2,-1,1)
If you are working through actual tables this should spill down results under Random Feature automatically. However, if one does not use tables, you could nest the above in BYROW() if you are an 365-insider:
=BYROW(A2:A6,LAMBDA(r,LET(X,FILTER(E$1:I$1,INDEX(E$2:I$6,MATCH(r,D$2:D$6,0),0)<>"","No Feature"),INDEX(X,RANDBETWEEN(1,COUNTA(X))))))
This would not work with the 2nd option where we used '#' to parse only the topleft value of our array (implicit intersection).
The idea is that:
A combination of INDEX() & MATCH() will 'slice' the row of interest out of the lookup-table based on our input.
In the 2nd step we'd use FILTER() to only leave those headers where the elements from the herefor returned array are not empty. In the case all elements are empty, this function will return the value "No Feature" as a headsup for the users.
In our final step we combine INDEX() with RANDBETWEEN(). The latter will return a random integer between a LBound (1 in our case) and an Ubound which we based on the amount of returned elements.
I tried to visualize this below.
Very new to PQ, and I'm pretty sure it can do what I need in this situation, but I need help figuring out how to get there.
I have a timesheet report with 20 columns covering 50 rows that will need to be formatted to a word doc for uploading into a separate system. The original data in the cells range from 0 to any negative 2 digit number (ex: "-20"), but they need to be formatted to a seven-character set ending in ".00".
Examples:
0 will need to become "0000.00"
-4 will need to become "-004.00"
-25 will need to become "-025.00"
I think I should be able to use the text.insert function, but I'm not familiar enough with M Language to get it to do what I want it to do.
Any solutions/suggestions?
Here's my previous answer revisited...set up to use a function. You can just invoke the function once for each column you want to reformat. You'll just pass the name of the column you want to reformat to the function as you invoke the function each time.
Create a new blank query:
Open the new query in Advanced Editor and highlight everything in it:
Paste this over the highlighted text in the Advanced Editor:
let
FormatIt = (SourceColumn) =>
let
Base = Number.Round(SourceColumn,2)*.01,
Source = try Text.Start(Text.Range(
if Base < 7 then Text.From(Base) & "001" else
Text.From(Base),0,7),2) & Text.Range(Text.Range(
if Base < 7 then Text.From(Base) & "001" else
Text.From(Base),0,7),3,2) & "." & Text.End(Text.Range(
if Base < 7 then Text.From(Base) & "001" else
Text.From(Base),0,7),2)
otherwise "0000.00"
in
Source
in
FormatIt
...and click Done.
You'll see a new function has been created and listed in the Queries list on the left side of the screen.
Then go to your query with the columns you want to reformat (click on the name of your query that has the numbers you want to change in it, on the left side of the screen) and...
Click Invoke Custom Function
And fill out the pop-up like this:
- You can make up a different New column name than Custom.1.
- Function Query is the name of your query you are calling (the one you just created when you pasted the code)...for me, it's called Query1.
- Source Column is the column with the numbers you want to format.
...and click OK.
You can invoke this function once for each column. It will create a new formatted column for each.
You can use this formula = Text.PadStart(Text.From([Column1]),4,"0")&".00") in PQ to add new column that looks similar to your needs.
Here's an admittedly "busy" formula to do it:
= Table.AddColumn(#"Changed Type", "Custom", each Text.Start(Text.Range(if Number.Round([Column1],2)*.01 < 7 then Text.From(Number.Round([Column1],2)*.01) & "001" else Text.From(Number.Round([Column1],2)*.01),0,7),2) & Text.Range(Text.Range(if Number.Round([Column1],2)*.01 < 7 then Text.From(Number.Round([Column1],2)*.01) & "001" else Text.From(Number.Round([Column1],2)*.01),0,7),3,2) & "." & Text.End(Text.Range(if Number.Round([Column1],2)*.01 < 7 then Text.From(Number.Round([Column1],2)*.01) & "001" else Text.From(Number.Round([Column1],2)*.01),0,7),2))
It assumes your numbers that you want formatted are in Column1 to start. It creates a new column...Custom...with the formatted result.
To try it out, start with Column1 already populated and loaded into Power Query; then click the Add Column tab and then the Custom Column button, and populate the pop-up window like this:
...and click OK.
With more time, the repetitive parts could be made with variables to shorten this up a bit. This could also be turned into a function, given some time. But I don't have the time right now, so I figured I'd give you at least "something."
I'm trying to create a formula that takes in data formatted thusly:
Ex.
A1 Excel Data --> (Useless Data - 'example')
Then, Grab the data after the 'hyphen' -> MID(A1,FIND("-",A1)+1,1)
Finally, look at the first letter of the selected data and sort it into four Categories -> IF(MID(A1,FIND("-",A1)+1,1)="e""E","Example"
Notes:
Needs to search for both upper and lower-case lettering.
If none of the statements are 'true' it should default to category 'other'
Ideally a cool way to remove formatting such as '(', ' ', ', ' would be cool.
Function so Far:
=IF(MID(A1,FIND("-",A1)+1,1)="a""A","Amex",IF(MID(A1,FIND("-",A1)+1,1)="C""c","Citi Bank",IF(MID(A1,FIND("-",A1)+1,1)="W""w","Wells Fargo", "Other")))
This function will search for both upper and lower case, and will default to other.
=IF(ISNUMBER(SEARCH("a",MID(A1,FIND("-",A1)+1,1))),"Amex",IF(ISNUMBER(SEARCH("c",MID(A1,FIND("-",A1)+1,1))),"Citi Bank",IF(ISNUMBER(SEARCH("w",MID(A1,FIND("-",A1)+1,1))),"Wells Fargo", "Other")))
However, please strongly consider putting a list together that you can reference using an INDEX and MATCH. Consider the following in columns D and E:
D E
-------
a Amex
c Citi Bank
w Wells Fargo
Then just use this formula:
=IFERROR(INDEX(E:E,MATCH(LOWER(MID(A1,FIND("-",A1)+1,1)),D:D,0)),"Other")
If you do need to remove formatting you can look into the SUBSTITUTE formula
I have a really large database of tweets. Most of the tweets have multiple #hashtags and #mentions. I want all the #hashtags separated with a space in one column and all the #mentions in another column. I already know how to extract the first occurrence of a #hashtag and a #mention. But I don't know to get them all? Some of the tweets have as much as 8 #hashtags. Manually going through the tweets and copy/pasting the #hashtags and #mentions seem an impossible task for over 5,000 tweets.
Here is an example of what I want. I have Column A and I want a macro that would populate columns B and C. (I'm on Windows &, Excel 2010)
Column A
-----------
Dear #DavidStern, #spurs put a quality team on the floor and should have beat the #heat. Leave #Pop alone. #Spurs a classy organization.
Live broadcast from #Nacho_xtreme: "Papelucho Radio"http://mixlr.com nachoxtreme-radio … #mixlr #pop #dance
"Since You Left" by #EmilNow now playing on KGUP 106.5FM. Listen now on http://www.kgup1065.com #Pop #Rock
Family Night #battleofthegenerations Dad has the #Monkeys Mom has #DonnieOsman #michaelbuble for me #Dubstep for the boys#Pop for sissy
#McKinzeepowell #m0ore21 I love that the PNW and the Midwest are on the same page!! #Pop
I want Column B to look like This:
Column B
--------
#DavidStern #Pop #Spurs
#mixlr #pop #dance
#Pop #Rock
#battleofthegenerations #Monkeys #DonnieOsman #Dubstep #Pop
#pop
And Column C to look like this:
Column C:
----------
#spurs #heat
#Nacho_xtreme
#EmilNow
#michaelbuble
#McKinzeepowell #m0ore21
Consider using regular expressions.
You can use regular expressions within VBA by adding a reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Here is a good starting point, with a number of useful links.
Updated
After adding a reference to the Regular Expressions library, put the following function in a VBA module:
Public Function JoinMatches(text As String, start As String)
Dim re As New RegExp, matches As MatchCollection, match As match
re.pattern = start & "\w*"
re.Global = True
Set matches = re.Execute(text)
For Each match In matches
JoinMatches = JoinMatches & " " & match.Value
Next
JoinMatches = Mid(JoinMatches, 2)
End Function
Then, in cell B1 put the following formula (for the hashtags):
=JoinMatches(A1,"#")
In column C1 put the following formula:
=JoinMatches(A1,"#")
Now you can copy just the formulas all the way down.
you could convert text to columns using the other character #, then against for #s and then concatenate the rest of the text back together for column A, if you are not familiar with regular expressions see (#Zev-Spitz)
I have two excel files with the same structure: they both have 1 column with data. One has 800 records and the other has 805 records, but I am not sure which of the 5 in the 805 set are not in the 800 set. Can I find this out using Excel?
vlookup is your friend!
Position your column, one value per row, in column A of each spreadsheet.
in column B of the larger sheet, type
=VLOOKUP(A1,'[Book2.xlsb]SheetName'!$A:$A,1,FALSE)
Then copy the formula down as far as your column of data runs.
Where the result of the formula is FALSE, that data is not in the other worksheet.
It might seem like a hack, but I personally prefer copying the cells as text (or exporting as a CSV) into Winmerge or any other diff tool. Assuming the two sheets contain mostly identical data, Winmerge will show the differences immediately.
LibreOffice provides a Workbook Compare feature: Edit -> Compare Document
Easy way: Use a 3rd sheet to check.
Say you want to find differences between Sheet 1 and Sheet 2.
Go to Sheet 3, cell A1, enter
=IF(Sheet2!A1<>Sheet1!A1,"difference","").
Then select all cells
of sheet 3, fill down, fill right.
The cells that are different
between Sheet 1 and Sheet 2 will now say "difference" in Sheet 3.
You could adjust the formula to show the actual values that were different.
Excel has this built in if you have an excel version with the Inquire add-in.
This link from office webpage describes the process of enabling the add-in, if it isn't activated, and how to compare two compare two workbooks - among other things.
The comparison shows both structural differances as well as editorial and a lot of other changes if
http://office.microsoft.com/en-us/excel-help/what-you-can-do-with-spreadsheet-inquire-HA102835926.aspx
you should try this free online tool - www.cloudyexcel.com/compare-excel/
works good for most of the time, sometimes the results are a little off.
plus it also gives a good visual output
You can also download the results in excel format. (you need to signup for that)
COUNTIF works well for quick difference-checking. And it's easier to remember and simpler to work with than VLOOKUP.
=COUNTIF([Book1]Sheet1!$A:$A, A1)
will give you a column showing 1 if there's match and zero if there's no match (with the bonus of showing >1 for duplicates within the list itself).
If you have Microsoft Office Professional Plus 2013, you can use Microsoft Spreadsheet Compare to run a report on the differences between two workbooks.
Launch Spreadsheet Compare:
In Windows 7: On the Windows Start menu, under Office 2013 Tools, click Spreadsheet Compare.
In Windows 8: On the Start screen, click Spreadsheet Compare. If you do not see a Spreadsheet Compare tile, begin typing the words Spreadsheet Compare, and then select its tile.
Compare two Excel workbooks:
Click Home > Compare Files.
a. Click the blue folder icon next to the Compare box to browse to the location of the earlier version of your workbook. (In addition to files saved on your computer or on a network, you can enter a web address to a site where your workbooks are saved.)
b. Click the green folder icon next to the To box to browse to the location of the workbook that you want to compare to the earlier version, and then click OK. (TIP You can compare two files with the same name if they're saved in different folders.)
In the left pane, choose the options you want to see in the results of the workbook comparison by checking or unchecking the options, such as Formulas, Macros, or Cell Format. Or, just Select All.
Reference:
https://support.office.com/en-us/article/Basic-tasks-in-Spreadsheet-Compare-f2b20af8-a6d3-4780-8011-f15b3229f5d8
I think your best option is a freeware app called Compare IT! .... absolutely brilliant utility and dead easy to use. http://www.grigsoft.com/wincmp3.htm
Use the vlookup function.
Put both sets of data in the same excel file, on different sheets. Then, in the column next to the 805 row set (which I assume is on sheet2), enter
=if(isna(vlookup(A1, Sheet1!$A$1:$A$800, 1, false)), 0, 1)
The column will contain 0 for values that are not found in the other sheet, and 1 for values that are. You can sort the sheet to find all the missing values.
SO in fact that you are using excel means that you can use the
SpreadSheet Compare from Microsoft. It is available from Office 2013. Yes i know this question is older then 6 years. But who knows maybe someone need this information today.
The Notepad++ compare plugin works perfectly for this. Just save your sheets as .csv files and compare them in Notepad++. Notepad++ gives you a nice visual diff.
May be this replay is too late. But hope will help some one looking for a solution
What i did was, I saved both excel file as CSV file and did compare with Windiff.
ExcelDiff exports a HTML report in a Divided (Side-by-side) or Merged (Overlay) view highlighting the differences as well as the row and column.
I used Excel Compare. It is payware, but they do have a 15 day trial. It will report amended rows, added rows, and deleted rows. It will match based on the worksheet name (as an option):
http://www.formulasoft.com/excel-compare.html
Use conditional formatting to highlight the differences in excel.
Here's an example.
With just one column of data in each to compare a PivotTable may provide much more information. In the image below ColumnA is in Sheet1 (with a copy in Sheet2 for the sake of the image) and ColumnC in Sheet2. In each sheet a source flag has been added (Columns B and D in the image). The PT has been created with multiple consolidation ranges (Sheet1!$A$1:$B$15 and Sheet2!$C$1:$D$10):
The left hand numeric column shows what is present in Sheet1 (including q twice) and the right what in Sheet2 (again with duplicates – of c and d). d-l are in Sheet1 but not Sheet2 and w and z are in Sheet2 (excluding those there just for the image) but not Sheet1. Add display Show grand totals for columns and control totals would appear.
I found this command line utility that doesn't show the GUI output but gave me what I needed: https://github.com/na-ka-na/ExcelCompare
Sample output (taken from the project's readme file):
> excel_cmp xxx.xlsx yyy.xlsx
DIFF Cell at Sheet1!A1 => 'a' v/s 'aa'
EXTRA Cell in WB1 Sheet1!B1 => 'cc'
DIFF Cell at Sheet1!D4 => '4.0' v/s '14.0'
EXTRA Cell in WB2 Sheet1!J10 => 'j'
EXTRA Cell in WB1 Sheet1!K11 => 'k'
EXTRA Cell in WB1 Sheet2!A1 => 'abc'
EXTRA Cell in WB2 Sheet3!A1 => 'haha'
----------------- DIFF -------------------
Sheets: [Sheet1]
Rows: [1, 4]
Cols: [A, D]
----------------- EXTRA WB1 -------------------
Sheets: [Sheet1, Sheet2]
Rows: [1, 11]
Cols: [B, K, A]
----------------- EXTRA WB2 -------------------
Sheets: [Sheet1, Sheet3]
Rows: [10, 1]
Cols: [J, A]
-----------------------------------------
Excel files xxx.xlsx and yyy.xlsx differ
Tried to find a tool that will help to extract only the different sheets with the cell difference highlighted. Could not find any, so ended up writing one for myself. I hope this helps someone who is looking for similar solution. It takes care of left/right unique sheets, identical/different size sheets.
import pandas as pd
import xlsxwriter
import numpy as np
from openpyxl import load_workbook
# Get a complete list of sheets from both WorkBook
BOOK1 = "Book_1.xlsx"
BOOK2 = "Book_2.xlsx"
xlBook1 = pd.ExcelFile(BOOK1)
sheetsBook1 = xlBook1.sheet_names
xlBook2 = pd.ExcelFile(BOOK2)
sheetsBook2 = xlBook2.sheet_names
sheets = list(set(sheetsBook1 + sheetsBook2))
with pd.ExcelWriter('Difference.xlsx', engine='xlsxwriter', mode='w') as writer:
for sheet in sheets:
print (sheet)
book1 = None
book2 = None
book1Exists = True
book2Exists = True
try:
book1 = pd.read_excel(BOOK1,sheet_name=sheet,header=None,index_col=False).fillna(' ')
except ValueError as ve:
book1Exists = False
try:
book2 = pd.read_excel(BOOK2,sheet_name=sheet,header=None,index_col=False).fillna(' ')
except ValueError as ve:
book2Exists = False
# Case 1: Both sheet exist and they are identical size
if ( (( (book1Exists == True) and (book2Exists == True) )) and
( (len(book1) == len(book2)) and (len(book1.columns) == len(book2.columns)) )):
comparevalues = book1.values == book2.values
if False in comparevalues:
rows,cols = np.where(comparevalues==False)
for item in zip(rows,cols):
book1.iloc[item[0],item[1]] = ' {} --> {} '.format(book1.iloc[item[0], item[1]], book2.iloc[item[0],item[1]])
book1.to_excel(writer,sheet_name=sheet,index=False,header=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets[sheet]
# Add a format. Light red fill with dark red text.
format1 = workbook.add_format({'bg_color': '#FFC7CE',
'font_color': '#9C0006'})
# Apply a conditional format to the cell range.
worksheet.conditional_format('A1:AZ100',{'type': 'text',
'criteria': 'containing',
'value': '-->',
'format': format1})
# Case 2: Left unique case
elif (book1Exists == False):
book2.to_excel(writer,sheet_name=sheet+" B2U",index=False,header=False)
# Case 3: Right unique case
elif (book2Exists == False):
book1.to_excel(writer,sheet_name=sheet+" B1U",index=False,header=False)
# Case 4: Both exist but different size
elif (( (book1Exists == True) and (book2Exists == True) ) and
( (len(book1) != len(book2)) or (len(book1.columns) != len(book2.columns)) )):
if (book1.size > book2.size):
book1.to_excel(writer,sheet_name=sheet+" SD",index=False,header=False)
elif (book2.size > book1.size):
book2.to_excel(writer,sheet_name=sheet+" SD",index=False,header=False)
It is not clear from your question if you want to identify values not present in larger set or to check in the larger set if the value is present in the shorter one. Here a solution for both cases:
Values in Subset not in Set
=FILTER(B2:B11,ISNUMBER(MATCH(B2:B11,A2:A6,0)))
Check if value in Set is not in Subset
=IF(ISNUMBER(MATCH(B2:B11,A2:A6,0)), TRUE, FALSE)
excel overlay will put both spreadsheets on top of each other (overlay them) and highlight the differences.
http://download.cnet.com/Excel-Overlay/3000-2077_4-10963782.html?tag=mncol