How to eliminate duplicate values in a row - excel

I have an Excel spreadsheet that contains customer id number, and all email addresses we've collected for said customer. There is one row per customer, and close to 10,000 rows. In most cases, there are multiple occurrences of the same email address. I would like to consolidate the list of addresses per customer down to a unique list. I find many helpful articles on how to get unique values from data that occurs in separate rows, but this task needs to get unique values from separate columns.
I want to take this....
CustomerID | Email1 | Email2 | Email3 | Email4 |
--------------------------------------------------------------------------------
12345 | myemail#example.com | example#gmail.com | example#gmail.com | myemail#example.com
and turn it into this...
CustomerID | Email1 | Email2 | Email3 | Email4 |
--------------------------------------------------------------------------------
12345 | myemail#example.com | example#gmail.com | | |
Is there a way to do this in Excel???

As a one-off exercise, you could select all of your data, copy and paste it to a new sheet using the 'Transpose' option (to put the rows into columns). Use Excel's "Remove Duplicates" option on the Data ribbon to delete the duplicate rows. After the duplicates are removed, copy and paste with transpose back to your first sheet.

Related

Dual criteria data validation in Excel

Unlike the other questions posted with this topic, my criteria are not simple comparators. I want a dropdown list that includes all values in one named table excluding those values that meet another criteria. For instance a table includes employee names in one column and vacation dates in another column. I want the data validation to allow a list of employees who are not on vacation for a variable date drawn from another cell. The general method seems to be to create additional tables where the secondary criteria (in this case date) is the column header populated by items from the first list that satisfy some criteria. It seems impractical to create 365 tables named for each date and populated by rows of employees from the first table that have not requested that date off. Is there another way to accomplish this?
Sample Data:
| Employee| Vacation Dates | | work on 1/26/20 |
_____________________________ ___________________
| Bob | 1/26/20, 1/27/20| | <allow only |
| Mike | 2/20/20, 2/21/20| | Mike or Cindy> |
| Cindy | 2/20/20, 1/28/20|
Had to transpose my thinking. Rather than a table for each date, I can have a vacation table for each employee. The validation formula has to be a custom validation rather than a list, so no drop down selection list is available, but it will work. Error message also cannot discriminate which criteria is being violated -- name not on employee list versus name from employee list who is on vacation. Would be great if validation worked like conditional formatting with different rules applied in sequence.
| Employee| Bob | Mike | Cindy | | 1/26/20 |
____________________________________| ___________
| Bob |1/26/20| 2/20/20 |2/20/20| | |
| Mike |1/27/20| 2/21/20 |1/28/20| | |
| Cindy |
The validation formula for the "1/26/20" column (F in the scheme above) would be
=AND(COUNTIF($A$2:$A$4,F2)>0,COUNTIF(INDIRECT(ADDRESS(2,MATCH(F2,$B$1:$D$1,0)+1)):INDIRECT(ADDRESS(3,MATCH(G2,$B$1:$D$1,0)+1)),F1)<1)

Excel: how to group and then sort groups in a custom order?

I have a table of data, I want to group this data and then sort the groups of rows in a custom way.
Example:
I have a table of data like this:
key | group
-------------
BC.AA | BC
AA.AA | AA
CC.DE | CC
AA.CD | AA
And a list of groups like this
group | no. of items
-------------------
BC | 1
CC | 1
AA | 2
How do I create a new table where the rows of the first table are grouped and ordered in the same way the second table is ordered. So like this:
key | group
-------------
BC.AA | BC
CC.DE | CC
AA.CD | AA
AA.AA | AA
I like to do this with excel formulas, so it updates automatically when the original table is changed. I hope to avoid using macros, but I could write a custom excel worksheet formula.
You could add a column to your first table of =MATCH(B1, GroupSheet!A:A), which will just return the corresponding row in GroupSheet that matches your group column, and sort by that.
You can do this in Excel 2010 by selecting the data you want to sort, going to the Data tab, clicking the Sort icon and then choosing Custom List... under Order. This will be fine for small sorts, but you might need something more powerful for longer lists...

Recreating a non-straightforward Excel 'vlookup'

I'm looking for some thoughts on how you might recreate a 'vlookup' that I currently do in excel.
I have two tables: Data contains a list of datetime values; DateConverter; contains a list of calendar dates and their associated "network dates." Imagine for a business - not every day is a workday, so if I want to calculate differences in dates, I'm most interested in the number of work days that elapsed between my two dates.
Here is what the data might look like:
Data Table DateConverter Table
================= ===================
| Datetime | | Calendar date | Netowrk date |
| ------------- | | ------------- | ------------ |
| 6-1-15 8:00a | | 6-1-15 | 1000 |
| 6-2-15 1:00p | | 6-2-15 | 1001 |
| 6-3-15 7:00a | | 6-3-15 | 1002 |
| 6-10-15 3:00p | | 6-4-15 | 1003 |
| 6-15-15 1:00p | | 6-5-15 | 1004 |
| 6-12-15 2:00a | | 6-8-15 | 1005 | // Skips the weekend
| ... | | ... | ... |
In excel, I can easily map in the network date for each date in the Datetime field with a variant of vlookup:
// Assume that Datetime values are in Column A, Calendar date values in
// Column C, Network date values in Column D - this formula fills Column B
// Headers are in row 1 - first values are in row 2
B2=OFFSET($D$1,COUNTIFS($C:$C,"<"&A2),)
The formula counts the dates that are less than the lookup value (using countifs because the values in the search array are dates, and the search value is datetime) and returns the associate network date.
Is there a way to do this in Tableau? Will it require a calculated field or can I do this with some kind of join?
Thanks in advance for the help! Let me know if there is anything I can clarify. Thanks!
If the tables are on the same data server, you have the option to use joins, which is usually the most efficient way to combine information from different tables. If the tables are on different servers or platforms, then you can't use a single query to join them.
In either case, you can use Tableau data blending, which is sort of like a client-side join of aggregated results from multiple queries. Its a pretty useful technique, but a little more complex and restricted and also usually less efficient than a server side join.
So if you have the option to have both tables on the same server, start with that. It will be simpler and likely faster.
Note if you are going to use a date as a join key, you probably want to define it is a date and not a datetime.
#alex-blakemore's response would normally be adequate, but if you can change the schema, you could simply add the network date to the DataTable. The hourly granularity should not cause excessive growth and you don't need to navigate the joining.
Then, instead of counting rows and requiring a sorted table, simply subtract the Network date from each other and add 1.

Data model for inconsistent data on Cassandra

I am pretty new to NoSQL and Cassandra but I was told by my architecture committee to use this. I just want to understand how to convert the RDBMS model to noSQL.
I have a database where user needs to import data from an excel or csv file into the database. This file may have different columns each time.
For example in the excel file data might look something like this:
Name| AName| Industry| Interest | Pint |Start Date | End date
x | 111-121 | IT | 2 | 1/1/2011 | 1/2/2011
x | 111-122 | hotel | 1 | "" | ""
y| 111-1000 | IT | 2 | 1/1/2011 | 1/2/2011
After we upload this the next excel file might look
Name| AName| Industry| Interest | Pint |Start Date | isTrue | isNegative
x | 111-121 | IT | 2 | 1/1/2011 | 1/2/2011 | yes | no
x | 111-122 | hotel | 1 | "" | no | no
y| 111-1000 |health | 2 | 1/1/2010 | yes|""
I would not know in advance what columns I am going to create when importing data. I am totally confused with noSQL and unable to understand how handle this on how to import data when I don't know the table structure
Start with the basic fact that a column family (cassandra for "table") is made up of rows. Each row has a row key and some number of key/value pairs (called columns). For a particular column in a row the name of the column is the key for the pair and the value of the column is the value of the pair. Just because you have a column by some name in one row does not necessarily mean you'll have a column by that name in any other row.
Internally, row keys, column names and column values are stored as byte arrays and you'll need to use serializers to convert program data to the byte arrays and back again.
It's up to you as to how you define the row key, column name and column value.
One approach would be to have a row in the CF correspond to a row from Excel. You'd have to identify the one Excel column that will provide a unique id and store that in the row key. The remained of the Excel columns can get stored in cassandra columns, one-to-one. This lets you be very flexible on most column names, but you have to have a unique key value somewhere. The unique key requirement will always hold for any storage scheme you use.
There are other storage schemes, but they all boil down to you defining in the Excel what your row key is and how you break the Excel data into key/value pairs.
Check out some noSQL patterns and I highly suggest reading "Building on Quicksand" by Pat Helland
some good patterns(with or without using PlayOrm)...
http://buffalosw.com/wiki/Patterns-Page/

Excel: max() of count() with column grouping in a pivot table

I have a pivot table fed from a MySQL view. Each returned row is basically an instantiation of "a person, with a role, at a venue, on a date". The each cell then shows count of person (lets call it person_id).
When you pivot this in excel, you get a nice table of the form:
| Dates -->
--------------------------
Venue |
Role | -count of person-
This makes a lot of sense, and the end user likes this format BUT the requirement has changed to group the columns (date) into a week.
When you group them in the normal way, this count is then applied in columns as well. This is, of course, logical behaviour, but what I actually want is max() of the original count().
So the question: Does anyone know how to have cells count(), but the grouping perform a max()?
To illustrate this, imagine the columns for a week. Then imaging the max() grouped as a week, giving:
Old:
| M | T | W | T | F | S | S ||
--------------------------------------- .... for several weeks
Venue X |
Role Y| 1 | 1 | 2 | 1 | 2 | 3 | 1 ||
New (grouped by week)
| Week 1 | ...
---------------------------
Venue X |
Role Y| 3 | ...
I'm not on my pc, but the steps below should be broadly correct:
You should be able to right click on the date field on pivot table and select group.
Then highlight week, you may have to select year also.
Lastly right click on the count data you already have and expand the summarise by, and select max.

Resources