Combine multiple rows into single row in Pandas Dataframe - python-3.x

I have got a child table here. Here is the sample data.
+----+------+----------+----------------+--------+---------+
| ID | Name | City | Email | Phone | Country |
+----+------+----------+----------------+--------+---------+
| 1 | Ted | Chicago | abc#gmail.com | 132321 | USA |
| 1 | Josh | Richmond | abc#gmail.com | 435324 | USA |
| 2 | John | Seattle | 123#gmail.com | 322421 | USA |
| 2 | John | Berkley | 4723#gmail.com | 322421 | USA |
| 2 | Mike | Seattle | 4723#gmail.com | 322421 | USA |
+----+------+----------+----------------+--------+---------+
The rows above need to be appended together. Only unique values are required.
+----+---------------+----------------------+----------------------------------+-------------------+---------+
| ID | Name | City | Email | Phone | Country |
+----+---------------+----------------------+----------------------------------+-------------------+---------+
| 1 | 'Ted','Josh' | 'Chicago','Richmond' | 'abc#gmail.com' | '132321','435324' | 'USA' |
| 2 | 'John','Mike' | 'Seattle','Berkley' | '123#gmail.com','4723#gmail.com' | '322421' | 'USA' |
+----+---------------+----------------------+----------------------------------+-------------------+---------+

Use if ordering is important GroupBy.agg with lambda function and remove duplicates by dictionary:
df1=df.groupby('ID').agg(lambda x: ','.join(dict.fromkeys(x.astype(str)).keys())).reset_index()
#another alternative, but slow if large data
#df = df.groupby('ID').agg(lambda x: ','.join(x.astype(str).unique())).reset_index()
print (df1)
ID Name City Email \
0 1 Ted,Josh Chicago,Richmond abc#gmail.com
1 2 John,Mike Seattle,Berkley 123#gmail.com,4723#gmail.com
Phone Country
0 132321,435324 USA
1 322421 USA
If ordering is not important use similar solution with removed duplicates by sets:
df2 = df.groupby('ID').agg(lambda x: ','.join(set(x.astype(str)))).reset_index()
print (df2)
ID Name City Email \
0 1 Josh,Ted Richmond,Chicago abc#gmail.com
1 2 John,Mike Berkley,Seattle 4723#gmail.com,123#gmail.com
Phone Country
0 435324,132321 USA
1 322421 USA

Related

Show text as value Power Pivot using DAX formula

Is there a way by using a DAX measure to create the column which contain text values instead of the numeric sum/count that it will automatically give?
In the example below the first name will appear as a value (in the first table) instead of their name as in the second.
Data table:
+----+------------+------------+---------------+-------+-------+
| id | first_name | last_name | currency | Sales | Stock |
+----+------------+------------+---------------+-------+-------+
| 1 | Giovanna | Christon | Peso | 10 | 12 |
| 2 | Roderich | MacMorland | Peso | 8 | 10 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 4 | 6 |
| 1 | Giovanna | Christon | Peso | 11 | 13 |
| 2 | Roderich | MacMorland | Peso | 9 | 11 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 5 | 7 |
| 1 | Giovanna | Christon | Peso | 15 | 17 |
| 2 | Roderich | MacMorland | Peso | 10 | 12 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 6 | 8 |
| 1 | Giovanna | Christon | Peso | 17 | 19 |
| 2 | Roderich | MacMorland | Peso | 11 | 13 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 7 | 9 |
+----+------------+------------+---------------+-------+-------+
No DAX needed. You should put the first_name field on Rows and not on Values. Select Tabular View for the Report Layout. Like this:
After some search I found 4 ways.
measure 1 (will return blank if values differ):
=IF(COUNTROWS(VALUES(Table1[first_name])) > 1, BLANK(), VALUES(Table1[first_name]))
measure 2 (will return blank if values differ):
=CALCULATE(
VALUES(Table1[first_name]),
FILTER(Table1,
COUNTROWS(VALUES(Table1[first_name]))=1))
measure 3 (will show every single text value), thanks # Rory:
=CONCATENATEX(Table1,[first_name]," ")
For very large dataset this concatenate seems to work better:
=CALCULATE(CONCATENATEX(VALUES(Table1[first_name]),Table1[first_name]," "))
Results:

How can I grab a value from another sheet based on an ID

Customer
| Cus ID | Cus Name |
| 1 | Bob |
| 2 | Sam |
| 3 | Tom |
Transaction
| TID | Cus ID | Cus Name |
| 1 | 1 | ??? |
| 2 | 2 | ??? |
| 3 | 1 | ??? |
| 4 | 3 | ??? |
Essentially my question is, how can I grab the Cus Name Using the Cus ID in the Transaction sheet?
This is a standard VLOOKUP. In cell C1
=VLOOKUP(B1,Customer!A:B,2,0)
Drop formula down to used range to fill column.

PowerPivot Grouped Average DAX

I'm trying to model some outbound calling data in PowerPivot. We have reps across multiple locations, and in general we breakdown our outbound calling into two periods of the day (before and after 12pm).
We can export data from our phone system a list of every call made for a day -- let's say an example is as follows:
+------------+-------------+-------+-----------+-------------+
| Date | Call Length | Agent | Workgroup | Call Period |
+------------+-------------+-------+-----------+-------------+
| 01.01.2016 | 00:05:26 | Sam | Sydney | 1 |
| 01.01.2016 | 00:15:05 | Sam | Sydney | 1 |
| 01.01.2016 | 00:55:22 | John | Sydney | 2 |
| 01.01.2016 | 00:45:11 | Sam | Sydney | 2 |
| 01.01.2016 | 00:04:52 | John | Sydney | 1 |
| 01.01.2016 | 00:01:52 | Timmy | London | 1 |
| 01.01.2016 | 00:02:21 | Timmy | London | 2 |
| 01.01.2016 | 00:05:21 | Karen | London | 1 |
| 02.01.2016 | 00:15:21 | Sam | Sydney | 1 |
| 02.01.2016 | 00:42:44 | Sam | Sydney | 2 |
| 02.01.2016 | 01:52:22 | John | Sydney | 1 |
| 02.01.2016 | 00:53:24 | John | Sydney | 1 |
| 02.01.2016 | 00:05:53 | Kerry | Sydney | 2 |
| 02.01.2016 | 00:43:43 | Sam | Sydney | 2 |
| 02.01.2016 | 01:08:00 | John | Sydney | 2 |
| 02.01.2016 | 00:13:52 | Timmy | London | 2 |
| 02.01.2016 | 00:25:44 | Timmy | London | 1 |
| 02.01.2016 | 02:58:31 | Karen | London | 1 |
| 02.01.2016 | 00:08:37 | Timmy | London | 2 |
| 02.01.2016 | 00:12:28 | Karen | London | 2 |
+------------+-------------+-------+-----------+-------------+
What I'm trying to calculate is the average daily time spent on phone per Workgroup, eg. on average how long is each agent on the phone at each location.
I'm guessing the arithmetic is as follows:
Measure 1: Total talk time for each Agent (eg. sum of all talk time for the day)
Measure 2: Average agent total talk time per workgroup (eg. sum of the above grouped by workgroup, divided by number of agents in that workgroup)
The output might look something like this (but doesn't have to be):
+------------+-----------+-----------------------+-----------------+-----------------------------+
| Date | Workgroup | Total Number of Calls | Total Talk Time | Average Talk Time per Agent |
+------------+-----------+-----------------------+-----------------+-----------------------------+
| 01.01.2016 | Sydney | 11 | 03:02:42 | 1:34:53 |
| | London | 4 | 02:24:51 | 01:13:41 |
| 02.01.2016 | Sydney | 5 | 01:52:05 | 00:56:51 |
| | London | 52 | 10:11:23 | 03:51:11 |
+------------+-----------+-----------------------+-----------------+-----------------------------+
Apologies if I'm unclear it what I'm asking.
Slicing your data on a pivot table will do the calculations.
you only need the following calculations:
DurationOfCall :=sum(MyTable[CallLength])
NrOfCalls :=countrows(MyTable)
AvgDuration :=DIVIDE([DurationOfCall],[NrOfCalls])
this will give the following result (on your sample dataset):
Workbook with testcase: attachment

How to pivot row data using Informatica?

How can I pivot row data using Informatica PowerCenter Designer? Say, I have a source file called address.txt:
+---------+--------------+-----------------+
| ADDR_ID | NAME | ADDRESS |
+---------+--------------+-----------------+
| 1 | John Smith | JohnsAddress1 |
| 1 | John Smith | JohnsAddress2 |
| 2 | Adrian Smith | AdriansAddress1 |
| 2 | Adrian Smith | AdriansAddress2 |
+---------+--------------+-----------------+
I would like to Pivot this data like this:
+---------+--------------+-----------------+-----------------+
| ADDR_ID | NAME | ADDRESS1 | ADDRESS2 |
+---------+--------------+-----------------+-----------------+
| 1 | John Smith | JohnsAddress1 | JohnsAddress2 |
| 2 | Adrian Smith | AdriansAddress1 | AdriansAddress2 |
+---------+--------------+-----------------+-----------------+
How can I do this in Informatica?
If every person has two addresses, you can use the FIRST and LAST functions in an Aggregator transformation:
!

Create columns from column values in Excel

I have a data in Excel:
+-----------------------------+--------------------+----------+
| Name | Category | Number |
+-----------------------------+--------------------+----------+
| Alex | Portret | 3 |
| Alex | Other | 2 |
| Serge | Animals | 1 |
| Serge | Portret | 4 |
+-----------------------------+--------------------+----------+
And I want to transform it to:
+-----------+-----------+-------+---------+
| Name | Portret | Other | Animals |
+-----------+-----------+-------+---------+
| Alex | 3 | 2 | 0 |
| Serge | 4 | 0 | 1 |
+-----------+-----------+-------+---------+
How can I do it in MS Excel ?
You can use a pivot table for that
Take a look at http://office.microsoft.com/en-gb/excel-help/pivottable-reports-101-HA001034632.aspx

Resources