Transposing Rows in Openrefine - pivot

I am using Openrefine (openrefine-2.6-rc.2) running on Windows and opening with Chrome browser (65.033225.181
I have data in text format (.txt) that I have imported into Openrefine for cleaning and processing. The data entries reside in rows under one column. I would like to "transpose" (pivot) the items in the rows so they appear in columns
Following is an example of the current state:
Column 1
Mary Smith
Company Name IBM
Location New York
John Davis
Company Name Lockheed-Martin
Location Los Angeles
Jane Segal
Company Name Microsoft
Location Boston
Ideally, by transposing the entries the result would look like this:
Last Name First Name Company Name Location
Smith Mary IBM New York
Davis John Lockheed Los Angeles
Segal Jane Microsoft Boston
Just not sure how to do this in Openrefine

When creating your Open Refine project, make sure that empty rows are not imported.
You can delete them later, but it's a little more complicated (see screencast).
Then, just :
1° Apply the function Transpose -> Transpose cells in rows into columns, with a value of 3.
2° Delete the words "Company Name" and "Location" using a Transform with formulas like value.replace('Company Name', '').trim() and value.replace('Location', '').trim()
3° Rename the columns.
Here is a visual tutorial.

Related

PowerBI multiple values in same column displayed as table format with no data

I'm trying to create a PowerBi dashboard from SharePoint list. The problem is one of the column contains "People or Group" where it intakes multiple names. This column is read as a table in PowerBi and when expanded, returns empty though there is data. Similarly, I have another column with "People or Group" but it only takes one name which works fine when expanded returning values.
Sample Data
ColA ColB ColC
1 John Doe John Doe, Tim Apple
2 Tim Apple Tim Apple, Steve Cook
3 Steve Cook Tim Apple
From above, ColB works fine for data extraction but ColC return empty without any data.
I've attached a PowerBI screenshot for reference .
I was able to figure this. I used the column FieldValuesAsText to convert all the lists and tables as regular text without any issues. This column converts multiple columns at the same time, thereby saving a lot of time too...

How to merge data of two excel sheets into the third sheet with some cleansing operations

I have a homework assignment where I have to merge data of two excel sheets by performing some cleansing operations using formulas.
Sheet 1:
OrderID | Full Name | Customer Status
1001 Waqar Hussain Silver
2002 Ali Moin Gold
Sheet 2:
OrderID | First Name | Last Name | Customer Status
A1003 Junaid Ali 2
A2004 Kamran Hussain 1
Sheet 3:(Combined Sheet) - Expected
OrderID | Full Name | Customer Status
1001 Waqar Hussain Silver
2002 Ali Moin Gold
1003 Junaid Ali Silver
2004 Kamran Hussain Gold
There are probably a lot of ways to do this. First make sure the data is cleaned. If you are already 100% positive the data is clean you can skip this step. If you aren't sure it's better to be safe than sorry. For each column create a new column using the CLEAN and TRIM functions to remove any non-printable characters and any extra spaces. Something similar to =TRIM(CLEAN(A2)). Then drag the formula for each cell.
After this in order to merge the data together we need something to join on. The full name seems to make the most sense. On sheet two we'll write a new function to join the first name and last name together. The =CONCAT formula should work.
=CONCAT(First Name, " " ,Last Name). Make sure to note the extra space added by the quote. That way it matches the Full Name from Sheet 1. Looks like we'll also need to strip out the letter from Order ID in sheet 2. I'm going to assume that all Order IDs are 5 characters long. If this isn't true then you'll need a different solution. You can use =RIGHT(A2,4). This will grab the right 4 characters from the text string.
At this point let's create a distinct list. Copy the Full Names from Sheet1 and Paste them on to sheet 3. Copy the Full Names we created on Sheet2 and Paste VALUES onto sheet 3 below the full names from sheet 1. Then select all the rows in the column and go to the Data tab. Click "Remove Duplicates". This will now generate a distinct list of values.
We can now merge the data together using an INDEX MATCH. There are lots of great tutorials on how to use INDEX match in combination. It's a little long to explain on this thread, but this is a great thread explaining how it works. It's worth taking 10 minutes to fully understand it because it is a formula you will use thousands of times throughout your life.
https://www.deskbright.com/excel/using-index-match/
Let me know if I can clarify anything.
Best,
Brett

Cleaning full names into first name, last name, etc columns

I have a CSV file that has a single column of full names that are in different formats. Some include suffixes and initials. There are thousands of records.
I want to break each record apart into separate columns for each part of the full name that exists. The final columns would be:
Title
First Name
Middle Name
Last Name
Suffix
Here is an example of what some of the different names look like:
John Smith
Doe, Jane, MBA
Mrs. Sarah Johnson
Steven P Little
Fredericks, J S, D.D.S.
S Morrison, Dr Oscar
Fred Jones, M.B.A.
T. H. Gallatin
Morris Jr, Gary B.
What is a good way to break those out into separate columns given there is no standard format to the full names?

How to split the data in single column to multiple column in excel?

I have a thousands of entries in excel like this
A
2.Amber Blackwell
3.5899 Township Road
4.19 Glenford, Ohio 43739
5.Phone:XXXXXXXX
6.
7.
8.Alaska Communication / Robert Muncy
9.600 TELEPHONE AVE
10.ANCHORAGE, ALASKA 99503-6010
11.Phone:XXXXXXX
12.
13.
14.RED74 IT Support
15.1 STRAUBE CENTER BLVD
16.PENNINGTON, NJ 08534-1467
17.Phone:XXXXXXX
18.
19.
20.Guru Adivalli
21.1220 E Oak Street
22.Louisville, KY 40204
23.Phone:XXXXXXX
HOW TO TRANSFORM THEM INTO NAME IN 1st COLUMN ADDRESS IN 2nd COLUMN city in 3rd and phone number in 4th?
and also how to remove the text "phone:" so I can only have numbers xxxxxx in 4th column?
Kindly answer separately for each questions
Name Address City/State/Zip Phone
Thanks,
Regards,
Ruban
You can copy the content and use Transpose option from Paste special. Once you have transferred the content the data will be transferred to each column as you have mentioned.
Now for your 2nd query, use Text - Column from data menu for separating text "Phone", once you have separated the column, you can keep the required column and delete the non required column

Need Help in Excel Pivot Table

I am working on Excel 2007 and I need help with creating a pivot table.
My excel sheet looks some what like this
Name Date Team Location
John 2011-05-01 Project NY
John 2010-10-12 Information NY
John 2010-02-04 Development CA
Sam 2011-05-01 Development CA
Sam 2010-01-01 Project NY
Sam 2008-01-01 Programmer NY
Brad 2011-04-03 Project NY
Brad 2009-01-01 Info NY
Brad 2007-01-01 Designer CA
Now, if I create a pivot table based on the data above, and put a filter on the "Date" to see who worked at where aka "Location" under what "Team", let's say between "2010-01-01 to 2011-12-31"
Then it will count "John" three times, "Sam" twice and "Brad" once. And total of 6 employeses working during "2010-01-01 to 2011-12-31"
Now I want to remove these duplicates so that if "John" is counted once, he won't be counted anymore, even if he switched to different "Team" or "Location" so I can count for the total number of employees during "2010-01-01 to 2011-12-31" without any duplicates.
I understand that if I want to edit the pivot table and create unique value to remove these duplicates, I need to add another column. But I need help creating this column.
Could anyone help me out here?
Thanks a lot guys!
Anyway, tell me if this would work for you.
1) Sort your spreadsheet by 'Name' first and by 'Date' second.
2) Add an extra column called 'Old Position'.
3) Go down the sorted list and for every name with duplicate rows that you encounter, leave the first occurance alone, but add an 'X' to the column 'Old Position' for all of the older duplicates.
Now you can filter by keeping rows that have their 'Old Position' column not equal to 'X'. This should give you just the most recent positions for all employees.
As long as there are not two distinct employees with the exact same name, I think this should work (otherwise try to use an employee id or somethings unique to each individual instead of their name).
Put "Date" in report filter, "Name" in row labels, set filter for "Location" as "NY" then "Location" can in placed in either report filter or row labels depending on how you want to see data.

Resources