Calculate Overlap in Excel Dataset - excel

I have an Excel file with ~500,000+ rows of data, each of which has (amongst other things) an ID and a certain value I'd like to manipulate. I'll use an example consisting of names and foods. The data looks something like this:
Name Food
Alex Melon
Alex Burger
Bruce Apple
Charlie Water
Alice Apple
Bruce Melon
Bruce Apple
Bruce Plum
I'd like to find the overlap in foods between any pair of names, giving me a result that would tell me (for example) that for the pairing of Bruce vs Alex, 2/3 of Bruce's data is unique and 1/3 is the same is Alex's list, whilst for Alex 1/2 his data in unique and 1/2 is the same as Bruce.
There is no consistency in the amount of foods a person can have listed alongside their name. And its entirely possible for some people to have foods not found alongside anyone else.
I thought to present it through something like this, where each percentage sign is the overlap for that pairing (read by row, so C2 would be the proportion of Alex's data also found in Alice's, whilst B3 would be the proportion of Alice's data also found in Alex's):
Alex Alice Bruce Charlie
Alex - % % %
Alice % - % %
Bruce % % - %
Charlie % % % -
I've been struggling to think and find a formulae or VBA script that would achieve this and calculate the overlap. I've considered creating (i) a helper column that concatenates the name and food, (ii) a new de-duplicated unique list of foods and maps this against the helper column. However, as far as I can tell, whilst that will help me summarise which foods go with which person, it won't help me find out the overlap between each person's list of foods.
Any help would be greatly appreciated!

This was fun!
You can use Power Query for this.
Highlight your data and insert a table, make sure you say the first row is columnm headers.
Go to the Table ribbon and change the table's name to "PersonTag"
Go to the Data ribbon and in the 'Get & Transform' section, click "From Table"
This opens up Power Query, with a new query called "PersonTag"
Highlight both columns, then on the "Home" ribbon, choose "Remove Rows - Remove Duplicates"
In the "Home" ribbon, click "Manage - Reference"
You've just created a new query that refers to the PersonTag query! Rename it to PersonCount.
Highlight the "Name" column, and in the "Transform" ribbon, click "Group By" and group by name, creating a new column called "PersonCount" that is the count of the rows.
Go back to editing the PersonTag query
Create a new query (or copy an existing one, it doesn't matter how), name it "PersonTagPersonTag", and then go to the "Home" ribbon, click "Advanced Editor" and replace whatever's there with the following.
let
Source = PersonTag,
// Recursion! Now each row contains it's own "PersonTag" table!
#"Added Custom" = Table.AddColumn(Source, "2nd", each PersonTag),
#"Expanded 2nd" = Table.ExpandTableColumn(#"Added Custom", "2nd", {"Name", "Food"}, {"2nd.Name", "2nd.Food"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded 2nd",{{"2nd.Name", type text}, {"2nd.Food", type text}}),
// We only want rows where the foods match and the names don't
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Name] <> [2nd.Name] and [Food] = [2nd.Food]))
in
#"Filtered Rows"
Now we're going to group by Name and 2nd.Name to get the row count of matches in a "PersonPersonCount" columns, bring in the PersonCount query we created earlier to get the total foods each Name, then create a PercentMatch column by dividing PersonPersonCount by PersonCount. Then we can get rid of the Count columns because we don't need them, and pivot by 2nd.Name! Create another query (I named mine "PersonvPerson").
let
Source = PersonTagPersonTag,
#"Grouped Rows" = Table.Group(Source, {"Name", "2nd.Name"}, {{"PersonPersonCount", each Table.RowCount(_), type number}}),
// Bring in PersonCount query
#"Merged Queries" = Table.NestedJoin(#"Grouped Rows",{"Name"},PersonCount,{"Name"},"PersonCount",JoinKind.LeftOuter),
// If you click the column type icon in the column title in the previous step, you get the dialog box you can fill out that does this step for you
#"Expanded PersonCount" = Table.ExpandTableColumn(#"Merged Queries", "PersonCount", {"PersonCount"}, {"PersonCount"}),
#"Added PercentMatch" = Table.AddColumn(#"Expanded PersonCount", "PercentMatch", each [PersonPersonCount] / [PersonCount]),
#"Changed Type" = Table.TransformColumnTypes(#"Added PercentMatch",{{"PercentMatch", Percentage.Type}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Name", "2nd.Name", "PercentMatch"}),
#"Sorted Rows" = Table.Sort(#"Removed Other Columns",{{"2nd.Name", Order.Ascending}}),
// I did this by highlighting the "2nd.Name" column, going to the "Transform" ribbon, and clicking "Pivot Column"
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[#"2nd.Name"]), "2nd.Name", "PercentMatch", List.Sum)
in
#"Pivoted Column"
Exit the Power Query window and keep your changes. By default, creating new queries automatically create new tabs on the worksheet that contain the data. Delete the tabs you don't want to keep.
If you're anything like I was a few months ago, your jaw is dropping at what you can do with power queries. I gave you the code because I was too lazy to tell you every little click to create the code, but don't be overwhelmed!!! I just clicked around to create each next step and it created the code for me! They made it easy to click around and try/undo things.

Related

A Frustrating Set of Variables in Excel

My industry (aftermarket auto components) utilizes a data standard for digital distribution, and I am currently attempting to create a living reference document, formatted with the correct information in the correct way, to make updating our standard database a less time consuming process.
My company has a 'Master Data Sheet' which contains every piece of data for all of the 20k+ products that we sell. All of our pricing and tracking sheets call cells or ranges from the Master Sheet, in addition to most of our front-facing web presence.
Here's my problem. The standard requires that our marketing descriptions be broken into separate lines with a specific identifier code and grouped by item ID:
Item ID Desc Code Desc
CHD001A AAA Brake Kit
CHD001A BAA Cross-drilled...
CHD001A BAA All of our...
CAE221B AAA Replacement Part
CAE221B BAA Reinforced with...
Our Master Data sheet has a different structure:
Item ID Desc - AAA Desc - BAA Desc - BAA
CHD001A Brake Kit Cross drilled... All of our...
CAE221B Replacement Part Reinforced with...
I'm completely stuck on how to get the right info into the right slots. I CANNOT alter the structure of the Master Sheet or I will have to remap at least thirty other spreadsheets. A VLOOKUP won't work in the horizontal way it needs to, and IF statements will get 20 nests in and then lack have a good way to group things. Please help.
Assuming that your task is to find the description of item CH001A in the master db, and that you know the description code, you can use INDEX/MATCH. Here's the setup I used for developing the formula.
I created a simulation of your master in A1:D4 (one row more than the example in your question.
I assigned G2 as the cell where I would enter the Item ID and G3 to enter the Desc Code.
Now the formula =IFERROR(MATCH(G2,A1:A4,0), 1) finds the sheet row number by Item ID and =IFERROR(MATCH("Desc - " & G3,A1:D1,0),1) finds the sheet column number by Desc code. Note that both formulas default to 1 if not found.
Now the formula below will return the description.
=INDEX(A1:D4,IFERROR(MATCH(G2,A1:A4,0), 1),IFERROR(MATCH("Desc - " & G3,A1:D1,0),1))
Observe that the db range A1:D4 includes the captions and both range A1:A4 and A1:D1 start from the extreme top or left. This enables a column or row caption to be displayed in case of error (when an Item ID or Desc Code isn't found).
The formula isn't perfect yet but the method is. I take it that you will be able to tweak it to optimize adaptation to your needs. Let me know if you need help or advice with that.
Very simple to do with Power Query, available in Excel 2010+
Select some cell in the Master Table
If this is not a real Table, it will be changed into one
Data / Get&Transform / From Table/Range
In the PQ editor window that opens, select the Item Id column
Unpivot other columns
Split the resultant Attribute column by `Transition from non-digit to digit
This will get rid of the automatically created suffixes caused by creating
a table with initially identical column headers
If there are digits in the code itself, you'll need to remove the terminal digit(s) using a custom column with a formula.
The best way to do that would depend on the actual structure of your values
Delete the Attribute.2 column (the one with the terminal digits)
Rename the columns appropriately
Here is the generated M Code.
You can just paste this into the Advanced Editor of PQ. If you do, be sure to change the Table Name in Line 2 to whatever your real table name for the Master Data turns out to be.
let
Source = Excel.CurrentWorkbook(){[Name="Master"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Item ID", type text}, {"Desc - AAA", type text}, {"Desc - BAA", type text}, {"Desc - BAA2", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Item ID"}, "Attribute", "Value"),
#"Split Column by Character Transition" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Attribute.1", "Attribute.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Character Transition",{"Attribute.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Attribute.1", "Desc Code"}, {"Value", "Desc"}}),
#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([Desc] <> ""))
in
#"Filtered Rows"

Power Query / Power BI get look for data from another excel workbook

I am trying to combine worksheets from two different workbooks with Power Query and I have trouble doing that.
I do not want to merge the two workbooks.
I do not want to create relationships or "joints".
However, I want to get very specific information for one workbook which has only one column. The "ID" column.
The ID column has rows with letter tags : AB or BE.
Following these letters, sepcific numeric ranges are associated.
For both AB and BE, number ranges first from 0000 to 3000 and from 3000 to 6000.
I thus have the following possibilities:
From AB0000 to AB3000
From AB3001 to AB6000
From BE0000 to BE3000
From BE3001 to AB6000
Each category match to the a specific item in my column geography, from the other workbook:
From AB0000 to AB3000, it is ItalyZ
From AB3001 to AB6000, it is ItalyB
From BE0000 to BE3000, it is UKY
From BE3001 to AB6000, it is UKM
I am thus trying to find the highest number associated to the first AB category, the second AB category, the first BE category, and the second.
I then want to "bring" this number in the other query and increment it each time that matching country is found in the other workbook.
For example :
AB356 is the highest number in the first workbook.
Once the first "ItalyB" is found, the column besides writes "AB357".
Once the second is "ItalyB" is found, the column besides write "AB358".
Here is the one columned worksheet:
Here is the other worksheet with the various countries in geography:
Here is an example of results:
have one column (geography) with
I think that this is something which I should work towards:
I added the index column, with a start as one, because each row (even row zero) should increment either of the four matching code.
In order to keep moving forward I have also been trying to create some sort of mapping in third excel sheet, that I imported in Power BI, but I am not sure that this is a good way forward:
I have the following result when I create a blank query:
After a correction, I still get this result when creating the blank query:
This is not an easy answer as there are many steps to get to your result. I have choosen for m-query because of the complexity.
In PBi click on Transform data, now you are in m-query.
The table with the ID's (I called it "HighestID") needs expansion
because we need to be able to map on prefix
You need a mapping table
("GeoMapping"), else there is no relation between the Prefixes and
the geolocation.
We need the newID on the Geo-table (which I called "Geo").
Expand the HighestID table.
Click on the table and open the Advanced Editor, look at your code and compare it to the one below, the last 2 steps are essential, there I add two columns (Prefix and Number) which we need later.
let
Source = Csv.Document(File.Contents("...\HighestID.csv"),[Delimiter=";", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Prefix", each Text.Middle([ID],0,2), type text),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Number", each Number.FromText(Text.Middle([ID],2,5)))
in
#"Added Custom1"
Result:
Create mapping table
Click right button under your last table and click Blank Query:
Paste the source below, ensure the name of the merg table equals the name of your table. As I mentioned, I called it HighestID.
let
Source = #table({"Prefix", "Seq_Start", "Seq_End","GeoLocation"},{{"AB",0,2999,"ItalyZ"},{"AB",3000,6000,"ItalyB"},{"BC",0,299,"UKY"},{"BC",3000,6000,"UKM"}}),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Seq_Start", Int64.Type}, {"Seq_End", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Prefix"}, HighestID, {"Prefix"}, "HighestID", JoinKind.LeftOuter),
#"Expanded HighestID" = Table.ExpandTableColumn(#"Merged Queries", "HighestID", {"Number"}, {"Number"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded HighestID", each [Number] >= [Seq_Start] and [Number] <= [Seq_End]),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Prefix", "Seq_Start", "Seq_End", "GeoLocation"}, {{"NextSeq", each List.Max([Number]) + 1, type number}})
in
#"Grouped Rows"
Result:
Adding the NextSeq Column
This is the hard bit because when I would only give you teh code, I am afraid it will not work so I give you the steps you need to do.
1.Select the table, right click on Geography and click Group by. select as below:
Merge with table Geomapping as below:
Expand the GeoMapping with NextSeq
Add a custom column:
Remove columns not needed so only custom is left created in step 4.
Expand the column (all select). End result all your columns you had earlier plus an Index column.

How to use Countifs,Or and Sumproduct efficiently

I have a list of accounts with 2 digit modifiers. Some accounts will have more then one modifier. I am looking for accounts with a certain combinations of modifiers.
So I have a list of accounts in the B column.
I have the modifiers in C Column
Example
Act # Modifier
111 80
111 56
111
222 55
222
333 51
333 50
333
I have some working code that works great until I get to many rows.
In this sample formula I have 8 Modifier groups.
50,22,51,62
51,22,62
54,50,51
55,50,51
56,50,51
80,50,51
"AS",50,51
59,50
=IF(OR(SUMPRODUCT(COUNTIFS(B:B,B3,C:C{50,22,51,62}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{51,22,62}))>=2,SUMPRODUCT(COUNTIFS(B:{54,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{55,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{56,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{80,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{"AS",50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{59,50}))>=2),"Check","")
This code will put check by any account that has 2 or more of the modifiers from any of the 8 groups. It has to be 2 modifiers from the same group though.
I was just wondering if there is a better way to write this? Instead of doing all these or can I just do OR for the different modifier criteria I am looking for?
Something like
=COUNTIFS(H:H,H5,I:I,OR({59,50},{"AS",50,51}))
As requested by #SkysLastChance, I will post my solution using Power Query (PQ) even though the question was tagged to Excel-Formula.
Please note you MUST use Excel 2010 or later versions otherwise you will not be able to use Power Query. My answer might not be robust enough for people who has not used PQ before. So feel free to leave a question if you are unclear with any particular step.
Step 1
Convert the Account List and Modifier Group in the example into Table in your excel worksheet. One way of doing that is to highlight the data including headers and press Ctrl+T. Then you should get two tables as shown below. I have named the first table as Tbl_ActList, and named the second one as Tbl_MoGrp.
Please note I have added some data to the Account List table for result testing purpose.
Step 2
Select any cell within a table, go to the Data tab on top of your excel (mine is Excel 2016), click From Table in the Get & Transform section. It will load and add the table to the built-in PQ Editor. You can exit the editor (and keep the changes), and repeat this step to add the second table to the PQ Editor. Alternatively you can add a new query in the PQ Editor and find the second table from your excel worksheet. I will not demonstrate this process as you can google the know-how later on.
Step 3
Once you have added both tables to the editor, you can start editing/transforming data in each table/query using built-in functions and/or advanced coding. In this case I only used built-in functions.
For the Modifier Group table, I want to transform the original data into a 2-Column list with one column showing which Group the modifier belongs to, and the other column showing a single modifier.
Firstly, use the Split Column function in the Transform tab to split the original modifier groups into single value by using , (comma) as the delimiter.
The new table is in matrix structure which is no ideal for look up purpose, so I used Unpivot Columns function in the Transform tab to convert it into list structure. What I actually did is to highlight the Grp column and select Unpivot Other Columns to get the list. Alternatively you can highlight the first four columns and use Unpivot Columns to get the same list.
Then I renamed Value column as Modifier, and removed the Attribute column to end up a 2-Column table.
Please note all data in each table/query in this example have been set to 'Text' format (aka data type). Data type is very sensitive and specific in PQ, and incorrect data type may lead to error.
Here is the full code behind the scene. All steps are performed using the built-in functions without any advanced coding:
let
Source = Excel.CurrentWorkbook(){[Name="Tbl_MoGrp"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Modifier", type text}, {"Grp", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Modifier", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Modifier.1", "Modifier.2", "Modifier.3", "Modifier.4"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Modifier.1", type text}, {"Modifier.2", type text}, {"Modifier.3", type text}, {"Modifier.4", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Grp"}, "Attribute", "Value"),
#"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Value", "Modifier"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Attribute"})
in
#"Removed Columns"
Step 4
With the Modifier Group list ready, we can look up the modifier group in the Account List table for each modifier using Merge Queries function in the Home tab. The logic is to find the link between two tables to conduct a look up.
Firstly, select/highlight the column (Modifier) that contains the look up value in the origin table (Tbl_ActList), then select the table (Tbl_MoGrp) that you want to look up from, then select/highlight the corresponding column (Modifier) in the second table, and then click OK to continue.
Please note before merging I have filtered the Modifier column in the Account List table to get rid of cells showing null (blank) as they are not useful for the look up.
After merging the queries there is a new column added to the Account List table. It may look like a column but it contains all data from the Modifier Group table stored in Grp column and Modifier column. As we want to look up the modifier group only, we can Expand the column to show the Grp column only.
Click on the little square box on the right hand side of the header of the last column to trigger the Expand function, then select the Grp column only and click OK to continue.
Now we have a table showing account number, modifier, and modifier group. We can then use the Group By function in the Home tab to find out for each account number how many modifiers have appeared in each applicable modifier group.
Please See below screenshot for the settings for the Group By function.
Then I sorted the table ascending by Acc # column, and filtered the Count column to show values greater than or equal to 2, i.e. at least 2 modifies linked to that account number have appeared in a modifier group.
Here is the full code behind the scene:
let
Source = Excel.CurrentWorkbook(){[Name="Tbl_ActList"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Act #", type text}, {"Modifier", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Modifier] <> null)),
#"Merged Queries" = Table.NestedJoin(#"Filtered Rows", {"Modifier"}, Tbl_MoGrp, {"Modifier"}, "Tbl_Grp", JoinKind.LeftOuter),
#"Expanded Tbl_Grp" = Table.ExpandTableColumn(#"Merged Queries", "Tbl_Grp", {"Grp"}, {"Grp"}),
#"Grouped Rows" = Table.Group(#"Expanded Tbl_Grp", {"Act #", "Grp"}, {{"Count", each Table.RowCount(_), type number}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Act #", Order.Ascending}}),
#"Filtered Rows1" = Table.SelectRows(#"Sorted Rows", each [Count] >= 2)
in
#"Filtered Rows1"
Step 5
The answer could stop at Step 4 as the table has shown the account number that we are looking for. However if there are thousands of account numbers, then it is better to Remove Other Columns except the Act # column, and Remove Duplicates within the column, and then Close & Load the result to a new worksheet. The final result may look like this:
A tip here, before Close & Load any query for the first time, it is better to set the following in your Query Options. It will prevent PQ Editor to load each of your queries to a separate worksheet by default. Just imaging how long it will take if you have 20 queries in your PQ Editor and each of them have more than a thousand lines of data.
Once you change the default option, PQ Editor will only create connections for your queries after you click Close & Load, and you can manually load a specific query result to a worksheet as shown below:
Conclusion
I believe if this question was tagged as a PowerQuery, there may be more concise or 'fancier' answers than mine. Regardless, the things that I like PQ the most are it is a built-in function of excel (2010 and later versions), it is scalable, replicable and more powerful when it comes to data cleansing and transforming.
Cheers :)

How do I reconstruct a data-set based on unique ID

Looking for a solution either in excel or IBM SPSS:
I have a dataset with around 95,000 rows. Each row is one response from a participant on a particular question. For example, Row 2 is the response from participant A, on Question 1, where they indicated a score of 2. As pictured.
Ideally I need 1 line of responses per participant as pictured here:
I've tried VLOOKUP and then a macro to delete #N/A and move up the values but memory can't even handle the VLOOKUP, so it's not a viable option.
I feel out of options on what to do, but without laying out my data-set like this, I can't do later analysis (Later I need to average across all participants where Q5 = 80 etc [Q5 is a category code]).
You can do this with a Pivot Table.
Using Power Query (Excel 2010+) (aka Get&Transform in Excel 2016+) gives you a bit more flexibility in, for example, automating the naming of the column Headers.
You can use the GUI if you will only have five questions. But if the number of questions might vary from run to run, the code to handle that needs to be done through the Advanced Editor.
If not, you can use the GUI to just Pivot the QuestionNumber column
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"UserID", type text}, {"QuestionNumber", Int64.Type}, {"Score", Int64.Type}}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Changed Type", {{"QuestionNumber", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Changed Type", {{"QuestionNumber", type text}}, "en-US")[QuestionNumber]), "QuestionNumber", "Score", List.Sum),
Renames = List.Transform(List.Skip(Table.ColumnNames(#"Pivoted Column"),1), each {_, "Q" &_}),
#"New Headers" = Table.RenameColumns(#"Pivoted Column", Renames)
in
#"New Headers"
SPSS ANSWER:
Run this code in a new syntax window:
casestovars /id=userid /index=questionNum /separator="".

How can I have the informations from 2 differents columns represented as 1 in a pivot table?

In my data, I have 2 columns who represent a country visited before and a country visited after the cities that I am studying.
Here's a picture of my data sample: https://i.imgur.com/kS4K9uK.png
I'd like to represent in my pivot table all the countries linked to each city (so before and after the city). I'd like to have the cities in my line and all the countries who can possibly be visited before and after as my columns and the count of those in my values.
Here is a picture of what I'd like to achieve, but I can only do it for one of the columns (country after in that case). I'd like the same format but having the data of both before and after (but it's important to know that it's not necessarily the same countries in the 2 columns so I can't just have one of the country columns as the head and both as the values): https://i.imgur.com/PUjhSmB.png
When I place the cities in the line and the 2 country columns in value and columns, it is so difficult to read the table as the before and after are all separate and might even be counted as a pair. and if they are not in the pivot table column they only give me the count of countries before and after but not by the countries, which is not what I'm looking for.
Here is a picture of the result of the pivot table: https://i.imgur.com/3j4BD3k.png
I also tried to create a new field by doing «Country before» + «Country after» but it doesn't seem to work as the data is in text.
Ok I think understand the output now. You essentially want a count of the number of occurrences of each country in columns B+C, grouped by the city. I'll provide a few ways so you can select what suits you best.
Simplest method
The easiest way I can think of is simply paste the second column under the first column and then pivot on this new table.
COUNTIF
A more repeatable way would be to essentially make your own pivot table and use the COUNTIF function to count the instances of each country.
=COUNTIFS($A$1:$A$6,$F2,$B$1:$B$6,G$1)+COUNTIFS($A$1:$A$6,$F2,$C$1:$C$6,G$1)
Power Query
The most repeatable way is to use PowerQuery. This will enable you to refresh the data at the click of a button. To do this (assuming you have excel 2016) go to the Data tab and, with you data selected click "From Table/Range". The Power Query window will open. On the top left of the screen will be a button with advanced editor. When you open it you'll see the following code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"City", type text}, {"Country Before", type text}, {"Country After", type text}})
in
#"Changed Type"
Replace the code with the following code. Note that your table may be called something different. You can see what it's called on the second line of the code. The code below uses "Table1" - you can replace this with the name of your table.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"City", type text}, {"Country Before", type text}, {"Country After", type text}}),
#"Before" = Table.RemoveColumns(#"Changed Type",{"Country Before"}),
#"After" = Table.RemoveColumns(#"Changed Type",{"Country After"}),
#"Append" = Table.Combine({#"Before",#"After"}),
#"Inserted Merged Column" = Table.AddColumn(Append, "Country", each Text.Combine({[Country After], [Country Before]}, ""), type text),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Merged Column",{"Country After", "Country Before"})
in
#"Removed Columns"
Hope that helps.

Resources