I'm a new Tableau user and am looking for help/guidance in creating a frequency distribution table from data in an Excel spreadsheet. The data is from a survey, Column A is the respondent's e-mail address. Column B is the location (State). Columns C - N are questions from the survey, and answers are listed in the columns on a scale from 1 - 3. The column names look like this:
E-Mail .............. State ......... Question 1 ..... Question 2 ..... Question 3
john.doe ............ MN ................ 1 ........................2....................1
mary.smith...........WI...................2.........................3....................2
j.doe....................MN..................3.........................1....................2
I'd like to use Tableau to create a frequency distribution table that would look similar to the following:
State 1 State 2 State 3
1 2 3 1 2 3 1 2 3
Question 1
Question 2
Question 3, etc.
I have a feeling that my data is not formatted correctly; however, I'm struggling with how do so. My questions are: (1) Can I create the frequency distribution table in Tableau using my data in its current format? (2) If not, how should I format it in order to create the frequency distribution table?
Thank you in advance for your help.
It can be done, but not exactly the way your data is right now .
first part of solution :
Good news is you don't need to touch excel for data manipulation, can be done in tableau itself . Select the edit datasource from data source context menu.
Select all the states by pressing the control / command button and then pivot . This data state is much more flexible to work upon .
Second part : I have uploaded desired output at public Tableau. Check and let us know if that is desired.
Tableau Public Link
Related
I'm trying to be pointed into the right direction of finding a solution of an interesting request I've gotten recently.
An excel file was given to me, with 1 sheet per day, so the year file has 365 sheets, named based on their date.
Now the interesting and also annoying part is that the sheet contains roughly 15 tables, which are not formatted as tables but only visually. See the example here:
The desired format is this:
TABLE NAME -- NAME -- VALUE1 -- VALUE2 -- VALUE3 -- SHEETNAME
Luckily the source format is the same on every sheet. My question is, does anyone know a good method to create a new excel file that takes all this data and combines it into 1 sheet. Which software? Language etc.
So essentially it would be saying, use cell/row X5&&cell/row Y4 as column 1, cell/rowxxx as column 2 etc. (from all sheets available, combined)
Then what I'd want is to import said data and have it transformed/loaded into 1 big new table as described. I previously used pandas dataframe and tabular to merge PDF tables into 1 but these were already actual tables of itself and thus easier. These are basically just cells, visually shown as tables making this quite a nightmare.
Would highly appreciate any creative ideas.
I'm trying to help a colleague with some work in Excel, he has a data-set of 40 Organisations of which each organisation has multiple Key Personal (KP). For each of these KP there has been an assessment against 3 key areas of criteria (where they are given a Y or N), these criterion being:
Geographic Area (Broken down into 26 Geographic Areas)
Industry Experience (Broken down into 18 Industries)
Areas of Expertise (Broken down into 18 Areas)
An example of the data is shown in the screenshot is linked
What I am trying to achieve is set up a 'filter form' that will allow an individual to put in their requirements (e.g. Aged Care Experience, in All of the West Region) and be provided with an output of the organisations that fit this criteria.
I have attempted to achieve this via utilizing a Pivot table, but have had no luck due to the different criteria and the fact that each organisation has multiple KP.
Any assistance would be much appreciated as to whether this can actually be achieved in Excel and how it could be done. If it can't I was thinking whether an Access Database could be used.
Update:
Please see attached the example data extract as requested by donPablo
Data Extract
From discussions with my Colleague the best outcome for him would be to get the Supplier, The KP and the other Criteria (think of it as filtering to hide all the Organisations and KPs expect the ones that meet the criteria).
if this is not achievable I can imagine that having the name of the organisation and KP as the output (that meet the criteria) would be suffice.
Think about maintenance of the ExampleData...
Adding a new Industry. Adding a new Expertise.
Splitting Industry into 3 Industry-s
Adding new Org with 2 KP
Deleting old KP3 from an org
For now with the initial concept, changes are small.
But soon in growth period there will be many changes.
How do you distribute these changes to all the users?
Thus, some sort of Split solution is needed.
A back-end DB (XLS or MSACCESS or SQLSERVER) ,
and a front-end form for--
Selection(s)
Results
Back-end as XLS could still be as ExampleData...
To be kept in central office.
And a front-end that links or references that db
but does not contain all the detail rows.
I think that the main matrix needs another column
called AreaType, value G or I or E
and that the area heading row needs to say
'ANY Geo" and have all "Y"-s in each column, etc for I and E.
In searching the matrix for Aged Care we should only look at Industry.
The ANY row would be chosen when the user does not choose an area.
I think that "Org" is a separate table
And that "KP" is another separate table.
This allows full details to be stored elsewhere
than the main matrix of areas.
Column heading of matrix would be "Org#~KP#", which would be
parsed on the tilde and separately looked up.
(it is improbable that any org or kp will have a tilde).
Yes, it is possible to search the matrix and retrieve qualified rows.
For ncol = minCol to maxCol
CountYInG = 0: CountYInI = 0: CountYInE = 0:
For each AreaType G, I, E
' then look at what was selected (gggg/iiii/eeee)
For each AreaName in (gggg/iiii/eeee)
If matrix = "Y" then add 1 to Count
next
next
if CountYInG > 0 and CountYInI > 0 and CountYInE > 0 then
This Org/KP qualifies
endif
next
added Pi Day, 20:00
First inclination is NOT to have 3 criteria tables (G/I/E), but rather ONE table.
Lets make several alternative DB designs. Then look at usage, and rank them.
Finally choose one and do it. Good luck, and Bye.
Matrix alternative
MatrixTable--AreaType & AreaName (PK), and one attribute Column for each Org/KP with value 'Y' or blank.
1st row has PK=C-ColHeadings, and each Column has Org#/KP# for that column.
OrgTable--Org# (PK), and OrgName, OrgStreet1, OrgStreet2, OrgCity/State/Zip, OrgPhone, ...
KPTable --KP# (PK), and KPName, KPOrg#, KPPhone
Normalized alternative (Admin would need to do pivot to see matrix view)
DetailTable--Org#(FK)-KP#(FK)-AreaType-AreaName(FK) DetailValue = 'Y' or ('Y' by implication of row existance)
OrgTable--Org# (PK), and OrgName, OrgStreet1, OrgStreet2, OrgCity/State/Zip, OrgPhone, ...
KPTable --KP# (PK), and KPName, KPOrg#, KPPhone
AreaTable--AreaType-AreaName(PK) (so that everyone spells it the same)
Your favorite design... list the tables, and their fields
Found a problem which Im not able to solve with my excel "skills" so Im askin for help.
V. Excel 2016
( https://i.stack.imgur.com/U8VLH.png )
https://files.fm/u/nvkgsf63
http://www.filedropper.com/excel4so
I have 3 tables. First is a table with input. Second one is a something I want to add to the first one and third is the output i need to CREATE with functions. 1st and the 2nd table is gonna change every month and so does the output then.
(3rd Table is just and shorten example - added there only IDs 551 and 77 to show what i mean)
Basically what i need is:
If ID number (Example 551) appears in 2nd table and the 1st - do(in CASH) 8768,2 + 892,30 (with HOURS too) and put it into the 3rd table with the same ID.
If ID number does NOT appear in the 2nd table (Example ID 54) - just copy data from the input.
If ID number does NOT appear in the 1st table (Example ID 77) - Add data to the end of the 3rd table.
I hope everything is clear with these examples, if not just ask, ill try to explain it better.
Thank you so much guys
Have a nice day :)
Currently I have a database that has 2 variables:
Fund with an ID attached to it and
Investor with an ID also attached to it.
The example attached, has 4 funds and 4 investors.
An investor can invest in 1 to 4 funds.
I have a VBA function that transposes the data into an "X & Y axis" format.
If there is a name "Ben & Jerry Fund" and "Ben" is present, it should show a quarter entry for that investor name but if the investor does not invest in the fund, it should just be blank.
Question: Is this possible?
Using the =IF(AND( function would not be possible here since there's so many funds and investors in the database.
Figure 1 shows the data reference (before transposing).
Figure 2 is the desired result.
If the purpose is to have a dynamic report. I mean if you want to append more years and avoid operative process. Follow this steeps:
Use "Format as table" for your data. This will allow you to append more data later and it will refresh the functions by itself.
Create a field to extract just the quarter number in the field [Quarter] with =MID([Quarter],1,1)
Create a field to extract the year =RIGHT([Quarter],4) this will allow you to use this field as a filter for future years
Create a pivot table using the new table then organize the data:
*Filter([Year])
*Rows([Fund Name], [Fund Id])
*Columns([Inversor], [Inversor ID])
*Values ([Quarter])
You will see the numbers for the quarter. If you want to see the letter "Q" you can change the format with right click, and select "Number Format...". There in "Custom" change to Q0
Here is a Tutorial about format as table and pivot tables that I made a few months ago. I am sorry it is in Spanish but I am using the Excel English version.
=IFERROR(INDEX($E$3:$E$12,SMALL(IF($C$3:$C$12=L$5,IF($A$3:$A$12=$K7,ROW($A$3:$A$12)-ROW($A$2))),1)),"")
Building an array of row numbers for INDEX and grabbing the first match (smallest row) based on the 2 if statements...
A pivot table can achieve what you want?
The issue is sorting an array that is generated automatically from an data source using a formula that extracts unique data points. (Data points are date/time)
The data is being extracted with this fomula.
=INDEX(Table_ExternalData_1[SampleDateTime],MATCH(0,INDEX(COUNTIF($G$2:G2,Table_ExternalData_1[SampleDateTime]),0,0),0))
Once extracted, the data is not sorted right away. The current data is extracted from a database via an SQL string that pulls in data corresponding to the data and time that the data point was created.
Because of this, the extracted points are not in the correct order. I am attempting to sort the extracted data points from earliest to latest to continue with the data sorting, but need the date/times to be sorted in a separate row.
I have attempted to use a pivot table, but it isn't exactly what I need and ends up being a messier end product than I need.
All assistance is appreciated.
Example is below.
1
2
3
5
1
2
3
4
6
5
3
I need this.
1
2
3
4
5
6
I did end up finding a solution that I will be able to modify. Using a single row of a pivot table, I took just the date/time column and had the PivotTable function sort the data to be utilized as necessary.
Thank you.
The fact that the range in the example you give:
1) Consists of entries of a numeric datatype only
2) Does not contain any blanks
means that the solution is relatively simple.
Assuming that data is in A1:A11, first use a single cell somewhere within the worksheet to count the number of expected returns. For example, using B1 for this purpose, enter this formula in that cell:
=SUM(IF(FREQUENCY(A1:A11,A1:A11),1))
Your main formula is then:
=IF(ROWS($1:1)>B$1,"",SMALL(IF(FREQUENCY(A$1:A$11,A$1:A$11),A$1:A$11),ROWS($1:1)))
the latter being copied down until you start to get blanks for the results.
Regards