Change Data layout in Excel

Change Data layout in Excel - excel

I have Excel data taken from vendor.
I want to change the format of layout so that it would be easier for me to ingest this data to database such as (Postgre / Mysql)
There are thousands of file in this format as one file contain information of only sales in one date.
Is there a better or fast way to convert this data or is creating VBA the only way?

We could definitely do this in VBA!
On the other hand, why bother when it can easily be accomplished with excel formulas?
My assumptions:
You're dealing with data of variable sizes (number of regions and items)
Cells you show as merged are merged on the actual data.
I set this up to work with a max data range of row 100, but easy to change that.
Step 1)
Identify how many regions and items we have:
=COUNTA($D$4:$K$4) (Regions)
=COUNTA($C$6:$C$100) (Items)
Step 2) List Items, repeating for number of regions.
Use local line numbers rather than worksheet row numbers to make this more portable.
To make it repeat, we're going to divide line# by total regions, roundup, then index items using that. Don't forget error handling to identify when to stop listing items. From now on, all the rest of our formulas will have a If( *item column* <> "", *show stuff* , *don't show stuff* ) . We can gloss over that in future steps.
=IFERROR(INDEX($C$6:$C$100,MATCH(ROUNDUP(M5/$P$1,0),$B$6:$B$100,0)),"")
Step 3) Date is super easy:
Just pull the same date for each: =IF($O5<>"",$C$2,"")
Step 4) We're going to use the Mod() function to create a cyclic action with our regions.
It took me a while to play with the formula... so don't ask me what all the adjustments(translation and scaling) are for, just trust it works. =IF($O5<>"",INDEX($D$4:$K$4,2*(MOD($M5-1,$P$1)+1)-1),"")
Step 5) "Quantity" and "Price" are the same formula, except "Price" has an extra "+1"
Quantity = =IFERROR(INDEX($D$6:$K$100,MATCH($O5,$C$6:$C$100,0),MATCH($P5,$D$4:$K$4,0)),"")
Price = =IFERROR(INDEX($D$6:$K$100,MATCH($O5,$C$6:$C$100,0),MATCH($P5,$D$4:$K$4,0)+1),"")
FINAL PRODUCT:

Related

How do I recieve the number of cells based off of three columns in Excel?

I'm not too sure how to word this problem so, I apologize for the vagueness. Here is what I am trying to do though:
I have a large Excel table with a ton of values, I however, only care about 3 columns. The three columns I have are "Project Name", "Active/Planned", and "Week of Month". Here is an example of some values I would have:
Project Name
Active/Planned
Week of Month
StoreProj
Active
2021-07 Jul-Wk1
SecProj
Planned
2021-07 Jul-Wk2
StoreProj
Active
2021-07 Jul-Wk1
Now, I have used a formula to get the number of projects based on a specific week month and avoiding duplicate values for the project name. The code I used returns an integer of the number of projects. Here is what I used:
=IFERROR(ROWS(UNIQUE(FILTER(Table[Project Name],Table[Week of Month]=2021-07 Jul-Wk1))), 0)
This works as intended. Now the issue I am running into is that I need to filter through these rows as I did previously, but now I need to include the "Active/Planned" column. So, I want to be able to see how many projects I have based off of the week of the month and return a number of projects (excluding duplicate names), but be able to filter through that integer output based off of the active/planned projects. So in a perfect scenario I can choose the week of month and if the project is "Active" or "Planned" and see the amount of projects I have.
This might be an easy fix so I apologize, I am just stumped, any help would be greatly appreciated. Thanks!

Work through that step by step, you've got the FILTER function which is giving data to the UNIQUE function, to the ROWS function, and then your IFERROR. However, the data about whether each line/row is 'Active' or 'planned' isn't passed out beyond the FILTER function, so can't be used by anything further on in the above sequence.
Boring theoretical advice out the way, try this;
=COUNT(IF(UNIQUE(FILTER( Table[[Project Name]:[Active/Planned]], Table[Week of Month] = "2021-07 Jul-wk1"))= "Active", 1))
Explanation:
FILTER(...) outputs records with the relevant date filter, however it outputs Table[[Project Name]:[Active/Planned]] - both columns, to ensure all relevant data is there.
UNIQUE(...) Then narrows that down to unique values, although by this stage I'm not 100% sure you need this.
IF(... = "Active", 1) then replaces the 'Active' outputs with 1s
COUNT() returns the number of cells in the above that contain a number (the 1s from the IF())
Yes, you can't use COUNTIF on arrays (and all except that last bullet point above are outputting arrays not single values) - and no, I didn't know that before attempting to answer this question, found it over at a different question!

What is the excel equivalent of a COUNTIF with WILDCARD formula in Tableau?

I'm working on a Tableau table where I'd like to:
Isolate specific string records in a dimension; then
Count the instances that a specific string appears in that dimension (so all the records). A record may have multiple but different texts within.
Let's say my Dimension is called "Person Type" and I have 5 records with these respective values:
Employee, Visitor, Employee; Visitor, Applicant, Visitor; Applicant
I understand that I could first create a filter on the dimension to only show the singular record types: Employee, Visitor, Applicant, but where I'm having trouble is creating a calculation that looks at all the records and counts the instances that the word "Employee" is present in and so forth. In excel, a COUNTIF formula with a text wildcard handles this.
Here's an excel screenshot of what I'm trying to accomplish.
Screenshot
Edit: Tried a calculation that was pretty close to the solution. For some reason it's not counting Applicant, when there's clearly two instances of it. anyone have any idea's on how to improve the calculation?
IF CONTAINS([Person Types],'Employee') THEN "Employee"
ELSEIF CONTAINS([Person Types],'Visitor') THEN "Visitor"
ELSEIF CONTAINS([Person Types],'Applicant') THEN "Applicant"
END
result
issue

You can use the Contains function in your LOD which operates like a wild card *X* in Excel.
So something like:
{ FIXED [NEWFIELD]: SUM(INT(CONTAINS([PERSON],"Employee"))) }
The code
IF CONTAINS([Person Types],'Employee') THEN "Employee"
ELSEIF CONTAINS([Person Types],'Visitor') THEN "Visitor"
ELSEIF CONTAINS([Person Types],'Applicant') THEN "Applicant"
END
will not correctly count "Applicant" if there is an "Employee" or a "Visitor" since they will take precedence (each if is tested in sequence). So for example "Visitor; Applicant" will return "Visitor"
Also CONTAINS is case-sensitive so you might want to convert to all upper case to ensure that any subtle case variances are matched.
IF CONTAINS(UPPER([Person Types]),"APPLICANT") THEN "Applicant"

I recreated a sample like you have shown in your screenshot for a demonstration (it is always advised to copy and paste some rows of data directly in the code)
Steps (Tableau Desktop) It is though easier in Prep-
In the data source pane after connecting your data, click a down arrow on the person type field and click split
Two new field will be automatically created (given that maximum person types in one record is 2) or more rows will be created. (It is highly advised to have Tableau Prep) In tableau prep you can directly PIVOT these fields. See method here. But pivoting on calculated fields is not there in Tableau Desktop
Union the data with itself n (=2 here in this case) number of times. See GIF for help
create a calculated field say person types (I added a s intentionally to differentiate) with the following calculation
// A single person_type for each row.
CASE [Table Name]
WHEN "Sheet1" THEN [Person Type - Split 1]
WHEN "Sheet11" THEN [Person Type - Split 2]
END
Excluding null values you can create your desired views
Needless to say that you have to change values of [table name] as per your case in the calculated field

How to scrub data in Excel, specifically removing extreme outliers that are outside of a given range?

I have thousands of cells of data that are output from a model, with my results formatted as follows: cell ID is column header, each row is a timestep, and each cell's results over hundreds of timesteps is printed out in a spreadsheet. I want to analyze the data within certain percentiles. I've identified what the values are for the percentile thresholds of interest, but I'm not finding clear directions on how to ...
a) remove all values that are outside of the range I'm interested in, for the sheet I'm working in
or
b) pull the values within the range of interest out of the sheet and into a separate one for further analysis
The values are numbers with two decimal places.
I need to scrub the data, then analyze it, in a separate step. For example, even after removing the extreme max and min in a timeseries, I still want to see the entire timeseries but with the outliers removed or changed to a null value. How can I select or simply remove all the outliers from this data matrix, leaving the rest of the data in tact?

The best way to do it is using the PivotTable feature.
With the PivotTable you will be able to create filter parameters using ranges (the main data and the outliers as well).
Please, take a look on this if you don't know how to use a PivotTable:
Create a PivotTable to analyze worksheet data

Although "Robust" the easiest way to do this would to to filter your data. After that, filter out all of your good values. Once you only have the "bad" cells visible, then go to special and highlight only visible cells. From there, delete.
To do this efficiently / with keyboard shortcuts, it would be like so:
1) Select the data headers and type alt + h + s + f
2) Click the "value" header, and click proper numbers until you have all the data you do not want filtered out.
3) Highlight all of the remaining cells (These should be the data points you want deleted, and the row numbers in excel should be blue now)
4) Type alt + h + f + d + s + y to go to special and select visible cells only
5) Type alt + h + d + r to remove the rows (this will take a bit of time, be patient)
I hope it works!
***EDIT: Instead of manually sorting out the data, you can also rank based off of size, and directly cut the data like that. After thinking more about the answer, this method would be easier (I think) and much faster.

Break-Down Data in Excel without VBA (Formula Only)

Many times, I am required to provide some type of break-down to the customers - an example is shown in the attached figure.
I have a table of data ("TABLE DATA" - which is some type of pivot) + Customer provides its official form, its structure must be preserved (highlighted in yellow ). Basically, I need to separate the cost details of CODE "A" and CODE "B" into 2 separated sections.
Customer requires me to provided details for each individual Part (example shows Part A - "Break-Down Part A)
Is there anyway to put a"ITEM" from "TABLE DATA" into Code A and Code B ? the rests can be solved by Vlookup (Price, Quantity) - note: "ITEM" is non-duplicated values . Thank you very much

Number your rows in the breakout using =1 and =A1+1 and then just use the formula ="B-ITEM"&TEXT(A1,"000"). If you want to skip making a counter column you could use ="B-ITEM"&TEXT(ROW()-1,"000") to just use the current row number (minus 1 or however many you need).
If your items aren't sequentially like that, but still unique, I would recommend adding counters on the original tab similar to what you have, which would let you quickly find the 5th A or 7th B, something that counts the previous instances of your current type, and then adds 1. For Row 6 you could do =COUNTIF(A$1:A5,A6)+1.

Nested list in excel

I'm not even sure how to ask this.
I have a database, where each row is a person. Columns are contact info, phone, etc. One column is 'date visited'. There can be multiple dates visited for each person. I don't want to use a comma or stack them all in one field.
Is there a way to have a 'nested' list (not a drop-down menu - just a list of visited dates for each person), such that one person still only consumes one single row?

Yes,
To accomplish this give each person an ID that is unique and won't change.
Then on a separate sheet, store the ID and date.
main sheet ( ID, Name, Contact Info, phone, ect)
second sheet ( ID, date visited)
In database theory this is called a 'one to many' relationship, and what i'm describing is called 'normalizing your dataset'.
In Excel you can now use formulas to manipulate the data however you need to or can imagine after you split this apart.
As you mentioned in comment, counting all visited dates for a user.
On the main sheet to the right you could use:
=countif(Sheet2!A:A,Sheet1!A1)
This would Count all of the ID's in the second sheet that match the current row's ID on your main sheet.
Notes about using one cell:
Storing all the dates in one cell will eventually max it out, and will make it hard ot view/search as it grows so i highly advise against this approach.
If however you insist on keeping the dates in there, you could count the visits by counting the total number of comma's + 1 liek this =(LEN(G1) - LEN(SUBSTITUTE(G1,",","")))+1 This formula takes the length of all the dates, and the length of dates with commas removed and subtracts them to get a number of occurrences.
Notes about using multiple columns:
This approach has the same idea as the one I suggested, where we are associating a number of dates with the row's identity of a person. However, there are a few key limitations and drawbacks.
The main difference is that when we abstract the dates by transposing them to extend vertically we can manipulate them easier, and make a list of 20 dates for one person much easier to read. By transposing the dates vertically in the second sheet instead of using this approach we also gain the ability to use Excel's built in filter. Just storing large amounts of data is useless by itself. While storing it in a way that you can view and manipulate easy makes everything much more powerful.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string