I have a data set which I need to convert to longform, so I'd be able to use it in a data analysis program (R). The format is standardised for each table so I'm wondering if there is a way to have excel to transpose the data for me.
Thanks in advance for the help.
Data set
Longform
If you have to do this regularly for a lot of data, writing a macro to loop through everything would be best. A manual workaround that is still quicker for a lot of data is to create a set of formulas that converts all data from one point at one person's place into 8 lines of longform data. Then by changing a reference you can re-use these formulas for every point at every person's place:
Your first 4 columns are manual: Location, Point, Quarter, Type. They have fixed values for every 8 rows. Enter them manually for one data point, they'll all get copied later.
Then have a 5th working column that records the location of an anchor point for every set of data at a point at a persons place. For this example, I'm assuming you have a "NW" value in cell B3 on a sheet called "Data". In your 5th table column, in first row only (Cell E2) put in the text "Data!B3" without an equal sign.
The remaining columns for all 8 rows all refer to this anchor point using the OFFSET and INDIRECT functions. For each column in your data for the first 8 rows, refer to each value in the data set based on their relative position from the anchor point:
The first data column is the NW Shrub Distance value, which is offset by 1 rows and 1 columns:
=OFFSET(INDIRECT(E2),1,1)
The second data column is NW Shrub Height, which is offset by 1 row and 2 columns:
=OFFSET(INDIRECT(E2),1,2)
Continue through the rest of the columns on that row. Then go to the next row in your table. The first data column there is the NE Shrub Distance, which is offset by 7 rows and 1 column from the anchor NW cell:
=OFFSET(INDIRECT(E2),7,1)
Then the second data column in the 2nd row is the NE Shrub Height, which is offset by 7 rows and 2 columns from the anchor NW cell:
=OFFSET(INDIRECT(E2),7,2)
Prepare these formulas for all columns for all 8 rows. It will take a little while, but after you're done, you can then just copy the entire chunk and paste it below the first chunk. Update the one anchor value for the whole chunk from Data!B3 to the NW location in the next data chunk, eg Data!H3, and all formulas will now pull the values from all cells relative to new anchor point.
Repeat this for every data chunk and you'll have it in longform fairly quickly.
Related
I am making a log for work and in one of the columns, I need a formula that fills each cell with the latest text information from another sheet in the same workbook. I have attached a picture of the worksheet I am working in and a picture of the referenced worksheet.
The worksheet I need the formula in
The worksheet with the reference cells
In image 1, there are 2 column titles highlighted. Column 'A' and Column 'S'. Column A is the id of one of my animals, and column S is supposed to have a date/initial in there for my macro to work. However, people forget to fill it in and only fill it in the sheet from image 2 in Column 'P'. Because we reuse the same animal more than once the information that goes into column S in the first image always needs to be the newest information from the reference sheet. I know how to do a VLOOKUP with the dates, I have already done those, but because I need the cell to populate with text instead of numerical values, I am having trouble.
I will list some formulas I have tried that are supposed to go from the bottom to the top but don't work and maybe just need to tweak and some that work if I was going from the top to the bottom.
Formulas that don't work going from the bottom to the top but I'd think would:
=LOOKUP(2,1/(FIND(A18,BREEDING!D:D)),P:P)
=INDEX(BREEDING!P:P,MATCH(A21,BREEDING!D:D,0))
=LOOKUP(2,1/(BREEDING!D:D=A21),BREEDING:P:P)
Formulas that do work but go from the top to the bottom:
=(VLOOKUP(A17,BREEDING!D:Q,13,FALSE))
References:
Column A: Animal ID that is present in the 1st image
Column D: Animal ID that is present in the 2nd image
Column S: 'Date Weaned' cell that will contain the formula and information from the 2nd image should populate into
Column P: The actual date weaned that should go into column S of the 1st image
TL;DR
In image 1, cells in column S should have the latest text information from column P of image 2 if the information from Column A of image 1 matches the information from Column D of image 2
Formula that you could use is below
=INDEX(BREEDING!P:P;AGGREGATE(14;6;ROW(BREEDING!D:D)/(BREEDING!D:D=A2);1))
how it works - the index is simple, the aggregate the is main thing - it uses 14 as first parameter which is function LARGE (return k-th largest value) and then the 6 is the most important, because that means "skip errors". This is vital, because when you divide by (BREEDING!D:D=A2) it divides by 0 or 1 based on false/true so creates lot of errors and keeps only the lines with Animal ID equal to Animal ID of the line. Then it uses LARGE function on all lines with that Animal ID and because of the last part ";1))" it returns the highest row number where the Animal ID matches. This way you should be able to get the value from your second sheet for the newest (highest row number) line.
The formula can be quite resource-intensive so you might have to repaste as values - if you are periodically running a macro on the data you can have it also apply the formula and then change to values if it gets too long to calculate every time you want to change something...
We receive a large CSV data file where each row contains a few columns of metadata and then an arbitrary length sequence of X,Y point values in alternating columns. Different rows may contain different numbers of points. The actual data may contain hundreds of rows, each with many X,Y values (possibly a couple of thousand points).
The format of this data file is not within our control.
As a simple sample for illustration:
Series 1,ID142,2,45,7,21,1,65.5,14,22
Series 2,ID082,11,23,6,15,3,29,13,84,9,78,42,45,15,17
The above example would represent two series: Series 1 with points (2,45), (7,21), (1,65.5), (14,22), and Series 2 with points (11,23), (6,15), (3,29), (13,84), (9,78), (42,45), (15,17).
The most useful way to analyze this data would be multiple series in a scatter plot in Excel. An engineer might be interested in seeing a scatter-plot with row 1 data as series 1 and row 58 data as series 2. That might lead to wanting to see a plot of row 8 and row 97. So, it would not be realistic to have a complicated process to reformat the data depending on the rows of interest.
Is there a way to easily build Excel scatter-plots with multiple series from data where each series is represented by a single row with the multiple X and Y values all in that row in alternating columns (as the sample above)?
You could use the offset function. You can open a new sheet and write this in the first cell A1, press control+ shift+ enter
and drag the formula across the columns
=OFFSET(Sheet1!A1,M,0,1,100)
similarly do this in A2 and drag it across
=OFFSET(Sheet1!A1,N,0,1,100)
This would return the Mth row from top in sheet 1 completely into first row, and Nth row into the second row. You can give reference to M and N values in different cells to make it dynamic.
I have a table that is pulling thousands of rows of data from a very large sheet. Some of the columns in the table are getting their data from every 5th row on that large sheet. In order to speed up the process of creating the cell references, I used an OFFSET formula to grab a cell from every 5th row:
=OFFSET('Large Sheet'!B$2572,(ROW(1:1)-1)*5,,)
=OFFSET('Large Sheet'!B$2572,(ROW(2:2)-1)*5,,)
=OFFSET('Large Sheet'!B$2572,(ROW(3:3)-1)*5,,)
=OFFSET('Large Sheet'!B$2572,(ROW(4:4)-1)*5,,)
=OFFSET('Large Sheet'!B$2572,(ROW(5:5)-1)*5,,)
etc...
OFFSET can eat up resources during calculation of large tables though, and I'm looking for a way to speed up/simplify my formula. Is there any easy way to convert the OFFSET formula into just a simple cell reference like:
='Large Sheet'!B2572
='Large Sheet'!B2577
='Large Sheet'!B2582
='Large Sheet'!B2587
='Large Sheet'!B2592
etc...
I can't just paste values either. This needs to be an active reference, because the large sheet will change.
Thanks for your help.
And here is one last approach to this that does not use VBA or formulas. It's just a quick and dirty use of AutoFilter and deleting rows.
Main idea
Add a reference to a cell =Sheet1!A1 and copy it down to match as many rows as there are in the main data.
Add another formula in B1 to be =MOD(ROW(), 5)
Filter column B and uncheck the 0s (or any single number)
Delete all the rows that are visible
Delete column B
Voila, formulas for every 5th row
Some reference images, these are all taken on Sheet2.
Formulas with AutoFilter ready.
Filtered and ready to delete
Delete all those rows (select A1, CTRL+SHIFT+DOWN ARROW, SHIFT+SPACE, CTRL+MINUS)
Delete column B to get final result with "pure" formulas every 5th row.
If you want to take a VBA approach to this, you can generate the references very quickly using simple For loops.
Here is some very crude code which can get you started. It uses hard-coded sheet names and variables. I am really just trying to show the i*5 part.
Sub CreateReferences()
For i = 0 To 12
For j = 0 To 5
Sheet2.Range("H1").Offset(i, j).Formula = _
"=Sheet1!" & Sheet1.Range("A5").Offset(i * 5, j).Address
Next
Next
End Sub
It works by building a quick formula using the Address from a reference to a cell on Sheet1. The only key here is have one index count cells in the "summary" rows and multiply by 5 to get the reference to the "master" sheet. I am starting at A5 just to match the results from INDEX.
Results show the formula input for H1 and over. I am comparing to the INDEX results generated above.
Here is one approach using INDEX instead of OFFSET. I am not sure if it is faster, I guess you can check. INDEX is not volatile, so you might get some advantage from that.
Picture of ranges, you can see that Sheet1 has a lot of data and Sheet2 is pulling every 5th row from that sheet. The data in Sheet1 goes from A1:F1000 and just reports the address of the current cell.
Formulas use INDEX and are copied down and across from A1 on Sheet2.
=INDEX(Sheet1!$A$1:$F$1000,ROW()*5,COLUMN())
I have old Excel table that I need to manually fill and I made new Excel table where I used CTRL+T to fill in data automatically when typing formula in first row under the header/title cell.
My data is vertical in old:
Numbers Average (for last 10 Numbers from Left Row)
5,780.00
5,730.00
6,600.00
7,300.00
6,120.00
5,250.00
5,210.00
5,100.00
5,770.00
6,370.00 5923.00
6,000.00 5945.00
5,480.00 5920.00
5,120.00 5772.00
4,990.00 5541.00
This is how it should look, this is how I made it manually.
Formula is:
=IF(L11<>"",AVERAGE(L2:L11),"")
Where forumula is in M row (Average) and checking and calculating for L row (Numbers).
But for Table to auto-fill till last row, formula has to be made in first row, then Excel auto-fills.
Average:
5923.00
is from this numbers:
Numbers
5,780.00
5,730.00
6,600.00
7,300.00
6,120.00
5,250.00
5,210.00
5,100.00
5,770.00
6,370.00
How can Average formula for 10 vertical numbers from L row be inserted into any cell above 5923.00 in Average - M Row.
I do know how to fill row, I could copy formula, press CTRL + SHIFT + DOWN to find end of my table and paste formula, but when new data comes (imported CSV that updates), new data would not be filled, I need Excel to auto-fill it.
Here is answer for all if needed, if you have by dates, old data up and new down, then average of first 10 items, can't be calculated in first 10 rows without issues where you have Average.
Here is solution for one way direction:
=IF(AND((ROW())>=11,L2<>""),AVERAGE(OFFSET(M2,-9,-1,10)),"")
What Offset does, is goes up +9 places from current cell M2, then it goes 1 cell left, and from there takes 10 down to mark the range.
IF if statement is wrong it doesn't go left and up, thus no error, after and including ROW 10 it's true statement.
And this is more complicated to use with Table in Excel, when you have sorting by date, when newest date is on top, bottom is data that you can use for average:
=IF($A$2>$A$3,AVERAGE(OFFSET(M2,0,-1,10)),IF(AND((ROW())>=11,L2<>""),AVERAGE(OFFSET(M2,-9,-1,10)),""))
I have first IF (A2>A3) that checks how is table sorted:
- if sorted newest - oldest (1st case) then it takes average from first row on the left, and down 10 places
- if sorted opposite, it goes as said: left one place, and up 9, then takes 10 range.
Works like a charm, bit long, but it works!
You could add a test for which row the formula is in and only return a result if it's 11 or higher. Then you could enter it in the entire column table and it would fill automatically:
=IF(AND(ROW()>=11,L11<>""),AVERAGE(L2:L11),"")
ROW() returns the number of the row the fomula is in.
EDIT: Ok, here's a better one. Put this in M2 and copy down:
=IF(AND(ROW()>=11,L2<>""),AVERAGE(OFFSET($L$1,ROW()-10,0):OFFSET($L$1,ROW()-1,0)),"")
This is a confusing request.
I have an excel tab with a lot of data, for now I'll focus on 3 points of that data.
Team
Quarter
Task Name
In one tab I have a long list of this data displaying all the tasks for all the teams and what Quarter they will be on.
I WANT to load another tab, and take that data (from the original tab) and insert it into a non-list format. So I would have Quarters 1,2,3,4 as columns going across the screen, and Team Groups going down. I want each "task" that is labeled as Q1 to know to list in the Q1 section of that Teams "Block"
So something like this: "If Column A=TeamA,AND Quarter=Q1, then insert Task Name ... here."
Basically, if the formula = true, I want to print a list of those items within that team section of the excel document.
I'd like to be able to add/move things around at the data level, and have things automatically shift in the Display tab. I honestly have no idea where to start.
If there is never a possibility that there could be more that 1 task for a given team and quarter, then you can use a formula solution.
Given a data setup like this (in a sheet named 'Sheet1'):
And expected results like this (in a different sheet):
The formula in cell B2 and copied over and down is:
=IFERROR(INDEX(Sheet1!$C$2:$C$7,MATCH(1,INDEX((Sheet1!$A$2:$A$7=$A2)*(Sheet1!$B$2:$B$7=B$1),),0)),"")
I came across this situation. When I have to insert the values into a table from an Excel sheet I need all information in 1 Column instead of 2 multiple rows. In Excel my Data looks like:
ProductID----OrderID
9353510---- 1212259
9650934---- 1381676
9572474---- 1381677
9632365---- 1374217
9353182---- 1212260
9353182---- 1219361
9353182---- 1212815
9353513---- 1130308
9353320---- 1130288
9360957---- 1187479
9353077---- 1104558
9353077---- 1130926
9353124---- 1300853
I wanted single row for each product in shape of
(ProductID,'OrdersIDn1,OrderIDn2,.....')
For quick solution I fix it with a third column ColumnC to number the Sale of Product
=IF(A2<>A1,1,IF(A2=A1,C1+1,1))
and fourth Column D as a placeholder to concatenate with previous row value of same product:
=IF(A2=A1,D1+","&TEXT(B2,"########"),TEXT(B2,"########"))
Then Column E is the final column I required to hide/blank out duplicate row values and keep only the correct one:
=IF(A2<>A3,"("&A2&",'"&D2&"'),","")
Final Output required is only from Column E
ProductID Order Id Sno PlaceHolder Required Column
9353510 1212259 1 1212259 (9353510,'1212259'),
9650934 1381676 1 1381676 (9650934,'1381676'),
9572474 1381677 1 1381677 (9572474,'1381677'),
9632365 1374217 1 1374217 (9632365,'1374217'),
9353182 1212260 1 1212260
9353182 1219361 2 1212260,1219361
9353182 1212815 3 1212260,1219361,1212815 (9353182,'1212260,1219361,1212815'),
9353513 1130308 1 1130308 (9353513,'1130308'),
9353320 1130288 1 1130288 (9353320,'1130288'),
9360957 1187479 1 1187479 (9360957,'1187479'),
9353077 1104558 1 1104558
9353077 1130926 2 1104558,1130926 (9353077,'1104558,1130926')
You will notice that final values are only with the Maximum Number of ProductSno which I need to avoid duplication ..
In Your case Product could be Team and Order could be Quarter and Output could be
(Team,Q1,Q2,....),
Based on my understanding of your summary above, you want to put non-numerical data into a grid of teams and quarters.
The offset worksheet function will work well for this in conjunction with the match or vlookup functions. I have often done this task by doing the following steps.
In my data table, I have to concatenate the Team and quarter columns so I have a unique lookup value at the leftmost column of your table (Note: you can eventually hide this for ease of reading).
Note: You will want to name the input range for best formula management. Ideally use an Excel Table (2007 or greater) or create a dynamically named range with the offset and CountA functions working together (http://tinyurl.com/yfhfsal)
First, VLOOKUP arguments are VLOOKUP(Lookup_Value,Table_Array,Col_Index_num,[Range Lookup]) See http://tinyurl.com/22t64x7
In the first cell of your output area you would have a VLOOKUP formula that would look like this
=Vlookup(TeamName&Quarter,Input_List,Column#_Where_Tasks_Are,False)
The Lookup value should be referencing cells where you have the team names and quarter names listed down the sides and across the top. The input list is from the sheet you have the data stored. The number three represents the column number the tasks are listed in your source data, and the False tells the function it will only use an exact match in your putput.