Because of the structual nature of dataset, I plan to use VBA in Excel 2010 to process the dataset. The main characteristics of the dataset include three main columns as illustrated:
FromID, ToID, Amount
10, 10, 50
10, 11, 67
10, 12, 56
11, 10, 60
11, 11, 80
12, 10, 17
12, 11, 57
Of course this is the simplified version of the data and the origional data is much complicated than this. The FromID include the point who sends the data and ToID is the poin who receives the data. The amount indicate the size of the data. What I want is, based on the FromID and ToID, to generate a n*n matrix to store the dateset in matrix format in excel,
What i want, the matrix should be as follows:
10 11 12
10 --- 50 67 56
11 --- 60 80 17
12 --- ...
I now have such type of data in columns but I am a noob in VBA and i dont have too much experience. I am wondering is that possible to give me some suggestions about the logics (detailed?) and if possible, could you provide some code snippets with some explanations about how to do this?
Many thanks!
You may not need any VBA coding. Excel has a facility called a Pivot Table to create two dimensional tables of this type. See:
Introduction
Related
I have a dataset like this:
10, 23, 43, 45, 56;
12, 25, 21, 23, 40;
I want to know the average of the difference between the two rows like
mean (10 - 12, 23 - 25, 43 -21 ...)
Of course, this is only an example and the actual rows are hundreds of element long. I would like to compute the average of the difference without having to compute somewhere the difference and then having the average. (The sheet is already pretty big)
Thanks a lot
Mathematically, what you are asking for is identical to:
=AVERAGE(A1:E1)-AVERAGE(A2:E2)
Regards
Try,
=AVERAGE(INDEX((A1:E1)-(A2:E2), , ))
If there were missing values in one range or the other, you would need something like
=AVERAGEIFS(A1:G1,A1:G1,"<>",A2:G2,"<>")-AVERAGEIFS(A2:G2,A1:G1,"<>",A2:G2,"<>")
(I have tested it with blanks in G1 and F2)
I have this file for work (and 7000 others of the same format) that is very messy and not tidy in any way. I've been reading about tidying data using Pandas but feel I'm spinning my wheels at this point...
Here is the raw data viewed in Excel:
Here is some example text from the CSV:
Section 6. Reserve Summary
Ten Minute Reserve Requirement:, 1801
Ten Minute Reserve Estimate:, 1801
Thirty Minute Reserve Requirement:, 626
Thirty Minute Reserve Estimate:, 1926
Expected Actions of OP 4:, 0
Additional Capacity Available from OP 4 Actions:, 0
Section 7. Interchange Summary
Description, Import Limit MW, Export Limit MW, Scheduled, Contract
Highgate, -225, 0, -225
NB, -550, 200, -432
NYISO AC, -1400, 1200, 0
NYISO CSC, -346, 330, 330
NYISO NNC, -200, 200, 194
Phase 2 -2000 1200 -1501
Section 8. Weather Forecast Summary for the Peak Hour
City, Conditions, Wind, High Temperature (F)
Boston, Partly Cloudy, NE-10, 66
Hartford, Mostly Clear, N-12, 77
You can see column A is useless so I can remove. Column B mostly has variable names but also has Section names (rows 7, 9, 11...). Sometime column B has the value, but most of the time the value is listed in Column C-- also sometimes listed in Column D. Lines 44- 54 have some extra formatting going on where there are is a table of variable names and values...
Anyway, I absolutely do not have the skills to turn this into a tidy dataframe and will need to throw this to someone else. However, I'm hoping anyone can give advice on what to do. Is this even called 'data cleaning' or 'data structuring'?
I dropped Col A, then transposed the data, but that is far from setting this dataframe up correctly. What are other techniques to move data into the tidy structure needed?
Any resources shared would be great! I searched for too long on 'tidy data', 'data cleaning', 'data structuring' but all were too simplistic compared to this application.
Okay, so honestly this is a homework question, but I really did my best to find the solution, and I think I partially did.
The question:
We are given a series of cities whose positions are symbolized with only one coordinate and we are supposed to implement a given number of hospitals to cities so that the sum of each cities' distance to nearest hospital will be minimum.
That is, if we are given the cities at 1, 3, 5, 7, 9, 11, 13 and if we are going to put 3 hospitals, the hospitals will be at 3, 7, 11 (actually there could be multiple best solutions for this one, did not check).
We are advised to use dynamic programming and first check the case in which we implement only one hospital.
I've figured out finding the subsequent hospital's location. I create a table, and of cities. Then to each cell, I put either the city of the current rows' distance to closest hospital that already build or city's distance to city of the corresponding column.
For example, if we already implemented a hospital to 1, it would be like:
*-1-3-5-7-9-11-13
1|0|0|0|0|0|0||0|
3|2|0|2|2|2|2|2|
5|4|2|0|2|4|4|4|
..............
then sum the columns and find the next hospital.
The problem is, I cannot figure out the first hospital that I'm supposed to build!!
When I manually add one of the element of the actual solution, I can get the right answer so my partial solution should be true.
BTW complexity should be O(CityNum^2), hospitalNum is a constant. So I can't use bruteforce.
An example input and output (from the homework assg):
Input:
10 5 (10 is city num, 5 is hospital num)
1 2 3 6 7 9 11 22 44 50 (coordinates)
Output:
9 (sum of minimum distances)
I need a formula on Excel which would deduct the multiple of 20 from a given number. For example:
2 would give you 18,
10 would give 10,
23 would give 17,
118 would give 2,
321 would give 19.
It's worth noting that Excel has built-in functions for working with multiples: CEILING and FLOOR (in newer versions you have CEILING.MATH and FLOOR.MATH).
In your case, this should work:
=CEILING(A1,20)-A1
You want to divide it by 20, round it up and multiply it by 20, the rest is trivial.
=-(A1-ROUNDUP(A1/20,0)*20)
I have an excel sheet that is organized as follows:
COL1 COL2
1 30
2 30
3 29
4 12
5 12
6 12
In the above, as you can see, values are being repeated in COL2. I need to group these values count and then place them in separate workbooks. So, for example, the output should be
Total Records: 2
1, 30, ......
2, 30, ......
Total Records: 1
3, 29, ......
Total Records: 3
4, 12, ......
5, 12, ......
6, 12, ......
Once that is calculate, I need them exported into separate excel sheets.
Can someone please help me figure out the best approach to do this? How may this be done in Excel?
You can create pivot tables to do this. Create one pivot table for all your data, then duplicate and separate one pivot per value in Column 2. Put each of those pivots in the new work sheets as you need. If you need help with setting the pivots up, please take a look at this site.
This can also be done in VBA, if this is a large task or something you will need to do on a regular basis. Since you didn't ask for the VBA, I'll assume for now that you mean to do this manually.