Split Excel worksheet using Python - excel

I have excel files (hundreds of them) that look like this (sensor output):
Column1 Column2 Column3
Serial Number:
10004
Ref. Temp:
25C
Ref. Pressure:
1KPa
Time Temp. Pres.
1 21 1
2 22 1.1
3 23 1.2
. . .
. . .
. . .
I want to split this into two parts, the information section (top part) and data section (the rest), something like this:
Information section
Column1 Column2 Column3
Serial Number:
10004
Ref. Temp:
25C
Ref. Pressure:
1KPa
Data section:
Column1 Column2 Column3
Time Temp. Pres.
1 21 1
2 22 1.1
3 23 1.2
. . .
. . .
. . .
if it converts to data frame I don't want the first row and column become header and index of the data frame. I am using python 2.7 and numpy.

Make two copies of the worksheet.
In copy A, start a loop, going on the first column looking for the word Time. Once it finds it, let it delete anything before it.
Remember the row in a variable.
In copy B, delete anything after the remembered row to row number 2^20.

Related

Aggregating records with two main IDs in [VBA macro]

I want to make a macro in Excel that summarizes data from rows that match a composite ID generated from 2 ID columns. In my excel sheet, each row has 2 main ID columns: ID_1 is the main key, and ID_2 is a secondary key from which I only care about the first 2 letters (Which I have gotten using LEFT). I want to group rows with the same ID_1 and first 2 letters of ID_2 and report the SUM of the value, count, and sum columns.
In the example picture below, I want to turn the data in columns A:J into the data in columns M:V
So, with this example -> We have 6 records 1015 (ID_1) with 3 different ID_2 (AB, AZ, AE). I want to sum them up to a one cell each (1015 AB ; 1015 AZ ;1015 AE) with values which each record had (there is 3 records: 1015 AB with VALUE of 2,3,4 so in result I want to get just one row 1015 AB 9(sum of value) 4(sum of count) 17 (sum of(value * count)). It's important to see that this 17 dosn't come from 9 * 4. It's =sum(I4:I6) (but it may be spread out like in 1200 FF example below! I am still trying to sort them both at one time, but I cant get past it..)
Add a helper column in D to combine the ID_1 and the first 2 characters of ID_2. =A4 & LEFT(C4,2). Copy that down then go to L4 and type in:
=+INDEX($D$4:$D$25,MATCH(0,COUNTIF(L$3:L3,$D$4:$D$25),0)
and hold down Ctrl + Shift + Enter to make it an array function. Copy down to get a list of unique combinations, and then split these values into the separate columns.
Finally to pull in the numbers, put this in Q4:
=SUMIFS(E$4:E$25,$A$4:$A$25,M4,$C$4:$C$25,O4 & "*")
and then copy down and across.

Grouping rows together in excel if they are scattered based on a field

Lets say we have an excel with customerid and amount . If i sort the excel on amount my customers will be scattered. So i want to achieve sort and then group same customers amount tohether maintaining that sort.
If i have below
Row 1 . X 200
Row 2. Y 245
Row 3. Z 45
Row 4. Y 456
Row 5. Z 23
Row 6. T 5678
I want output as :
T 5678
Y 456
Y 245
X 200
Z 45
Z 23
When you select a single column and perform a sort, it disregards values from other columns. So, before sorting your Excel sheet, you need to select the entire sheet:
After that, choose "sort" from the menu, and select sort by column A and then by column B:
Result:
Please see the sorted data on Column B (modified data for Z as 999).
Sorted on Column B
Now below is the result i want to achieve.That is , after sorted on Column B , the rows should re-arrange such that the customers data gets together. Result as below, we can see that 2nd last row z 999 had to be moved , putting below Z 23.
Final Result

Average top n values where next cell equals value

Having below data I would like to calculate average of top n values or top n% of values in column B where values in column A = C1.
I tried many different formulas but I can't get it to working.
{=AVERAGEIFS(B1:B14,A1:A14,C1,B1:B14,(IF(B1:B14>PERCENTILE(B1:B14,0.7),B1:B14)))}
{=AVERAGE(IF(A1:A14=C1,IF(B1:B14>PERCENTILE(B1:B14,0.7),B1:B14)))}
A B C
1 a 1 a
2 a 1 cs
3 cs 1 ffs
4 a 1 .
5 a 1 .
6 ffs 1 .
7 a 1 .
8 a 1 .
9 as 1 .
10 a 1 .
11 sfaq 1 .
12 a 1 .
13 aasf 1 .
14 a 1 .
15 a 1 .
16 a 1 .
17 qw 1 .
. . . .
. . . .
. . . .
Image of worksheet
I think your AVERAGE version is on the right lines but when you calculate the PERCENTILE you also need the column A condition included within that calculation, e.g. this array formula
=AVERAGE(IF(A1:A14=C1,IF(B1:B14>PERCENTILE(IF(A1:A14=C1,B1:B14),0.7),B1:B14)))
confirmed with CTRL+SHIFT+ENTER
You can get the same result with a regular (non array) formula using AVERAGEIFS if you use AGGREGATE function to calculate the PERCENTILE, i.e. like this
=AVERAGEIFS(B1:B14,A1:A14,C1,B1:B14,">"&AGGREGATE(16,6,B1:B14/(A1:A14=C1),0.7))
I expect the two formulas to give the same results

Importing SAS column names from Excel row

I am trying to create a SAS table from a XLSX Excel-file which looks like below. The SAS column names will be 3rd row in the Excel file and reading data from the 5th row.
A B C D F ...
1
2
3 Date Period Rate Rate down Rate up ...
4
5 2015-04-30 1 0.25 0.23 0.27 ...
6 2015-05-31 2 0.21 0.19 0.23 ...
. .........................................
. .........................................
I am using proc import to gather the table as below:
proc import datafile = have out=want DBMS = excel;
GETNAMES=YES; MIXED=YES; SCANTEXT=YES; USEDATE=YES; DATAROW=5;
run;
The problem is that Proc Import takes the column names in the 3rd row in numeric format like the rest of the Excel file, so SAS puts "." instead of column names like Date or Rate because SAS doesn't understand them as numeric values.
I found proc import options like DATAROW=5 to read the data from the fifth row, and MIXED=YES to indicate that the Excel-table include both numeric and character values. GETNAMES=YES to get column names from the table, and SCANTEXT=YES to scan text as you can understand. However, even with those options I got the same SAS table like below. The whole SAS-table is in numeric format, so it can't resolve names from Excel:
F1 F2 F3 F4 F5 ...
1 . . . . . ...
2 . . . . . ...
3 30APR2015 1 0.25 0.23 0.27 ...
4 31MAY2015 2 0.21 0.19 0.23 ...
. ...............................
. ...............................
Any idea about how to import the 3rd row of the XLSX file as my column name in the SAS table?
OK. I found the solution. I should have just added a simple option like RANGE=A3:G2000. In a very strange matter, I got error with the option DATAROW=5, so I removed it. So the code becomes:
proc import datafile = have out=want DBMS = excel;
GETNAMES=YES; MIXED=YES; SCANTEXT=YES; USEDATE=YES; RANGE='A3:G2000';
run;
Now it works. But that RANGE option is not written on every webpage, it was difficult to find.
It was also very strange that SAS couldn't realize that character values like "Date" should be in character format. But it realizes it when you use a Range option?

excel map values to same key column

From a text file i copy 10 lines of text in the format productName#qty.
First time around it could be in the following order. I paste this onto excel and separate the data by #
A#10 -> A 10
D#25 -> D 25
Second time around it could be in the following order. I do the same as before.
B#10 -> B 10
A#12 -> A 12
I want to merge the 2 sets of data and want the output to be something like this
A 10 12
B 10
D 25
Any help on how to do this. I don't know programming or macros, so any detailed description will be greatly appreciated.
Add a column for 'time around' and create a PivotTable with that for COLUMNS, Product for ROWS and Sum of Qty for Sigma VALUES:

Resources