I need to turn this excel sheet (one number per cell):
A B C D E F G
--------------------------------
1 | 1 2 3 4 5 6 7
2 | 8 9 10 11 12 13 14
3 | 15 16 17 18 19 20 21
into this (with all spaces between numbers, each row in one cell):
A
------------
| 1
1 | 2
| 3 4 5
| 6 7
—-line break—-
| 8
2 | 9
| 10 11 12
| 13 14
—-line break—-
| 15
3 | 16
| 17 18 19
| 20 21
Does anyone have any ideas or suggestions?
After a little playing around, I finally found a formula that worked for the above. The CHAR(10)s are the line breaks.
=(TRANSPOSE(A1)&CHAR(10)&TRANSPOSE(B1)&CHAR(10)&CONCATENATE(C1," ",D1," ",E1)&CHAR(10)&CONCATENATE(F1," ",G1)&CHAR(10))
Related
I want to create a new column in Python dataframe with specific requirements from other columns. For example, my python dataframe df:
A | B
-----------
5 | 0
5 | 1
15 | 1
10 | 1
10 | 1
20 | 2
15 | 2
10 | 2
5 | 3
15 | 3
10 | 4
20 | 0
I want to create new column C, with below requirements:
When the value of B = 0, then C = 0
The same value in B will have the same value in C. The same values in B will be classified as start, middle, and end. So for values 1, it has 1 start, 2 middle, and 1 end, for values 3, it has 1 start, 0 middle, and 1 end. And the calculation for each section:
I specify a threshold = 10.
Let's look at values B = 1 :
Start :
C.loc[2] = min(threshold, A.loc[1]) + A.loc[2]
Middle :
C.loc[3] = A.loc[3]
C.loc[4] = A.loc[4]
End:
C.loc[5] = min(Threshold, A.loc[6])
However, the output value of C will be the sum of the above calculations.
When the value of B is unique and not 0. For example when B = 4
C[10] = min(threshold, A.loc[9]) + min(threshold, A.loc[11])
I can solve point 0 and 3. But I'm struggling to solve point 2.
So, the final output will be:
A | B | c
--------------------
5 | 0 | 0
5 | 1 | 45
15 | 1 | 45
10 | 1 | 45
10 | 1 | 45
20 | 2 | 50
15 | 2 | 50
10 | 2 | 50
5 | 3 | 25
10 | 3 | 25
10 | 4 | 20
20 | 0 | 0
This question already has answers here:
How do I create a new column from the output of pandas groupby().sum()?
(4 answers)
Closed 3 years ago.
I want to create a new column in python dataframe based on other column values in multiple rows.
For example, my python dataframe df:
A | B
------------
10 | 1
20 | 1
30 | 1
10 | 1
10 | 2
15 | 3
10 | 3
I want to create variable C that is based on the value of variable A with condition from variable B in multiple rows. When the value of variable B in row i,i+1,..., the the value of C is the sum of variable A in those rows. In this case, my output data frame will be:
A | B | C
--------------------
10 | 1 | 70
20 | 1 | 70
30 | 1 | 70
10 | 1 | 70
10 | 2 | 10
15 | 3 | 25
10 | 3 | 25
I haven't got any idea the best way to achieve this. Can anyone help?
Thanks in advance
recreate the data:
import pandas as pd
A = [10,20,30,10,10,15,10]
B = [1,1,1,1,2,3,3]
df = pd.DataFrame({'A':A, 'B':B})
df
A B
0 10 1
1 20 1
2 30 1
3 10 1
4 10 2
5 15 3
6 10 3
and then i'll create a lookup Series from the df:
lookup = df.groupby('B')['A'].sum()
lookup
A
B
1 70
2 10
3 25
and then i'll use that lookup on the df using apply
df.loc[:,'C'] = df.apply(lambda row: lookup[lookup.index == row['B']].values[0], axis=1)
df
A B C
0 10 1 70
1 20 1 70
2 30 1 70
3 10 1 70
4 10 2 10
5 15 3 25
6 10 3 25
You have to use groupby() method, to group the rows on B and sum() on A.
df['C'] = df.groupby('B')['A'].transform(sum)
I know it's likely possible to do this with awk, but I have no idea how to do it.
Suppose I have the following 2 tab separated files, where there are blank lines that only contain \n:
file1:
A 1 4
B 2 5
C 3 6
D 7 10
E 8 11
A 9 12
file2:
E 13 16
F 14 17
G 15 18
H 19 22
I 20 23
J 21 24
I want to generate a new file which corresponds to the concatenation of the first 2 columns from file 1 with the third column from file 2, and then the third column from file 1:
final file:
A 1 16 4
B 2 17 5
C 3 18 6
D 7 22 10
E 8 23 11
A 9 24 12
Note that, in the final file, it's important that the blank lines should be kept blank, and no tabs should be inserted in there.
Simple paste + awk combination:
paste file1 file2 | awk '!NF{ print "" }NF{ print $1,$2,$6,$3 }'
The output:
A 1 16 4
B 2 17 5
C 3 18 6
D 7 22 10
E 8 23 11
A 9 24 12
awk 'NR==FNR{a[NR]=$3;next} NF{$3=a[FNR] OFS $3} 1' file2 file1
There are two parts of my query:
1) I have multiple .xlsx files stored in a folder, a total of 1 year's worth (~ 365 .xlsx files). They are named according to date: ' A_ddmmmyyyy.xlsx' (e.g. A_01Jan2016.xlsx). Each .xlsx has 5 columns of data: Date, Quantity, Latitude, Longitude, Measurement. The problem is, each .xlsx file consists about 400,000 rows of data and although I have scripts in Excel to merge them, the inherent row restriction in Excel prevents me from merging all the data together.
(i) Is there a way to read recursively the data from each .xlsx sheet into MATLAB, and specifying the variable name (i.e. Date, Quantity etc) for each column(variable) within MATLAB (there are no column headings in the .xlsx files)?
(ii) How can I merge the data for each column from each .xlsx together?
Thank you
Jefferson
Let's go by parts
First I do not recommend to join all your files data in one column, there is no need to have this information all together you can work separately with this, using for example datastore
working in matlab in mya directory:
>> pwd
ans =
/home/anquegi/learn/matlab/stackoverflow
I have a folder with a folder that have two sample excel files:
>> ls
20_hz.jpg big_data_store_analysis.m excel_files octave-workspace sample-file.log
40_hz.jpg chirp_signals.m NewCode.m sample.csv
>> ls excel_files/
A_01Jan2016.xlsx A_02Jan2016.xlsx
the content of each file is :
Date Quantity Latitude Longitude Measurement
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
15 15 15 15 15
16 16 16 16 16
17 17 17 17 17
18 18 18 18 18
19 19 19 19 19
20 20 20 20 20
21 21 21 21 21
22 22 22 22 22
Only to who how it will work.
Reading the data:
>> ssds = spreadsheetDatastore('./excel_files')
ssds =
SpreadsheetDatastore with properties:
Files: {
'/home/anquegi/learn/matlab/stackoverflow/excel_files/A_01Jan2016.xlsx';
'/home/anquegi/learn/matlab/stackoverflow/excel_files/A_02Jan2016.xlsx'
}
Sheets: ''
Range: ''
Sheet Format Properties:
NumHeaderLines: 0
ReadVariableNames: true
VariableNames: {'Date', 'Quantity', 'Latitude' ... and 2 more}
VariableTypes: {'double', 'double', 'double' ... and 2 more}
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'Date', 'Quantity', 'Latitude' ... and 2 more}
SelectedVariableTypes: {'double', 'double', 'double' ... and 2 more}
ReadSize: 'file'
Now you have all your data in tables let's see a preview
>> data = preview(ssds)
data =
Date Quantity Latitude Longitude Measurement
____ ________ ________ _________ ___________
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
The preview is a good point to get sample data to work.
You do not need to merge you can work throught all the elements:
>> ssds.VariableNames
ans =
'Date' 'Quantity' 'Latitude' 'Longitude' 'Measurement'
>> ssds.VariableTypes
ans =
'double' 'double' 'double' 'double' 'double'
% let's get all the Latitude elements that have Date equal 1, in this case the tow files are the same, so we wil get two elements with value 1
>> reset(ssds)
accum = [];
while hasdata(ssds)
T = read(ssds);
accum(end +1) = T(T.Date == 1,:).Latitude;
end
>> accum
accum =
1 1
So you need to work with datastore and tables, is a bit tricky but very useful, you also would like to control the readsize and other variables in datastore objects. but this is a good way working with large data files in matlab
For older versions of matlab you can use a more traditional approximation:
folder='./excel_files';
filetype='*.xlsx';
f=fullfile(folder,filetype);
d=dir(f);
for k=1:numel(d);
data{k}=xlsread(fullfile(folder,d(k).name));
end
Now you have the data stored in data
folder='./excel_files';
filetype='*.xlsx';
f=fullfile(folder,filetype);
d=dir(f);
for k=1:numel(d);
data{k}=xlsread(fullfile(folder,d(k).name));
end
data
data =
[22x5 double] [22x5 double]
data{1}
ans =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
15 15 15 15 15
16 16 16 16 16
17 17 17 17 17
18 18 18 18 18
19 19 19 19 19
20 20 20 20 20
21 21 21 21 21
22 22 22 22 22
But be carefull with a lot of large file
I have a table full of numbers with with headings. I also have a separate list of numbers that are contained in the table. I would like to find the location of each number on the list, in the table. I would then like to use the cell location to provide the corresponding row heading. I demonstrated what I'm looking for below.
How do I go about doing this? I'm imagining some combination of index/match functions, or perhaps vlookup, but none of the formulas that I've tried have worked so far. I'm completely lost at this point, so any help will be appreciated.
Thanks in advance!
Imagine something like this:
Table:
- Category A 1 2 3 4 5
- Category B 6 7 8 9 10
- Category C 11 12 13 14 15
- Category D 16 17 18 19 20
- Category E 21 22 23 24 25
List:
22
5
10
4
18
6
14
2
Desired Outcome:
- 22 Category E
- 5 Category A
- 10 Category B
- 4 Category A
- 18 Category D
- 6 Category B
- 14 Category C
- 2 Category A
Step 1: Find the row that the matching value is in
You can find the matching row by using a combination of a boolean function and SUMPRODUCT:
SUMPRODUCT((dataRange=22)*ROW(dataRange))
(note that this assumes that the items are all unique; it will not work if you have more than one match)
Step 2: find the category for that row
OFFSET(categoryACell, rows, 0)
so the resulting function would be:
OFFSET(categoryACell, SUMPRODUCT(--(dataRange=22)*ROW(dataRange)), 0)
A | B | C | D | E | F
_________________________________________________________
1 || Category A | 1 | 2 | 3 | 4 | 5
2 || Category B | 6 | 7 | 8 | 9 | 10
3 || Category C | 11 | 12 | 13 | 14 | 15
4 || Category D | 16 | 17 | 18 | 19 | 20
5 || Category E | 21 | 22 | 23 | 24 | 25
6 ||
7 ||
8 ||
9 ||
10 || 22 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A10)*ROW(B1:F5)))
11 || 5 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A11)*ROW(B1:F5)))
12 || 10 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A12)*ROW(B1:F5)))
13 || 4 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A13)*ROW(B1:F5)))
14 || 18 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A14)*ROW(B1:F5)))
15 || 6 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A15)*ROW(B1:F5)))
16 || 14 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A16)*ROW(B1:F5)))
17 || 2 | =INDIRECT("A"&SUMPRODUCT((B1:F5=A17)*ROW(B1:F5)))