How to convert matrix data into columns? - excel

I have data of 100 x 101. I want to convert them in series e.g. for first row all column data then for 2nd row all column data and so on. It means the result will be three columns only. The first column with row numbers, the 2nd column with column numbers and the 3rd column with the value for that respective row and column.
Could you please help me doing this conversion in MATLAB.
Available data are in ASCII format and it is possible to open in both MATLAB and Excel.

This can be done by find:
A = rand(100,101);
[data(:,1), data(:,2), data(:,3)] = find(A);
data = sortrows(data,[1 2]);
Note that this is highly inefficient, as you are storing 3 values where you only need to store 1 (the element's actual value). For accessing a specific element, say row 31, column 43, you simply do A(31,43), where you index the matrix.
The file size of data is indeed three times larger than that of A:
whos
Name Size Bytes Class Attributes
A 100x101 80800 double
data 10100x3 242400 double

You can use the ind2sub function that is faster and make more sense in this situation:
tic
A = rand(100,101);
[data(:,1), data(:,2), data(:,3)] = find(A);
data = sortrows(data,[1 2]);
toc
tic
B = A' ;
[data_B(:,1), data_B(:,2)] = ind2sub(size(B), 1:length(B(:)));
data_B(:,3) = B(:);
toc
The output for the timing is as follow:
Elapsed time is 0.002130 seconds (first method)
Elapsed time is 0.000525 seconds (second method).

Related

C++ Excel Automation : How to use SetFormula function for the cell elements whose row & column were unknown but will be given by the program later on?

I am working on the heritage codes which use C++ Excel Automation to output our analysis data in the excel spreadsheet. From the following article,
https://support.microsoft.com/en-us/topic/how-to-use-mfc-to-automate-excel-and-create-and-format-a-new-workbook-6f2450bc-ba35-a36a-df2f-c9dd53d7aef1
I knew we can use "range.SetFormula() function to calculate the formula results from some specific cells, for example:
range = sheet.GetRange(COleVariant("C2"), COleVariant("C6"));
range.SetFormula(COleVariant("=A2 & \" \" & B2"));
My question here is how can I use SetFormula function to point to some cell elements whose row & column are unknow but will be determined as the program runs. In specifically, I have a number of cell elements populated as my analysis runs. Different analysis will have different number of elements output to the excel spreadsheet. For example, if I have kw data, then the excel output will be populated in kw row 6 column and I also need to output some summary results based on these element underneath these populated elements. Something like this:
int kw = var_length; // the row changes depending on different analysis
DWORD numElements[2];
Range range;
range = sheet.GetRange(COleVariant(_T("A3")),COleVariant(_T("A3")));
numElements[0]= kw; //Number of rows in the range.
numElements[1]= 6; //Number of columns in the range.
saRet.Create(VT_R8, 2, numElements);
for(int iRow = 0;iRow < kw; iRow++)
{
for (iCol = 0; iCol < 6; iCol++)
{
index[0] = iRow;
index[1] = iCol;
saRet.PutElement(index, &somevalue);
}
}
range.SetValue2(COleVariant(saRet));
CString TStr;
TStr.Format(_T("A%d"), kw+2);
range = sheet.GetRange(COleVariant(TStr), COleVariant(TStr))
CString t1, t2;
t1.Format(_T("A%d"), kw/2);
t2.Format(_T("A%d"), kw);
range.SetFormula(COleVariant(L"=SUM(A&t1: A&t2)")); // Calculate the sum of second half of whole elements, Apparently, this didn't work, How can I fix this?
Here I want to sum the second half of whole elements but in the SetFormula function, I didn't know exactly row number for these element, eg, A25 - A50. The row number is dependent on the kw which is given as input from program. Different analysis, kw is different. I attempted to use TStr format to get the row number but it CAN NOT be used inside SetFormula function. Ideally I want to use formula for my summary data output so that if I change my populated the element values, the summary data output can change accordingly. I searched in your MSDN website but couldn't find any solution on how to resolve this.
Can someone help me with the issue?
Thanks in advance.

{Python} - [Pandas] - How sum columns by condition less than in columns name

First explaining the dataframe, the values of columns '0-156', '156-234', '234-546' .... '> 76830' is the percentage distribution for each range of distances in meters, totaling 100%.
Column 'Cell Name' refers to the data element of the other columns and the column 'Distance' is the column that will trigger the desired sum.
I need to sum the values of the columns '0-156', '156-234', '234-546' .... '> 76830' which are less than the value of the 'Distance' (Meters) column.
Below creation code for testing.
import pandas as pd
# initialize list of lists
data = [['Test1',0.36516562,19.065996,49.15094,24.344206,0.49186087,1.24217,5.2812457,0.05841639,0,0,0,0,158.4122868],
['Test2',0.20406325,10.664485,48.70978,14.885571,0.46103176,8.75815,14.200708,2.1162114,0,0,0,0,192.553074],
['Test3',0.13483211,0.6521175,6.124511,41.61725,45.0036,5.405257,1.0494527,0.012979688,0,0,0,0,1759.480042]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Cell Name','0-156','156-234','234-546','546-1014','1014-1950','1950-3510','3510-6630','6630-14430','14430-30030','30030-53430','53430-76830','>76830','Distance'])
Example of what should be done:
The value of column 'Distance' = 158.412286772863 therefore would have to sum the values <= of the following columns, 0-156, '156-234' totalizing 19.43116162 %.
Thanks so much!
As I understand it, you want to sum up all the percentage values in a row, where the lower value of the column-description (in case of '0-156' it would be 0, in case of '156-234' it would be 156, and so on...) is smaller than the value in the distance column.
First I would suggest, that you transform your string-like column-names into values, as an example:
lowerlimit=df.columns[2]
>>'156-234'
Then read the string only till the '-' and make it a number
int(lowerlimit[:lowerlimit.find('-')])
>> 156
You can loop this through all your columns and make a new row for the lower limits.
For a bit more simplicity I left out the first column for your example, and added another first row with the lower limits of each column, that you could generate as described above. Then this code works:
data = [[0,156,234,546,1014,1950,3510,6630,11430,30030,53430,76830,1e-23],[0.36516562,19.065996,49.15094,24.344206,0.49186087,1.24217,5.2812457,0.05841639,0,0,0,0,158.4122868],
[0.20406325,10.664485,48.70978,14.885571,0.46103176,8.75815,14.200708,2.1162114,0,0,0,0,192.553074],
[0.13483211,0.6521175,6.124511,41.61725,45.0036,5.405257,1.0494527,0.012979688,0,0,0,0,1759.480042]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['0-156','156-234','234-546','546-1014','1014-1950','1950-3510','3510-6630','6630-14430','14430-30030','30030-53430','53430-76830','76830-','Distance'])
df['lastindex']=None
df['sum']=None
After creating basically your dataframe, I add two columns 'lastindex' and 'sum'.
Then I am searching for the last index in every row, that is has its lower limit below the distance given in that row (df.iloc[x,-3]); afterwards I'm summing up the respective columns in that row.
for i in np.arange(1,len(df)):
df.at[i,'lastindex']=np.where(df.iloc[0,:-3]<df.iloc[i,-3])[0][-1]
df.at[i,'sum']=sum(df.iloc[i][0:df.at[i,'lastindex']+1])
I hope, this is helpful. Best, lepakk

append cell row to matrix

I'm reading an excelfile in matlab with
[NUM,TXT,RAW]=xlsread(DATENEXCEL,sSheet_Data);
In the excelfile are different datamatrices in different sheets in the following form
Date Firm1 Firm2 Firm3 ...
1.1.16 12 12 12
... ... ... ...
Currently I'm handling the pure data with the NUM object and the header row with the TXT object. My first issue is how to combine the header row with the data rows. Looping does not work, since I predefine the data matrix with
daten=zeros([length(sDatesequence) size(RAW,2)]);
because I want to be able to add more data from different sources to that object. Predefining with zeros, however, leads Matlab to expect doubles and not characters. Converting the cell array TXT with cell2mat delivers unsatisfying results:
cell2mat(TXT(1,:))=Firm1Firm2Firm3...
hence only a long string vector.
Question: Is there another way to combine character vectors and double matrices?
Regards,
Richard
You can combine them in a cell array.
c{1,1} = 'Firm1';
c{1,2} = datavector;
c{2,1} = 'Firm2';
c{2,2} = datavector;
But as far as I know it is not possible to add text headers to a numerical matrix, unless you do something with typcasting. But I would not recommend that.
d(1:8)='Firm1 '; %must have exactly eight characters (a double has a length of 8 bytes)
y = typecast(uint8(d),'double') %now you have a number that would fit in a matrix of doubles
x=char(typecast(y,'uint8')) %now it's converted back to text

Secifying a common range for xlsread function in Matlab

I am trying to figure out how to specify a common range for xlsread() function in matlab.
Usually I use n=xlsread('filename','#sheet','A1:A10'), but I have quite a bit of data in the same sheet and I'd like to know if I can specify it with one range, i.e . if all my data is between '1:10', I want to specify 1:10 as range, and only call the letter values of each column.
I was thinking to do it as follows:
function [a,b,c]=getdata(filename,'1:10')
a=xlsread(filename,1,'A:A'???)
b=xlsread(filename,1,'B:B'???)
c=xlsread(filename,1,'C:C'???)
end
After some research I could not find any information as to how this is done.
Thanks in advance,
Greg
If you want to read 1 to 10 rows of column A, use:
data = xlsread(filename, 1, 'A1:A10');
If you want to read 1 to 10 rows of all columns, use:
data = xlsread(filename, 1, '1:10');
If you want to read 1 to 10 rows of, say, first three columns A, B, and C, use:
data = xlsread(filename, 1, 'A1:C10');
Using dynamic variable names is always a bad idea. Read this for explanation. But if you still want to create a, b, and c and so on depending on the number of columns in the Excel file, you can use:
for k=1:size(data,2)
assignin('caller', char(96+k), data(:,k)); %or char(64+k) for block letters
end
The above will work if number of columns are less than or equal to 26. This may only be feasible if you're dealing with a few columns. But I still recommend to avoid it.

Reading and Combining Excel Time Series in Matlab- Maintaining Order

I have the following code to read off time series data (contained in sheets 5 to 19 in an excel workbook). Each worksheet is titled "TS" followed by the number of the time series. The process works fine apart from one thing- when I study the returns I find that all the time series are shifted along by 5. i.e. TS 6 becomes the 11th column in the "returns" data and TS 19 becomes the 5th column, TS 15 becomes the 1st column etc. I need them to be in the same order that they are read- such that TS 1 is in the 1st column, TS 2 in the 2nd etc.
This is a problem because I read off the titles of the worksheets ("AssetList") which maintain their actual order throughout subsequent codes. Therefore when I recombine the titles and the returns I find that they do not match. This complicates further manipulation when, for example column 4 is titled "TS 4" but actually contains the data of TS 18.
Is there something in this code that I have wrong?
XL='TimeSeries.xlsx';
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
Bounds=3;
[Bounds,~] = xlsread(XL,Bounds);
% Determine the number of worksheets in the xls-file:
FirstSheet=5;
[~,AssetList] = xlsfinfo(XL);
lngth=size(AssetList,2);
AssetList(:,1:FirstSheet-1)=[];
% Loop through the number of sheets and RETRIEVE VALUES
merge_count = 1;
for I=FirstSheet:lngth
[FundValues, ~, FundSheet] = xlsread(XL,I);
% EXTRACT DATES AND DATA AND COMBINE
% (TO REMOVE UNNECCESSARY TEXT IN ROWS 1 TO 4)
Fund_dates_data = FundSheet(4:end,1:2);
FundDates = cellstr(datestr(datevec(Fund_dates_data(:,1),...
formatIn),formatOut));
FundData = cell2mat(Fund_dates_data(:,2));
% CREATE TIME SERIES FOR EACH FUND
Fundts{I}=fints(FundDates,FundData,['Fund',num2str(I)]);
if merge_count == 2
Port = merge(Fundts{I-1},Fundts{I},'DateSetMethod','Intersection');
end
if merge_count > 2
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection');
end
merge_count = merge_count + 1;
end
% ANALYSE PORTFOLIO
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Returns)
[qassetmean, qassetcovar] = q.getAssetMoments
This is probably due to merge. By default, it sorts columns alphabetically. Unfortunately, as your naming pattern is "FundN", this means that, for example, Fund10 will normally be sorted before Fund9. So as you're looping over I from 5 to 19, you will have Fund10, through Fund19, followed by Fund4 through Fund9.
One way of solving this would to be always use zero padding (Fund01, Fund02, etc) so that alphabetical order and numerical order are the same. Alternatively, force it to stay in the order you read/merge the data by setting SortColumns to 0:
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection','SortColumns',0);

Resources