apache-poi: Handling cases when number of columns is more than 26 - apache-poi

I have to create excel files with data coming from a database. So far number of columns is less than 26, so following works fine:
val c = ('A'.toInt + numCols- 1).toChar
val ref = s"A1:${c}1"
CellRangeAddress.valueOf(ref)
But for cases when number of columns is much higher, I need to map to a-z, or even double letters AA,AB,AC so on.
Is there some API in POI to make it easy

Related

Using value_counts() and filter elements based on number of instances

I use the following code to create two arrays in a histogram, one for the counts (percentages) and the other for values.
df = row.value_counts(normalize=True).mul(100).round(1)
counts = df # contains percentages
values = df.keys().tolist()
So, an output looks like
counts = 66.7, 8.3, 8.3, 8.3, 8.3
values = 1024, 356352, 73728, 16384, 4096
Problem is that some values exist one time only and I would like to ignore them. In the example above, only 1024 repeated multiple times and others are there only once. I can manually check the number of occurrences in the row and see if they are not repeated multiple times and ignore them.
df = row.value_counts(normalize=True).mul(100).round(1)
counts = df # contains percentages
values = df.keys().tolist()
for v in values:
# N = get_number_of_instances in row
# if N == 1
# remove v in row
I would like to know if there are other ways for that using the built-in functions in Pandas.
Some clarity requested on your question in comments above
If keys is a column and you want to retain non duplicates, please try
values=df.loc[~df['keys'].duplicated(keep=False), 'keys'].to_list()

How to extract text from a string between where there are multiple entires that meet the criteria and return all values

This is an exmaple of the string, and it can be longer
1160752 Meranji Oil Sats -Mt(MA) (000600007056 0001), PE:Toolachee Gas Sats -Mt(MA) (000600007070 0003)GL: Contract Services (510000), COT: Network (N), CO: OM-A00009.0723,Oil Sats -Mt(MA) (000600007053 0003)
The result needs to be column1 600007056 column2 600007070 column3 600007053
I am working in Spotfire and creating calclated columns through transformations as I need the columns to join to other data sets
I have tried the below, but it is only picking up the 1st 600.. number not the others, and there can be an undefined amount of those.
Account is the column with the string
Mid([Account],
Find("(000",[Account]) + Len("(000"),
Find("0001)",[Account]) - Find("(000",[Account]) - Len("(000"))
Thank you!
Assuming my guess is correct, and the pattern to look for is:
9 numbers, starting with 6, preceded by 1 opening parenthesis and 3 zeros, followed by a space, 4 numbers and a closing parenthesis
you can grab individual occurrences by:
column1: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',1)
column2: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',2)
etc.
The tricky bit is to find how many columns to define, as you say there can be many. One way to know would be to first calculate a max number of occurrences like this:
maxn: Max((Len([Amount]) - Len(RXReplace([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))','','g'))) / 9)
still assuming the number of digits in each column to extract is 9. This compares the length of the original [Amount] to the one with the extracted patterns replaced by an empty string, divided by 9.
Then you know you can define up to maxn columns, the extra ones for the rows with fewer instances will be empty.
Note that Spotfire always wants two back-slash for escaping (I had to add more to the editor to make it render correctly, I hope I have not missed any).

printing serial date and data to .csv or excel file using fprintf?

I just started using Matlab (R2015a) a few weeks ago, and although I have searched for an answer to this question (and tried a few workarounds) I haven't had any luck. Hopefully, it's an easy fix!!
I am trying to write one column of serial dates at high precision (I need milliseconds) and many columns of data to a .csv file. I don't want insane precision for everything, just the first column of dates.
Here's what I've found:
- csvwrite doesn't allow for differing precisions.
xlswrite doesn't have enough precision (even though my serial date is a double, and yes I looked at the spreadsheet cell)
dlmwrite appends data in row format, so writing the dates and then appending the rest of the data doesn't work (though soooo close!)
Now I'm trying with fprintf:
hz_time is the serial date (double)
data1 and data2 are 4x25 (double) and 4x7 (double) respectively
hz_time = 1.0e+05 *
[7.357583607870371, 7.357583607928241, 7.357583607986110, 7.357583608043980]
STR_data = [data1, data2];
filename = (strcat('Processed_',files(k1).name));
file = fopen(filename,'w');
fprintf(file,'%.20f\n',hz_time);
fprintf(file,'%f%f%f%f%f%f%f%f%\n',STR_data);
fclose('all')
Currently, this code appends data1 and data2 in one cell at the end of the STR_date_time column. When I try concatenating hz_time and the data matrices together (using strcat) I fail:
STR_data = strcat([hz_time, data1, data2])
Warning: Out of range or non-integer values truncated during conversion to character.
I'm sure it's probably my formatting...
My end goal is to export this data (into a .csv or excel spreadsheet or something) so that the first column has the serial date (loads of precision) and columns 2-8 have the other data in it.
Any help would be much appreciated.
Thanks in advance!

Reading and Combining Excel Time Series in Matlab- Maintaining Order

I have the following code to read off time series data (contained in sheets 5 to 19 in an excel workbook). Each worksheet is titled "TS" followed by the number of the time series. The process works fine apart from one thing- when I study the returns I find that all the time series are shifted along by 5. i.e. TS 6 becomes the 11th column in the "returns" data and TS 19 becomes the 5th column, TS 15 becomes the 1st column etc. I need them to be in the same order that they are read- such that TS 1 is in the 1st column, TS 2 in the 2nd etc.
This is a problem because I read off the titles of the worksheets ("AssetList") which maintain their actual order throughout subsequent codes. Therefore when I recombine the titles and the returns I find that they do not match. This complicates further manipulation when, for example column 4 is titled "TS 4" but actually contains the data of TS 18.
Is there something in this code that I have wrong?
XL='TimeSeries.xlsx';
formatIn = 'dd/mm/yyyy';
formatOut = 'mmm-dd-yyyy';
Bounds=3;
[Bounds,~] = xlsread(XL,Bounds);
% Determine the number of worksheets in the xls-file:
FirstSheet=5;
[~,AssetList] = xlsfinfo(XL);
lngth=size(AssetList,2);
AssetList(:,1:FirstSheet-1)=[];
% Loop through the number of sheets and RETRIEVE VALUES
merge_count = 1;
for I=FirstSheet:lngth
[FundValues, ~, FundSheet] = xlsread(XL,I);
% EXTRACT DATES AND DATA AND COMBINE
% (TO REMOVE UNNECCESSARY TEXT IN ROWS 1 TO 4)
Fund_dates_data = FundSheet(4:end,1:2);
FundDates = cellstr(datestr(datevec(Fund_dates_data(:,1),...
formatIn),formatOut));
FundData = cell2mat(Fund_dates_data(:,2));
% CREATE TIME SERIES FOR EACH FUND
Fundts{I}=fints(FundDates,FundData,['Fund',num2str(I)]);
if merge_count == 2
Port = merge(Fundts{I-1},Fundts{I},'DateSetMethod','Intersection');
end
if merge_count > 2
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection');
end
merge_count = merge_count + 1;
end
% ANALYSE PORTFOLIO
Returns=tick2ret(Port);
q = Portfolio;
q = q.estimateAssetMoments(Returns)
[qassetmean, qassetcovar] = q.getAssetMoments
This is probably due to merge. By default, it sorts columns alphabetically. Unfortunately, as your naming pattern is "FundN", this means that, for example, Fund10 will normally be sorted before Fund9. So as you're looping over I from 5 to 19, you will have Fund10, through Fund19, followed by Fund4 through Fund9.
One way of solving this would to be always use zero padding (Fund01, Fund02, etc) so that alphabetical order and numerical order are the same. Alternatively, force it to stay in the order you read/merge the data by setting SortColumns to 0:
Port = merge(Port,Fundts{I},'DateSetMethod','Intersection','SortColumns',0);

Get multiple column ranges from one row in Cassandra via Hector

Given I have one row in Cassandra with multiple columns that have an Integer as key and some value. Using SliceQuery in Hector gives me the ability to get one range of this columns. Is there a possibility to get multiple ranges with one query?
Example cassandra row:
columns 3, 7, 12, 34, 45, 46, 59, 98, 99
----------------------------------------
values a, f, e, v, a, r, r, o, k
How do I use Hector to get all columns with keys from 20 to 30 and 50 to 90 in one query?
To solve that on PlayOrm open source project we do this...
Does Hector have the async calls like astyanax does so that you can do this
for(Query q : queryList) {
// this next call is non-blocking..sends requests and returns immediately not waiting for response
Future f = q.executeAsycn();
futures.add(f);
}
//Now, both column slices are happening in parallel at the SAME time
for(Future f : futures) {
Result r = f.get(); //this will block for the first result
}
In this way, it is just as fast as if the api had ONE single api call anyways so no need for one call.
Dean
I don't think so. To keep it in one query you'll have to use a single slice that includes all of the slices of interest and then on the client side decide on a column by column basis whether the columns fall into one of the ranges you're interested in.
You should note that if there's too much data to handle all of the columns in that one big slice in a single call, then you can use pagination to reduce the footprint of the data on the client side, but then you might as well just make individual calls for each of your original slices.

Resources