Apache-POI / XSSF - Read big file (5 MB)

Apache-POI / XSSF - Read big file (5 MB) - excel

I've a big file with 10000 rows... Opening it, lasts an eternity... After 10mins I stopped the programm...
OPCPackage opcPackage = OPCPackage.open(item.getFilePath());
workbook = new XSSFWorkbook(opcPackage);
sheet = workbook.getSheet(item.sheet);
evaluator = new XSSFFormulaEvaluator(workbook);
itRows = sheet.rowIterator();
the itRows = sheet.rowIterator(); newer finishes...it needs so much time...
How can I read such a big file?

Related

How can I write data in a loop in Openpyxl?

I am creating a portscanner (via Python) and whenever I receive something, it should be written in an .XLSV file.
What it should do:
Webscanner finds port 21 open
Receives data
Writes it in a .XLSV file in row 2
Webscanner finds port 80 open
Receives data
Writes it in a .XLSV file in row 3
My code:
wb = load_workbook('scanreport.xlsx')
hitdetails = (str(hostname), str(host), str(port), str(keyword), str(banner))
wb = Workbook()
ws = wb.active
start_row = 1
start_column = 1
for searchresult in hitdetails:
ws.cell(row=start_row, column=start_column).value = searchresult
start_column += 1
start_row += 1
wb.save("scanreport.xlsx")
Result:
How can I fix this?

#skjoshi, you sir just fixed my problem with overwriting:
write into excel file without overwriting old content with openpyxl (this page)
Because I am already loading an existing file with an existing sheet, I was also creating a new worksheet, which will overwrite the old one in which I lose my data everytime.
In this case I removed the ws = wb.active and it worked!

Using ActiveX to Import from Excel to Matlab

I'm in need of optimizing import of .xls files to matlab due to xlsread being very time consuming with large amount of files. Current xlsread script as follows:
scriptName = mfilename('fullpath');
[currentpath, filename, fileextension]= fileparts(scriptName);
xlsnames = dir(fullfile(currentpath,'*.xls'));
xlscount = length(xlsnames);
xlsimportdata = zeros(7,6,xlscount);
for k = 1:xlscount
xlsimport = xlsread(xlsnames(k).name,'D31:I37');
xlsimportdata(:,1:size(xlsimport,2),k) = xlsimport;
end
I have close to 10k files per week that needs processing and with approx. 2sec per file processed on my current workstation, it comes in at about 5½ hours.
I have read that ActiveX can be used for this purpose however that is far beyond my current programming skills and have not been able to find a solution elsewhere. Any help on how to make this would be appreciated.
If it is simple to perform with ActiveX (or other proposed method), I would also be interested in data on cells D5 and G3, which I am currently grabbing from 'xlsnames(k,1).name' and 'xlsnames(k,1).date'
EDIT: updated to reflect the solution
% Get path to .m script
scriptName = mfilename('fullpath');
[currentpath, filename, fileextension]= fileparts(scriptName);
% Generate list of .xls file data
xlsnames = dir(fullfile(currentpath,'*.xls'));
xlscount = length(xlsnames);
SampleInfo = cell(xlscount,2);
xlsimportdata = cell(7,6,xlscount);
% Define xls data ranges to import
SampleID = 'G3';
SampleRuntime = 'D5';
data_range = 'D31:I37';
% Initiate progression bar
h = waitbar(0,'Initiating import...');
% Start actxserver
exl = actxserver('excel.application');
exlWkbk = exl.Workbooks;
for k = 1:xlscount
% Restart actxserver every 100 loops due limited system memory
if mod (k,100) == 0
exl.Quit
exl = actxserver('excel.application');
exlWkbk = exl.Workbooks;
end
exlFile = exlWkbk.Open([dname filesep xlsnames(k).name]);
exlSheet1 = exlFile.Sheets.Item('Page 0');
rngObj1 = exlSheet1.Range(SampleID);
xlsimport_ID = rngObj1.Value;
rngObj2 = exlSheet1.Range(SampleRuntime);
xlsimport_Runtime = rngObj2.Value;
rngObj3 = exlSheet1.Range(data_range);
xlsimport_data = rngObj3.Value;
SampleInfo(k,1) = {xlsimport_ID};
SampleInfo(k,2) = {xlsimport_Runtime};
xlsimportdata(:,:,k) = xlsimport_data;
% Progression bar updater
progress = round((k / xlscount) * 100);
importtext = sprintf('Importing %d of %d', k, xlscount);
waitbar(progress/100,h,sprintf(importtext));
disp(['Import progress: ' num2str(k) '/' num2str(xlscount)]);
end
%close actxserver
exl.Quit
% Close progression bar
close(h)

Give this a try. I am not an ActiveX Excel guru by any means. However, this works for me for my small amount of test XLS files (3). I never close the exlWkbk so I don't know if memory usage is building or if it automatically cleaned up when descoped after the next is opened in its place ... so use at your own risk. I am seeing an almost 2.5x speed increase which seems promising.
>> timeit(#getSomeXLS)
ans =
1.8641
>> timeit(#getSomeXLS_old)
ans =
4.6192
Please leave some feedback if this work on large number of Excel sheets because I am curious how it goes.
function xlsimportdata = getSomeXLS()
scriptName = mfilename('fullpath');
[currentpath, filename, fileextension]= fileparts(scriptName);
xlsnames = dir(fullfile(currentpath,'*.xls'));
xlscount = length(xlsnames);
xlsimportdata = zeros(7,6,xlscount);
exl = actxserver('excel.application');
exlWkbk = exl.Workbooks;
dat_range = 'D31:I37';
for k = 1:xlscount
exlFile = exlWkbk.Open([currentpath filesep xlsnames(k).name]);
exlSheet1 = exlFile.Sheets.Item('Sheet1'); %Whatever your sheet is called.
rngObj = exlSheet1.Range(dat_range);
xlsimport = cell2mat(rngObj.Value);
xlsimportdata(:,:,k) = xlsimport;
end
exl.Quit

How to play a single audio file from a collection of 100k audio files split into two folders?

I have placed two media folders into a single zip folder, and the total is 100k media files in the zip folder. I need to play a single file from particular folder of the zip folder. The problem is, first the total content of the Zip folder is completely read and then the necessary file is accessed. So, it takes more than 55 seconds to play a single file. I need a solution to reduce the time consumed to play the audio files.
Below is my code :
long lStartTime = System.currentTimeMillis();
System.out.println("Time started");
String filePath = File.separator+"sdcard"+File.separator+"Android"+File.separator+"obb"+File.separator+"com.mobifusion.android.ldoce5"+File.separator+"main.9.com.mobifusion.android.ldoce5.zip";
FileInputStream fis = new FileInputStream(filePath);
ZipInputStream zis = new ZipInputStream(fis);
String zipFileName = "media"+File.separator+"aus"+File.separator+fileName;
String usMediaPath = "media"+File.separator+"auk"+File.separator;
ZipEntry entry;
int UnzipCounter = 0;
while((entry = zis.getNextEntry())!=null){
UnzipCounter++;
System.out.println(UnzipCounter);
if(entry.getName().endsWith(zipFileName)){
File Mytemp = File.createTempFile("TCL", "mp3", getActivity().getCacheDir());
Mytemp.deleteOnExit();
FileOutputStream fos = new FileOutputStream(Mytemp);
for (int c = zis.read(); c!=-1; c= zis.read()){
fos.write(c);
}
if(fos!=null){
mediaPlayer = new MediaPlayer();
}
fos.close();
FileInputStream MyFile = new FileInputStream(Mytemp);
mediaPlayer.setDataSource(MyFile.getFD());
mediaPlayer.prepare();
mediaPlayer.start();
mediaPlayer.setOnCompletionListener(this);
long lEndTime = System.currentTimeMillis();
long difference = lEndTime - lStartTime;
System.out.println("Elapsed milliseconds: " + difference);
mediaPlayer.setOnErrorListener(this);
}
zis.closeEntry();
}
zis.close();

Try to not re-unzip the file because it consume too much time.
Instead of re-unzip the file, you can follow the following step:
Unzip file when first time app is launch. Set flag that we have launch the app before, use preferences.
On the next launch, check the flag. If app never launched before, goto first step. If it had launched before find the file and play.
If you really can't use those step, because it's your requirement, you can try using truezip vfs. But be aware that I've never use it before.
Here the library:
https://truezip.java.net/, https://github.com/jruesga/android_external_libtruezip

How to save data in excel file MATLAB?

I want to save pitch,yaw and roll data in excel file for all frames. Eg: if i have 200 frames then i want to save 200 frames information in excel file. I have tried but my code only stores one frame data.exceldata
fitting_model='models/Chehra_f1.0.mat';
load(fitting_model);
mov=VideoReader('7_a.avi'); %Read video file and create an object
c=mov.NumberOfFrames;
for k=1:c
a = read(mov, k);
img=im2double(a);
disp(['Detecting Face in ']);
faceDetector = vision.CascadeObjectDetector(); % detect face in an image
bbox = step(faceDetector, img); %create boundary box around face
test_init_shape = InitShape(bbox,refShape); %initialize facial points in variable
test_init_shape = reshape(test_init_shape,49,2);
if size(img,3) == 3
test_input_image = im2double(rgb2gray(img));
else
test_input_image = im2double((img));
end
disp(['Fitting']);
MaxIter=6;
test_points = Fitting(test_input_image,test_init_shape,RegMat,MaxIter);
load('3D_Shape_Model.mat');
n=49;
test_image=img;
imshow(test_image);hold on;
% % Compute 3D Head Pose
if(n==49)
test_shape=test_points;
[pitch,yaw,roll] = ComputePose(PDM_49,test_shape(:));
filename='framesdata.xlsx';
header = {'Pitch', 'yaw ','roll'};
new_data = num2cell([pitch(:), yaw(:), roll(:)]);
output = [header; new_data];
xlswrite(filename,output);
end
plot(test_shape(:,1),test_shape(:,2),'b*');
title([num2str(i),' : Pitch = ',num2str(pitch),' ; Yaw = ',num2str(yaw),' ; Roll = ',num2str(roll)]);
set(gcf,'units','normalized','outerposition',[0 0 1 1]);
pause(0.5);
close all;
end

As #excaza stated, you will need to move the xlswrite command out of your loop or specify the cells you are writing. Please see the xlswrite Doc for more information. The correct syntax would be :
xlswrite(filename,A,xlRange)
The following is the example they provide:
filename = 'testdata.xlsx';
A = {'Time','Temperature'; 12,98; 13,99; 14,97};
sheet = 2;
xlRange = 'E1';
xlswrite(filename,A,sheet,xlRange)
You will just need to provide xlswrite the address to start writing data.

Performance Impact with FileOutputStream using OpenCSV

We are using OpenCSV
(http://opencsv.sourceforge.net/apidocs/au/com/bytecode/opencsv/CSVWriter.html)
to write a report from a file with xml content.
There are two ways to go about this ->
i) Write using FileOutputStream
FileOutputStream fos = new FileOutputStream(file);
OutputStreamWriter osr= new OutputStreamWriter(fos);
writer = new CSVWriter(osr);
ii) Write using BufferedWriter
BufferedWriter out = new BufferedWriter(new FileWriter(file));
writer = new CSVWriter(out);
Does anybody know how the performance of the writing of this report gets affected by choosing one option over another ?
To my understanding OpenCSV does not care as long as it gets a stream that it can use.
The delta (difference) in performance would be the step before it, where the outputstream is created from the file.
What is the performance impact of using OutputStreamWriter versus BufferedWriter ?

After running some benchmarks with Google Caliper, it appears that the BufferedWriter option is the fastest (but there's really not much of a difference, so I'd just use the option that you're comfortable with).
How to interpret results:
The FileOutputStreamWriter scenario corresponds with option i
The BufferedWriter scenario corresponds with option ii
The FileWriter scenario is one I added which just uses a plain old FileWriter.
Each benchmark was run 3 times: writing 1000, 10,000, and 100,000 rows.
The tests were run on Linux Mint, i5-2500k (1.6GHz) CPU, 8GB RAM, with Oracle JDK7 (writing to a SATA green HDD). Results would vary with a different setup, but this should be good for comparison purposes.
rows benchmark ms linear runtime
1000 FileOutputStreamWriter 6.10 =
1000 BufferedWriter 5.89 =
1000 FileWriter 5.96 =
10000 FileOutputStreamWriter 50.55 ==
10000 BufferedWriter 50.71 ==
10000 FileWriter 51.64 ==
100000 FileOutputStreamWriter 525.13 =============================
100000 BufferedWriter 505.05 ============================
100000 FileWriter 535.20 ==============================
FYI opencsv wraps the Writer you give it in a PrintWriter.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Apache-POI / XSSF - Read big file (5 MB) - excel

Related

How can I write data in a loop in Openpyxl?

Using ActiveX to Import from Excel to Matlab

How to play a single audio file from a collection of 100k audio files split into two folders?

How to save data in excel file MATLAB?

Performance Impact with FileOutputStream using OpenCSV

Categories

Resources