Removing rows of excel file using MATLAB - excel

I am going to remove row of excel file using Matlab
Specifically I already removed some values which satisfy conditions.
(In this case, I removed elements which is out of 2 sigma(statistical distribution))
But I meet undesired results, cause they only remove values and remain the location empty.
So I am looking for methods for removing rows or move the elements to make no spaces.
%**Open the file**
fullFileName = [pwd '\Eurostoxx50_7월.xlsm'];
excel = actxserver('Excel.Application');
file = excel.Workbooks.Open(fullFileName);
sheet1=excel.Worksheets.get('Item', 'Inputsheet');
size_of_pd = size(PriceDifference2);
size_of_pd = size_of_pd(1);
%**Get the index where I want to remove**
m = mean(PriceDifference2);
s = std(PriceDifference2);
v1=m+2*s
v2=m-2*s
TF1 = PriceDifference2(:) >= v1 ;
TF2 = PriceDifference2(:) <= v2 ;
% combine them
TFall = TF1 | TF2;
%remove the elements
for i = 1:1:size_of_pd
if TFall(i) > 0
first_cell = strcat('B',num2str(i+34));
last_cell = strcat('Q',num2str(i+34));
range1=get(sheet1,'Range', first_cell,last_cell);
range1.Value=[];
end
end
file.Save;
file.Close;
delete(excel);
More specifically, the results looks like below In a excel file
0.002678839 0 0.479452055 3204.381729 2850 41.1 P -1.472671354
0.002678839 0 0.479452055 3204.381729 2900 48.9 P -1.508805266
0.002678839 0 0.479452055 3204.381729 2925 53.3 P -1.341898247
0.002678839 0 0.479452055 3204.381729 3350 210.8 P 12.3246967
0.002678839 0 0.479452055 3204.381729 3375 226.5 P 11.98361578
0.002678839 0 0.479452055 3204.381729 3400 243.1 P 11.31755056
0.002678839 0 0.479452055 3204.381729 3425 260.4 P 10.86345463
0.002678839 0 0.479452055 3204.381729 3350 210.8 P 12.3246967
0.002678839 0 0.479452055 3204.381729 3375 226.5 P 11.98361578
0.002678839 0 0.479452055 3204.381729 3400 243.1 P 11.31755056
0.002678839 0 0.479452055 3204.381729 3425 260.4 P 10.86345463
But I want to remove all the spaces like below
0.002678839 0 0.479452055 3204.381729 2850 41.1 P -1.472671354
0.002678839 0 0.479452055 3204.381729 2900 48.9 P -1.508805266
0.002678839 0 0.479452055 3204.381729 2925 53.3 P -1.341898247
0.002678839 0 0.479452055 3204.381729 3350 210.8 P 12.3246967
0.002678839 0 0.479452055 3204.381729 3375 226.5 P 11.98361578
0.002678839 0 0.479452055 3204.381729 3400 243.1 P 11.31755056
0.002678839 0 0.479452055 3204.381729 3425 260.4 P 10.86345463
0.002678839 0 0.479452055 3204.381729 3350 210.8 P 12.3246967
0.002678839 0 0.479452055 3204.381729 3375 226.5 P 11.98361578
0.002678839 0 0.479452055 3204.381729 3400 243.1 P 11.31755056
0.002678839 0 0.479452055 3204.381729 3425 260.4 P 10.86345463

I solved it in stupid way.
Just copy and paste many times
fullFileName = [pwd '\Eurostoxx50_7월.xlsm'];
if isempty(fullFileName)
% User clicked Cancel.
return;
end
excel = actxserver('Excel.Application');
sheet1=excel.Worksheets.get('Item', 'Inputsheet');
start_point = 34;
size_of_pd = size(PriceDifference2);
size_of_pd = size_of_pd(1);
%last_cell = strcat('Q',num2str(size_of_pd ));
first_cell = strcat('Q',num2str(start_point+1));
last_cell = strcat('Q',num2str(size_of_pd));
range1=get(sheet1,'Range', first_cell,last_cell);
PD_EXCEL = range1.value;
m = mean(PD_EXCEL);
s = std(PD_EXCEL);
v1 = m+2*s;
v2 = m-2*s;
TF1 = PD_EXCEL(:) >= v1 ;
TF2 = PD_EXCEL(:) <= v2 ;
% combine them
TFall = TF1 | TF2;
for i = 1:1:size_of_pd
if TFall(i) > 0
first_cell = strcat('B',num2str(i+start_point));
last_cell = strcat('Q',num2str(i+start_point));
range1= get(sheet1,'Range', first_cell,last_cell);
range1.Value = [];
end
end
for i = size_of_pd:-1:1
if TFall(i)>0
copy_first_cell = strcat('B',num2str(i+start_point+1));
copy_last_cell = strcat('Q',num2str(size_of_pd+start_point));
first_cell = strcat('B',num2str(i+start_point));
last_cell = strcat('Q',num2str(size_of_pd+start_point-1));
range1=get(sheet1,'Range', copy_first_cell,copy_last_cell);
range2 = get(sheet1,'Range',first_cell,last_cell);
range2.Value=range1.value;
end
end
first_cell = strcat('B',num2str(start_point + sum(TFall(:)==0)));
last_cell = strcat('Q',num2str(start_point+size_of_pd));
range1 = get(sheet1,'Range', first_cell,last_cell);
range1.Value = [];
file.Save;
file.Close;
delete(excel);

Related

Parse Basic Programming

Receiving this error when trying to compile code on replit. I have already tried to declare two arrays, you need to use separate DIM statements for each array:
as 20 DIM W$(10)
30 DIM G$(26)
Error: ParseError: Parse error on line 20: Unexpected token DIM
Any suggestions on how to resolve this issue
10 REM Hangman game
20 REM Set up variables
30 DIM W$(10), G$(26)
40 LET C = 0
50 LET I = 0
60 LET J = 0
70 REM Set up word list
80 W$(1) = "HANGMAN"
90 W$(2) = "BASIC"
100 W$(3) = "COMPUTER"
110 W$(4) = "PROGRAM"
120 W$(5) = "VINTAGE"
130 REM Select a random word
140 LET R = INT(RND(1) * 5) + 1
150 LET W = W$(R)
160 REM Set up guess string
170 FOR I = 1 TO LEN(W)
180 G$(I) = "-"
190 NEXT I
200 REM Main game loop
210 DO
220 CLS
230 PRINT "Hangman"
240 PRINT
250 PRINT "Word: ";
260 FOR I = 1 TO LEN(W)
270 PRINT G$(I);
280 NEXT I
290 PRINT
300 PRINT "Guesses: ";
310 FOR I = 1 TO 26
320 IF G$(I) <> "-" THEN PRINT G$(I);
330 NEXT I
340 PRINT
350 INPUT "Enter a letter: ", L$
360 IF LEN(L$) > 1 THEN 400
370 IF L$ < "A" OR L$ > "Z" THEN 400
380 LET L = ASC(L$) - 64
390 GOTO 420
400 PRINT "Invalid input. Please enter a single letter."
410 GOTO 350
420 REM Check if letter is in word
430 LET F = 0
440 FOR I = 1 TO LEN(W)
450 IF MID$(W, I, 1) = L$ THEN G$(I) = L$: F = 1
460 NEXT I
470 IF F = 0 THEN C = C + 1
480 IF C = 6 THEN 600
490 REM Check for win
500 LET WN = 1
510 FOR I = 1 TO LEN(W)
520 IF G$(I) = "-" THEN WN = 0
530 NEXT I
540 IF WN THEN PRINT "You win!": GOTO 650
550 REM Check for loss
560 IF C = 6 THEN PRINT "You lose. The word was "; W: GOTO 650
570 LOOP
600 REM Draw hangman
610 PRINT " _____"
620 PRINT " | |"
630 IF C > 1 THEN PRINT " O |" ELSE PRINT " |"
640 IF C > 2 THEN PRINT "/|\ |" ELSE PRINT " |"
650 END
/////
I tried declaring separate arrays. I also tried running it on a vintage basic terminal, and received an error with line 150 "type mismatch"

Why do I get a view error when enumerating a Dataframe

Why do I get a "view" error:
ndf = pd.DataFrame()
ndf['Signals'] = [1,1,1,1,1,0,0,0,0,0]
signals_diff = ndf.Signals.diff()
ndf['Revals'] = [101,102,105,104,105,106,107,108,109,109]
ndf['Entry'] = 0
for i,element in enumerate(signals_diff):
if (i==0):
ndf.iloc[i]['Entry'] = ndf.iloc[i]['Revals']
elif (element == 0):
ndf.iloc[i]['Entry'] = ndf.iloc[i - 1]['Entry']
else:
ndf.iloc[i]['Entry'] = ndf.iloc[i]['Revals']
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ndf.iloc[i]['Entry'] = ndf.iloc[i]['Revals']
instead of iloc use loc:
ndf = pd.DataFrame()
ndf['Signals'] = [1,1,1,1,1,0,0,0,0,0]
signals_diff = ndf.Signals.diff()
ndf['Revals'] = [101,102,105,104,105,106,107,108,109,109]
ndf['Entry'] = 0
for i,element in enumerate(signals_diff):
if (i==0):
ndf.loc[i,'Entry'] = ndf.loc[i,'Revals']
elif (element == 0):
ndf.loc[i,'Entry'] = ndf.loc[i - 1,'Entry']
else:
ndf.loc[i,'Entry'] = ndf.loc[i,'Revals']
This will solve the problem but when assigning, the index should be same. So because of the index thing you might not be able to get the expected result.
Do not chain indexes like ndf.iloc[i]['Entry'] when trying to assign something. See why does that not work.
That said, your code can be rewrite as:
ndf['Entry'] = ndf['Revals'].where(signals_diff != 0).ffill()
Output:
Signals Revals Entry
0 1 101 101.0
1 1 102 101.0
2 1 105 101.0
3 1 104 101.0
4 1 105 101.0
5 0 106 106.0
6 0 107 106.0
7 0 108 106.0
8 0 109 106.0
9 0 109 106.0
Let us keep using the index position slice with get_indexer
for i,element in enumerate(signals_diff):
if (i==0):
ndf.iloc[i,ndf.columns.get_indexer(['Entry'])] = ndf.iloc[i,ndf.columns.get_indexer(['Revals'])]
elif (element == 0):
ndf.iloc[i,ndf.columns.get_indexer(['Entry'])] = ndf.iloc[i - 1,ndf.columns.get_indexer(['Entry'])]
else:
ndf.iloc[i,ndf.columns.get_indexer(['Entry'])] = ndf.iloc[i,ndf.columns.get_indexer(['Revals'])]

sum one variable (X[i][j]) in one constraint with choco

I am using choco to solve a CSP , and one of my constraints is that the sum of one variables (X[i][j]) is less than N=10, and i=j=1....N.
How do I accomplish this? thank you for your help.
sum(X[i][j]) = 1 for i=j=1....N
You need a 1d array and call model.sum():
import org.chocosolver.solver.Model;
import org.chocosolver.solver.Solver;
import org.chocosolver.solver.variables.BoolVar;
public class SumBooleans {
private final static int N = 10;
public static void main(String[] args) {
Model model = new Model("Boolean sum");
BoolVar[][] vars = new BoolVar[N][N];
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
vars[i][j] = model.boolVar("vars[" + i + "][" + j + "]");
}
}
BoolVar[] flatArray = new BoolVar[N * N];
for (int index = 0; index < N * N; index++) {
int i = index / N;
int j = index % N;
flatArray[index] = vars[i][j];
}
model.sum(flatArray, "=", 1).post();
//model.sum(flatArray, "<", N).post();
//model.sum(flatArray, ">=", 8).post();
Solver solver = model.getSolver();
if (solver.solve()) {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
System.out.print(vars[i][j].getValue() + " ");
}
System.out.println();
}
}
}
}
Output:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

Pandas taking more time to execute or even no execution when large input values

The device.csv has these values (head(5)).
DEVICE_ADDRESS START_TIME UPDATE_TIME
0 00:0A:20:46:86:D2 1528711800 1528764903
1 00:0A:20:6A:17:38 1528659901 1528764905
2 00:0A:20:37:4D:C4 1528578901 1528764901
3 00:0A:20:42:96:E8 1528669200 1528764903
4 00:0A:20:3D:DF:5C 1528728729 1528764906
Each DEVICE_MAC has multiple entries of different START_TIME, UPDATE_TIME values. The CSV files are red in dataframe, then sorted in ascending order of Device_address. Once sorted we will calculate LATENCY_MIS, LATENCY_RB, RCOUNT values
import pandas as pd
from pandas import DataFrame
df = pd.read_csv(r"C:\Tool\Device.csv" ,names = [ "DEVICE_MAC", "START_TIME", "UPDATE_TIME"])
df=df.sort_values(['DEVICE_MAC', 'START_TIME', 'UPDATE_TIME'], ascending=[True, True,True])
df['LATENCY_MIS'],df['LATENCY_RB'], df['RCOUNT'], df['PAD'] = 0, 0, 0, 0
mac_ref = df.loc[0,'DEVICE_MAC']
start_refernce_time = df['UPDATE_TIME'].min()
end_reference_time = df['UPDATE_TIME'].max()
for index, row in df.iterrows():
if(mac_ref == row['DEVICE_MAC']):
if(index==0): #Starting of MAC processing
start_time_ref = row['START_TIME']
event_time_ref = row['UPDATE_TIME']
df.loc[index,'RCOUNT'] = 0
df.loc[index, 'PAD'] = row['UPDATE_TIME'] - start_refernce_time
elif(row['START_TIME'] == start_time_ref): #The same session prevails
difference_event_ts = row['UPDATE_TIME']-event_time_ref
event_time_ref = row['UPDATE_TIME']
df.loc[index,'LATENCY_MIS'] = difference_event_ts -300
df.loc[index,'RCOUNT'] = 0
if(index+1 in df.index):
if(row['DEVICE_MAC']!= df.loc[index+1,'DEVICE_MAC']):
df.loc[index, 'PAD'] = end_reference_time -row['UPDATE_TIME']
if(index== df.index[-1]):
df.loc[index, 'PAD'] = end_reference_time -row['UPDATE_TIME']
elif(row['START_TIME'] != start_time_ref): #New Session Starts
#difference_event_ts = row['START_TIME']-event_time_ref+(row['UPDATE_TIME']-row['START_TIME']-300)
df.loc[index,'LATENCY_RB'] = row['START_TIME']-event_time_ref
df.loc[index, 'LATENCY_MIS']= row['UPDATE_TIME']-row['START_TIME'] #-300*****
event_time_ref = row['UPDATE_TIME']
df.loc[index,'RCOUNT'] = 1
start_time_ref = row['START_TIME']
event_time_ref = row['UPDATE_TIME']
else: #Starting of new MAC Processing
mac_ref = row['DEVICE_MAC']
start_time_ref = row['START_TIME']
event_time_ref = row['UPDATE_TIME']
df.loc[index,'RCOUNT'] = 0
df.loc[index, 'PAD'] = row['UPDATE_TIME'] - start_refernce_time
Each row's LATENCY_MIS, LATENCY_RB, RCOUNT depends on previous rows and consecutive next row START_TIME, UPDATE_TIME values. (Except 1st and last rows of each DEVICE_MAC group).
The output looks like this
DEVICE_MAC_ADDRESS START_TIME UPDATE_TIME LATENCY_MIS LATENCY_RB RCOUNT PAD
18228 00:A0:BC:33:04:F0 1527703135 1528787401 1199 0 0 7219
18995 00:A0:BC:33:04:F0 1527703135 1528788601 600 0 0 6019
21007 00:A0:BC:33:04:F0 1527703135 1528791001 1200 0 0 3619
17981 00:A0:BC:37:60:76 1527697084 1528787100 899 0 0 7520
1384 00:A0:BC:3A:91:5C 1528596621 1528766734 599 0 0 27886
2945 00:A0:BC:3A:91:5C 1528596621 1528768533 899 0 0 26087
5832 00:A0:BC:3A:91:5C 1528596621 1528772133 600 0 0 22487
9091 00:A0:BC:3A:91:5C 1528596621 1528776334 600 0 0 18286
11989 00:A0:BC:3A:91:5C 1528596621 1528779934 600 0 0 14686
12880 00:A0:BC:3A:91:5C 1528596621 1528780834 600 0 0 13786
The middle code block to calulate LATENCY_MIS, LATENCY_RB, RCOUNT, PAD takes more time for executing or not executing when input CSV is larger.

attoparsec-iteratee doesn't work when input is larger than buffer size

I have a simple attoparsec-based pdf parser. It works fine until used with iteratee.
When size of input exceeds buffer size.
import qualified Data.ByteString as BS
import qualified Data.Iteratee as I
import qualified Data.Attoparsec as P
import qualified Data.Attoparsec.Iteratee as P
import System.Environment (getArgs)
import Control.Monad
import Pdf.Parser.Value
main :: IO ()
main = do
[i] <- getArgs
liftM (P.parseOnly parseValue) (BS.readFile i) >>= print -- works
I.fileDriverRandomVBuf 2048 (P.parserToIteratee parseValue) i >>= print -- works
I.fileDriverRandomVBuf 1024 (P.parserToIteratee parseValue) i >>= print -- DOES NOT works!!!
Input:
<< /Annots [ 404 0 R 547 0 R ] /ArtBox [ 0.000000 0.000000 612.000000 792.000000 ] /BleedBox [ 0.000000 0.000000 612.000000 792.000000 ] /Contents [ 435 0 R 436 0 R 437 0 R 444 0 R 448 0 R 449 0 R 450 0 R 453 0 R ] /CropBox [ 0.000000 0.000000 612.000000 792.000000 ] /Group 544 0 R /MediaBox [ 0.000000 0.000000 612.000000 792.000000 ] /Parent 239 0 R /Resources << /ColorSpace << /CS0 427 0 R /CS1 427 0 R /CS2 428 0 R >> /ExtGState << /GS0 430 0 R /GS1 431 0 R /GS2 469 0 R /GS3 475 0 R /GS4 439 0 R /GS5 480 0 R /GS6 485 0 R /GS7 491 0 R /GS8 497 0 R >> /Font << /C2_0 447 0 R /T1_0 421 0 R /T1_1 422 0 R /T1_2 423 0 R /T1_3 424 0 R /T1_4 425 0 R /T1_5 426 0 R /T1_6 438 0 R >> /ProcSet [ /PDF /Text /ImageC /ImageI ] /Properties << /MC0 << /Metadata 502 0 R >> >> /XObject << /Fm0 451 0 R /Fm1 504 0 R /Fm2 513 0 R /Fm3 515 0 R /Fm4 517 0 R /Fm5 526 0 R /Fm6 528 0 R /Fm7 537 0 R /Fm8 539 0 R /Im0 540 0 R /Im1 541 0 R /Im2 452 0 R /Im3 542 0 R /Im4 543 0 R >> >> /Rotate 0 /StructParents 1 /TrimBox [ 0.000000 0.000000 612.000000 792.000000 ] /Type /Page >>
So, the parser works without iteratee, works with big enough chunks, but doesn't work with smaller chunks. Bug in iteratee? In attoparsec-iteratee? In my code? Is there any workaround? It is a really urgent issue for me.
Thanks.
Edit 2: I created a new parser in Pdf/Parser/Value
dictOrStream :: Parser PdfValue
dictOrStream = do
dict <- parseDict
P.skipSpace
let s1 = do
P.string $ fromString "stream"
content <- P.manyTill P.anyWord8 $ P.endOfLine >> P.string (fromString "endstream")
return $ PdfValStream (PdfStream dict (BS.pack content))
s1 <|> return (PdfValDict dict)
then used this parser in parseValue. This works for all your cases. I don't know why choice fails to backtrack properly, maybe an attoparsec bug?
Edit: I notice that, if I replace your top-level parseValue with parseDict, it works. It also works if I remove parseStream from the choices in parseValue. I think attoparsec has committed to "parseStream" after the completion of the top-level dictionary, therefore it's expecting more input (a space, the "stream" token, etc.) leading to this error. At this point there's an ambiguity between these two parsing options that you'll need to resolve. I don't know why it works properly when the entire input is available; I would expect an error to be reported as when your parser is fed chunks.
As of now, I suspect a bug in either your code, or possibly attoparsec. I ran the following test by manually reading bytestring chunks and feeding it to your attoparsec parser:
*Main System.IO> h <- openFile "test.pdf" ReadMode
*Main System.IO Data.ByteString> let hget = hGetSome h 1024
*Main System.IO Data.ByteString> b <- hget
*Main System.IO Data.ByteString> let r = P.parse parseValue b
*Main System.IO Data.ByteString> r
Partial _
*Main System.IO Data.ByteString> b <- hget
*Main System.IO Data.ByteString> let r' = P.feed r b
*Main System.IO Data.ByteString> r'
Partial _
*Main System.IO Data.ByteString> b <- hget
*Main System.IO Data.ByteString> Data.ByteString.length b
0
*Main System.IO Data.ByteString> let r'2 = P.feed r' b
*Main System.IO Data.ByteString> r'2
Fail "<< /Annots [ 404 0 R 547 0 R ] /ArtBox [ 0.000000 0.000000 612.000000 792.000000 ] /BleedBox [ 0.000000 0.000000 612.000000 792.000000 ] /Contents [ 435 0 R 436 0 R 437 0 R 444 0 R 448 0 R 449 0 R 450 0 R 453 0 R ] /CropBox [ 0.000000 0.000000 612.000000 792.000000 ] /Group 544 0 R /MediaBox [ 0.000000 0.000000 612.000000 792.000000 ] /Parent 239 0 R /Resources << /ColorSpace << /CS0 427 0 R /CS1 427 0 R /CS2 428 0 R >> /ExtGState << /GS0 430 0 R /GS1 431 0 R /GS2 469 0 R /GS3 475 0 R /GS4 439 0 R /GS5 480 0 R /GS6 485 0 R /GS7 491 0 R /GS8 497 0 R >> /Font << /C2_0 447 0 R /T1_0 421 0 R /T1_1 422 0 R /T1_2 423 0 R /T1_3 424 0 R /T1_4 425 0 R /T1_5 426 0 R /T1_6 438 0 R >> /ProcSet [ /PDF /Text /ImageC /ImageI ] /Properties << /MC0 << /Metadata 502 0 R >> >> /XObject << /Fm0 451 0 R /Fm1 504 0 R /Fm2 513 0 R /Fm3 515 0 R /Fm4 517 0 R /Fm5 526 0 R /Fm6 528 0 R /Fm7 537 0 R /Fm8 539 0 R /Im0 540 0 R /Im1 541 0 R /Im2 452 0 R /Im3 542 0 R /Im4 543 0 R >> >> /Rotate 0 /StructParents 1 /TrimBox [ 0.000000 0.000000" [] "Failed reading: empty"
For some reason, your parser doesn't seem to like receiving data in chunks, and fails upon receiving the third (empty) chunk without consuming any input. I haven't yet figured out where your parser is going wrong, but it's definitely not iteratee or attoparsec-iteratee.

Resources