Let's say I have below array of dates (not necessarily sorted):
import numpy as np
np.array(["2000Q1", "2000Q2", "2000Q3", "2000Q4", "2001Q1", "2001Q2", "2001Q3", "2001Q4", "2002Q1",
"2002Q2", "2002Q3", "2002Q4", "2003Q1", "2003Q2", "2003Q3", "2003Q4", "2004Q1", "2004Q2", "2004Q3",
"2004Q4", "2005Q1", "2005Q2", "2005Q3", "2005Q4", "2006Q1", "2006Q2", "2006Q3", "2006Q4", "2007Q1",
"2007Q2", "2007Q3", "2007Q4", "2008Q1", "2008Q2", "2008Q3", "2008Q4", "2009Q1", "2009Q2", "2009Q3",
"2009Q4"])
From this I want to create a DataFrame with 2 columns for start-date and end-date, where this dates corresponds to the starting date of a date range and ending date for that date rage spanning 4 years. This will continue for each element of above array until the last element. For example, first 3 rows of this new DataFrame would look like below
Is there any direct function/method to achieve above in Python?
Here's one way using PeriodIndex and DateOffset functions in pandas. Note that I named your array arr below:
df = pd.DataFrame({'start-date': arr,
'end-date': (pd.PeriodIndex(arr, freq='Q').to_timestamp() +
pd.DateOffset(years=4, months=10)).to_period('Q')})
Output:
start-date end-date
0 2000Q1 2004Q4
1 2000Q2 2005Q1
2 2000Q3 2005Q2
3 2000Q4 2005Q3
4 2001Q1 2005Q4
5 2001Q2 2006Q1
6 2001Q3 2006Q2
7 2001Q4 2006Q3
8 2002Q1 2006Q4
9 2002Q2 2007Q1
10 2002Q3 2007Q2
11 2002Q4 2007Q3
12 2003Q1 2007Q4
13 2003Q2 2008Q1
14 2003Q3 2008Q2
15 2003Q4 2008Q3
16 2004Q1 2008Q4
17 2004Q2 2009Q1
18 2004Q3 2009Q2
19 2004Q4 2009Q3
20 2005Q1 2009Q4
21 2005Q2 2010Q1
22 2005Q3 2010Q2
23 2005Q4 2010Q3
24 2006Q1 2010Q4
25 2006Q2 2011Q1
26 2006Q3 2011Q2
27 2006Q4 2011Q3
28 2007Q1 2011Q4
29 2007Q2 2012Q1
30 2007Q3 2012Q2
31 2007Q4 2012Q3
32 2008Q1 2012Q4
33 2008Q2 2013Q1
34 2008Q3 2013Q2
35 2008Q4 2013Q3
36 2009Q1 2013Q4
37 2009Q2 2014Q1
38 2009Q3 2014Q2
39 2009Q4 2014Q3
Currently for a task, I am working with input files which give Matrix related test cases (Matrix Multiplication) i.e., example of an input file ->
N M
1 3 5 ... 6 (M columns)
....
5 4 2 ... 1 (N rows)
I was using simple read() to access them till now, but this is not efficient for large files of size > 10^2.
So I wanted to know is there some way to use processes to do this in parallel.
Also I was thinking of using multiple IO readers based on line, so then each process could read different segments of the file but couldn't find any helpful resources.
Thank you.
PS: Current code is using this:
io:fread(IoDev, "", "~d")
Did you consider to use re module? I did not make a performance test, but it may be efficient. In the following example I do not use the first "M N" line. So I did not put it in the matrix.txt file.
matrix file:
1 2 3 4 5 6 7 8 9
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
I made the conversion in the shell
1> {ok,B} = file:read_file("matrix.txt"). % read the complete file and store it in a binary
{ok,<<"1 2 3 4 5 6 7 8 9\r\n11 12 13 14 15 16 17 18 19\r\n21 22 23 24 25 26 27 28 29\r\n31 32 33 34 35 36 37 38 39">>}
2> {ok,ML} = re:compile("[\r\n]+"). % to split the complete binary in a list a binary, one for each line
{ok,{re_pattern,0,0,0,
<<69,82,67,80,105,0,0,0,0,0,0,0,1,8,0,0,255,255,255,255,
255,255,...>>}}
3> {ok,MN} = re:compile("[ ]+"). % to split the line into binaries one for each integer
{ok,{re_pattern,0,0,0,
<<69,82,67,80,73,0,0,0,0,0,0,0,17,0,0,0,255,255,255,255,
255,255,...>>}}
4> % a function to split a line and convert each chunk into integer
4> F = fun(Line) -> Nums = re:split(Line,MN), [binary_to_integer(N) || N <- Nums] end.
#Fun<erl_eval.7.126501267>
5> Lines = re:split(B,ML). % split the file into lines
[<<"1 2 3 4 5 6 7 8 9">>,<<"11 12 13 14 15 16 17 18 19">>,
<<"21 22 23 24 25 26 27 28 29">>,
<<"31 32 33 34 35 36 37 38 39">>]
6> lists:map(F,Lines). % map the function to each lines
[[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29],
[31,32,33,34,35,36,37,38,39]]
7>
if you want to check the matrix size, you can replace the last line with:
[[NbRows,NbCols]|Matrix] = lists:map(F,Lines),
case (length(Matrix) == NbRows) andalso
lists:foldl(fun(X,Acc) -> Acc andalso (length(X) == NbCols) end,true,Matrix) of
true -> {ok,Matrix};
_ -> {error_size,Matrix}
end.
is there some way to use processes to do this in parallel.
Of course.
Also I was thinking of using multiple IO readers based on line, so
then each process could read different segments of the file but
couldn't find any helpful resources.
You don't seek to positions in a file by line, rather you seek to byte positions. While a file may look like a bunch of lines, a file is actually just one long sequence of characters. Therefore, you will need to figure out what byte positions you want to seek to in the file.
Check out file:position, file:pread.
IE:
23 HL*3*2*23*0
24 PAT*19
25 NM1*QC*1*CUSTOMER*COLE
26 N3*228 PINEAPPLE CIRCLE
27 N4*CORA*PA*15108
28 DMG*D8*19940921*M
29 CLM*945405*5332.54***12>B>1*Y*A*Y*Y*P
30 HI*BK>2533
31 LX*1
32 SV1*HC>J2941*5332.54*UN*84***1
33 DTP*472*RD8*20110511-20110511
34 REF*6R*1099999731
35 NTE*ADD*GENERIC 12MG CARTRIDGE
36 LIN**N4*00013264681
37 CTP****7*UN
I want to populate column C with the text from row 29 as a min row with "945405" all the way to row 37 (the one with the text "CTP" in it). I cannot do this in VBA due to permissions. Is there a formula that will grab this value (it is always CLM * xxxxxx *...), assign it to column C using the "CLM" as the min row and CTP as the MAX row all the way through the SS? IE:
23 HL*3*2*23*0
24 PAT*19
25 NM1*QC*1*CUSTOMER*COLE
26 N3*228 PINEAPPLE CIRCLE
27 N4*CORA*PA*15108
28 DMG*D8*19940921*M
29 CLM*945405*5332.54***12>B>1*Y*A*Y*Y*P 945405
30 HI*BK>2533 945405
31 LX*1 945405
32 SV1*HC>J2941*5332.54*UN*84***1 945405
33 DTP*472*RD8*20110511-20110511 945405
34 REF*6R*1099999731 945405
35 NTE*ADD*GENERIC 12MG CARTRIDGE 945405
36 LIN**N4*00013264681 945405
37 CTP****7*UN 945405
38 NM1*DK*1*PATIENT*DEBORAH****XX*1
39 N3*123 MAIN ST*APT B
****Update*****
I was given permissions in VBA. How would I loop this?
Here is a clearer picture of what I am trying to accomplish
enter image description here
you can use the =MID(Source_Cell, Start_Position, Desired_Length) function to pull the substring. In your case it would be:
=MID(B29, 5, 6)
You can then put this formula in all of the cells you'd like it to be in.
I have this data in Excel.
A B C
--------------------------------------
Line Number Value #1 Value #2
1 21 35
2 21 27
3 21 18
4 10 47
5 50 5
6 37 68
7 10 21
8 75 21
I tried to calculate the total "21" based on odd line number. In this situation, the answer should be 3. However, neither" IF(MOD(A1:A8,2)=1,COUNTIF(B1:C8,21)) " nor " {IF(MOD(A1:A8,2)=1,COUNTIF(B1:C8,21))} "worked and Google didn't yield anything helpful. Could anyone help me? Thanks!!
This works for odd lines:
=SUM(COUNTIF(A:B,21)-SUMPRODUCT((A:B=21)*(MOD(ROW(A:B),2)=0)))
there may be a better way of writing this formula.
Use this to count even lines:
=SUMPRODUCT((A:B=21)*(MOD(ROW(A:B),2)=0))
I've already checked a similarly existing topic (How to read numeric data from a string in FORTRAN), but I'm not being able to do what I want.
I need to open a file and read a numeric value from a string. Bellow there's a section of the file in question. I want to read the integer next to 'ELEMENTS:', but so far I'm not being able to do so.
ELEMENT GROUP 2.4.6
GROUP: 1 ELEMENTS: 187169 MATERIAL: 2 NFLAGS: 1
fluid
0
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
Can someone please help me here?
Ok guys, thanks to your answers the program is working!
For further reference, here's the reading part of the code:
READ(77,'(A)') str
ipos = INDEX(str,"ELEMENTS:",back=.true.) + 9
READ (str(1+ipos:),*) k
PRINT*, k
Thank for the answers.