Efficient Reading of Input File

Efficient Reading of Input File - io

Currently for a task, I am working with input files which give Matrix related test cases (Matrix Multiplication) i.e., example of an input file ->
N M
1 3 5 ... 6 (M columns)
....
5 4 2 ... 1 (N rows)
I was using simple read() to access them till now, but this is not efficient for large files of size > 10^2.
So I wanted to know is there some way to use processes to do this in parallel.
Also I was thinking of using multiple IO readers based on line, so then each process could read different segments of the file but couldn't find any helpful resources.
Thank you.
PS: Current code is using this:
io:fread(IoDev, "", "~d")

Did you consider to use re module? I did not make a performance test, but it may be efficient. In the following example I do not use the first "M N" line. So I did not put it in the matrix.txt file.
matrix file:
1 2 3 4 5 6 7 8 9
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
I made the conversion in the shell
1> {ok,B} = file:read_file("matrix.txt"). % read the complete file and store it in a binary
{ok,<<"1 2 3 4 5 6 7 8 9\r\n11 12 13 14 15 16 17 18 19\r\n21 22 23 24 25 26 27 28 29\r\n31 32 33 34 35 36 37 38 39">>}
2> {ok,ML} = re:compile("[\r\n]+"). % to split the complete binary in a list a binary, one for each line
{ok,{re_pattern,0,0,0,
<<69,82,67,80,105,0,0,0,0,0,0,0,1,8,0,0,255,255,255,255,
255,255,...>>}}
3> {ok,MN} = re:compile("[ ]+"). % to split the line into binaries one for each integer
{ok,{re_pattern,0,0,0,
<<69,82,67,80,73,0,0,0,0,0,0,0,17,0,0,0,255,255,255,255,
255,255,...>>}}
4> % a function to split a line and convert each chunk into integer
4> F = fun(Line) -> Nums = re:split(Line,MN), [binary_to_integer(N) || N <- Nums] end.
#Fun<erl_eval.7.126501267>
5> Lines = re:split(B,ML). % split the file into lines
[<<"1 2 3 4 5 6 7 8 9">>,<<"11 12 13 14 15 16 17 18 19">>,
<<"21 22 23 24 25 26 27 28 29">>,
<<"31 32 33 34 35 36 37 38 39">>]
6> lists:map(F,Lines). % map the function to each lines
[[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29],
[31,32,33,34,35,36,37,38,39]]
7>
if you want to check the matrix size, you can replace the last line with:
[[NbRows,NbCols]|Matrix] = lists:map(F,Lines),
case (length(Matrix) == NbRows) andalso
lists:foldl(fun(X,Acc) -> Acc andalso (length(X) == NbCols) end,true,Matrix) of
true -> {ok,Matrix};
_ -> {error_size,Matrix}
end.

is there some way to use processes to do this in parallel.
Of course.
Also I was thinking of using multiple IO readers based on line, so
then each process could read different segments of the file but
couldn't find any helpful resources.
You don't seek to positions in a file by line, rather you seek to byte positions. While a file may look like a bunch of lines, a file is actually just one long sequence of characters. Therefore, you will need to figure out what byte positions you want to seek to in the file.
Check out file:position, file:pread.

Related

What do "!" and "." mean in BASIC?

Trying to translate BASIC code written in the 1990's to Python. I keep coming across two symbols, ! (exclamation mark) and . (period). I can't find any documentation online on what they do.
I have the code running but some of the outputs are not as expected - I am wondering if these might be the issue as I previously thought that the period may just be a typo for a multiplication.
Examples:
|
v
QWLOST = (((TW-TDAO)/(TWRT-TDAOR))^1.25)*((VISR/VIS)^0.25).(PW+PE)*DT
TFAVE = (TTO+TBO)/2!
^
|

In case anyone else in the future needs to know this.
! - defines a single
. - Was just a typo for * (multiplication)

I tried a few things in bwBasic (in Linux, in case that's relevant!).
bwBASIC: list
10: for i = 1 to 20
20: print i, ., . - i
30: next i
40: print ".="; .
This gave me:
bwBASIC: run
1 20 19
2 20 18
3 20 17
4 20 16
5 20 15
6 20 14
7 20 13
8 20 12
9 20 11
10 20 10
11 20 9
12 20 8
13 20 7
14 20 6
15 20 5
16 20 4
17 20 3
18 20 2
19 20 1
20 20 0
.= 20
Which would suggest that . (in bwBasic in any case) is the max number in a for loop.

How to generate 3 natural number that sum to 60 using awk

I am trying to write awk script that generate 3 natural numbers that sum to 60. I am trying with rand function but I`ve got problem with sum to 60

Here is one way:
awk -v n=60 'BEGIN{srand();a=int(rand()*n);b=int(rand()*(n-a));c=n-a-b;
print a,b,c}'
Idea is:
generate random number a :0=<a<60
generate random number b :0=<b<60-a
c=60-a-b
here, I set a variable n=60, to make it easy if you have other sum.
If we run this one-liner 10 times, we get output:
kent$ awk 'BEGIN{srand();for(i=1;i<=10;i++){a=int(rand()*60);b=int(rand()*(60-a));c=60-a-b;print a,b,c}}'
46 7 7
56 1 3
26 15 19
14 12 34
44 6 10
1 36 23
32 1 27
41 0 19
55 1 4
54 1 5

Fortran: read numeric data from string

I've already checked a similarly existing topic (How to read numeric data from a string in FORTRAN), but I'm not being able to do what I want.
I need to open a file and read a numeric value from a string. Bellow there's a section of the file in question. I want to read the integer next to 'ELEMENTS:', but so far I'm not being able to do so.
ELEMENT GROUP 2.4.6
GROUP: 1 ELEMENTS: 187169 MATERIAL: 2 NFLAGS: 1
fluid
0
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
Can someone please help me here?

Ok guys, thanks to your answers the program is working!
For further reference, here's the reading part of the code:
READ(77,'(A)') str
ipos = INDEX(str,"ELEMENTS:",back=.true.) + 9
READ (str(1+ipos:),*) k
PRINT*, k
Thank for the answers.

Find a pattern and modify the next line without modifying the other contents in the file. preferably linux based commands (sed, awk etc)

I have a file which looks like this:
# Hello, welcome to the world
# Trying to modify XXXXXX
# Some more random text
poly RANDOM LAYER{
20 25
18 2
1 5
1 2
5 6
}
poly RANDOM LAYER{
30 50
14 25
15 25
15 26
15 26
15 27
}
I would like to increment the values in the next line of poly RANDOM layer,say add 10 to the first number (20+10=30) and 20 to the next number (25+20=45). The rest of the contents should be the same:
This should be done for all the lines immediately after poly RANDOM LAYER
The output should look like:
# Hello, welcome to the world
# Trying to modify XXXXXX
# Some more random text
poly RANDOM LAYER{
*30 45*
18 2
1 5
1 2
5 6
}
poly RANDOM LAYER{
*40 60*
14 25
15 25
15 26
15 26
15 27
}

If the specific leading white space is always 4 chars:
$ awk 'f{$1=" "$1+10; $2+=20; f=0} /RANDOM/{f=1} 1' file
# Hello, welcome to the world
# Trying to modify XXXXXX
# Some more random text
poly RANDOM LAYER{
30 45
18 2
1 5
1 2
5 6
}
poly RANDOM LAYER{
40 70
14 25
15 25
15 26
15 26
15 27
}
otherwise use:
$ awk 'f{fmt=$0; gsub(/[^[:space:]]+/,"%s",fmt); $0=sprintf(fmt,$1+10,$2+20); f=0} /RANDOM/{f=1} 1' file
as that will just reproduce in your output WHATEVER leading, trailing, or inter-field white space you have in your input.

You say (sed, awk, etc). Is perl part of etc?
perl -pe 's/(\d+)/$1+10/ge if($lastLineMatch); $lastLineMatch = m/poly RANDOM/; ' < file
Or if you want to add different values to the two numbers:
perl -pe 's/(\d+)(\D+)(\d+)/($1+10).$2.($3+20)/ge if($lastLineMatch); $lastLineMatch = m/poly RANDOM/; ' < file

File Reading problems in Python

While Reading the files in python using
f = open ("filename.txt")
and accessing the data with
f.read(1)
and finally finding the position of stream usibg
f.tell()
for every step; We get a continous numbering starting from 0 to the current position.
The problem i am facing is that i am actually getting a random number as f.tell() for some positions and then continung the numbers.
For examle, the f.tell() outputs look something ike the following
0
1
2
3
133454568679978
6
7
8...
Any idea why this is happening?
My Code :
f=open("temp_mcompress.cpp")
current = ' '
while current != '' :
print(f.tell())
current = f.read(1)
f.close()
Temp_mcompress.cpp file :
#include <iostream>
int main(int a)
{
}
OUtput :
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
18446744073709551636
18446744073709551638
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
18446744073709551655
40
41
43
44

It seems I might have found the problem which may still be applicable to python 3.x:
source: http://docs.python.org/2.4/lib/bltin-file-objects.html
tell()
Return the file's current position, like stdio's ftell().
Note: On Windows, tell() can return illegal values (after an fgets())
when reading files with Unix-style line-endings. Use binary mode
('rb') to circumvent this problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Efficient Reading of Input File - io

Related

What do "!" and "." mean in BASIC?

How to generate 3 natural number that sum to 60 using awk

Fortran: read numeric data from string

Find a pattern and modify the next line without modifying the other contents in the file. preferably linux based commands (sed, awk etc)

File Reading problems in Python

Categories

Resources