Split string in equal chunks in Lua - string

I need to split a string in equal sized chunks (where the last chunk can be smaller, if the strings length can't be divided without remainder).
Let's say I have a string with 2000 chars. I want to split this string in equal sized chunks of 500 chars, so I would end up with 4 strings of 500 chars.
How can this be done in Lua, when neither the length of the initial string is fixed, nor the chunk size?
Example
String: "0123456789" (Length = 10) should be splitted in strings of 3 characters
Result: "012", "345", "678", "9"
(doesn't matter if the result is in a table or returned by a iterator)

local function splitByChunk(text, chunkSize)
local s = {}
for i=1, #text, chunkSize do
s[#s+1] = text:sub(i,i+chunkSize - 1)
end
return s
end
-- usage example
local st = splitByChunk("0123456789",3)
for i,v in ipairs(st) do
print(i, v)
end
-- outputs
-- 1 012
-- 2 345
-- 3 678
-- 4 9

To split a string into 4, you can get the size like this:
local str = "0123456789"
local sz = math.ceil(str:len() / 4)
Then the first string is str:sub(1, sz), I'll leave the rest to you.

> function tt(s)
>> local t={}
>> for p in s:gmatch("..?.?") do
>> t[#t+1]=p
>> end
>>
>> for i,v in ipairs(t) do
>> print(i, v)
>> end
>> end
> tt("0123")
1 012
2 3
> tt("0123456789")
1 012
2 345
3 678
4 9
> tt("012345678901")
1 012
2 345
3 678
4 901

Related

How would I undo the actions of string.gmatch for a certain section of string in lua

So I am using lua and splitting a string by spaces to write a sort of sub-language. And I am trying to have it not split anything inside parenthesis, I am already at the stage where I can detect whether there is parenthesis. But I want to reverse the gmatching of the string inside the parenthesis as I want to preserve the string contained within.
local function split(strng)
local __s={}
local all_included={}
local flag_table={}
local uncompiled={}
local flagged=false
local flagnum=0
local c=0
for i in string.gmatch(strng,'%S+') do
c=c+1
table.insert(all_included,i)
if(flagged==false)then
if(string.find(i,'%('or'%['or'%{'))then
flagged=true
flag_table[tostring(c)]=1
table.insert(uncompiled,i)
print'flagged'
else
table.insert(__s,i)
end
elseif(flagged==true)then
table.insert(uncompiled,i)
if(string.find(i,'%)' or '%]' or '%}'))then
flagged=false
local __=''
for i=1,#uncompiled do
__=__ .. uncompiled[i]
end
table.insert(__s,__)
print'unflagged'
end
end
end
return __s;
end
This is my splitting code
I would just not use gmatch for this at all.
local input = " this is a string (containg some (well, many) annoying) parentheses and should be split. The string contains double spaces. What should be done? And what about trailing spaces? "
local pos = 1
local words = {}
local last_start = pos
while pos <= #input do
local char = string.byte(input, pos)
if char == string.byte(" ") then
table.insert(words, string.sub(input, last_start, pos - 1))
last_start = pos + 1
elseif char == string.byte("(") then
local depth = 1
while depth ~= 0 and pos + 1 < #input do
local char = string.byte(input, pos + 1)
if char == string.byte(")") then
depth = depth - 1
elseif char == string.byte("(") then
depth = depth + 1
end
pos = pos + 1
end
end
pos = pos + 1
end
table.insert(words, string.sub(input, last_start))
for k, v in pairs(words) do
print(k, "'" .. v .. "'")
end
Output:
1 ''
2 'this'
3 'is'
4 'a'
5 'string'
6 '(containg some (well, many) annoying)'
7 'parentheses'
8 'and'
9 'should'
10 'be'
11 'split.'
12 'The'
13 'string'
14 'contains'
15 ''
16 'double'
17 ''
18 ''
19 'spaces.'
20 'What'
21 'should'
22 'be'
23 'done?'
24 'And'
25 'what'
26 'about'
27 'trailing'
28 'spaces?'
29 ''
Thinking about trailing spaces and other such problems is left as an exercise for the reader. I tried to highlight some of the possible problems with the example that I used. Also, I only looked at one kind of parenthesis since I do not want to think how this (string} should be ]parsed.
Oh and if nested parenthesis are not a concerned: Most of the code above can be replaced with a call to string.find(input, ")", pos, true) to find the closing parenthesis.
Please note that you cannot or or and patterns as attempted in your code.
"%(" or "%[" equals "%("
Lua will interpret that expression left to right. "%( is a true value Lua will reduce the expression to "%(", which logically is the same as the full expression.
So string.find(i,'%('or'%['or'%{') will only find ('s in i.
As a similar but slightly different approach to Uli's answer, I would first split by parentheses. Then you can split the the odd-numbered fields on whitespace:
split = require("split") -- https://luarocks.org/modules/telemachus/split
split__by_parentheses = function(input)
local fields = {}
local level = 0
local field = ""
for i = 1, #input do
local char = input:sub(i, i)
if char == "(" then
if level == 0 then
-- add non-parenthesized field to list
fields[#fields+1] = field
field = ""
end
level = level + 1
end
field = field .. char
if char == ")" then
level = level - 1
assert(level >= 0, 'Mismatched parentheses')
if level == 0 then
-- add parenthesized field to list
fields[#fields+1] = field
field = ""
end
end
end
assert(level == 0, 'Mismatched parentheses')
fields[#fields+1] = field
return fields
end
input = " this is a string (containg some (well, many) annoying) parentheses and should be split. The string contains double spaces. What should be done? And what about trailing spaces? "
fields = split__by_parentheses(input)
for i, field in ipairs(fields) do
print(("%d\t'%s'"):format(i, field))
if i % 2 == 1 then
for j, word in ipairs(split.split(field)) do
print(("\t%d\t%s"):format(j, word))
end
end
end
outputs
1 ' this is a string '
1
2 this
3 is
4 a
5 string
6
2 '(containg some (well, many) annoying)'
3 ' parentheses and should be split. The string contains double spaces. What should be done? And what about trailing spaces? '
1
2 parentheses
3 and
4 should
5 be
6 split.
7 The
8 string
9 contains
10 double
11 spaces.
12 What
13 should
14 be
15 done?
16 And
17 what
18 about
19 trailing
20 spaces?
21

Error in reading data into 3x3 matrix in Fortran [duplicate]

I would like to read and store scientific formatted numbers from a txt file, which is formatted and the numbers are separated by tabulator.
This is what I have so far:
IMPLICIT NONE
REAL,ALLOCATABLE,DIMENSION(2) :: data(:,:)
INTEGER :: row,column
INTEGER :: j,i
CHARACTER(len=30) :: filename
CHARACTER(len=30) :: format
filename='data.txt'
open(86,file=filename,err=10)
write(*,*)'open data file'
read(86, *) row
read(86, *) column
allocate(data(row,column))
format='(ES14.7)'
do i=1,row
read(86,format) data(i,:)
enddo
close(86)
This is how the txt file looks like:
200
35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35 ...
The problem is that it doesn't read and store the correct values from the txt to the data variable. Is it the format causing the problem?
I would also like to know how to count the number of columns in this case. (I can count the rows by using read(86,*) in a for loop.)
Yes, your format is not good for the data you show. Better one should be like that read(99,'(6(E11.4,X))') myData(i,:).
However, I am not sure if you really need to use format at your reading at all.
Following example pretty close to what you are trying to do, and it is working bot with and without format.
program readdata
implicit none
real, allocatable :: myData(:,:)
real :: myLine
integer :: i, j, myRow, myColumn
character(len=30) :: myFileName
character(len=30) :: myFormat
myFileName='data.dat'
open(99, file=myFileName)
write(*,*)'open data file'
read(99, *) myRow
read(99, *) myColumn
allocate(myData(myRow,myColumn))
do i=1,myRow
read(99,*) myData(i,:)
!read(99,'(6(E11.4,X))') myData(i,:)
print*, myData(i,:)
enddo
close(99)
end program readdata
To test, I assumed that you have rows and columns always in the file, as you give, so my test data was following.
2
6
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
If you are really interested to read your files with a format and if the number of columns are not constant you may need a format depending on a variable, please see related discussions here.
Though there are no direct command to count the number of items in a line, we can count the number of periods or (E|e|D|d) by using the scan command. For example,
program main
implicit none
character(100) str
integer n
read( *, "(a)" ) str
call countreal( str, n )
print *, "number of items = ", n
contains
subroutine countreal( str, num )
implicit none
character(*), intent(in) :: str
integer, intent(out) :: num
integer pos, offset
num = 0
pos = 0
do
offset = scan( str( pos + 1 : ), "." ) !! (1) search for periods
!! offset = scan( str( pos + 1 : ), "EeDd" ) !! (2) search for (E|e|D|d)
if ( offset > 0 ) then
pos = pos + offset
num = num + 1
print *, "pos=", pos, "num=", num !! just for check
else
return
endif
enddo
endsubroutine
end
Please note that pattern (1) works only when all items have periods, while pattern (2) works only when all items have exponents:
# When compiled with (1)
$ echo "2.9900 2.8000E-35 2.6300D-35 2.46 2.31" | ./a.out
pos= 2 num= 1
pos= 10 num= 2
pos= 22 num= 3
pos= 34 num= 4
pos= 40 num= 5
number of items = 5
# When compiled with (2)
$ echo "2.9900E-35 2.8000D-35 2.6300e-35 2.4600d-35" | ./a.out
pos= 7 num= 1
pos= 19 num= 2
pos= 31 num= 3
pos= 43 num= 4
number of items = 4
For more general purposes, it may be more convenient to write a custom "split()" function that separate items with white spaces (or use an external library that supports a split function).

Importing csv file into J and using them as variable

I saved this data (20 vectors v)into csv file like this
v=:<"1 (? 20 2 $ 20)
makecsv v
v writecsv jpath'~temp/position.csv'
]vcsv =: freads jpath '~temp/position.csv'
fixcsv vcsv
, and I could import the csv file by
readcsv jpath '~temp/position.csv'
However, it doesn't give same result if I name it as
w=: readcsv jpath '~temp/position.csv'
diff=: ([{]) ,. ]
0 diff v
0 diff w
Actually, 0 diff w gives a length error
Is there any other approach should I use to have same results from both v(original) and w(imported csv data)?
Thank you!
I'm a J beginner so you may get a better answer later, but poking at it I think I have found something.
First, the tables/csv addon docs state that readcsv "Reads csv file into a boxed array," emphasis mine, while writecsv "Writes an array to a csv file." In other words, readcsv and writecsv are not symmetric operations. And the shapes of the values seem to confirm that:
$ w
1 20
$ v
20
This is also why the diff works for v but not w. If you simply unbox the result, it seems to work better:
0 diff 0 { w
┌───┬─────┐
│1 3│1 3 │
├───┼─────┤
...
├───┼─────┤
│1 3│5 8 │
└───┴─────┘
However, the shapes are still not exactly the same:
$ > v
20 2
$ > 0 { w
20 5
I think this is because readcsv doesn't know that your values are numeric; you probably need to throw a ". in there somewhere to decode them.
When you write the CSV file, you just have a bunch of ASCII characters. In this case, you've got numbers, spaces, and commas.
When you read the CSV, J has no guarantees about the format or contents. fixcsv gets your commas and line-breaks translated into a grid of cells, but J boxes it all to be safe, because it's a bunch of variable-length ASCII strings.
If you want to get back to v, you have two things you need to do. The first is to get the dimensions right. CSV files, pretty much by definition, are two-dimensional. If you change your example to write a two-dimensional array to the CSV, you'll find that you have the same shape after fixcsv readcsv.
u =: 4 5 $ v
u writecsv jpath'~temp/position.csv'
104
] t =: fixcsv freads jpath '~temp/position.csv'
┌────┬─────┬────┬────┬─────┐
│9 11│1 4 │8 3 │3 12│5 4 │
├────┼─────┼────┼────┼─────┤
│7 11│10 11│9 10│0 8 │6 16 │
├────┼─────┼────┼────┼─────┤
│13 8│17 12│13 2│5 19│17 14│
├────┼─────┼────┼────┼─────┤
│2 15│19 10│3 1 │12 7│14 13│
└────┴─────┴────┴────┴─────┘
$ v
20
$ u
4 5
$ t
4 5
If you're definitely dealing with a one-dimensional list (albeit of boxed number pairs), then you can Ravel (,) what you read to get it down to one dimension.
$ w
1 20
$ , w
20
Once you have them in the same shape, you need to convert the ASCII text into number arrays. Do that with Numbers (".).
10 * > {. v
90 110
10 * > {. , w
|domain error
| 10 *>{.,w
'a' , > {. , w
a9 11
10 * _ ". > {. , w
90 110

Read scientific formatted numbers from txt

I would like to read and store scientific formatted numbers from a txt file, which is formatted and the numbers are separated by tabulator.
This is what I have so far:
IMPLICIT NONE
REAL,ALLOCATABLE,DIMENSION(2) :: data(:,:)
INTEGER :: row,column
INTEGER :: j,i
CHARACTER(len=30) :: filename
CHARACTER(len=30) :: format
filename='data.txt'
open(86,file=filename,err=10)
write(*,*)'open data file'
read(86, *) row
read(86, *) column
allocate(data(row,column))
format='(ES14.7)'
do i=1,row
read(86,format) data(i,:)
enddo
close(86)
This is how the txt file looks like:
200
35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35 ...
The problem is that it doesn't read and store the correct values from the txt to the data variable. Is it the format causing the problem?
I would also like to know how to count the number of columns in this case. (I can count the rows by using read(86,*) in a for loop.)
Yes, your format is not good for the data you show. Better one should be like that read(99,'(6(E11.4,X))') myData(i,:).
However, I am not sure if you really need to use format at your reading at all.
Following example pretty close to what you are trying to do, and it is working bot with and without format.
program readdata
implicit none
real, allocatable :: myData(:,:)
real :: myLine
integer :: i, j, myRow, myColumn
character(len=30) :: myFileName
character(len=30) :: myFormat
myFileName='data.dat'
open(99, file=myFileName)
write(*,*)'open data file'
read(99, *) myRow
read(99, *) myColumn
allocate(myData(myRow,myColumn))
do i=1,myRow
read(99,*) myData(i,:)
!read(99,'(6(E11.4,X))') myData(i,:)
print*, myData(i,:)
enddo
close(99)
end program readdata
To test, I assumed that you have rows and columns always in the file, as you give, so my test data was following.
2
6
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
2.9900E-35 2.8000E-35 2.6300E-35 2.4600E-35 2.3100E-35 2.1600E-35
If you are really interested to read your files with a format and if the number of columns are not constant you may need a format depending on a variable, please see related discussions here.
Though there are no direct command to count the number of items in a line, we can count the number of periods or (E|e|D|d) by using the scan command. For example,
program main
implicit none
character(100) str
integer n
read( *, "(a)" ) str
call countreal( str, n )
print *, "number of items = ", n
contains
subroutine countreal( str, num )
implicit none
character(*), intent(in) :: str
integer, intent(out) :: num
integer pos, offset
num = 0
pos = 0
do
offset = scan( str( pos + 1 : ), "." ) !! (1) search for periods
!! offset = scan( str( pos + 1 : ), "EeDd" ) !! (2) search for (E|e|D|d)
if ( offset > 0 ) then
pos = pos + offset
num = num + 1
print *, "pos=", pos, "num=", num !! just for check
else
return
endif
enddo
endsubroutine
end
Please note that pattern (1) works only when all items have periods, while pattern (2) works only when all items have exponents:
# When compiled with (1)
$ echo "2.9900 2.8000E-35 2.6300D-35 2.46 2.31" | ./a.out
pos= 2 num= 1
pos= 10 num= 2
pos= 22 num= 3
pos= 34 num= 4
pos= 40 num= 5
number of items = 5
# When compiled with (2)
$ echo "2.9900E-35 2.8000D-35 2.6300e-35 2.4600d-35" | ./a.out
pos= 7 num= 1
pos= 19 num= 2
pos= 31 num= 3
pos= 43 num= 4
number of items = 4
For more general purposes, it may be more convenient to write a custom "split()" function that separate items with white spaces (or use an external library that supports a split function).

Reconstructing string after parsing and modifying numbers from it in Lua

I have strings like the following (quotation marks are only showing that there may be leading and trailing whitespaces), and I need to extract the numbers from the string, which may be integer or float, negative or non-negative.
" M0 0.5 l 20 0 0 20.34 -20 0q10 0 10 10 t 10 10 54.333 10 h -50 z"
After extracting the numbers I have to multiply them with random numbers, which the following function produces.
-- returns a random float number between the specified boundaries (floats)
function random_in_interval(lower_boundary, upper_boundary)
return ((math.random() * (upper_boundary - lower_boundary)) + lower_boundary)
end
At the end reconstruct the string with the characters and multiplied numbers in the correct order. Also all this has to happen in Lua, and I can't use any external libraries, since this will be used in a LuaTeX compiled document.
The case of the characters must not be changed, characters may or may not have spaces before and after them, but in the output it would be nice if there were. I have already written a helper function to add whitespace before and after characters, however when a character has a whitespace before or after it this will introduce multiple whitespaces, which I cannot solve at the moment.
-- adds whitespace before and after characters
function pad_characters(str)
local padded_str = ""
if #str ~= 0 then
for i = 1, #str, 1 do
local char = string.sub(str, i, i)
if string.match(char, '%a') ~= nil then
padded_str = padded_str .. " " .. char .. " "
else
padded_str = padded_str .. char
end
end
end
-- remove leading and trailing whitespaces
if #padded_str ~= 0 then
padded_str = string.match(padded_str, "^%s*(.-)%s*$")
end
return padded_str
end
I have no idea how I could parse, modify the numeric parts of the string, and reconstruct it in the correct order, and doing this in pure Lua without using any external libraries.
Try this. Adapt as needed.
s=" M0 0.5 l 20 0 0 20.34 -20 0q10 0 10 10 t 10 10 54.333 10 h -50 z"
print(s:gsub("%S+",function (x)
local y=tonumber(x)
if y then
return y*math.random()
else
return x
end
end))
I couldn't come up with anything better than processing each character, and decide if it is a number (digit, decimal point, negative sign) or anything else and act according to it.
-- returns a random float number between the specified boundaries (floats)
function random_in_interval(lower_boundary, upper_boundary)
return ((math.random() * (upper_boundary - lower_boundary)) + lower_boundary)
end
-- note: scaling is applied before randomization
function randomize_and_scale(str, scale_factor, lower_boundary, upper_boundary)
local previous_was_number = false
local processed_str = ""
local number = ""
for i = 1, #str, 1 do
local char = string.sub(str, i, i)
if previous_was_number then
if string.match(char, '%d') ~= nil or
char == "." then
number = number .. char
else -- scale and randomize
number = number * scale_factor
number = number * random_in_interval(lower_boundary, upper_boundary)
processed_str = processed_str .. number .. char
number = ""
previous_was_number = false
end
else
if string.match(char, '%d') ~= nil or
char == "-" then
number = number .. char
previous_was_number = true
else
processed_str = processed_str .. char
-- apply stuff
previous_was_number = false
end
end
end
return processed_str
end

Resources