Reconstructing string after parsing and modifying numbers from it in Lua - string

I have strings like the following (quotation marks are only showing that there may be leading and trailing whitespaces), and I need to extract the numbers from the string, which may be integer or float, negative or non-negative.
" M0 0.5 l 20 0 0 20.34 -20 0q10 0 10 10 t 10 10 54.333 10 h -50 z"
After extracting the numbers I have to multiply them with random numbers, which the following function produces.
-- returns a random float number between the specified boundaries (floats)
function random_in_interval(lower_boundary, upper_boundary)
return ((math.random() * (upper_boundary - lower_boundary)) + lower_boundary)
end
At the end reconstruct the string with the characters and multiplied numbers in the correct order. Also all this has to happen in Lua, and I can't use any external libraries, since this will be used in a LuaTeX compiled document.
The case of the characters must not be changed, characters may or may not have spaces before and after them, but in the output it would be nice if there were. I have already written a helper function to add whitespace before and after characters, however when a character has a whitespace before or after it this will introduce multiple whitespaces, which I cannot solve at the moment.
-- adds whitespace before and after characters
function pad_characters(str)
local padded_str = ""
if #str ~= 0 then
for i = 1, #str, 1 do
local char = string.sub(str, i, i)
if string.match(char, '%a') ~= nil then
padded_str = padded_str .. " " .. char .. " "
else
padded_str = padded_str .. char
end
end
end
-- remove leading and trailing whitespaces
if #padded_str ~= 0 then
padded_str = string.match(padded_str, "^%s*(.-)%s*$")
end
return padded_str
end
I have no idea how I could parse, modify the numeric parts of the string, and reconstruct it in the correct order, and doing this in pure Lua without using any external libraries.

Try this. Adapt as needed.
s=" M0 0.5 l 20 0 0 20.34 -20 0q10 0 10 10 t 10 10 54.333 10 h -50 z"
print(s:gsub("%S+",function (x)
local y=tonumber(x)
if y then
return y*math.random()
else
return x
end
end))

I couldn't come up with anything better than processing each character, and decide if it is a number (digit, decimal point, negative sign) or anything else and act according to it.
-- returns a random float number between the specified boundaries (floats)
function random_in_interval(lower_boundary, upper_boundary)
return ((math.random() * (upper_boundary - lower_boundary)) + lower_boundary)
end
-- note: scaling is applied before randomization
function randomize_and_scale(str, scale_factor, lower_boundary, upper_boundary)
local previous_was_number = false
local processed_str = ""
local number = ""
for i = 1, #str, 1 do
local char = string.sub(str, i, i)
if previous_was_number then
if string.match(char, '%d') ~= nil or
char == "." then
number = number .. char
else -- scale and randomize
number = number * scale_factor
number = number * random_in_interval(lower_boundary, upper_boundary)
processed_str = processed_str .. number .. char
number = ""
previous_was_number = false
end
else
if string.match(char, '%d') ~= nil or
char == "-" then
number = number .. char
previous_was_number = true
else
processed_str = processed_str .. char
-- apply stuff
previous_was_number = false
end
end
end
return processed_str
end

Related

Removing a specific digit from a number that was specified by the user

I tried to make a Python program that removes specific digit from a number, example a = 12025 k = 2 result is 105, however none of the guides helped me do that, can anybody help me with that?
Conversion to string does not seem elegant.
As pseudo-code:
number without digit (number, digit)
if number == digit
0
else if number < 10
number
else if number % 10 == digit
number without digit (number / 10, digit)
else
number without digit (number / 10, digit) * 10 + (number % 10)
Where / is integer division, truncating the remainder, and % is the modulo, remainder.
So it is a matter of recursion.
You have to convert into str type, then remove the occurrencies, and go back to int
int(str(a).replace(str(k),''))
a = 12025
k = 2
print(int(str(a).replace(str(k), '')))
If you want to use math rather than converting to and from string you can do
a = 12025
k = 2
result = 0
exp = 0
while a:
a, remainder = divmod(a, 10)
if remainder != k:
result = result + 10**exp * remainder
exp += 1

How would I undo the actions of string.gmatch for a certain section of string in lua

So I am using lua and splitting a string by spaces to write a sort of sub-language. And I am trying to have it not split anything inside parenthesis, I am already at the stage where I can detect whether there is parenthesis. But I want to reverse the gmatching of the string inside the parenthesis as I want to preserve the string contained within.
local function split(strng)
local __s={}
local all_included={}
local flag_table={}
local uncompiled={}
local flagged=false
local flagnum=0
local c=0
for i in string.gmatch(strng,'%S+') do
c=c+1
table.insert(all_included,i)
if(flagged==false)then
if(string.find(i,'%('or'%['or'%{'))then
flagged=true
flag_table[tostring(c)]=1
table.insert(uncompiled,i)
print'flagged'
else
table.insert(__s,i)
end
elseif(flagged==true)then
table.insert(uncompiled,i)
if(string.find(i,'%)' or '%]' or '%}'))then
flagged=false
local __=''
for i=1,#uncompiled do
__=__ .. uncompiled[i]
end
table.insert(__s,__)
print'unflagged'
end
end
end
return __s;
end
This is my splitting code
I would just not use gmatch for this at all.
local input = " this is a string (containg some (well, many) annoying) parentheses and should be split. The string contains double spaces. What should be done? And what about trailing spaces? "
local pos = 1
local words = {}
local last_start = pos
while pos <= #input do
local char = string.byte(input, pos)
if char == string.byte(" ") then
table.insert(words, string.sub(input, last_start, pos - 1))
last_start = pos + 1
elseif char == string.byte("(") then
local depth = 1
while depth ~= 0 and pos + 1 < #input do
local char = string.byte(input, pos + 1)
if char == string.byte(")") then
depth = depth - 1
elseif char == string.byte("(") then
depth = depth + 1
end
pos = pos + 1
end
end
pos = pos + 1
end
table.insert(words, string.sub(input, last_start))
for k, v in pairs(words) do
print(k, "'" .. v .. "'")
end
Output:
1 ''
2 'this'
3 'is'
4 'a'
5 'string'
6 '(containg some (well, many) annoying)'
7 'parentheses'
8 'and'
9 'should'
10 'be'
11 'split.'
12 'The'
13 'string'
14 'contains'
15 ''
16 'double'
17 ''
18 ''
19 'spaces.'
20 'What'
21 'should'
22 'be'
23 'done?'
24 'And'
25 'what'
26 'about'
27 'trailing'
28 'spaces?'
29 ''
Thinking about trailing spaces and other such problems is left as an exercise for the reader. I tried to highlight some of the possible problems with the example that I used. Also, I only looked at one kind of parenthesis since I do not want to think how this (string} should be ]parsed.
Oh and if nested parenthesis are not a concerned: Most of the code above can be replaced with a call to string.find(input, ")", pos, true) to find the closing parenthesis.
Please note that you cannot or or and patterns as attempted in your code.
"%(" or "%[" equals "%("
Lua will interpret that expression left to right. "%( is a true value Lua will reduce the expression to "%(", which logically is the same as the full expression.
So string.find(i,'%('or'%['or'%{') will only find ('s in i.
As a similar but slightly different approach to Uli's answer, I would first split by parentheses. Then you can split the the odd-numbered fields on whitespace:
split = require("split") -- https://luarocks.org/modules/telemachus/split
split__by_parentheses = function(input)
local fields = {}
local level = 0
local field = ""
for i = 1, #input do
local char = input:sub(i, i)
if char == "(" then
if level == 0 then
-- add non-parenthesized field to list
fields[#fields+1] = field
field = ""
end
level = level + 1
end
field = field .. char
if char == ")" then
level = level - 1
assert(level >= 0, 'Mismatched parentheses')
if level == 0 then
-- add parenthesized field to list
fields[#fields+1] = field
field = ""
end
end
end
assert(level == 0, 'Mismatched parentheses')
fields[#fields+1] = field
return fields
end
input = " this is a string (containg some (well, many) annoying) parentheses and should be split. The string contains double spaces. What should be done? And what about trailing spaces? "
fields = split__by_parentheses(input)
for i, field in ipairs(fields) do
print(("%d\t'%s'"):format(i, field))
if i % 2 == 1 then
for j, word in ipairs(split.split(field)) do
print(("\t%d\t%s"):format(j, word))
end
end
end
outputs
1 ' this is a string '
1
2 this
3 is
4 a
5 string
6
2 '(containg some (well, many) annoying)'
3 ' parentheses and should be split. The string contains double spaces. What should be done? And what about trailing spaces? '
1
2 parentheses
3 and
4 should
5 be
6 split.
7 The
8 string
9 contains
10 double
11 spaces.
12 What
13 should
14 be
15 done?
16 And
17 what
18 about
19 trailing
20 spaces?
21

Reading a file of lists of integers in Fortran

I would like to read a data file with a Fortran program, where each line is a list of integers.
Each line has a variable number of integers, separated by a given character (space, comma...).
Sample input:
1,7,3,2
2,8
12,44,13,11
I have a solution to split lines, which I find rather convoluted:
module split
implicit none
contains
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
integer :: i, j, k, n, m, p, r
character(*) :: str
character :: sep, c
character(:), allocatable :: tmp
!First pass: find number of items (m), and maximum length of an item (r)
n = len_trim(str)
m = 1
j = 0
r = 0
do i = 1, n
if(str(i:i) == sep) then
m = m + 1
r = max(r, j)
j = 0
else
j = j + 1
end if
end do
r = max(r, j)
allocate(a(m))
allocate(character(r) :: tmp)
!Second pass: copy each item into temporary string (tmp),
!read an integer from tmp, and write this integer in the output array (a)
tmp(1:r) = " "
j = 0
k = 0
do i = 1, n
c = str(i:i)
if(c == sep) then
k = k + 1
read(tmp, *) p
a(k) = p
tmp(1:r) = " "
j = 0
else
j = j + 1
tmp(j:j) = c
end if
end do
k = k + 1
read(tmp, *) p
a(k) = p
deallocate(tmp)
end function
end module
My question:
Is there a simpler way to do this in Fortran? I mean, reading a list of values where the number of values to read is unknown. The above code looks awkward, and file I/O does not look easy in Fortran.
Also, the main program has to read lines with unknown and unbounded length. I am able to read lines if I assume they are all the same length (see below), but I don't know how to read unbounded lines. I suppose it would need the stream features of Fortran 2003, but I don't know how to write this.
Here is the current program:
program read_data
use split
implicit none
integer :: q
integer, allocatable :: a(:)
character(80) :: line
open(unit=10, file="input.txt", action="read", status="old", form="formatted")
do
read(10, "(A80)", iostat=q) line
if(q /= 0) exit
if(line(1:1) /= "#") then
a = string_to_integers(line, ",")
print *, ubound(a), a
end if
end do
close(10)
end program
A comment about the question: usually I would do this in Python, for example converting a line would be as simple as a = [int(x) for x in line.split(",")], and reading a file is likewise almost a trivial task. And I would do the "real" computing stuff with a Fortran DLL. However, I'd like to improve my Fortran skills on file I/O.
I don't claim it is the shortest possible, but it is much shorter than yours. And once you have it, you can reuse it. I don't completely agree with these claims how Fotran is bad at string processing, I do tokenization, recursive descent parsing and similar stuff just fine in Fortran, although it is easier in some other languages with richer libraries. Sometimes you can use the libraries written in other languages (especially C and C++) in Fortran too.
If you always use the comma you can remove the replacing by comma and thus shorten it even more.
function string_to_integers(str, sep) result(a)
integer, allocatable :: a(:)
character(*) :: str
character :: sep
integer :: i, n_sep
n_sep = 0
do i = 1, len_trim(str)
if (str(i:i)==sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
allocate(a(n_sep+1))
read(str,*) a
end function
Potential for shortening: view the str as a character array using equivalence or transfer and use count() inside of allocate to get the size of a.
The code assumes that there is just one separator between each number and there is no separator before the first one. If multiple separators are allowed between two numbers, you have to check whether the preceding character is a separator or not
do i = 2, len_trim(str)
if (str(i:i)==sep .and. str(i-1:i-1)/=sep) then
n_sep = n_sep + 1
str(i:i) = ','
end if
end do
My answer is probably too simplistic for your goals but I have spent a lot of time recently reading in strange text files of numbers. My biggest problem is finding where they start (not hard in your case) then my best friend is the list-directed read.
read(unit=10,fmt=*) a
will read in all of the data into vector 'a', done deal. With this method you will not know which line any piece of data came from. If you want to allocate it then you can read the file once and figure out some algorithm to make the array larger than it needs to be, like maybe count the number of lines and you know a max data amount per line (say 21).
status = 0
do while ( status == 0)
line_counter = line_counter + 1
read(unit=10,, iostat=status, fmt=*)
end do
allocate(a(counter*21))
If you want to then eliminate zero values you can remove them or pre-seed the 'a' vector with a negative number if you don't expect any then remove all of those.
Another approach stemming from the other suggestion is to first count the commas then do a read where the loop is controlled by
do j = 1, line_counter ! You determined this on your first read
read(unit=11,fmt=*) a(j,:) ! a is now a 2 dimensional array (line_counter, maxNumberPerLine)
! You have a separate vector numberOfCommas(j) from before
end do
And now you can do whatever you want with these two arrays because you know all the data, which line it came from, and how many data were on each line.

Evaluate equation but ignore order of mathematical operations and parentheses

I'm trying to write a function that ignores the order of mathematical operations and parentheses. The function just evaluates operators from left to right. (for +-*/^)
Example 1: 5 - 3 * 8^2 returns 256.
Example 2: 4 / 2 - 1^2 + (5*3) returns 18.
Here's what I did:
function out = calc(num)
[curNum, num] = strtok(num, '+-*/^');
out = str2num(curNum);
while ~isempty(num)
sign = num(1);
[curNum, num] = strtok(num, '+-*/^');
switch sign
case '+'
out = out + str2num(curNum);
case'-'
out = out - str2num(curNum);
case '*'
out = out.*str2num(curNum);
case '/'
out = out./str2num(curNum);
case '^'
out = out.^str2num(curNum);
end
end
end
My function doesn't ignore the left to right rule. How do I correct for this?
Your first example fails because you are splitting the string with the +-*/ delimiters, and you omitted the ^. You should change this to +-*/^ in lines 2 and 6.
Your second example fails because you aren't telling your program how to ignore the ( and ) characters. You should strip them before you enter the switch statement.
curNum = strrep(curNum,'(','')
curNum = strrep(curNum,')','')
switch sign
...
This is a way without any switch statements.
str = '4 / 2 - 1^2 + (5*3)'
%// get rid of spaces and brackets
str(regexp(str,'[ ()]')) = []
%// get numbers
[numbers, operators] = regexp(str, '\d+', 'match','split')
%// get number of numbers
n = numel(numbers);
%// reorder string with numbers closing brackets and operators
newStr = [numbers; repmat({')'},1,n); operators(2:end)];
%// add opening brackets at the beginning
newStr = [repmat('(',1,n) newStr{:}]
%// evaluate
result = eval(newStr)
str =
4/2-1^2+5*3
newStr =
((((((4)/2)-1)^2)+5)*3)
result =
18

How to find all combinations of a multiset in a string in linear time?

I am given a bag B (multiset) of characters with the size m and a string text S of size n. Is it possible to find all substrings that can be created by B (4!=24 combinations) in S in linear time O(n)?
Example:
S = abdcdbcdadcdcbbcadc (n=19)
B = {b, c, c, d} (m=4)
Result: {cdbc (Position 3), cdcb (Position 10)}
The fastest solution I found is to keep a counter for each character and compare it with the Bag in each step, thus the runtime is O(n*m). Algorithm can be shown if needed.
There is a way to do it in O(n), assuming we're only interested in substrings of length m (otherwise it's impossible, because for the bag that has all characters in the string, you'd have to return all substrings of s, which means a O(n^2) result that can't be computed in O(n)).
The algorithm is as follows:
Convert the bag to a histogram:
hist = []
for c in B do:
hist[c] = hist[c] + 1
Initialize a running histogram that we're going to modify (histrunsum is the total count of characters in histrun):
histrun = []
histrunsum = 0
We need two operations: add a character to the histogram and remove it. They operate as follows:
add(c):
if hist[c] > 0 and histrun[c] < hist[c] then:
histrun[c] = histrun[c] + 1
histrunsum = histrunsum + 1
remove(c):
if histrun[c] > 0 then:
histrun[c] = histrun[c] - 1
histrunsum = histrunsum + 1
Essentially, histrun captures the amount of characters that are present in B in current substring. If histrun is equal to hist, our substring has the same characters as B. histrun is equal to hist iff histrunsum is equal to length of B.
Now add first m characters to histrun; if histrunsum is equal to length of B; emit first substring; now, until we reach the end of string, remove the first character of the current substring and add the next character.
add, remove are O(1) since hist and histrun are arrays; checking if hist is equal to histrun is done by comparing histrunsum to length(B), so it's also O(1). Loop iteration count is O(n), the resulting running time is O(n).
Thanks for the answer. The add() and remove() methods have to be changed to make the algorithm work correctly.
add(c):
if hist[c] > 0 and histrun[c] < hist[c] then
histrunsum++
else
histrunsum--
histrun[c] = histrun[c] + 1
remove(c):
if histrun[c] > hist[c] then
histrunsum++
else
histrunsum--
histrun[c] = histrun[c] - 1
Explanation:
histrunsum can be seen as a score of how identical both multisets are.
add(c): when there are less occurrences of a char in the histrun multiset than in the hist multiset, the additional occurrence of that char has to be "rewarded" since the histrun multiset is getting closer to the hist multiset. If there are at least equal or more chars in the histrun set already, and additional char is negative.
remove(c): like add(c), where a removal of a char is weighted positively when it's number in the histrun multiset > hist multiset.
Sample Code (PHP):
function multisetSubstrings($sequence, $mset)
{
$multiSet = array();
$substringLength = 0;
foreach ($mset as $char)
{
$multiSet[$char]++;
$substringLength++;
}
$sum = 0;
$currentSet = array();
$result = array();
for ($i=0;$i<strlen($sequence);$i++)
{
if ($i>=$substringLength)
{
$c = $sequence[$i-$substringLength];
if ($currentSet[$c] > $multiSet[$c])
$sum++;
else
$sum--;
$currentSet[$c]--;
}
$c = $sequence[$i];
if ($currentSet[$c] < $multiSet[$c])
$sum++;
else
$sum--;
$currentSet[$c]++;
echo $sum."<br>";
if ($sum==$substringLength)
$result[] = $i+1-$substringLength;
}
return $result;
}
Use hashing. For each character in the multiset, assign a UNIQUE prime number. Compute the hash for any string by multiplying the prime number associated with a number, as many times as the frequency of that number.
Example : CATTA. Let C = 2, A=3, T = 5. Hash = 2*3*5*5*3 = 450
Hash the multiset ( treat it as a string ). Now go through the input string, and compute the hash of each substring of length k ( where k is the number of characters in the multiset ). Check if this hash matches the multiset hash. If yes, then it is one such occurence.
The hashes can be computed very easily in linear time as follows :
Let multiset = { A, A, B, C }, A=2, B=3, C=5.
Multiset hash = 2*2*3*5 = 60
Let text = CABBAACCA
(i) CABB = 5*2*3*3 = 90
(ii) Now, the next letter is A, and the letter discarded is the first one, C. So the new hash = ( 90/5 )*2 = 36
(iii) Now, A is discarded, and A is also added, so new hash = ( 36/2 ) * 2= 36
(iv) Now B is discarded, and C is added, so hash = ( 36/3 ) * 5 = 60 = multiset hash. Thus we have found one such required occurence - BAAC
This procedure will obviously take O( n ) time.

Resources