python pyparsing scanString - wrong start/end location

python pyparsing scanString - wrong start/end location - python-3.x

I'm trying to find the start and end location of a typical token in a text with the scanString function.
text = """
P: INT;
timer2.et == 3423
Q : INT ;
TIME1: TIME;
TIME2: TIME;
TIMER_Q3 : BOOL;
WECHSEL : BOOL;
m : BOOL;
timer.q = 4
"""
From this text I want to find the location of the XXX.et and the XXX.q tokens:
import pyparsing as pp
TK_TIMER_Q_ET = pp.Word(pp.alphanums + "_") + (pp.Literal(".q") | pp.Literal(".et"))
t_end = []
t_match = []
t_start = []
for match, start, end in TK_TIMER_Q_ET.scanString(text):
t_match.append(match)
t_start.append(start)
t_end.append(end)
i = len(t_match) - 1
k = 0
while k <= i:
print("t_end=", t_end[k])
print("t_start=", t_start[k])
print("t_match=", t_match[k])
print("match=", text[t_start[k]:t_end[k]])
k += 1
As an output I expect "timer2.et" and "timer.q" when I print "match=...", but I get:
t_end= 35
t_start= 26
t_match= ['timer2', '.et']
match= 423
Q
t_end= 189
t_start= 182
t_match= ['timer', '.q']
match=
Would be awesome if somebody could help me with that issue!

What you are missing is the grouping of the characters to make one identifier. Try changing the code the following way:
K_TIMER_Q_ET = pp.Group(pp.Word(pp.alphanums + "_") + (pp.Literal(".q") | pp.Literal(".et")))
Works for me:
('t_end=', 27)
('t_start=', 18)
('t_match=', ([(['timer2', '.et'], {})], {}))
('match=', 'timer2.et')
('t_end=', 153)
('t_start=', 146)
('t_match=', ([(['timer', '.q'], {})], {}))
('match=', 'timer.q')

Related

MATLAB string 2 number of table

I have a 3-year data in a string tableformat.txt. Three of its lines are given below:
12-13 Jan -10.5
14-15 Jan -9.992
15-16 Jan -8
How to change the 3rd column (-10.5, -9.992 and -8) of string to be (-10.500, -9.992 and -8.000) of number?
I have made the following script:
clear all; clc;
filename='tableformat.txt';
fid = fopen(filename);
N = 3;
for i = [1:N]
line = fgetl(fid)
a = line(10:12);
na = str2num(a);
ma(i) = na;
end
ma
which gives:
ma = -1 -9 -8
When I did this change: a = line(10:15);, I got:
Error message: Index exceeds matrix dimensions.

This will work for you.
clear all;
clc;
filename='tableformat.txt';
filename2='tableformat2.txt';
fid = fopen(filename);
fid2 = fopen(filename2,'w');
formatSpec = '%s %s %6.4f\n';
N = 3;
for row = [1:N]
line = fgetl(fid);
a = strsplit(line,' ');
a{3}=cellfun(#str2num,a(3));
fprintf(fid2, formatSpec,a{1,:});
end
fclose(fid);
fclose(fid2);

Getting all strings in a lua script

I'm trying to encode some strings in my lua script, and since that I have a lua script with over 200k characters, encrypting each string query in the script with a function such as this example below
local string = "stackoverflow"
local string = [[stackoverflow]]
local string = [==[stackoverflow]==]
local string = 'stackoverflow'
to
local string=decode("jkrtbfmviwcfn",519211)
Trying to provide all above results to thread through a gsub and have the gsub encode the string text with a random offset number.
So far, I was only capable of gsubbing full quotation marks through.
function encode(x,offset,a)
for char in string.gmatch(x, "%a") do
local encrypted = string.byte(char) + offset
while encrypted > 122 do
encrypted = encrypted - 26
end
while encrypted < 97 do
encrypted = encrypted + 26
end
a[#a+1] = string.char(encrypted)
end
return table.concat(a)
end
luacode=[==[thatstring.Value="Encryptme!" testvalue.Value=[[string with
a linebreak]] string.Text="STOP!"]==]
luacode=luacode:gsub([=["(.-)"]=],function(s)
print("Caught "..s)
local offset=math.random(1,4)
local encoded=encode(s,offset,{})
return [[decode("]]..encoded..[[",]]..offset..[[)]]
end)
print("\n"..luacode)
With its output being
Caught Encryptme!
Caught STOP!
thatstring.Value=decode("crgvctxqi",4) testvalue.Value=[[string with
a linebreak]] string.Text=decode("opkl",2)
Any better solutions?

local function strings_and_comments(lua_code, callback)
-- lua_code must be valid Lua code (an error may be raised on syntax error)
-- callback will be invoked as callback(object_type, value, start_pos, end_pos)
-- callback("comment", comment_text, start_pos, end_pos) -- for comments
-- callback("string", string_value, start_pos, end_pos) -- for string literals
local objects = {} -- possible comments and string literals in the code
-- search for all start positions of comments (with false positives)
for pos, br1, eq, br2 in lua_code:gmatch"()%-%-(%-*%[?)(=*)(%[?)" do
table.insert(objects, {start_pos = pos,
terminator = br1 == "[" and br2 == "[" and "]"..eq.."]" or "\n"})
end
-- search for all start positions of string literals (with false positives)
for pos, eq in lua_code:gmatch"()%[(=*)%[[%[=]*" do
table.insert(objects, {is_string = true, start_pos = pos,
terminator = "]"..eq.."]"})
end
for pos, quote in lua_code:gmatch"()(['\"])" do
table.insert(objects, {is_string = true, start_pos = pos, quote = quote})
end
table.sort(objects, function(a, b) return a.start_pos < b.start_pos end)
local end_pos = 0
for _, object in ipairs(objects) do
local start_pos, ok, symbol = object.start_pos
if start_pos > end_pos then
if object.terminator == "\n" then
end_pos = lua_code:find("\n", start_pos + 1, true) or #lua_code
-- exclude last spaces and newline
while lua_code:sub(end_pos, end_pos):match"%s" do
end_pos = end_pos - 1
end
elseif object.terminator then
ok, end_pos = lua_code:find(object.terminator, start_pos + 1, true)
assert(ok, "Not a valid Lua code")
else
end_pos = start_pos
repeat
ok, end_pos, symbol = lua_code:find("(\\?.)", end_pos + 1)
assert(ok, "Not a valid Lua code")
until symbol == object.quote
end
local value = lua_code:sub(start_pos, end_pos):gsub("^%-*%s*", "")
if object.terminator ~= "\n" then
value = assert((loadstring or load)("return "..value))()
end
callback(object.is_string and "string" or "comment", value, start_pos, end_pos)
end
end
end
local inv256
local function encode(str)
local seed = math.random(0x7FFFFFFF)
local result = '",'..seed..'))'
if not inv256 then
inv256 = {}
for M = 0, 127 do
local inv = -1
repeat inv = inv + 2
until inv * (2*M + 1) % 256 == 1
inv256[M] = inv
end
end
repeat
seed = seed * 3
until seed > 2^43
local K = 8186484168865098 + seed
result = '(decode("'..str:gsub('.',
function(m)
local L = K % 274877906944 -- 2^38
local H = (K - L) / 274877906944
local M = H % 128
m = m:byte()
local c = (m * inv256[M] - (H - M) / 128) % 256
K = L * 21271 + H + c + m
return ('%02x'):format(c)
end
)..result
return result
end
function hide_strings_in_lua_code(lua_code)
local text = { [[
local function decode(str, seed)
repeat
seed = seed * 3
until seed > 2^43
local K = 8186484168865098 + seed
return (str:gsub('%x%x',
function(c)
local L = K % 274877906944 -- 2^38
local H = (K - L) / 274877906944
local M = H % 128
c = tonumber(c, 16)
local m = (c + (H - M) / 128) * (2*M + 1) % 256
K = L * 21271 + H + c + m
return string.char(m)
end
))
end
]] }
local pos = 1
strings_and_comments(lua_code,
function (object_type, value, start_pos, end_pos)
if object_type == "string" then
table.insert(text, lua_code:sub(pos, start_pos - 1))
table.insert(text, encode(value))
pos = end_pos + 1
end
end)
table.insert(text, lua_code:sub(pos))
return table.concat(text)
end
Usage:
math.randomseed(os.time())
-- This is the program to be converted
local luacode = [===[
print"Hello world!"
print[[string with
a linebreak]]
local str1 = "stackoverflow"
local str2 = [[stackoverflow]]
local str3 = [==[stackoverflow]==]
local str4 = 'stackoverflow'
print(str1)
print(str2)
print(str3)
print(str4)
]===]
-- Conversion
print(hide_strings_in_lua_code(luacode))
Output (converted program)
local function decode(str, seed)
repeat
seed = seed * 3
until seed > 2^43
local K = 8186484168865098 + seed
return (str:gsub('%x%x',
function(c)
local L = K % 274877906944 -- 2^38
local H = (K - L) / 274877906944
local M = H % 128
c = tonumber(c, 16)
local m = (c + (H - M) / 128) * (2*M + 1) % 256
K = L * 21271 + H + c + m
return string.char(m)
end
))
end
print(decode("ef869b23b69b7fbc7f89bbe7",2686976))
print(decode("c2dc20f7061c452db49302f8a1d9317aad1009711e0984",1210253312))
local str1 = (decode("84854df4599affe9c894060431",415105024))
local str2 = (decode("a5d7db792f0b514417827f34e3",1736704000))
local str3 = (decode("6a61bcf9fd6f403ed1b4846e58",1256259584))
local str4 = (decode("cad56d9dea239514aca9c8b8e0",1030488064))
print(str1)
print(str2)
print(str3)
print(str4)
Output of output (output produced by the converted program)
Hello world!
string with
a linebreak
stackoverflow
stackoverflow
stackoverflow
stackoverflow

Compare to string of names

I am trying to compare the names of two strings, and trying to pick out the name that are not included in the other string.
h = 1;
for i = 1:name_size_main
checker = 0;
main_name = main(i);
for j = 1:name_size_image
image_name = image(j);
temp = strcmpi(image_name, main_name);
if temp == 1;
checker = temp;
end
end
if checker == 0
result(h) = main_name;
h = h+1;
end
end
but it keeps returning the entire string as result, the main string contain roughly 1000 names, the images name contain about 300 names, so it should return about 700 names in result but it keep returning all 1000 names.

I tried your code with small vectors:
main = ['aaa' 'bbb' 'ccc' 'ddd'];
image = ['bbb' 'ddd'];
name_size_main = size(main,2);
name_size_image = size(image,2);
h = 1;
for i = 1:name_size_main
checker = 0;
main_name = main(i);
for j = 1:name_size_image
image_name = image(j);
temp = strcmpi(image_name, main_name);
if temp == 1;
checker = temp;
end
end
if checker == 0
result(h) = main_name;
h = h+1;
end
end
I get result = 'aaaccc', is it not what you want to get?
EDIT:
If you are using cell arrays, you should change the line result(h) = main_name; to result{h} = main_name; like that:
main = {'aaa' 'bbb' 'ccc' 'ddd'};
image = {'bbb' 'ddd'};
name_size_main = size(main,2);
name_size_image = size(image,2);
result = cell(0);
h = 1;
for i = 1:name_size_main
checker = 0;
main_name = main(i);
for j = 1:name_size_image
image_name = image(j);
temp = strcmpi(image_name, main_name);
if temp == 1;
checker = temp;
end
end
if checker == 0
result{h} = main_name;
h = h+1;
end
end

You can use cells of string along with setdiff or setxor.
A = cellstr(('a':'t')') % a cell of string, 'a' to 't'
B = cellstr(('f':'z')') % 'f' to 'z'
C1 = setdiff(A,B,'rows') % gives 'a' to 'e'
C2 = setdiff(B,A,'rows') % gives 'u' to 'z'
C3 = setxor(A,B,'rows') % gives 'a' to 'e' and 'u' to 'z'

Find the last index of a character in a string

I want to have ability to use a lastIndexOf method for the strings in my Lua (Luvit) project. Unfortunately there's no such method built-in and I'm bit stuck now.
In Javascript it looks like:
'my.string.here.'.lastIndexOf('.') // returns 14

function findLast(haystack, needle)
local i=haystack:match(".*"..needle.."()")
if i==nil then return nil else return i-1 end
end
s='my.string.here.'
print(findLast(s,"%."))
print(findLast(s,"e"))
Note that to find . you need to escape it.

If you have performance concerns, then this might be a bit faster if you're using Luvit which uses LuaJIT.
local find = string.find
local function lastIndexOf(haystack, needle)
local i, j
local k = 0
repeat
i = j
j, k = find(haystack, needle, k + 1, true)
until j == nil
return i
end
local s = 'my.string.here.'
print(lastIndexOf(s, '.')) -- This will be 15.
Keep in mind that Lua strings begin at 1 instead of 0 as in JavaScript.

Here’s a solution using
LPeg’s position capture.
local lpeg = require "lpeg"
local Cp, P = lpeg.Cp, lpeg.P
local lpegmatch = lpeg.match
local cache = { }
local find_last = function (str, substr)
if not (str and substr)
or str == "" or substr == ""
then
return nil
end
local pat = cache [substr]
if not pat then
local p_substr = P (substr)
local last = Cp() * p_substr * Cp() * (1 - p_substr)^0 * -1
pat = (1 - last)^0 * last
cache [substr] = pat
end
return lpegmatch (pat, str)
end
find_last() finds the last occurence of substr in the string
str, where substr can be a string of any length.
The first return value is the position of the first character of
substr in str, the second return value is the position of the
first character following substr (i.e. it equals the length of the
match plus the first return value).
Usage:
local tests = {
A = [[fooA]], --> 4, 5
[""] = [[foo]], --> nil
FOO = [[]], --> nil
K = [[foo]], --> nil
X = [[X foo X bar X baz]], --> 13, 14
XX = [[foo XX X XY bar XX baz X]], --> 17, 19
Y = [[YYYYYYYYYYYYYYYYYY]], --> 18, 19
ZZZ = [[ZZZZZZZZZZZZZZZZZZ]], --> 14, 17
--- Accepts patterns as well!
[P"X" * lpeg.R"09"^1] = [[fooX42barXxbazX]], --> 4, 7
}
for substr, str in next, tests do
print (">>", substr, str, "->", find_last (str, substr))
end

To search for the last instance of string needle in haystack:
function findLast(haystack, needle)
--Set the third arg to false to allow pattern matching
local found = haystack:reverse():find(needle:reverse(), nil, true)
if found then
return haystack:len() - needle:len() - found + 2
else
return found
end
end
print(findLast("my.string.here.", ".")) -- 15, because Lua strings are 1-indexed
print(findLast("my.string.here.", "here")) -- 11
print(findLast("my.string.here.", "there")) -- nil
If you want to search for the last instance of a pattern instead, change the last argument to find to false (or remove it).

Can be optimized but simple and does the work.
function lastIndexOf(haystack, needle)
local last_index = 0
while haystack:sub(last_index+1, haystack:len()):find(needle) ~= nil do
last_index = last_index + haystack:sub(last_index+1, haystack:len()):find(needle)
end
return last_index
end
local s = 'my.string.here.'
print(lastIndexOf(s, '%.')) -- 15

Modified longest common substring

Given two strings what is an efficient algorithm to find the number and length of longest common sub-strings with the sub-strings being called common if :
1) they have at-least x% characters same and at same position.
2) the start and end indexes of the sub-strings being same.
Ex :
String 1 -> abedefkhj
String 2 -> kbfdfjhlo
suppose the x% being asked is 40,then, ans is,
5 1
where 5 is the longest length and 1 is the number of sub-strings in each string satisfying the given property. Sub-String is "abede" in string 1 and "kbfdf" in string 2.

You can use smth like Levenshtein distance without deleting and inserting.
Build the table, where every element [i, j] is error for substring from position [i] to position [j].
foo(string a, string b, int x):
len = min(a.length, b.length)
error[0][0] = 0 if a[0] == b[0] else 1;
for (end: [1 -> len-1]):
for (start: [end -> 0]):
if a[end] == b[end]:
error[start][end] = error[start][end - 1]
else:
error[start][end] = error[start][end - 1] + 1
best_len = 0;
best_pos = 0;
for (i: [0 -> len-1]):
for (j: [i -> 0]):
len = i - j + 1
error_percent = 100 * error[i][j] / len
if (error_percent <= x and len > best_len):
best_len = len
best_pos = j
return (best_len, best_pos)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

python pyparsing scanString - wrong start/end location - python-3.x

Related

MATLAB string 2 number of table

Getting all strings in a lua script

Compare to string of names

Find the last index of a character in a string

Modified longest common substring

Categories

Resources