How to parse a configuration file (kind of a CSV format) using LUA

How to parse a configuration file (kind of a CSV format) using LUA - string

I am using LUA on a small ESP8266 chip, trying to parse a text string that looks like the one below. I am very new at LUA, and tried many similar scripts found at this forum.
data="
-- String to be parsed\r\n
Tiempo1,20\r\n
Tiempo2a,900\r\n
Hora2b,27\r\n
Tiempo2b,20\r\n
Hora2c,29\r\n
Tiempo2c,18\r\n"
My goal would be to parse the string, and return all the configuration pairs (name/value).
If needed, I can modify the syntax of the config file because it is created by me.
I have been trying something like this:
var1,var2 = data:match("([Tiempo2a,]), ([^,]+)")
But it is returning nil,nil. I think I am on the very wrong way to do this.
Thank you very much for any help.

You need to use gmatch and parse the values excluding non-printable characters (\r\n) at the end of the line or use %d+
local data=[[
-- String to be parsed
Tiempo1,20
Tiempo2a,900
Hora2b,27
Tiempo2b,20
Hora2c,29
Tiempo2c,18]]
local t = {}
for k,v in data:gmatch("(%w-),([^%c]+)") do
t[#t+1] = { k, v }
print(k,v)
end

Related

How to get packet.tcp.payload and packet.http.data as string?

The return value for these attributes are in hex format seperated by ':'
Eg : 70:79:f6:2e: something like this.
When I am trying to decode it to plain string ( human readable ) it doesn't work. What encoding is being used? I tried various different methods like codecs.decode(), binascii.unhexlify(), bytes.fromhex() also different encodings ASCII and UTF-8. Nothing worked, any help is appreciated. I am using python 3.6

Thanks for your question! I believe you're wanting to read the payload in chunks of two hex places. The functions you tried are not able to parse the : delimiter out-of-the-box. Something like splitting the string by the : delimiter, converting their values to human-readable characters, and joining the "list" to a string should do the trick.
hex_string = '70:79:f6:2e'
hex_split = hex_string.split(':')
hex_as_chars = map(lambda hex: chr(int(hex, 16)), hex_split)
human_readable = ''.join(hex_as_chars)
print(human_readable)
Is this what you have in mind?

Parsing a non-Unicode string with Flask-RESTful

I have a webhook developed with Flask-RESTful which gets several parameters with POST.
One of the parameters is a non-Unicode string, encoded in cp1251.
Can't find a way to correctly parse this argument using reqparse.
Here is the fragment of my code:
parser = reqparse.RequestParser()
parser.add_argument('text')
msg = parser.parse_args()
Then, I write msg to a text file, and it looks like this:
{"text": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd !\n\n\ufffd\ufffd\ufffd\ufffd\ufffd\n\n-- \n\ufffd \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd."}
As you can see, Flask somehow replaces all Cyrillic characters with \ufffd. At the same time, non-Cyrillic characters, like ! or \n are processed correctly.
Anything I can do to advise RequestParser with the string encoding?
Here is my code for writing the text to disk:
f = open('log_msg.txt', 'w+')
f.write(json.dumps(msg))
f.close()
I tried f = open('log_msg.txt', 'w+', encoding='cp1251') with the same result.
Then, I tried
f = open('log_msg_ascii.txt', 'w+')
f.write(ascii(json.dumps(msg)))
Also, no difference.
So, I'm pretty sure it's RequestParser() tries to be too smart and can't understand the non-Unicode input.
Thanks!

Okay, I finally found a workaround. Thanks to #lenz for helping me with this issue. It seems that reqparse wrongly assumes that every string parameter comes as UTF-8. So when it sees a non-Unicode input field (among other Unicode fields!), it tries to load it as Unicode and fails. As a result, all characters are U+FFFD (replacement character).
So, to access that non-Unicode field, I did the following trick.
First, I load raw data using get_data(), decode it using cp1251 and parse with a simple regexp.
raw_data = request.get_data()
contents = raw_data.decode('windows-1251')
match = re.search(r'(?P<delim>--\w+\r?\n)Content-Disposition: form-data; name=\"text\"\r?\n(.*?)(?P=delim)', contents, re.MULTILINE | re.DOTALL)
text = match.group(2)
Not the most beautiful solution, but it works.

Problem with multivariables in string formatting

I have several files in a folder named t_000.png, t_001.png, t_002.png and so on.
I have made a for-loop to import them using string formatting. But when I use the for-loop I got the error
No such file or directory: '/file/t_0.png'
This is the code that I have used I think I should use multiple %s but I do not understand how.
for i in range(file.shape[0]):
im = Image.open(dir + 't_%s.png' % str(i))
file[i] = im

You need to pad the string with leading zeroes. With the type of formatting you're currently using, this should work:
im = Image.open(dir + 't_%03d.png' % i)
where the format string %03s means "this should have length 3 characters and empty space should be padded by leading zeroes".
You can also use python's other (more recent) string formatting syntax, which is somewhat more succinct:
im = Image.open(f"{dir}t_{i:03d}")

You are not padding the number with zeros, thus you get t_0.png instead of t_000.png.
The recommended way of doing this in Python 3 is via the str.format function:
for i in range(file.shape[0]):
im = Image.open(dir + 't_{:03d}.png'.format(i))
file[i] = im
You can see more examples in the documentation.
Formatted string literals are also an option if you are using Python 3.6 or a more recent version, see Green Cloak Guy's answer for that.

Try this:
import os
for i in range(file.shape[0]):
im = Image.open(os.path.join(dir, f't_{i:03d}.png'))
file[i] = im
(change: f't_{i:03d}.png' to 't_{:03d}.png'.format(i) or 't_%03d.png' % i for versions of Python prior to 3.6).
The trick was to specify a certain number of leading zeros, take a look at the official docs for more info.
Also, you should replace 'dir + file' with the more robust os.path.join(dir, file), which would work regardless of dir ending with a directory separator (i.e. '/' for your platform) or not.
Note also that both dir and file are reserved names in Python and you may want to rename your variables.
Also check that if file is a NumPy array, file[i] = im may not be working.

multiple variable in python regex

I have seen several related posts and several forums to find an answer for my question, but nothing has come up to what I need.
I am trying to use variable instead of hard-coded values in regex which search for either word in a line.
However i am able to get desired result if i don't use variable.
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub2/a2#dell>
<http://www.somesite.com/software/sub3/a3#Notepad>
re.search(r"\#Msoffice|#vlc|#Notepad", line)
This regex will return the line which has #Msoffice OR #vlc OR #Notepad.
I tried defining a single variable using re.escape and that worked absolutely fine. However i have tried many combination using | and , (pipe and comma) but no success.
Is there any way i can specify #Msoffice , #vlc and #Notepad in different variables and so later i can change those ?
Thanks in advance!!

If I did understand you the right way you'd like to insert variables in your regex.
You are actually using a raw string using r' ' to make the regex more readable, but if you're using f' ' it allows you to insert any variables using {your_var} then construct your regex as you like:
var1 = '#Msoffice'
var2 = '#vlc'
var3 = '#Notepad'
re.search(f'{var1}|{var2}|{var3}', line)
The most annoying issue is that you will have to add \ to escaped char, to look for \ it will be \\
Hope it helped

import re
lines = ["<http://www.somesite.com/software/sub/a1#Msoffice>",
"<http://www.somesite.com/software/sub1/a1#vlc>",
"<http://www.somesite.com/software/sub2/a2#dell>",
"<http://www.somesite.com/software/sub3/a3#Notepad>"]
for line in lines:
if re.search(r'\b(?:\#{}|\#{}|\#{})\b'.format('Msoffice', 'vlc', 'Notepad'), line):
print(line)
Output :
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub3/a3#Notepad>

matlab iterative filenames for saving

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?

Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat

You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration

sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.

For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to parse a configuration file (kind of a CSV format) using LUA - string

Related

How to get packet.tcp.payload and packet.http.data as string?

Parsing a non-Unicode string with Flask-RESTful

Problem with multivariables in string formatting

multiple variable in python regex

matlab iterative filenames for saving

Categories

Resources