Lua: Line breaks in strings - string

I've been working on a formatter that will take a long string and format it into a series of lines broken off at the word within a certain character limit. For instance, He eats the bread, broken off at every 8 characters, would return something like:
He eats
the
bread
This is because "He eats" contains 7 characters, and "the bread" contains 9, so it has to break off at "the" and continue with "bread".
The script itself has been working wonderfully, thanks to the help of other members before. However, now I have a new challenge.
I'm utilizing a free-entry box for entering the series of lines. It's for a program called MUSHclient, if anyone is familiar with it. The utils.editbox opens up an editor text box, and allows free form, and returns the result as a string. For example:
result = utils.editbox("What are you typing?")
saves the typed response into result which is a string type. I want to put line breaks in properly, but I can't figure out how to do that. If I print(result), it returns something like the following:
This is the result of three separate paragraphs.
As you can see, there is a line break between each paragraph.
So the result contains line breaks, or control characters.
Now, I've managed to extrapolate the individual lines, but not the linebreaks in between, using the following code:
for line in result:gmatch("[^%c]+%c?") do
Send(note_wrap(line))
end
How can I extrapolate for linebreaks as well as textual content?
Edit For reference, the note_wrap function is as follows:
function note_wrap(str, limit, indent, indent1)
indent = indent or ""
indent1 = indent1 or indent
limit = limit or 79
local here = 1-#indent1
local last_color = ''
return indent1..str:gsub("(%s+)()(%S+)()",
function(sp, st, word, fi)
local delta = 0
local color_before_current_word = last_color
word:gsub('()#([#%a])',
function(pos, c)
if c == '#' then
delta = delta + 1
elseif c == 'x' then
delta = delta + 5
last_color = word:sub(pos, pos+4)
else
delta = delta + 2
last_color = word:sub(pos, pos+1)
end
end)
here = here + delta
if fi-here > limit then
here = st - #indent + delta
return "\n"..indent..color_before_current_word..word
end
end)
end
The reason for this code is to be able to take #rHe eats #x123the bread, ignore the color codes (indicated by #<letter> or #x<digits>) and return the aforementioned result of:
He eats
the
bread
but insert the color codes so it then returns:
#rHe eats
#x123the
bread
If this can be modified to recognize new lines and put an actual new line in, that would be fabulous.
Edit 2 Tried Paul's solution, and it's not doing what I'm needing it to. I may not have been clear enough, so I'll try to clear it up in this edit.
When I type into the free form box, I want the information presented exactly as is, but formatted to break correctly AND maintain entered newlines. For example, if I write this in the free form box:
Mary had a little lamb, little lamb, little lamb
Mary had a little lamb
It's fleece was white as snow
using Paul's code, and setting string to 79, I get:
Mary had a little lamb, little lamb, little lamb
Mary had a little lamb
It's fleece was white as snow
And that's not what I want. I'd want it to return it just as it was written, and breaking line as necessary. So if I had a 20 character limit, it'd return:
Mary had a little
lamb, little lamb,
little lamb
Mary had a little
lamb
It's fleece was
white as snow
If I added manual line breaks, I'd want it to respect those, so if I hit return twice after the first line, it'd have a line break under the first line as well. For example, if I wrote this post in the free form box, I'd want it to respect every new paragraph and every proper line break as well. I hope this clears things up.

I use something like this in my code (this assumes that \n is the line separator):
local function formatUpToX(s, x, indent)
x = x or 79
indent = indent or ""
local t = {""}
local function cleanse(s) return s:gsub("#x%d%d%d",""):gsub("#r","") end
for prefix, word, suffix, newline in s:gmatch("([ \t]*)(%S*)([ \t]*)(\n?)") do
if #(cleanse(t[#t])) + #prefix + #cleanse(word) > x and #t > 0 then
table.insert(t, word..suffix) -- add new element
else -- add to the last element
t[#t] = t[#t]..prefix..word..suffix
end
if #newline > 0 then table.insert(t, "") end
end
return indent..table.concat(t, "\n"..indent)
end
print(formatUpToX(result, 20))
print(formatUpToX("#rHe eats #x123the bread", 8, " "))
The cleanse function removes any markup that needs to be included in the string, but doesn't need to count against the limit.
For the example you have, I get the following output (using 20 as the limit for the first fragment and 8 for the second):
This is the result
of three separate
paragraphs.
As you can see,
there is a line
break between each
paragraph.
So the result
contains line
breaks, or control
characters.
#rHe eats
#x123the
bread

Related

Stata flag when word found, not strpos

I have some data with strings, and I want to flag when a word is found. A word would be defined as at the start of the string, end, or separated a space. strpos will find whenever the string is present, but I am looking for something similar to subinword. Does Stata have a way to use the functionality of subinword without having to replace it, and instead flag the word?
clear
input id str50 strings
1 "the thin th man"
2 "this old then"
3 "th to moon"
4 "moon blank th"
end
gen th_pos = 0
replace th = 1 if strpos(strings, "th") >0
This above code will flag every observation as they all contain "th", but my desired output is:
ID strings th_sub
1 "the thin th man" 1
2 "this old then" 0
3 "th to moon" 1
4 "moon blank th" 1
A small trick is that "th" as a word will be preceded and followed by a space, except if it occurs at the beginning or the end of string. The exceptions are no challenge really, as
gen wanted = strpos(" " + strings + " ", " th ") > 0
works around them. Otherwise, there is a rich set of regular expression functions to play with.
The example above flags that the code that doesn't do what you want condenses to one line,
gen th_pos = strpos(strings, "th") > 0
A more direct answer is that you don't have to replace anything. You just have to get Stata to tell you what would happen if you did:
gen WANTED = strings != subinword(strings, "th", "", .)
If removing a substring if present changes the string, it must have been present.
Regular expressions can be useful for this type of exercise, with word boundaries allowing you to search for whole words indicated by \b, as in "\bword\b".
gen wanted = ustrregexm(strings, "\bth\b")

String Operations Confusion? ELI5

I'm extremely new to python and I have no idea why this code gives me this output. I tried searching around for an answer but couldn't find anything because I'm not sure what to search for.
An explain-like-I'm-5 explanation would be greatly appreciated
astring = "hello world"
print(astring[3:7:2])
This gives me : "l"
Also
astring = "hello world"
print(astring[3:7:3])
gives me : "lw"
I can't wrap my head around why.
This is string slicing in python.
Slicing is similar to regular string indexing, but it can return a just a section of a string.
Using two parameters in a slice, such as [a:b] will return a string of characters, starting at index a up to, but not including, index b.
For example:
"abcdefg"[2:6] would return "cdef"
Using three parameters performs a similar function, but the slice will only return the character after a chosen gap. For example [2:6:2] will return every second character beginning at index 2, up to index 5.
ie "abcdefg"[2:6:2] will return ce, as it only counts every second character.
In your case, astring[3:7:3], the slice begins at index 3 (the second l) and moves forward the specified 3 characters (the third parameter) to w. It then stops at index 7, returning lw.
In fact when using only two parameters, the third defaults to 1, so astring[2:5] is the same as astring[2:5:1].
Python Central has some more detailed explanations of cutting and slicing strings in python.
I have a feeling you are over complicating this slightly.
Since the string astring is set statically you could more easily do the following:
# Sets the characters for the letters in the consistency of the word
letter-one = "h"
letter-two = "e"
letter-three = "l"
letter-four = "l"
letter-six = "o"
letter-7 = " "
letter-8 = "w"
letter-9 = "o"
letter-10 = "r"
letter11 = "l"
lettertwelve = "d"
# Tells the python which of the character letters that you want to have on the print screen
print(letter-three + letter-7 + letter-three)
This way its much more easily readable to human users and it should mitigate your error.

Python 3.5: Is it possible to align punctuation (e.g. £, $) to the left side of a word using regex?

As part of my code, I need to align things like the pound sign to the left of a string. For example my code starts with:
"A price of £ 8 is roughly the same as $ 10.23!"
and needs to end with:
"A price of £8 is roughly the same as $10.23!"
I've created the following function to solve this however I feel that it is very inefficient and was wondering if there was a way to do this with regular expressions in Python?
for i in sentence:
if i == "(" or i == "{" or i == "[" or i == "£" or i == "$":
if i != len(sentence):
corrected_sentence.append(" ")
corrected_sentence.append(i)
else:
corrected_sentence.append(i)
What this is doing right now is going through the 'sentence' list where I have split up all of the words and punctuation and t then reforming this followed by a space EXPECT where the listed characters are used and adding to another list to be made into a single string again.
I only want to do this with the characters I have listed above (so I need to ignore things like full stops or exclamation marks etc).
Thanks!
I'm not sure what you want to do with the brackets, but from the description you can use a regex to find and replace whitespace preceded by the characters (lookbehind) and followed by a digit (lookahead).
>>> print(re.sub(r"(?<=[\{\[£\$])\s+(?=\d)", "", "A price of £ 8 is roughly the same as $ 10.23!"))
A price of £8 is roughly the same as $10.23!

Write statement for a complex format / possibility to write more than once on the same excel line

I am presently working on a file to open one by one .txt documents, extract data, to finally fill a .excel document.
Because I did not know how it is possible to write multiple times on the same line of my Excel document after one write statement (because it jumps to the next line), I have created a string of characters which is filled time after time :
Data (data_limite(x),x=1,8)/10, 9, 10, 7, 9, 8, 8, 9/
do file_descr = 1,nombre_fichier,1
taille_data1 = data_limite(file_descr)
nvari = taille_data1-7
write (new_data1,"(A30,A3,A11,A3,F5.1,A3,A7,F4.1,<nvari>(A3))") description,char(9),'T-isotherme',char(9),T_trait,char(9),'d_gamma',taille_Gam,(char(9),i=1,nvari)
ecriture_descr = ecriture_descr//new_data1
end do
Main issue was I want to adapt char(9) amount with the data_limite value so I built a write statement with a variable amount of char(9).
At the end of the do-loop, I have a very complex format of ecriture_descr which has no periodic format due to the change of the nvari value
Now I want to add this to the first line of my .excel :
Open(Unit= 20 ,File='resultats.RES',status='replace')
write(20,100) 'param',char(9),char(9),char(9),char(9),char(9),'*',char(9),'nuances',char(9),'*',char(9),ecriture_descr
100 format (a5,5(a3),a,a3,a7,a,a3,???)
but I do not know how to write this format. It would have been easier if, at each iteration of the do-loop I could fill the first line of my excel and continue to fill the first line at each new new_data1 value.
EDIT : maybe adding advance='no' in my write statement would help me, I am presently trying to add it
EDIT 2 : it did not work with advance='no' but adding a '$' at the end of my format write statement disable the return of my function. By moving it to my do-loop, I guess I can solve my problem :). I am presently trying to add it
First of all, your line
ecriture_descr = ecriture_descr//new_data1
Is almost certainly not doing what you expect it to do. I assume that both ecriture_descr and new_data are of type CHARACTER(len=<some value>) -- that is a fixed length string. If you assign anything to such a string, the string is cut to length (if the assigned is too long), or padded with spaces (if the assigned is too short:
program strings
implicit none
character(len=8) :: h
h = "Hello"
print *, "|" // h // "|" ! Prints "|Hello |"
h = "Hello World"
print *, "|" // h // "|" ! Prints "|Hello Wo|"
end program strings
And this combination will work against you: ecriture_descr will already be padded to the max with spaces, so when you append new_data1 it will be just outside the range of ecriture_descr, a bit like this:
h = "Hello" ! h is actually "Hello "
h = h // "World" ! equiv to h = "Hello " // "World"
! = "Hello World"
! ^^^^^^^^^
! Only this is assigned to h => no change
If you want a string aggregator, you need to use the trim function which removes all trailing spaces:
h = trim(h) // " World"
Secondly, if you want to write to a file, but don't want to have a newline, you can add the option advance='no' into the write statement:
do i = 1, 100
write(*, '(I4)', advance='no') i
end do
This should make your job a lot easier than to create one very long string in memory and then write it all out in one go.

How to quickly edit determinate part of code inside different similar lines

I have this problem I'm adjusting a code I've made I have a structure like this:
Apple1 = Fruit("ss","ss",[0.1,0.4],'w')
PineApple = Fruit("ss","ss",[0.315,0.4],'w')
Banana = Fruit("ss","ss",[0.315,0.280],'w')
...
...
Instead of "ss"I would like to type further information like "Golden Delicious". For the moment I'm simply deleting "ss"clicking over it and then replacing it with the information I want to insert. I'm sure there is a faster way to do it, I've tried something with VIM macros but I can't figure out how to "Raw input" my data.
I've try simply to substitute it with Spyder, but is slow because I have to click substitute every time, with VIM for what I've try is the same.
Then I wonder how insert something else after 'w'...
This is an example of an final output only to understand better the question :
Apple1 = Fruit("Golden Delicous","red",[0.1,0.4],'w')
PineApple = Fruit("Ananas comosus","green",[0.315,0.4],'w')
Banana = Fruit(" Cavendish banana","yellow",[0.315,0.280],'w')
...
...
I reformulate the question: which is the faster way to change "ss", for the moment I'm clicking over "ss" delate "ss" and write e.g "Golden Delicous" but is very slow. What I would like is that for every single ss the editor ask me to insert something to replace the single ss.
e.g. first ssin the fist line: I want to replace it typing something else e.g. "Golden Delicous" second ssin the first line I want to replace it typing somethingelse e.g. red. First ssin the second line I want to replace it with s.e. e.g. Ananas comosussecond ssin the second line I want to replace with s.e. e.g. green and so on.
I'm sure there is an answer for this somewhere but I can't find it!
Please if you down vote explain me why so I can improve it!
As far as I understand, the data that you want to substitute for "ss" does not have regular structure, so you will need to enter it by hand.
In Vim you would do it like this:
Place the cursor over the first "ss", then press * and then N.
Press ce, enter the new data (e.g. "Golden Delicious"), then leave Insert mode by pressing Escape.
Press n to jump to the next instance of "ss".
Repeat steps 2 and 3 ad libitum.
Look up :h * and :h n for more information.
I would do it like that:
:%s/ss/\=input('Replacement: ')/gc
This queries you for each occurrence. With the /c flag, the display is even updated during the loop (at the cost of having to additionally answer y for each occurrence); without the flag, you would need to keep track of where you are yourself.
You can use a function that searches the whole file substituting all "ss" strings with values from arrays populated with the replacement data:
function! ChangeSS()
let ss1 = ['Golden Delicous', 'Ananas comosus', 'Cavendish banana']
let ss2 = ['red', 'green', 'yellow']
call cursor(1, 1)
let l = "ss2"
while search('"ss"', 'W') > 0
if l == "ss1"
let l = "ss2"
else
let l = "ss1"
endif
execute 'normal ci"' . remove({l}, 0)
endwhile
endfunction
It uses a reference variable (l) that exchanges which array you want to extract data from. ss1 is for first appearance of "ss" in the line and ss2 for the second one.
Run it like:
:call ChangeSS()
That (in my test) yields:
Apple1 = Fruit("Golden Delicous","red",[0.1,0.4],'w')
PineApple = Fruit("Ananas comosus","green",[0.315,0.4],'w')
Banana = Fruit("Cavendish banana","yellow",[0.315,0.280],'w')

Resources