Lua: how do I split a string (of a varying length) into multiple parts?

Lua: how do I split a string (of a varying length) into multiple parts? - string

I have a string, starting with a number, then a space, then a word of an unknown amount of letters, a space again, and then sometimes another piece of text (which may or may not contain more than one word).
EDIT: the last piece of text is sometimes left out (see example #2)
Using the methods mentioned in the comments, str:find(...) on #2 would return nil.
Example:
"(number) (text) [more text]"
1: "10 HELLO This is a string"
2: "88 BYE"
What I want is to split these strings into a table, inside a table containing more of these split strings, like this:
{
[(number)] = { [1] = (text), [2] = (more text) }
[10] = { [1] = "HELLO", [2] = "This is a string" }
}
I have tried several methods, but none of them give me the desired result.
One of the methods I tried, for example, was splitting the string on whitespaces. But that resulted in:
{
[10] = { [1] = "HELLO", [2] = "This", ... [4] = "string" }
}
Thanks in advance.

Using various Lua string patterns, achieving the desired result is quite easy.
For eg.
function CustomMatching( sVar )
local tReturn = {}
local _, _, iNumber, sWord, sRemain = sVar:find( "^(%d+)%s(%a+)%s(.+)" )
tReturn[tonumber(iNumber)] = { sWord, sRemain }
return tReturn
end
And to call it:
local sVar = "10 HELLO This is a string"
local tMyTable = CustomMatching( sVar )
In the find() method the pattern "^(%d+)%s(%a+)%s(.+)" means:
Find and store all digits(%d) until a space is encountered.
Find and store all letters(%a) until a space is encountered.
Find and store all characters until the end of string is reached.
EDIT
Changed tReturn[iNumber] to tReturn[tonumber(iNumber)] as per the discussion in comments.

You can use the string.match method with an appropriate pattern:
local n, w, str = ('10 HELLO This is a string'):match'^(%d+)%s+(%S+)%s+(.*)$'
your_table[tonumber(n)] = {w, str}

Related

Looking for efficient string replacement algorythm

I'm trying to create a string replacer that accepts multilpe replacements.
The ideia is that it would scan the string to find substrings and replace those substrings with another substring.
For example, I should be able to ask it to replace every "foo" for "bar". Doing that is trivial.
The issue starts when I'm trying to add multiple replacements for this function. Because if I ask it to replace "foo" for "bar" and "bar" for "biz", running those replacements in sequence would result in "foo" turning to "biz", and this behavior is unintended.
I tried splitting the string into words and running each replacement function in each word. However that's not bullet proof either because still results in unintended behavior, since you can ask it to replace substrings that are not whole words. Also, I find that very inefficient.
I'm thinking in some way of running each replacer once in the whole string and sort of storing those changes and merging them. However I think I'm overengineering.
Searching on the web gives me trivial results on how to use string.replace with regular expressions, it doesn't solve my problem.
Is this a problem already solved? Is there an algorithm that can be used here for this string manipulation efficiently?

If you modify your string while searching for all occurences of substrings to be replaced, you'll end up modifying incorrect states of the string. An easy way out could be to get a list of all indexes to update first, then iterate over the indexes and make replacements. That way, indexes for "bar" would've been already computed, and won't be affected even if you replace any substring with "bar" later.
Adding a rough Python implementation to give you an idea:
import re
string = "foo bar biz"
replacements = [("foo", "bar"), ("bar", "biz")]
replacement_indexes = []
offset = 0
for item in replacements:
replacement_indexes.append([m.start() for m in re.finditer(item[0], string)])
temp = list(string)
for i in range(len(replacement_indexes)):
old, new, indexes = replacements[i][0], replacements[i][1], replacement_indexes[i]
for index in indexes:
temp[offset+index:offset+index+len(old)] = list(new)
offset += len(new)-len(old)
print(''.join(temp)) # "bar biz biz"

Here's the approach I would take.
I start with my text and the set of replacements:
string text = "alpha foo beta bar delta";
Dictionary<string, string> replacements = new()
{
{ "foo", "bar" },
{ "bar", "biz" },
};
Now I create an array of parts that are either "open" or not. Open parts can have their text replaced.
var parts = new List<(string text, bool open)>
{
(text: text, open: true)
};
Now I run through each replacement and build a new parts list. If the part is open I can do the replacements, if it's closed just add it in untouched. It's this last bit that prevents double mapping of replacements.
Here's the main logic:
foreach (var replacement in replacements)
{
var parts2 = new List<(string text, bool open)>();
foreach (var part in parts)
{
if (part.open)
{
bool skip = true;
foreach (var split in part.text.Split(new[] { replacement.Key }, StringSplitOptions.None))
{
if (skip)
{
skip = false;
}
else
{
parts2.Add((text: replacement.Value, open: false));
}
parts2.Add((text: split, open: true));
}
}
else
{
parts2.Add(part);
}
}
parts = parts2;
}
That produces the following:
Now it just needs to be joined back up again:
string result = String.Concat(parts.Select(p => p.text));
That gives:
alpha bar beta biz delta
As requested.

Let's suppose your given string were
str = "Mary had fourteen little lambs"
and the desired replacements were given by the following hash (aka hashmap):
h = { "Mary"=>"Butch", "four"=>"three", "little"=>"wee", "lambs"=>"hippos" }
indicating that we want to replace "Mary" (wherever it appears in the string, if at all) with "Butch", and so on. We therefore want to return the following string:
"Butch had fourteen wee hippos"
Notice that we do not want 'fourteen' to be replaced with 'threeteen' and we want the extra spaces between 'fourteen' and 'wee' to be preserved.
First collect the keys of the hash h into an array (or list):
keys = h.keys
#=> ["Mary", "four", "little", "lambs"]
Most languages have a method or function sub or gsub that works something like the following:
str.gsub(/\w+/) do |word|
if keys.include?(word)
h[word]
else
word
end
end
#=> "Butch had fourteen wee hippos"
The regular expression /\w+/ (r'\w+' in Python, for example) matches one or more word characters, as many as possible (i.e., a greedy match). Word characters are letters, digits and the underscore ('_'). It therefore will sequentially match 'Mary', 'had', 'fourteen', 'little' and 'lambs'.
Each matched word is passed to the "block" do |word| ...end and is held by the variable word. The block calculation then computes and returns the string that is to replace the value of word in a duplicate of the original string. Different languages uses different structures and formats to do this, of course.
The first word passed to the block by gsub is 'Mary'. The following calculation is then performed:
if keys.include?("Mary") # true
# so replace "Mary" with:
h[word] #=> "Butch
else # not executed
# not executed
end
Next, gsub passes the word 'had' to the block and assigns that string to the variable word. The following calculation is then performed:
if keys.include?("had") # false
# not executed
else
# so replace "had" with:
"had"
# that is, leave "had" unchanged
end
Similar calculations are made for each word matched by the regular expression.
We see that punctuation and other non-word characters is not a problem:
str = "Mary, had fourteen little lambs!"
str.gsub(/\w+/) do |word|
if keys.include?(word)
h[word]
else
word
end
end
#=> "Butch, had fourteen wee hippos!"
We can see that gsub does not perform replacements sequentially:
h = { "foo"=>"bar", "bar"=>"baz" }
keys = h.keys
#=> ["foo", "bar"]
"foo bar".gsub(/\w+/) do |word|
if keys.include?(word)
h[word]
else
word
end
end
#=> "bar baz"
Note that a linear search of keys is required to evaluate
keys.include?("Mary")
This could be relatively time-consuming if keys has many elements.
In most languages this can be sped up by making keys a set (an unordered collection of unique elements). Determining whether a set contains a given element is quite fast, comparable to determining if a hash has a given key.
An alternative formulation is to write
str.gsub(/\b(?:Mary|four|little|lambs)\b/) { |word| h[word] }
#=> "Butch had fourteen wee hippos"
where the regular expression is constructed programmatically from h.keys. This regular expression reads, "match one of the four words indicated, preceded and followed by a word boundary (\b). The trailing word boundary prevents 'four' from matching 'fourteen'. Since gsub is now only considering the replacement of those four words the block can be simplified to { |word| h[word] }.
Again, this preserves punctuation and extra spaces.
If for some reason we wanted to be able to replace parts of words (e.g., to replace 'fourteen' with 'threeteen'), simply remove the word boundaries from the regular expression:
str.gsub(/Mary|four|little|lambs/) { |word| h[word] }
#=> "Butch had threeteen wee hippos"
Naturally, different languages provide variations of this approach. In Ruby, for example, one could write:
g = Hash.new { |h,k| k }.merge(h)
The creates a hash g that has the same key-value pairs as h but has the additional property that if g does not have a key k, g[k] (the value of key k) returns k. That allows us to write simply:
str.gsub(/\w+/, g)
#=> "Butch had fourteen wee hippos"
See the second version of String#gsub.
A different approach (which I will show is problematic) is to construct an array (or list) of words from the string, replace those words as appropriate and then rejoin the resulting words to form a string. For example,
words = str.split
#=> ["Mary", "had", "fourteen", "little", "lambs"]
arr = words.map do |word|
if keys.include?(word)
h[word]
else
word
end
end
["Butch", "had", "fourteen", "wee", "hippos"]
arr.join(' ')
#=> "Butch had fourteen wee hippos"
This produces similar results except the extra spaces have been removed.
Now suppose the string contained punctuation:
str = "Mary, had fourteen little lambs!"
words = str.split
#=> ["Mary,", "had", "fourteen", "little", "lambs!"]
arr = words.map do |word|
if keys.include?(word)
h[word]
else
word
end
end
#=> ["Mary,", "had", "fourteen", "wee", "lambs!"]
arr.join(' ')
#=> "Mary, had fourteen wee lambs!"
We could deal with punctuation by writing
words = str.scan(/\w+/)
#=> ["Mary", "had", "fourteen", "little", "lambs"]
arr = words.map do |word|
if keys.include?(word)
h[word]
else
word
end
end
#=> ["Butch", "had", "fourteen", "wee", "hippos"]
Here str.scan returns an array of all matches of the regular expression /\w+/ (one or more word characters). The obvious problem is that all punctuation has been lost when arr.join(' ').

You can achieve in a simple way, by using regular expressions:
import re
replaces = {'foo' : 'bar', 'alfa' : 'beta', 'bar': 'biz'}
original_string = 'foo bar, alfa foo. bar other.'
expected_string = 'bar biz, beta bar. biz other.'
replaced = re.compile(r'\w+').sub(lambda m: replaces[m.group()] if m.group() in replaces else m.group(), original_string)
assert replaced == expected_string
I haven't checked the performance, but I believe it is probably faster than using "nested for loops".

Swift string strip all characters but numbers and decimal point?

I have this string:
Some text: $ 12.3 9
I want to get as a result:
12.39
I have found examples on how to keep only numbers, but here I am wanting to keep the decimal point "."
What's a good way to do this in Swift?

This should work (it's a general approach to filtering on a set of characters) :
[EDIT] simplified and adjusted to Swift3
[EDIT] adjusted to Swift4
let text = "$ 123 . 34 .876"
let decimals = Set("0123456789.")
var filtered = String( text.filter{decimals.contains($0)} )
If you need to ignore anything past the second decimal point add this :
filtered = filtered.components(separatedBy:".") // separate on decimal point
.prefix(2) // only keep first two parts
.joined(separator:".") // put parts back together

Easiest and simplest reusable way: you can use this regex replacement option. This replaces all characters except 0 to 9 and dot (.) .
let yourString = "$123. 34"
//pattern says except digits and dot.
let pattern = "[^0-9.]"
do {
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.CaseInsensitive)
//replace all not required characters with empty string ""
let string_With_Just_Numbers_You_Need = regex.stringByReplacingMatchesInString(yourString, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, yourString.characters.count), withTemplate: "")
//your number converted to Double
let convertedToDouble = Double(string_With_Just_Numbers_You_Need)
} catch {
print("Cant convert")
}

One possible solution to the question follows below. If you're working with text fields and currency, however, I suggest you take a look at the thread Leo Dabus linked to.
extension String {
func filterByString(myFilter: String) -> String {
return String(self.characters.filter {
myFilter.containsString(String($0))
})
}
}
var a = "$ 12.3 9"
let myFilter = "0123456789.$"
print(a.filterByString(myFilter)) // $12.39

Specman string: How to split a string to a list of its chars?

I need to split a uint to a list of bits (list of chars, where every char is "0" or "1", is also Ok). The way I try to do it is to concatenate the uint into string first, using binary representation for numeric types - bin(), and then to split it using str_split_all():
var num : uint(bits:4) = 0xF; // Can be any number
print str_split_all(bin(num), "/w");
("/w" is string match pattern that means any char).
The output I expect:
"0"
"b"
"1"
"1"
"1"
"1"
But the actual output is:
0. "0b1111"
Why doesn't it work? Thank you for your help.

If you want to split an integer into a list of bits, you can use the %{...} operator:
var num_bits : list of bit = %{num};
You can find a working example on EDAPlayground.
As an extra clarification to your question, "/w" doesn't mean match any character. The string "/\w/" means match any single character in AWK Syntax. If you put that into your match expression, you'll get (almost) the output you want, but with some extra blanks interleaved (the separators).
Regardless, if you want to split a string into its constituting characters, str_split_all(...) isn't the way to go. It's easier to convert the string into ASCII characters and then convert those back to string again:
extend sys {
run() is also {
var num : uint(bits:4) = 0xF; // Can be any number
var num_bin : string = bin(num);
var num_bin_chars := num_bin.as_a(list of byte);
for each (char) in num_bin_chars {
var char_as_string : string;
unpack(packing.low, %{8'b0, char}, char_as_string);
print char_as_string;
};
};
};
The unpack(...) syntax is directly from the e Reference Manual, Section 2.8.3 Type Conversion Between Strings and Scalars or Lists of Scalars

How to detect a number in my Linked List of Strings, and get the value

I need to sort my Linked List, the problem is that each of my Linked List elements are Strings with sentences. So the question is... how to detect each number in my Linked List and get the value?.
I tried to split my linked list so I can pass trough each element.
private LinkedList<String> list = new LinkedList<String>();
list.add("Number One: 1")
list.add("Number Three: 3")
list.add("Number two:2")
for(Iterator<String> iterator =list.iterator(); iterator.hasNext(); )
{
String string = iterator.next();
for (String word : string.split(" ")){
}
I also tried with "if((word.contains("1") || (word.contains("2")...." inside the for loop, and then pass the value "word" to Double... but I think is not very smart
So my goal is this Output (Number One: 1 , Number Two: 2, Number Three: 3), therefore I need the value of each number first.

why not use tryParse on the string,
for (String word : string.split(" ")){
int outvalue
if(int.TryParse(word, outvalue)){
//DoSomething with result
}
}

How to manipulate Strings in Scala while using the Play framework?

I am using the play framework 2.2.1 and I have a question concerning the manipulation of Strings within view templates. Unfortunately I am not very familiar with the Scala programming language nor its APIs. The strings are contained in a List which is passed from the controller to the view and then I use a loop to process each string before they are added to the html. I would like to know how to do the following: trim, toLowerCase and remove spaces. As an example, if I have "My string ", I would like to produce "mystring". More specifically I would actually like to produce "myString", however I'm sure I can figure that out if someone points me in the right direction. Thanks.
UPDATE:
Fiaz provided a great solution, building on his answer and just for interest sake I came up with the following solution using recursion. This example is of course making many assumptions about the input provided.
#formatName(name: String) = #{
def inner(list: List[String], first: Boolean): String = {
if (!list.tail.isEmpty && first) list.head + inner(list.tail, false)
else if (!list.tail.isEmpty && !first) list.head.capitalize + inner(list.tail, false)
else if (list.tail.isEmpty && !first) list.head.capitalize
else list.head
}
if (!name.trim.isEmpty) inner(name.split(' ').map(_.toLowerCase).toList, true)
else ""
}

If you want to know how to do just the trimming, lower-casing and joining without spaces, try this perhaps?
// Given that s is your string
s.split(" ").map(_.toLowerCase).mkString
That splits a string into an array strings, splitting is done on one or more spaces so that gives you trimmed strings. You then map each element in the array with the function (x => x.toLowerCase) (for which the shorthand is (_.toLowerCase)) and then join the Array back into a single string using the mkString method that collections have.
So let's say you want to capitalize the first letter of the each of the space-split bits:
Scala provides a capitalize method on Strings, so you could use that:
s.split(" ").map(_.toLowerCase.capitalize).mkString
See http://www.scala-lang.org/api/current/scala/collection/immutable/StringOps.html
One suggestion as to how you can get the exact output (your example 'myString') you describe:
(s.split(" ").toList match {
case fst::rest => fst.toLowerCase :: rest.map(_.toLowerCase.capitalize)
case Nil => Nil }
).mkString

There is example of using the string manipulation below:
#stringFormat(value: String) = #{
value.replace("'", "\\'")
}
#optionStringFormat(description: Option[String]) = #{
if (description.isDefined) {
description.get.replace("'", "\\'").replace("\n", "").replace("\r", "")
} else {
""
}
}
#for(photo <- photos) {
<div id="photo" class="random" onclick="fadeInPhoto(#photo.id, '#photo.filename', '#stringFormat(photo.title)', '#optionStringFormat(photo.description)', '#byTags');">
This example obtained from https://github.com/joakim-ribier/play2-scala-gallery

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Lua: how do I split a string (of a varying length) into multiple parts? - string

You can use the string.match method with an appropriate pattern: local n, w, str = ('10 HELLO This is a string'):match'^(%d+)%s+(%S+)%s+(.*)$' your_table[tonumber(n)] = {w, str}

Related

Looking for efficient string replacement algorythm

Swift string strip all characters but numbers and decimal point?

Specman string: How to split a string to a list of its chars?

How to detect a number in my Linked List of Strings, and get the value

How to manipulate Strings in Scala while using the Play framework?

Categories

Resources