Swift: Test boundary of String.Index for substring function - string

Wow. Swift makes it really fiddly to copy a substring from a simple String.
Most programming languages allow characters to be simply indexed by their integer position in the string, making targeting a character or a range a matter of simple maths. Because Swift allows a wide range of characters to be used with various bit depths, a precise (memory?) index for each character has to be found first, based its position from the start or end of the string. These positions can then be passed into a method of the String class that returns the substring in the range. I've written a function to do the work:
//arguments: The parent string, number of chars from 1st char in it and total char length of substring
func subStr(str: String, c1: Int, c2: Int) -> String {
//get string indexes for range of substring based on args
let ind1 = str.startIndex.advancedBy(c1)
let ind2 = str.startIndex.advancedBy(c1+c2)
//calls substring function with a range object passed as an argument set to the index values
let sub = str.substringWithRange(Range<String.Index>(start: ind1, end: ind2))
//substring returned
return sub
}
The problem is that because the substringWithRange function only works with Range objects its not obvious how to check if the substring is out of bounds of the string. For example calling this function would produce an error:
subStr ("Stack Overflow", c1: 6, c2: 12)
The c1 argument is OK but the length of the (c2) substring exceeds the upper boundary of the parent string causing a fatal error.
How do I refine this function to make it handle bad arguments or otherwise refine this clunky process?
Many thanks for reading.

You can use
let ind1 = str.startIndex.advancedBy(c1, limit: str.endIndex)
let ind2 = str.startIndex.advancedBy(c1+c2, limit: str.endIndex)
to advance the start index by the given amounts, but not beyond the end index of the string. With that modification, your function gives
subStr ("Stack Overflow", c1: 6, c2: 12) // "Overflow"
subStr ("Stack Overflow", c1: 12, c2: 20) // ""

I quickly made a nice substring function for Strings without Foundation dependency:
extension String {
func substring(from: Int, length: Int) -> String {
return String(dropFirst(from).prefix(length))
}
}
If the length is bigger than possible, it just gives all the characters to the end. Can't crash Can't crash as long as neither argument is negative (it makes sense to crash if any argument is negative since that would be a major flaw in your code).
Example usage:
"Stack Overflow".substring(from: 6, length: 12)
gives
"Overflow"

Related

Kotlin - Way to get a substring starting from a specified index until another specified index or end of string?

Example
val string = "Large mountain"
I would like to get a substring starting from the index of the "t" character until index of "t" + 7 with the 7 being arbitrary or end of string.
val substring = "tain"
Assuming that the string is larger
val string2 = "Large mountain and lake"
I would like to return
val substring2 = "tain and l"
If my I were to try to substring(indexOf("t") ,(indexOf("t") + 7) )
In this second case right now if I use "Large mountain" I would get an index out of bounds exception.
I don't think there's an especially elegant way to do this.
One fairly short and readable way is:
val substring = string.drop(string.indexOf('t')).take(7)
This uses indexOf() to locate the first 't' in the string, and then drop() to drop all the previous characters, and take() to take (up to) 7 characters from there.
However, it creates a couple of temporary strings, and will give an IllegalArgumentException if there's no 't' in the string.
Improving robustness and efficiency takes more code, e.g.:
val substring = string.indexOf('t').let {
if (it >= 0)
string.substring(it, min(it + 7, string.length))
else
string
}
That version lets you control the result when there's no 't' (in the else branch); it also avoids creating any temporary objects. As before, it uses indexOf() to locate the first 't', but then min() to work out how long the substring can be, and substring() to generate it in one go.
If you were doing this a lot, you could of course put it into your own function, e.g.:
fun String.substringFrom(char: Char, maxLen: Int)
= indexOf(char).let {
if (it >= 0)
substring(it, min(it + maxLen, length))
else
this
}
which you could then call with e.g. "Large mountain".substringFrom('t', 7)

convert string to list of int in kotlin

I have a string = "1337" and I want to convert it to a list of Int, I tried to get every element in the string and convert it to Int like this string[0].toInt but I didn't get the number I get the Ascii value, I can do it with this Character.getNumericValue(number), How I do it without using a built it function? with good complexity?
What do you mean "without using a built in function"?
string[0].toInt gives you the ASCII value of the character because the fun get(index: Int) on String has a return type of Char, and a Char behaves closer to a Number than a String. "0".toInt() == 0 will yield true, but '0'.toInt() == 0 will yield false. The difference being the first one is a string and the second is a character.
A oneliner
string.split("").filterNot { it.isBlank() }.map { it.toInt() }
Explanation: split("") will take the string and give you a list of every character as a string, however, it will give you an empty string at the beginning, which is why we have filterNot { it.isBlank() }, we then can use map to transform every string in our list to Int
If you want something less functional and more imperative that doesn't make use of functions to convert there is this
val ints = mutableListOf<Int>() //make a list to store the values in
for (c: Char in "1234") { //go through all of the characters in the string
val numericValue = c - '0' //subtract the character '0' from the character we are looking at
ints.add(numericValue) //add the Int to the list
}
The reason why c - '0' works is because the ASCII values for the digits are all in numerical order starting with 0, and when we subtract one character from another, we get the difference between their ASCII values.
This will give you some funky results if you give it a string that doesn't have only digits in it, but it will not throw any exceptions.
As in Java and by converting Char to Int you get the ascii equivalence.
You can instead:
val values = "1337".map { it.toString().toInt() }
println(values[0]) // 1
println(values[1]) // 3
// ...
Maybe like this? No-digits are filtered out. The digits are then converted into integers:
val string = "1337"
val xs = string.filter{ it.isDigit() }.map{ it.digitToInt() }
Requires Kotlin 1.4.30 or higher and this option:
#OptIn(ExperimentalStdlibApi::class)

find number of repeating substrings in a string

I am looking for an algorithm that will find the number of repeating substrings in a single string.
For this, I was looking for some dynamic programming algorithms but didn't find any that would help me. I just want some tutorial on how to do this.
Let's say I have a string ABCDABCDABCD. The expected output for this would be 3, because there is ABCD 3 times.
For input AAAA, output would be 4, since A is repeated 4 times.
For input ASDF, output would be 1, since every individual character is repeated 1 time only.
I hope that someone can point me in the right direction. Thank you.
I am taking the following assumptions:
The repeating substrings must be consecutive. That is, in case of ABCDABC, ABC would not count as a repeating substring, but it would in case of ABCABC.
The repeating substrings must be non-overalpping. That is, in case of ABCABC, ABC would not count as a repeating substring.
In case of multiple possible answers, we want the one with the maximum value. That is, in the case of AAAA, the answer should be 4 (a is the substring) rather than 2 (aa is the substring).
Under these assumptions, the algorithm is as follows:
Let the input string be denoted as inputString.
Calculate the KMP failure function array for the input string. Let this array be denoted as failure[]. This operation if of linear time complexity with respect to the length of the string. So, by definition, failure[i] denotes the length of the longest proper-prefix of the substring inputString[0....i] that is also a proper-suffix of the same substring.
Let len = inputString.length - failure.lastIndexValue. At this point, we know that if there is any repeating string at all, then it has to be of this length len. But we'll need to check for that; First, just check if len perfectly divides inputString.length (that is, inputString.length % len == 0). If yes, then check if every consecutive (non-overlapping) substring of len characters is the same or not; this operation is again of linear time complexity with respect to the length of the input string.
If it turns out that every consecutive non-overlapping substring is the same, then the answer would be = inputString.length/ len. Otherwise, the answer is simply inputString.length, as there is no such repeating substring present.
The overall time complexity would be O(n), where n is the number of characters in the input string.
A sample code for calculating the KMP failure array is given here.
For example,
Let the input string be abcaabcaabca.
Its KMP failure array would be - [0, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8].
So, our len = (12 - 8) = 4.
And every consecutive non-overlapping substring of length 4 is the same (abca).
Therefore the answer is 12/4 = 3. That is, abca is repeated 3 times repeatedly.
The solution for this with C# is:
class Program
{
public static string CountOfRepeatedSubstring(string str)
{
if (str.Length < 2)
{
return "-1";
}
StringBuilder substr = new StringBuilder();
// Length of the substring cannot be greater than half of the actual string
for (int i = 0; i < str.Length / 2; i++)
{
// We will iterate through half of the actual string and
// create a new string by appending the current character to the previous character
substr.Append(str[i]);
String clearedOfNewSubstrings = str.Replace(substr.ToString(), "");
// We will remove the newly created substring from the actual string and
// check if the length of the actual string, cleared of the newly created substring, is 0.
// If 0 it tells us that it is only made of its substring
if (clearedOfNewSubstrings.Length == 0)
{
// Next we will return the count of the newly created substring in the actual string.
var countOccurences = Regex.Matches(str, substr.ToString()).Count;
return countOccurences.ToString();
}
}
return "-1";
}
static void Main(string[] args)
{
// Input: {"abcdaabcdaabcda"}
// Output: 3
// Input: { "abcdaabcdaabcda" }
// Output: -1
// Input: {"barrybarrybarry"}
// Output: 3
var s = "asdf"; // Output will be -1
Console.WriteLine(CountOfRepeatedSubstring(s));
}
}
How do you want to specify the "repeating string"? Is it simply the first group of characters up until either a) the first character is found again, b) the pattern begins to repeat, or c) some other criteria?
So, if your string is "ABBAABBA", is that a 2 because "ABBA" repeats twice or is it 1 because you have "ABB" followed by "AAB"? What about "ABCDABCE" -- does "ABC" count (despite the "D" in between repetitions?) In "ABCDABCABCDABC", is the repeating string "ABCD" (1) or "ABCDABC" (2)?
What about "AAABBAAABB" -- is that 3 ("AAA") or 2 ("AAABB")?
If the end of the repeating string is another instance of the first letter, it's pretty simple:
Work your way through the string character by character, putting each character into another variable as you go, until the next character matches the first one. Then, given the length of the substring in your second variable, check the next bit of your string to see if it matches. Continue until it doesn't match or you hit the end of the string.
If you just want to find any length pattern that repeats regardless of whether the first character is repeated within the pattern, it gets more complicated (but, fortunately, it's the sort of thing computers are good at).
You'll need to go character by character building a pattern in another variable as above, but you'll also have to watch for the first character to reappear and start building a second substring as you go, to see if it matches the first. This should probably go in an array as you might encounter a third (or more) instance of the first character which would trigger the need to track yet another possible match.
It's not difficult but there is a lot to keep track of and it's a rather annoying problem. Is there a particular reason you're doing this?

Always find the middle of a string with elements grouped by 3 characters

I've a series of strings that represent airline's itineraries:
FLROTP
MADFCOFCOFLR
BLQMADMADUIOUIOMADMADBLQ
MXPJFKJFKMCOJFKMXP
WAWPSAPSAWAW
FLRFRAFRASGNSGNBKKBKKVIEVIEFLR
FLRMUCMUCDELDXBDXBZRHZRHFLR
FLRFRAFRASINSINMELMELSINSINFRAFRAFLR
FLRCDGCDGCANCANJJNZHACANCANCDGCDGFLRWNZCANCANZHAHKGAMSFLR
JFKMTYMTYMEXMEXPTYMDEMDEBOGBOGLIM
PSAISTISTICNICNNRTNRTISTISTPSANRTISTISTPSA
MXPDXBDXBPERPERADLADLMELMELASPASPAYQAYQASPASPSYDSYDDXBDXBMXP
FLRFRAFRAORDORDLASLASBNACLTCLTMUCMUCPSA
FLRCDGCDGBOGBOGBAQBAQBOGBOGCUCCUCBOGBOGMDEMDEBOGBOGUIOGYELIMLIMHAVHAVCDGCDGFLR
FLRFRAFRALAXLAXSEASEAORDORDICTICTORDORDCMHCMHBOSBOSMIAMIAFRAFRAFLR
PSAMUCMUCIADIADGSOGSOCLTCLTMIAMIADFWDFWICTICTDFWDFWCMHCMHPHLPHLALBALBIADIADFRAFRAFLR
FLRFRAFRAEZEEZESCLSCLGRUCGHSDUSDUPOAPOAGRUGRULIMLIMUIOUIOBOGBOGPTYPTYPOSPOSMIAMIAFRAFRAFLR
PSACDGCDGHAVHAVPTYPTYUIOUIOMDEMDEBOGBOGBAQBAQBOGBOGCUCCUCBOGBOGCDGCDGFLR
FLRCDGCDGMEXMEXSJOSJOMEXBJXBJXMEXMEXCDGCDGPSA
I'd like to always be able to find the "middle" of the string (that 90% of the cases is the passenger's destination) but i'm short of ideas. Any help? :)
What you want is not the index at the exact middle of the string, but the closest index to the middle that is a multiple of 3, to index the start of a valid 3-letter code.
You didn't specify a language so I'll just use C++ to illustrate.
std::string code = "MXPJFKJFKMCOJFKMXP";
Find the length of the string:
int length = code.size();
Count how many codes you have:
int codecount = length / 3;
Find the middle code, using integer arithmetic (rounding down), with the codes numbered from zero:
int middlecode = codecount / 2;
Find the start index of your middle code:
int index = middlecode * 3;
Get the middle code:
std::string destination = code.substr(index, 3);
For strings with an even number of codes, this will give the first code in the second half of the string, e.g:
MXPJFKJFKMCOJFKMXP
For strings with an odd number of codes, this will give the middle code, e.g:
FLRFRAFRAORDORDLASLASBNACLTCLTMUCMUCPSA
(which in the above case looks wrong, but you did say only 90%!)

How can I modify individual characters in a String using Swift?

var str: String = "sometext"
for i in str.characters.indices
{
str[i] = "c"
}
print(str)
I'm getting the following error:
error: cannot assign through subscript: subscript is get-only
You are getting this error because the subscript method of a Swift String is get-only like it is saying in your warning.
This is different from an Array.
Array:
array[0] ✅
array[0] = 0 ✅
String:
str[0] ❌
str[0] = "0" ❌
str[str.startIndex.advancedBy(0)] ✅
Use replaceRange for accomplishing your task.
Example:
var value = "green red blue"
value.replaceRange(value.startIndex.advancedBy(
6)..<value.startIndex.advancedBy(6 + 3),
with: "yellow")
print(value)
Result:
green yellow blue
Also have a look at this superb blog article from Ole Begemann who explains how Swift Strings work very detailed. You will also find the answer why you can't use subscript methods on Swift Strings.
Because of the way Swift strings are stored, the String type does not support random access to its Characters via an integer index — there is no direct equivalent to NSStringʼs characterAtIndex: method. Conceptually, a String can be seen as a doubly linked list of characters rather than an array.
Article Link
In some cases it may be preferable to convert the String to an Array, mutate, then convert back to a String, e.g.:
var chars = Array("sometext".characters)
for i in 0..<chars.count {
chars[i] = "c"
}
let string = String(chars)
Advantages include:
clarity
better performance on large strings: O(1) time for making each replacement in Array vs O(N) time for making each replacement in String.
Disadvantages include:
higher memory consumption: O(N) for Array vs O(1) for String.
Pick your poison :)

Resources