Modifying a character in a string in Lua - string

Is there any way to replace a character at position N in a string in Lua.
This is what I've come up with so far:
function replace_char(pos, str, r)
return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end
str = replace_char(2, "aaaaaa", "X")
print(str)
I can't use gsub either as that would replace every capture, not just the capture at position N.

Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.
This variation on your code:
function replace_char(pos, str, r)
return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end
is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.
But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.
It is possible that one of two alternate approaches could be faster. The first is the solution offered by Paŭlo Ebermann, but with one small tweak:
function replace_char2(pos, str, r)
return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end
This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.
But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)
A third alternative that comes to mind is this:
function replace_char3(pos, str, r)
return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end
table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.
My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.

You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.
Maybe
"%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())
is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).

With luajit, you can use the FFI library to cast the string to a list of unsigned charts:
local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')

Related

Reading a comma-separated string (not text file) in Matlab

I want to read a string in Matlab (not an external text file) which has numerical values separated by commas, such as
a = {'1,2,3'}
and I'd like to store it in a vector as numbers. Is there any function which does that? I only find processes and functions used to do that with text files.
I think you're looking for sscanf
A = sscanf(str,formatSpec) reads data from str, converts it according
to the format specified by formatSpec, and returns the results in an
array. str is either a character array or a string scalar.
You can try the str2num function:
vec = str2num('1,2,3')
If you have to use the cell a, per your example, it would be: vec=str2num(a{1})
There are some security warnings in the documentation to consider so be cognizant of how your code is being employed.
Another, more flexible, option is textscan. It can handle strings as well as file handles.
Here's an example:
cellResult = textscan('1,2,3', '%f','delimiter',',');
vec = cellResult{1};
I will use the eval function to "evaluate" the vector. If that is the structure, I will also use the cell2mat to get the '1,2,3' text (this can be approached by other methods too.
% Generate the variable "a" that contains the "vector"
a = {'1,2,3'};
% Generate the vector using the eval function
myVector = eval(['[' cell2mat(a) ']']);
Let me know if this solution works for you

Hash function to see if one string is scrambled form/permutation of another?

I want to check if string A is just a reordered version of string B. For example, "abc" = "bca" = "cab"...
There are other solutions here: https://www.geeksforgeeks.org/check-if-two-strings-are-permutation-of-each-other/
However, I was thinking a hash function would be an easy way of doing this, but the typical hash function takes order into consideration. Are there any hash functions that do not care about character order?
Are there any hash functions that do not care about character order?
I don't know of real-world hash functions that have this property, no. Because this is not a problem they are designed to solve.
However, in this specific case, you can make your own "hash" function (a very very bad one) that will indeed ignore order: just sum ASCII codes of characters. This works due to the commutative property of addition (a + b == b + a)
def isAnagram(self,a,b):
sum_a = 0
sum_b = 0
for c in a:
sum_a += ord(c)
for c in b:
sum_b += ord(c)
return sum_a == sum_b
To reiterate, this is absolutely a hack, that only happens to work because input strings are limited in content in the judge system (only have lowercase ASCII characters and do not contain spaces). It will not work (reliably) on arbitrary strings.
For a fast check you could use a kind af hash-funkction
Candidates are:
xor all characters of a String
add all characters of a String
multiply all characters of a String (be careful might lead to overflow for large Strings)
If the hash-value is equal, it could still be a collision of two not 'equal' strings. So you still need to make a dedicated compare. (e.g. sort the characters of each string before comparing them).

Lua unicode, using string.sub() with two-byted chars

As example: I want remove the first 2 letters from the string "ПРИВЕТ" and "HELLO." one of these are containing only two-byted unicode symbols.
Trying to use string.sub("ПРИВЕТ") and string.sub("HELLO.")
Got "РИВЕТ" and "LLO.".
string.sub() removed 2 BYTES(not chars) from these strings. So i want to know how to get the removing of the chars
Something, like utf8.sub()
The key standard function for this task is utf8.offset(s,n), which gives the position in bytes of the start of the n-th character of s.
So try this:
print(string.sub(s,utf8.offset(s,3),-1))
You can define utf8.sub as follows:
function utf8.sub(s,i,j)
i=utf8.offset(s,i)
j=utf8.offset(s,j+1)-1
return string.sub(s,i,j)
end
(This code only works for positive j. See http://lua-users.org/lists/lua-l/2014-04/msg00590.html for the general case.)
There is https://github.com/Stepets/utf8.lua, a pure lua library, which expand standard function to support utf8 string.
I found a simpler solution (the the solution using offset() didnt work for me for all cases):
function utf8.sub(s, i, j)
return utf8.char(utf8.codepoint(s, i, j))
end

String matching without using builtin functions

I want to search for a query (a string) in a subject (another string).
The query may appear in whole or in parts, but will not be rearranged. For instance, if the query is 'da', and the subject is 'dura', it is still a match.
I am not allowed to use string functions like strfind or find.
The constraints make this actually quite straightforward with a single loop. Imagine you have two indices initially pointing at the first character of both strings, now compare them - if they don't match, increment the subject index and try again. If they do, increment both. If you've reached the end of the query at that point, you've found it. The actual implementation should be simple enough, and I don't want to do all the work for you ;)
If this is homework, I suggest you look at the explanation which precedes the code and then try for yourself, before looking at the actual code.
The code below looks for all occurrences of chars of the query string within the subject string (variables m; and related ii, jj). It then tests all possible orders of those occurrences (variable test). An order is "acceptable" if it contains all desired chars (cond1) in increasing positions (cond2). The result (variable result) is affirmative if there is at least one acceptable order.
subject = 'this is a test string';
query = 'ten';
m = bsxfun(#eq, subject.', query);
%'// m: test if each char of query equals each char of subject
[ii jj] = find(m);
jj = jj.'; %'// ii: which char of query is found within subject...
ii = ii.'; %'// jj: ... and at which position
test = nchoosek(1:numel(jj),numel(query)).'; %'// test all possible orders
cond1 = all(jj(test) == repmat((1:numel(query)).',1,size(test,2)));
%'// cond1: for each order, are all chars of query found in subject?
cond2 = all(diff(ii(test))>0);
%// cond2: for each order, are the found chars in increasing positions?
result = any(cond1 & cond2); %// final result: 1 or 0
The code could be improved by using a better approach as regards to test, i.e. not testing all possible orders given by nchoosek.
Matlab allows you to view the source of built-in functions, so you could always try reading the code to see how the Matlab developers did it (although it will probably be very complex). (thanks Luis for the correction)
Finding a string in another string is a basic computer science problem. You can read up on it in any number of resources, such as Wikipedia.
Your requirement of non-rearranging partial matches recalls the bioinformatics problem of mapping splice variants to a genomic sequence.
You may solve your problem by using a sequence alignment algorithm such as Smith-Waterman, modified to work with all English characters and not just DNA bases.
Is this question actually from bioinformatics? If so, you should tag it as such.

Basics of Strings

Ok, i've always kind of known that computers treat strings as a series of numbers under the covers, but i never really looked at the details of how it works. What sort of magic is going on in the average compiler/processor when we do, for instance, the following?
string myString = "foo";
myString += "bar";
print(myString) //replace with printing function of your choice
The answer is completely dependent on the language in question. But C is usually a good language to kind of see how things happen behind the scenes.
In C:
In C strings are array of char with a 0 at the end:
char str[1024];
strcpy(str, "hello ");
strcpy(str, "world!");
Behind the scenes str[0] == 'h' (which has an int value), str[1] == 'e', ...
str[11] == '!', str[12] == '\0';
A char is simply a number which can contain one of 256 values. Each character has a numeric value.
In C++:
strings are supported in the same way as C but you also have a string type which is part of STL.
string literals are part of static storage and cannot be changed directly unless you want undefined behavior.
It's implementation dependent how the string type actually works behind the scenes, but the string objects themselves are mutable.
In C#:
strings are immutable. Which means you can't directly change a string once it's created. When you do += what happen is a new string gets created and your string now references that new string.
The implementation varies between language and compiler of course, but typically for C it's something like the following. Note that strings are essentially syntactical sugar for char arrays (char[]) in C.
1.
string myString = "foo";
Allocate 3 bytes of memory for the array and set the value of the 1st byte to 'f' (its ASCII code rather), the 2nd byte to 'o', the 2rd byte to 'o'.
2.
foo += "bar";
Read existing string (char array) from memory pointed to by foo.
Allocate 6 bytes of memory, fill the first 3 bytes with the read contents of foo, and the next 3 bytes with b, a, and r.
3.
print(foo)
Read the string foo now points to from memory, and print it to the screen.
This is a pretty rough overview, but hopefully should give you the general idea.
Side note: In some languages/compuilers, char != byte - for example, C#, where strings are stored in Unicode format by default, and notably the length of the string is also stored in memory. C++ typically uses null-terminated strings, which solves the problem in another way, though it means determining its length is O(n) rather than O(1).
Its very language dependent. However, in most cases strings are immutable, so doing that is going to allocate a new string and release the old one's memory.
I'm assuming a typo in your sample and that there is only one variable called either foo or myString, not two variables?
I'd say that it'll depend a lot on what compiler you're using. In .Net strings are immutable so when you add "bar" you're not actually adding it but rather creating a new string containing "foobar" and telling it to put that in your variable.
In other languages it will work differently.

Resources