Why strings cannot be indexed by integer values - string

I learned that Swift strings cannot be indexed by integer values. I remembered it and I use the rule. But I've never fully understood the mechanic behind it.
The explanation of from the official document is as follows
"Different characters can require different amounts of memory to store, so in order to determine which Character is at a particular position, you must iterate over each Unicode scalar from the start or end of that String. For this reason, Swift strings cannot be indexed by integer values"
I've read it several times, I still don't quite get the point. Can someone explain me a bit more why Swift String cannot be indexed by integer values?
Many Thanks

A string is stored in memory as an array of bytes.
A given character can require 1 to 4 bytes for the basic codepoint, plus any number of combining diacritical mark.
For example, é requires 2 bytes.
Now, if you have the strings efgh and éfgh, to access the second character (f), for the first string, the character is in the byte array at index 1, for the second string, it is at index 2.
In order to know that, you need to inspect the first character. For accessing any character based on its index, you need to go through all the previous characters to know how many bytes each takes.

Related

Fixed vs variable length STRING field definitions in ECL

I have a question regarding STRING field definitions.
Am I better off to fully qualify my STRING fields or allow them to be variable length?
For example I am working with a data file which contains multiple string data elements which can be up to 1000 characters in length.
When I define the ECL fields as STRING1000 the strings are padded and difficult to view in ECL Watch.
If I define the ECL fields simply as STRING, the string fields are adjusted to the length of the field value and much easier to read in ECL Watch.
With regards to my question, does either option affect the size of my dataset in memory or on disk?
What is the best practice I should follow?
The standard answer to this question is:
IF you know the string is always going to contain n number of characters (like a US state code or zipcode field) OR the string will always contain 1 to n characters where n is a small number and the average length of the actual data approaches the max (like most street address fields) THEN you should define that field as a STRINGn. ELSE IF n is a large number and the average length of the data is small compared to the maximum THEN variable-length STRING would be best.
Both options affect the storage and memory size:
Fixed-length fields are always stored at their defined length.
Variable-length STRING fields are stored with a leading 4-byte integer indicating the actual number of characters following that instance (like a Pascal string)
Therefore, if you define a string field that always contains 2 characters as a STRING2 it occupies two bytes of storage, but define it as a STRING and it will occupy six.

Convert string into fixed length numbers and convert it back

I have more than 100 cpp files. I need to assign unique ID to each of them. I aslo need to know which file it is based on their ID. I found the maximum length of file's name contains 64 characters and the ID can only be at most 8 bytes long. Is there any algorithm can help to assign unique ID to source file in VS2013 in C++ and can also let user know which file it is based on the ID ?
Just store a mapping between filename and an integer.
-----Yes, this way is very simple. But every time when people create new course files, the mapping need to be re-coded. So I won't use this way.
HERE IS THE ORIGINAL QUESTION SO THAT THE COMMENTS BELOW MAKE SENSE
Now I have a bunch of strings, like "AAA", or "ABBCCHH". The maximum of string contains 64 characters. Now I need an algorithm which can convert string into numbers( not must be integer, double float is also acceptable). But the length of numbers must be fixed. For example, if "A" is convert into 12312, 5 digits, "ABBHGGH" should also have 5 digits after converted. And these numbers can also be converted back to original strings. Is there any algorithms can do that ? The converted number cannot over 8 bytes. That's why I cannot just use ASCII etc simple algorithm. I don't know which algorithm can do that.
To generate unique IDs of an arbitrary set of filenames (the actual question here), you could use a cryptographic hash (SHA-1, -256, -384, -512). This will result in a unique, fixed-length hexadecimal output. If you can't allow the characters a-f in the output, you can convert the hexadecimal value to decimal.
This process is not reversible, but you can maintain a map (lookup table) of the input values to the IDs.
If you want a simpler solution, just hexadecimal encode the filenames. This is reversible. (You can add the hex -> decimal conversion here if necessary as well).

Similar String Comparison Algorithm

Got this question in a recent interview. Basic String compare with a little twist. I have an input String, STR1 = 'ABC'. I should return "Same/Similar" when the string to compare, STR2 has anyone of these values - 'ACB' 'BAC' 'ABC' 'BCA' 'CAB' 'CBA' (That is same characters, same length and same no of occurrences). The only answer struck at that moment was to proceed with 'Merge sort' or 'Quick Sort' since it's complexity is logarithmic. Is there any other better algorithm to achieve the above result?
Sorting both, and comparing the results for equality, is not a bad approach for strings of reasonable lengths.
Another approach is to use a map/dictionary/object (depending on language) from character to number-of-occurrences. You then iterate over the first string, incrementing the counts, and iterate over the second string, decrementing them. You can return false as soon as you get a negative number.
And if your set of possible characters is small enough to be considered constant, you can use an array as the "map", resulting in O(n) worst-case complexity.
Supposing you can use any language, I would opt for a python 'dictionary' solution. You could use 2 dictionaries having as keys each string's characters. Then you can compare the dictionaries and return the respective result. This actually works for strings with characters that appear more than once.

Comparing strings in MIPS assembly

I have a bunch of strings in an array that I have defined in the data segment. If I were to take 2 of the strings from the array, is it possible to compare them to see which has a greater value in mips? How would I do this? Basically, I'm looking to rearrange the strings based on alphabetical order.
EDIT: This is less of me trying to get help with a specific problem, and more of just a general question that will help me with my approach to the code. Thanks!
If it were me, I'd create a list of pointers to the strings. That is, a list of the addresses of each string. Then you'd write a subroutine the compares two strings given their pointers. Then, when you need to swap the strings, you simply swap the actual pointers.
You want to avoid swapping the strings themselves, since they may well be tightly packed, thus you'd have to do a lot of shifting to move the holes of memory around. Pointers are simple to swap. You could swap strings more easily if they were all of a fixed length (or less), then you wouldn't have to worry about moving the memory holes around.
But sorting the pointer list is really the hot tip.
To compare strings, the simplest way is to iterate over each character of each string, and subtract them from each other. If the result is 0, they're equal. If not, then if the result is > 0, then the first string is before the other string, otherwise the second string is lower and you would swap them. If you run out of either string before the other, and they're equal all the way to that point, the shorter string is less than the longer one.

How can using strings instead of simple types like integers alter the O-notation of operations?

Proposed answer:
Strings are simply arrays of characters so the O-notation will be dependent on the number of characters in the string (if the loop depends on the length of the string). In this case the O-notation wouldn't be affected because the length of the string is a constant.
Any other ideas? Am I reading this question correctly?
This is not true, since representing integers in arrays are not boundless.
IOW a string that represents an 32-bit integer is maximally 32-bit, thus maximally 10 digits in base 10, and O(10) is a negiable constant that doesn't change the O notation.
So, in summary, while strings are O(n), basic integer types represented as strings are O(maximally 10)=O(0)
I think you need to specify your problem better.
Try thinking about something that operates on an array of integers or an array of strings, clearly in the latter case you have an array of array of a primitive type rather than an array of a primitive type. How does this change things?
That depends entirely on what you are doing with the strings.
If you for example copy items from one array to another, the result is depending on the implementation. It's still an O(n) operation, but the meaning of n changes. If copying a string causes a new copy to be created, n means the total number of characters in all the strings. If copying a string is only copying the reference to it, n means the total number of strings.

Resources