Ocaml String.sub: How to get all characters until end of string

Ocaml String.sub: How to get all characters until end of string - string

So I'm using the String.sub function and I'm wondering if there's any way to get all the characters in a string until the end of the string. Since String.sub takes in a string, int (the index of where to start getting chars), and then a number of how many chars, I'm not sure what the easiest way of doing all chars since we want a possibly positive infinite amount

For String.sub you just have to subtract from the length of the string. There's no simpler way, i.e., there's no value for length that has a special meaning. A value larger than the remainder of the string is an error in OCaml (which tends to be strict when checking parameters).
Assume i is >= 0 and < length of the string:
String.sub s i (String.length s - i)
You can use Str.last_chars, but you still need to know how many characters you want. I.e., you still have to subtract from the length of the string.
Str.last_chars s (String.length s - i)

Related

Compare Unicode code point range in Python3

I would like to check if a character is in a certain Unicode range or not, but seems I cannot get the expected answer.
char = "？" # the unicode value is 0xff1f
print(hex(ord(char)))
if hex(ord(char)) in range(0xff01, 0xff60):
print("in range")
else:
print("not in range")
It should print: "in range", but the results show: "not in range". What have I done wrong?

hex() returns a string. To compare integers you should simply use ord:
if ord(char) in range(0xff01, 0xff60):
You could've also written:
if 0xff01 <= ord(char) < 0xff60:

In general for such problems, you can try inspecting the types of your variables.
Typing 0xff01 without quotes, represents a number.
list(range(0xff01, 0xff60)) will give you a list of integers [65281, 65282, .., 65375]. range(0xff01, 0xff60) == range(65281, 65376) evaluates to True.
ord('?') gives you integer 65311.
hex() takes an integer and converts it to '0xff01' (a string).
So, you simply need to use ord(), no need to hex() it.

Just only use ord:
if ord(char) in range(0xff01, 0xff60):
...
hex is not needed.
As mentioned in the docs:
Convert an integer number to a lowercase hexadecimal string prefixed with “0x”.
Obviously that already describes it, it becomes a string instead of what we want, an integer.
Whereas the ord function does what we want, as mentioned in the docs:
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().

Python. Why the length of the list changes after turning it from int to string?

I have a bunch of users in a list called UserList.
And I do not want the output to have the square brackets, so I run this line:
UserList = [1,2,3,4...]
UserListNoBrackets = str(UserList).strip('[]')
But if I run:
len(UserList) #prints22 (which is correct).
However:
len(UserListNoBrackets) #prints 170 (whaaat?!)
Anyway, the output is actually correct (I'm pretty sure). Just wondering why that happens.

Here:
UserListNoBrackets = str(UserList).strip('[]')
UserListNoBrackets is a string. A string is a sequence of characters, and len(str) returns the numbers of characters in the string. A comma is a character, a white space is a character, and the string represention of an integer has has many characters as there are digits in the integer. So obviously, the length of your UserListNoBrackets string is much greater than the length of you UserList list.

You probably need str.join
Ex:
user_list = [1,2,3,4...]
print(",".join(map(str, user_list)))
Note:
Using map method to convert all int elements in list to string.

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

I know the string methods str.isdigit, str.isdecimal and str.isnumeric.
I'm looking for a built-in method that checks if a character is algebraic, meaning that it can be found in a declaration of a decimal number.
The above mentioned methods return False for '-1' and '1.0'.
I can use isdigit to retrieve a positive integer from a string:
string = 'number=123'
number = ''.join([d for d in string if d.isdigit()]) # returns '123'
But that doesn't work for negative integers or floats.
Imagine a method called isnumber that works like this:
def isnumber(s):
for c in s:
if c not in list('.+-0123456789'):
return False
return True
string1 = 'number=-1'
string2 = 'number=0.1'
number1 = ''.join([d for d in string1 if d.isnumber()]) # returns '-1'
number2 = ''.join([d for d in string2 if d.isnumber()]) # returns '0.1'
The idea is to test against a set of "basic" algebraic characters. The string does not have to contain a valid Python number. It could also be an IP address like 255.255.0.1.
.
Does a handy built-in that works approximately like that exist?
If not, why not? It would be much more efficient than a python function and very useful. I've seen alot of examples on stackoverflow that use str.isdigit() to retrieve a positive integer from a string. Is there a reason why there isn't a built-in like that, although there are three different methods that do almost the same thing?

No such function exists. There are a bunch of odd characters that can be part of number literals in Python, such as o, x and b in the prefix of integers of non-decimal bases, and e to introduce the exponential part of a float. I think those plus the hex digits (0-9 and A-F) and sign characters and the decimal point are all you need.
You can put together a string with the right character yourself and test against it:
from string import hex_digits
num_literal_chars = hex_digits + "oxOX.+-"
That will get a bunch of garbage though if you use it to test against mixed text and numbers:
string1 = "foo. bar. 0xDEADBEEF 10.0.0.1"
print("".join(c for c in string1 if c in num_literal_chars))
# prints "foo.ba.0xDEADBEEF10.0.0.1"
The fact that it gives you a bunch of junk is probably why no builtin function exists to do this. If you want to match a certain kind of number out of a string, write an appropriate regular expression to match that specific kind of number. Don't try to do it character-by-character, or try to match all the different kinds of Python numbers.

Go's LeftStr, RightStr, SubStr

I believe there are no LeftStr(str,n) (take at most n first characters), RightStr(str,n) (take at most n last characters) and SubStr(str,pos,n) (take first n characters after pos) function in Go, so I tried to make one
// take at most n first characters
func Left(str string, num int) string {
if num <= 0 {
return ``
}
if num > len(str) {
num = len(str)
}
return str[:num]
}
// take at most last n characters
func Right(str string, num int) string {
if num <= 0 {
return ``
}
max := len(str)
if num > max {
num = max
}
num = max - num
return str[num:]
}
But I believe those functions will give incorrect output when the string contains unicode characters. What's the fastest solution for those function, is using for range loop is the only way?

As mentioned in already in comments,
combining characters, modifying runes, and other multi-rune
"characters"
can cause difficulties.
Anyone interested in Unicode handling in Go should probably read the Go Blog articles
"Strings, bytes, runes and characters in Go"
and "Text normalization in Go".
In particular, the later talks about the golang.org/x/text/unicode/norm package which can help in handling some of this.
You can consider several levels increasingly of more accurate (or increasingly more Unicode aware) spiting the first (or last) "n characters" from a string.
Just use n bytes.
This may split in the middle of a rune but is O(1), is very simple, and in many cases you know the input consists of only single byte runes.
E.g. str[:n].
Split after n runes.
This may split in the middle of a character. This can be done easily, but at the expense of copying and converting with just string([]rune(str)[:n]).
You can avoid the conversion and copying by using the unicode/utf8 package's DecodeRuneInString (and DecodeLastRuneInString) functions to get the length of each of the first n runes in turn and then return str[:sum] (O(n), no allocation).
Split after the n'th "boundary".
One way to do this is to use
norm.NFC.FirstBoundaryInString(str) repeatedly
or norm.Iter to find the byte position to split at and then return str[:pos].
Consider the displayed string "cafés" which could be represented in Go code as: "cafés", "caf\u00E9s", or "caf\xc3\xa9s" which all result in the identical six bytes. Alternative it could represented as "cafe\u0301s" or "cafe\xcc\x81s" which both result in the identical seven bytes.
The first "method" above may split those into "caf\xc3"+"\xa9s" and cafe\xcc"+"\x81s".
The second may split them into "caf\u00E9"+"s" ("café"+"s") and "cafe"+"\u0301s" ("cafe"+"́s").
The third should split them into "caf\u00E9"+"s" and "cafe\u0301"+"s" (both shown as "café"+"s").

Array of Strings in Fortran 77

I've a question about Fortran 77 and I've not been able to find a solution.
I'm trying to store an array of strings defined as the following:
character matname(255)*255
Which is an array of 255 strings of length 255.
Later I read the list of names from a file and I set the content of the array like this:
matname(matcount) = mname
EDIT: Actually mname value is hardcoded as mname = 'AIR' of type character*255, it is a parameter of a function matadd() which executes the previous line. But this is only for testing, in the future it will be read from a file.
Later on I want to print it with:
write(*,*) matname(matidx)
But it seems to print all the 255 characters, it prints the string I assigned and a lot of garbage.
So that is my question, how can I know the length of the string stored?
Should I have another array with all the lengths?
And how can I know the length of the string read?
Thanks.

You can use this function to get the length (without blank tail)
integer function strlen(st)
integer i
character st*(*)
i = len(st)
do while (st(i:i) .eq. ' ')
i = i - 1
enddo
strlen = i
return
end
Got from here: http://www.ibiblio.org/pub/languages/fortran/ch2-13.html
PS: When you say: matname(matidx) it gets the whole string(256) chars... so that is your string plus blanks or garbage

The function Timotei posted will give you the length of the string as long as the part of the string you are interested in only contains spaces, which, if you are assigning the values in the program should be true as FORTRAN is supposed to initialize the variables to be empty and for characters that means a space.
However, if you are reading in from a file you might pick up other control characters at the end of the lines (particularly carriage return and/or line feed characters, \r and/or \n depending on your OS). You should also toss those out in the function to get the correct string length. Otherwise you could get some funny print statements as those characters are printed as well.
Here is my version of the function that checks for alternate white space characters at the end besides spaces.
function strlen(st)
integer i,strlen
character st*(*)
i = len(st)
do while ((st(i:i).eq.' ').or.(st(i:i).eq.'\r').or.
+ (st(i:i).eq.'\n').or.(st(i:i).eq.'\t'))
i = i - 1
enddo
strlen = i
return
end
If there are other characters in the "garbage" section this still won't work completely.
Assuming that it does work for your data, however, you can then change your write statement to look like this:
write(*,*) matname(matidx)(1:strlen(matname(matidx)))
and it will print out just the actual string.
As to whether or not you should use another array to hold the lengths of the string, that is up to you. the strlen() function is O(n) whereas looking up the length in a table is O(1). If you find yourself computing the lengths of these static strings often, it may improve performance to compute the length once when they are read in, store them in an array and look them up if you need them. However, if you don't notice the slowdown, I wouldn't worry about it.

Depending on the compiler that you are using, you may be able to use the trim() intrinsic function to remove any leading/trailing spaces from a string, then process it as you normally would, i.e.
character(len=25) :: my_string
my_string = 'AIR'
write (*,*) ':', trim(my_string), ':'
should print :AIR:.
Edit:
Better yet, it looks like there is a len_trim() function that returns the length of a string after it has been trimmed.

intel and Compaq Visual Fortran have the intrinsic function LEN_TRIM(STRING) which returns the length without trailing blanks or spaces.
If you want to suppress leading blanks or spaces, use "Adjust Left" i.e. ADJUSTF(STRING)
In these FORTRANs I also note a useful feature: If you pass a string in to a function or subroutine as an argument, and inside the subroutine it is declared as CHARACTER*(*), then
using the LEN(STRING) function in the subroutine retruns the actual string length passed in, and not the length of the string as declared in the calling program.
Example:
CHARACTER*1000 STRING
.
.
CALL SUBNAM(STRING(1:72)
SUBROUTINE SYBNAM(STRING)
CHARACTER*(*) STRING
LEN(STRING) will be 72, not 1000

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Ocaml String.sub: How to get all characters until end of string - string

Related

Compare Unicode code point range in Python3

Python. Why the length of the list changes after turning it from int to string?

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

Go's LeftStr, RightStr, SubStr

Array of Strings in Fortran 77

Categories

Resources