What is happening when I declare a string as: char aString[SIZE*2+1];? - string

I am studying an example of a program the takes a user input, stores in a string then prints two occurrences of the corrosponding character in upper or lower case. For example, input ABCdef would print aabbccDDEEFF.
I'm a little bit confused about the way the new string is declared, can anybody help explain what is happening.
char string[MAX_STRING_SIZE+1]; // MAX_STRING_SIZE is defined as 500 and +1 is for the NULL char to terminate string
char stringNew[MAX_STRING_SIZE*2+1]; // Here I do not understand *2+1
EDIT: Just after I posted this question I figured out the answer and realised it may not be useful to the stack overflow community but as the question had already been answered it would be rude to delete it.
In the event that an other member is interested in this specific question I have attempted to make it useful by editing the question title and summarising the answer.
As this program will print 2 occurrences of every character from the user input, the new string needs to be twice the size as the original string. By declaring the new string size with *2 you are simply multiplying the size by 2 thus doubling the size.
Very simple in hindsight, I hope this can be of use to somebody else.

No pointers are involved in that declaration. You're just declaring another array of chars that is twice the size of the first one + the extra byte that'll store \0.
It can't be a pointer because there are no identifiers between the square brackets. After the preprocessor will have done its job with the source file, that expression will actually become char stringNew[500*2+1];

Related

Why is Julia giving me StringIndex error?

I'm getting a StringIndex error for one particular string out of 10,000 which I am processing. I don't really know what the issue is with this string. I think it is probably a special character issue.
If I println the string then assign it to txt then pass txt to the function, I don't get an error. I am a little baffled.
I am sorry, I can't post the string as it is protected content and even if I did copying and pasting the string somehow removes the source of error. Any suggestions?
Just to expand. The details of how String is represented in Julia are explained in the Julia manual.
You can use eachindex to get an iterator of valid indices into a String. The reason why it is an iterator is that you cannot efficiently (i.e. in O(1) time) find an index of i-th character in the string. However, you can use isascii function on a String to check if it consists only of ASCII characters (in which case byte and character indices are the same).
Also if you need to get to some specific character in a string you usually need probably more than one character, in which case first, last and chop functions are useful (actually last(first(s, n)) gives you a character at position n; although it is not most efficient - iterating eachindex will allocate less).
In Julia Strings are indexed by bytes rather than characters. You should use for c in str rather than trying to index manually.

Read substrings from a string containing multiplication [duplicate]

This question already has answers here:
'*' and '/' not recognized on input by a read statement
(2 answers)
Closed 4 years ago.
I am a scientist programming in Fortran, and I came up with a strange behaviour. In one of my programs I have a string containing several "words", and I want to read all words as substrings. The first word starts with an integer and a wildcard, like "2*something".
When I perform an internal read on that string, I expect to read all wods, but instead, the READ function repeatedly reads the first substring. I do not understand why, nor how to avoid this behaviour.
Below is a minimalist sample program that reproduces this behaviour. I would expect it to read the three substrings and to print "3*a b c" on the screen. Instead, I get "a a a".
What am I doing wrong? Can you please help me and explain what is going on?
I am compiling my programs under GNU/Linux x64 with Gfortran 7.3 (7.3.0-27ubuntu1~18.04).
PROGRAM testread
IMPLICIT NONE
CHARACTER(LEN=1024):: string
CHARACTER(LEN=16):: v1, v2, v3
string="3*a b c"
READ(string,*) v1, v2, v3
PRINT*, v1, v2, v3
END PROGRAM testread
You are using list-directed input (the * format specifier). In list-directed input, a number (n) followed by an asterisk means "repeat this item n times", so it is processed as if the input was a a a b c. You would need to have as input '3*a' b c to get what you want.
I will use this as another opportunity to point out that list-directed I/O is sometimes the wrong choice as its inherent flexibility may not be what you want. That it has rules for things like repeat counts, null values, and undelimited strings is often a surprise to programmers. I also often see programmers complaining that list-directed input did not give an error when expected, because the compiler had an extension or the programmer didn't understand just how liberal the feature can be.
I suggest you pick up a Fortran language reference and carefully read the section on list-directed I/O. You may find you need to use an explicit format or change your program's expectations.
Following the answer of #SteveLionel, here is the relevant part of the reference on list-directed sequential READ statements (in this case, for Intel Fortran, but you could find it for your specific compiler and it won't be much different).
A character string does not need delimiting apostrophes or quotation marks if the corresponding I/O list item is of type default character, and the following is true:
The character string does not contain a blank, comma (,), or slash ( / ).
The character string is not continued across a record boundary.
The first nonblank character in the string is not an apostrophe or a quotation mark.
The leading character is not a string of digits followed by an asterisk.
A nondelimited character string is terminated by the first blank, comma, slash, or end-of-record encountered. Apostrophes and quotation marks within nondelimited character strings are transferred as is.
In total, there are 4 forms of sequential read statements in Fortran, and you may choose the option that best fits your need:
Formatted Sequential Read:
To use this you change the * to an actual format specifier. If you know the length of the strings at advance, this would be as easy as '(a3,a2,a2)'. Or, you could come with a format specifier that matches your data, but this generally demands you knowing the length or format of stuff.
Formatted Sequential List-Directed:
You are currently using this option (the * format descriptor). As we already showed you, this kind of I/O comes with a lot of magic and surprising behavior. What is hitting you is the n*cte thing, that is interpreted as n repetitions of cte literal.
As said by Steve Lionel, you could put quotation marks around the problematic word, so it will be parsed as one-piece. Or, as proposed by #evets, you could split or break your string using the intrinsics index or scan. Another option could be changing your wildcard from asterisk to anything else.
Formatted Namelist:
Well, that could be an option if your data was (or could be) presented in the namelist format, but I really think it's not your case.
Unformatted:
This may not apply to your case because you are reading from a character variable, and an internal READ statement can only be formatted.
Otherwise, you could split your string by means of a function instead of a I/O operation. There is no intrinsic for this, but you could come with one without much trouble (see this thread for reference). As you may have noted already, manipulating strings in fortran is... awkward, at least. There are some libraries out there (like this) that may be useful if you are doing lots of string stuff in Fortran.

Conversion of list to string - TCL

I encountered the following problem in TCL. In my application, I read very large text files (some hundreds of MB) into TCl list. The list is then returned by the function to the main context, and then checked for emptiness. Here is the code snapshot:
set merged_trace_list [merge_trace_files $exclude_trace_file $trace_filenames ]
if {$merged_trace_list == ""} {
...
And I get crash at the "if" line. The crash seems to be related to memory overflow. I thought that the comparison to "" forces TCL to convert list to the string, and since the string is too long, this causes crash. I then replaced above "if" line by another one:
if {[lempty $merged_trace_list]} {
and crash indeed disappeared. In the light of the above, I have several questions:
What is the maximum allowed string length in TCL?
What is difference between string and list in TCL in terms of memory allocation? Why I can have very long list, but not corresponding string?
When the list first returned by the function into the main scope (the first line) , is it not converted to the string first? And if yes, why I don't have crash in that line?
Thanks,
I hope the descriptions and the questions are clear.
Konstantin
The current maximum size of individual memory object (e.g., string) is 2GB. This is a known bug (of long standing) on 64-bit platforms, but fixing it requires a significant ABI and API breaking change, so it won't appear until Tcl 9.0.
The difference between strings and lists is that strings are stored in a single block of memory, whereas lists are stored in an array of pointers to elements. You can probably get 256k elements in a list no problem, but after that you might run into problems as the array reaches the 2GB limit.
Tcl's value objects may be simultaneously both lists and strings; the dictum about Tcl that “everything is a string” is not actually true, it's just that everything may be serialized to a string. The returning of a list does not force it to be converted to string — that's actually a fairly slow operation — but comparing the value for equality with a string does force the generation of the string. The lempty command must be instead getting the length of the string (you can use llength to do the same thing) and comparing that to zero.
Can you adjust your program to not need to hold all that data in memory at once? It's living a little dangerously given the bug mentioned above.
This is not really an answer, but it's slightly too much for a comment.
If you want to check if a list is empty, the best option is llength. If the list length is 0, your list has no content. The low-level lookup for this is very cheap.
If you still want to determine if a list is empty by comparing it to the empty string you will have to face the cost of resolving the string representation of the list. In this case, $myLongList eq {} is preferable to $myLongList == {}, since the latter comparison also forces the interpreter to check if the operands are numeric (at least it used to be like that, it might have changed).

MASM - Concatenating strings

For reference, here is the question I'm working from.
Write a procedure named Str_concat that concatenates a source string to the end of a target string. Sufficient space must be available in the target string before this procedure is called.
I'm not looking for code (though I'll take it if it makes it easier to explain). I want to know why it places the caveat that there must be sufficient space in the target string.
I haven't written anything yet because this is my first time really reading the question and I like to plot before I type. I understand that I will need to increment the target string by Lenghtof to point to, and overwrite, the null-terminator. Then loop by length of the source to pull everything over. So why worry about the space in the target; aren't strings just contiguous arrays of characters and, as such, can be extended indeterminately?
Or have I missed some important note in my reference book?

Interpret strings as variable names in Fortran [duplicate]

This question already has answers here:
Determine variable names dynamically according to a string in Fortran
(4 answers)
Closed 5 years ago.
I'd like to access a real variable with a name equal to a string of characters that I have. Something like this (I'll make the example as clean as possible):
character(len=5) :: some_string
real :: value
value = 100.0
some_string = 'value'
At this point, how do I create an association between the character array value and the name of my real variable, value, so that I can write the value of 100.0 by referring to the string some_string?
That's pretty much not going to happen in Fortran. There are no "dynamic" language features like this available in the language. Variable names are a compile-time only thing, and simply don't exist at runtime (the names have been translated to machine addresses by the compiler).
This is how I work around this:
character(100) :: s
integer :: val
val = 100
write(s,*) val
print *,trim(s)
This prints 100 to the screen. There is some strangeness which I do not understand however, the character s needs to be very large (100 int his case). For instance, if you use 3 instead of 100, it does not work. This is not a critical thing, as the use of trim fixes this, but it would be nice if somebody could answer why this is the case.
Either way, this should work.

Resources