Is there a way to set the maximum number of characters for GtkTextBuffer? What I need is when the maximum number of characters is reached, further typing produces no output. Beep would be nice but not required. Or maybe I need to do something with GtkTextView?
I need my GtkTextView to serve as a word-wrapping single line entry, so I want to disallow Return characters as well.
I'm looking for a solution for GTK+2 and GTK+3.
This question has a similar intent, but in this case, the poster wants to only display a certain number of tailing characters. ptomato (who would know more about it than I) recommends using a callback for typing to cut or truncate the string.
Your callback would be very similar, except that you would cut off the tailing characters and strip any newline characters as they were fed in.
You could also use a similar mechanism to the comment fields on SO, where you are notified that your string is too long and either disallow continuing or truncate it to size only when the data is actually being used.
For wrapping, you also want to read about wrapping modes, although you may already know about that.
Related
This question already has answers here:
'*' and '/' not recognized on input by a read statement
(2 answers)
Closed 4 years ago.
I am a scientist programming in Fortran, and I came up with a strange behaviour. In one of my programs I have a string containing several "words", and I want to read all words as substrings. The first word starts with an integer and a wildcard, like "2*something".
When I perform an internal read on that string, I expect to read all wods, but instead, the READ function repeatedly reads the first substring. I do not understand why, nor how to avoid this behaviour.
Below is a minimalist sample program that reproduces this behaviour. I would expect it to read the three substrings and to print "3*a b c" on the screen. Instead, I get "a a a".
What am I doing wrong? Can you please help me and explain what is going on?
I am compiling my programs under GNU/Linux x64 with Gfortran 7.3 (7.3.0-27ubuntu1~18.04).
PROGRAM testread
IMPLICIT NONE
CHARACTER(LEN=1024):: string
CHARACTER(LEN=16):: v1, v2, v3
string="3*a b c"
READ(string,*) v1, v2, v3
PRINT*, v1, v2, v3
END PROGRAM testread
You are using list-directed input (the * format specifier). In list-directed input, a number (n) followed by an asterisk means "repeat this item n times", so it is processed as if the input was a a a b c. You would need to have as input '3*a' b c to get what you want.
I will use this as another opportunity to point out that list-directed I/O is sometimes the wrong choice as its inherent flexibility may not be what you want. That it has rules for things like repeat counts, null values, and undelimited strings is often a surprise to programmers. I also often see programmers complaining that list-directed input did not give an error when expected, because the compiler had an extension or the programmer didn't understand just how liberal the feature can be.
I suggest you pick up a Fortran language reference and carefully read the section on list-directed I/O. You may find you need to use an explicit format or change your program's expectations.
Following the answer of #SteveLionel, here is the relevant part of the reference on list-directed sequential READ statements (in this case, for Intel Fortran, but you could find it for your specific compiler and it won't be much different).
A character string does not need delimiting apostrophes or quotation marks if the corresponding I/O list item is of type default character, and the following is true:
The character string does not contain a blank, comma (,), or slash ( / ).
The character string is not continued across a record boundary.
The first nonblank character in the string is not an apostrophe or a quotation mark.
The leading character is not a string of digits followed by an asterisk.
A nondelimited character string is terminated by the first blank, comma, slash, or end-of-record encountered. Apostrophes and quotation marks within nondelimited character strings are transferred as is.
In total, there are 4 forms of sequential read statements in Fortran, and you may choose the option that best fits your need:
Formatted Sequential Read:
To use this you change the * to an actual format specifier. If you know the length of the strings at advance, this would be as easy as '(a3,a2,a2)'. Or, you could come with a format specifier that matches your data, but this generally demands you knowing the length or format of stuff.
Formatted Sequential List-Directed:
You are currently using this option (the * format descriptor). As we already showed you, this kind of I/O comes with a lot of magic and surprising behavior. What is hitting you is the n*cte thing, that is interpreted as n repetitions of cte literal.
As said by Steve Lionel, you could put quotation marks around the problematic word, so it will be parsed as one-piece. Or, as proposed by #evets, you could split or break your string using the intrinsics index or scan. Another option could be changing your wildcard from asterisk to anything else.
Formatted Namelist:
Well, that could be an option if your data was (or could be) presented in the namelist format, but I really think it's not your case.
Unformatted:
This may not apply to your case because you are reading from a character variable, and an internal READ statement can only be formatted.
Otherwise, you could split your string by means of a function instead of a I/O operation. There is no intrinsic for this, but you could come with one without much trouble (see this thread for reference). As you may have noted already, manipulating strings in fortran is... awkward, at least. There are some libraries out there (like this) that may be useful if you are doing lots of string stuff in Fortran.
I am building a string compressor and for simplicity reasons, I wanted to use some non-printable characters.
1) Is it in some way "bad" to use the 0-31 ASCII characters?
2) Can these characters occur in a normal text string?
If the answer is "partially":
3) What of them is better to use in this case? I think I will need maximum 9 of them.
Well the answer is that it depends on how you're using it. If you're treating the "string" as binary, then binary by definition can have any value. However if it is meant to be read/printed, it could cause serious problems to use characters 0-31.
It isn't too big a deal for the most part, except that 0 is "end of string" by many platforms. Though again, it depends entirely on how you're using it. My advice would be at the very least, avoid character 0. If you want the user to be able to copy and paste the string, then none of these would be suitable. They must be printable characters, in other words.
for a small compiler project we are currently working on implementing a compiler for a subset of C for which we decided to use Haskell and megaparsec. Overall we made good progress but there are still some corner cases that we cannot correctly handle yet. One of them is the treatment of backslashes followed by a newline. To quote from the specification:
Each instance of a backslash character () immediately followed by a
new-line character is deleted, splicing physical source lines to form
logical source lines. Only the last backslash on any physical source
line shall be eligible for being part of such a splice.
(ยง5.1.1., ISO/IEC9899:201x)
So far we came up with two possible approaches to this problem:
1.) Implement a pre-lexing phase in which the initial input is reproduced and every occurence of \\\n is removed. The big disadvantage we see in this approach is that we loose accurate error locations which we need.
2.) Implement a special char' combinator that behaves like char but looks an extra character ahead and will silently consume any \\\n. This would give us correct positions. The disadvantage here is that we need to replace every occurence of char with char' in any parser, even in the megaparsec-provided ones like string, integer, whitespace etc...
Most likely we are not the first people trying to parse a language with such a "quirk" with parsec/megaparsec, so I could imagine that there is some nicer way to do it. Does anyone have an idea?
Say we have a large sequence of characters that is unable to fit in memory, and we want to find the longest span of characters such that none are repeated. How would you do this? I am familiar with concepts of external sorting, but do not see how we could apply similar techniques to a problem like this, since it seems processing a sequence of characters is entirely dependent on previous sequences.
Start two pointers into the file at position 0, the front pointer and the back pointer.
Then advance the front pointer through the file, and as you do, advance the back pointer as necessary to ensure that the span between the back pointer and the front pointer contains no repeating characters. This will be the longest span of unique characters that ends at the front pointer.
In order to do this, you just maintain a set containing all the characters between the back and front pointers. If you want to advance the front pointer, and the character you pass is already in the set, then you must first advance the back pointer until the duplicate character is removed.
The longest span of characters you encounter in this way will be the longest span of unique characters in the file.
You can implement the two file pointers by opening the same file for reading twice. Alternatively, you can open it just once and use a circular buffer to remember everything between the back and front. There are only 256 (depending on your character type) unique characters so this buffer doesn't have to be too big.
I found out after testing that linux allows any character in a file name except for / and null (\0). So what sequence should I not allow in a filename? I heard a leading - may confuse some command line programs, which doesn't matter to me, however it may bother other people if they decide to collect a bunch of files and filter it with some GNU programs.
It was suggested to me to remove leading and trailing spaces and I plan to only because typically the user doesn't mean to have leading/trailing space.
What problematic sequence might there be and what sequence should I consider not allowing?
I am also considering not allowing characters illegal in windows just for convenience. I think I may not allow dashes at the beginning (dash is a legal window character)
Your question is somewhat confusing since you talk at length about Linux, but then in a comment to another answer you say that you are generating filenames for people to download, which presumably means that you have absolutely no control whatsoever over the filesystem and operating system that the files will be stored on, making Linux completely irrelevant.
For the purpose of this answer I'm going to assume that your question is wrong and your comment is correct.
The vast majority of operating systems and filesystems in use today fall roughly into three categories: POSIX, Windows and MacOS.
The POSIX specification is very clear on what a filename that is guaranteed to be portable across all POSIX systems looks like. The characters that you can use are defined in Section 3.276 (Portable Filename Character Set) of the Open Group Base Specification as:ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789._-The maximum filename length that you can rely on is defined in Section 13.23.3.5 (<limits.h> Minimum Values) as 14. (The relevant constant is _POSIX_NAME_MAX.)
So, a filename which is up to 14 characters long and contains only the 65 characters listed above, is safe to use on all POSIX compliant systems, which gives you 24407335764928225040435790 combinations (or roughly 84 bits).
If you don't want to annoy your users, you should add two more restrictions: don't start the filename with a dash or a dot. Filenames starting with a dot are customarily interpreted as "hidden" files and are not displayed in directory listings unless explicitly requested. And filenames starting with a dash may be interpreted as an option by many commands. (Sidenote: it is amazing how many users don't know about the rm ./-rf or rm -- -rf tricks.)
This leaves you at 23656340818315048885345458 combinations (still 84 bits).
Windows adds a couple of new restrictions to this: filenames cannot end with a dot and filenames are case-insensitive. This reduces the character set from 65 to 39 characters (37 for the first, 38 for the last character). It doesn't add any length restrictions, Windows can deal with 14 characters just fine.
This reduces the possible combinations to 17866587696996781449603 (73 bits).
Another restriction is that Windows treats everything after the last dot as a filename extension which denotes the type of the file. If you want to avoid potential confusion (say, if you generate a filename like abc.mp3 for a text file), you should avoid dots altogether.
You still have 13090925539866773438463 combinations (73 bits).
If you have to worry about DOS, then additional restrictions apply: the filename consists of one or two parts (seperated by a dot), where neither of the two parts can contain a dot. The first part has a maximum length of 8, the second of 3 characters. Again, the second part is usually reserved to indicate the file type, which leaves you only 8 characters.
Now you have 4347792138495 possible filenames or 41 bits.
The good news is that you can use the 3 character extension to actually correctly indicate the file type, without breaking the POSIX filename limit (8+3+1 = 12 < 14).
If you want your users to be able to burn the files onto a CD-R formatted with ISO9660 Level 1, then you have to disallow hyphen anywhere, not just as the first character. Now, the remaining character set looks likeABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789_which gives you 3512479453921 combinations (41 bits).
I would leave the determination of what's "valid" up to the OS and filesystem driver. Let the user type whatever they want, and pass it on. Handle errors from the OS in an appropriate manner. The exception is I think it's reasonable to strip leading and trailing spaces. If people want to create filenames with embedded spaces or leading dashes or question marks, and their chosen filesystem allows it, it shouldn't be up to you to try to prevent them.
It's possible to mount different filesystems at different mount points (or drives in Windows) that have different rules regarding legal characters in a file name. Handling this sort of thing inside your application will be much more work than is necessary, because the OS will already do it for you.
Since you seem to be interested primarily in Linux, one thing to avoid is characters that the (typical) shell will try to interpret, for example, as a wildcard. You can create a file named "*" if you insist, but you might have some users who don't appreciate it much.
Are you developing an application where you have to ask the user to create files themselves? If that's what you are doing, then you can set the rules in your application. (eg only allow [a-zA-Z0-9_.] and reject the rest of special characters.) this is much simpler to enforce.
urlencode all strings to be use as filenames and you'll only have to worry about length. This answer might be worth reading.
I'd recommend the use of a set of whitelist characters. In general, symbols in filenames will annoy people.
By all means allow people to use a-z 0-9 and unicode characters > 0x80, but do not allow arbitrary symbols, things like & and , will cause a lot of annoyance, as well as fullstops in inappropriate places.
I think the ASCII symbols which are safe to allow are: fullstop underscore hyphen
Allowing any OTHER ascii symbols in the filename is asking for trouble.
A filename should also not start with an ascii symbol. Policy on spaces in filenames is tricky as users may expect to be able to use them, but some filenames are obviously silly (such as those which START with spaces)