first="harry"
last="potter"
print(first, first.title())
print(f"Full name: {first.title()} {last.title()}")
print("Full name: {0.title()} {1.title()}".format(first, last))
The first two statements works fine; which means there is attribute title() to 'str' object.
The third print statement gives error. Why is it so?
The str.format() syntax is different from f-string syntax. In particular, while f-strings essentially let you put any expression between the brackets, str.format() is considerably more limited. Per the documentation:
The grammar for a replacement field is as follows:
replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
arg_name ::= [identifier | digit+]
attribute_name ::= identifier
element_index ::= digit+ | index_string
index_string ::= <any source character except "]"> +
conversion ::= "r" | "s" | "a"
format_spec ::= <described in the next section>
You'll note that, while attribute names (via the dot operator .) and indices (via square-brackets []) - in other words, values - are valid, actual method calls (or any other expressions) are not. I hypothesize this is because str.format() does not actually execute the text, but just swaps in an object that already exists.
Actual f-strings (your second example) share a similar syntax to the str.format() method, in that they use curly-brackets {} to indicate the areas to replace, but according to the PEP that introduced them,
F-strings provide a way to embed expressions inside string literals, using a minimal syntax. It should be noted that an f-string is really an expression evaluated at run time, not a constant value.
This is clearly different (more complex) than str.format(), which is more of a simple text replacement - an f-string is an expression and is executed as such, and allows full expressions inside its brackets (in fact, you can even nest f-strings inside each other, which is fun).
str.format() passes string object in respective placeholder. And by using '.' you can access the string attributes or functionalities. That is why {0.title()} searching for the specific method in the string class and it is getting nothing about title().
But if you use
print("Full name: {0.title} {1.title}".format(first, last))
>> Full name: <built-in method title of str object at 0x7f5e42d09630><built-in method title of str object at 0x7f5e42d096b0>
Here you can see you can access built-in method of string
If you want to use title() with format() then use like this:
print("Full name: {0} {1}".format(first.title(), last.title()))
>> Full name: Harry Potter
Related
I have a string S = '02143' and a list A = ['a','b','c','d','e']. I want to replace all those digits in 'S' with their corresponding element in list A.
For example, replace 0 with A[0], 2 with A[2] and so on. Final output should be S = 'acbed'.
I tried:
S = re.sub(r'([0-9])', A[int(r'\g<1>')], S)
However this gives an error ValueError: invalid literal for int() with base 10: '\\g<1>'. I guess it is considering backreference '\g<1>' as a string. How can I solve this especially using re.sub and capture-groups, else alternatively?
The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1> literal string, since the re module won't have any chance of evaluating it at that time. re engine only evaluates it during a match, but the A[int(r'\g<1>')] part is evaluated before the re engine attempts to find a match.
That is why it is made possible to use callback methods inside re.sub as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.
See the re documentation:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
Use
import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))
See the Python demo
Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group().
I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?
For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.
Do you know some better way ?
Use the re.escape() function for this:
4.2.3 re Module Contents
escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.
def simplistic_plural(word, text):
word_or_plural = re.escape(word) + 's?'
return re.match(word_or_plural, text)
You can use re.escape():
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'
If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.
If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).
Unfortunately, re.escape() is not suited for the replacement string:
>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'
A solution is to put the replacement in a lambda:
>>> re.sub('a', lambda _: '_', 'aa')
'__'
because the return value of the lambda is treated by re.sub() as a literal string.
Usually escaping the string that you feed into a regex is such that the regex considers those characters literally. Remember usually you type strings into your compuer and the computer insert the specific characters. When you see in your editor \n it's not really a new line until the parser decides it is. It's two characters. Once you pass it through python's print will display it and thus parse it as a new a line but in the text you see in the editor it's likely just the char for backslash followed by n. If you do \r"\n" then python will always interpret it as the raw thing you typed in (as far as I understand). To complicate things further there is another syntax/grammar going on with regexes. The regex parser will interpret the strings it's receives differently than python's print would. I believe this is why we are recommended to pass raw strings like r"(\n+) -- so that the regex receives what you actually typed. However, the regex will receive a parenthesis and won't match it as a literal parenthesis unless you tell it to explicitly using the regex's own syntax rules. For that you need r"(\fun \( x : nat \) :)" here the first parens won't be matched since it's a capture group due to lack of backslashes but the second one will be matched as literal parens.
Thus we usually do re.escape(regex) to escape things we want to be interpreted literally i.e. things that would be usually ignored by the regex paraser e.g. parens, spaces etc. will be escaped. e.g. code I have in my app:
# escapes non-alphanumeric to help match arbitrary literal string, I think the reason this is here is to help differentiate the things escaped from the regex we are inserting in the next line and the literal things we wanted escaped.
__ppt = re.escape(_ppt) # used for e.g. parenthesis ( are not interpreted as was to group this but literally
e.g. see these strings:
_ppt
Out[4]: '(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
__ppt
Out[5]: '\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
print(rf'{_ppt=}')
_ppt='(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
print(rf'{__ppt=}')
__ppt='\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
the double backslashes I believe are there so that the regex receives a literal backslash.
btw, I am surprised it printed double backslashes instead of a single one. If anyone can comment on that it would be appreciated. I'm also curious how to match literal backslashes now in the regex. I assume it's 4 backslashes but I honestly expected only 2 would have been needed due to the raw string r construct.
I cannot understand why my simple String equality test is returning false.
Code is:
boolean isDevelopment() {
//config.project_stage is set to "Development"
String cfgvar = "${config.project_stage}"
String comp = "Development"
assert cfgvar.equals(comp)
}
Result is:
assert cfgvar.equals(comp)
| | |
| false Development
Development
I also get false if I do:
assert cfgvar == comp
toString() is not necessary. Most probably you have some trailing
spaces in config.project_stage, so they are retained also in cfgvar.
comp has no extra spaces, what can be seen from your code.
Initially the expression "${config.project_stage}" is of GString
type, but since you assign it to a variable typed as String,
it is coerced just to String, so toString() will not change anything.
It is up to you whether you use equals(...) or ==.
Actually Groovy silently translates the second form to the first.
So, to sum up, you can write assert cfgvar.trim() == comp.
You can also trim cfgvar at the very beginning, writing:
cfgvar = "${config.project_stage}".trim()
and then not to worry about any trailing spaces.
Have you checked for trailing spaces? At least your output as one for the first Development. Try a .trim() when you compare those strings (and maybe a .toLowerCase() too)
And remember: .equals() in Groovy is a pointer comparison. What want to do is ==. Yes, just the opposite from what it is defined in Java, but the Groovy definition makes more sense :-)
Update: see comment by #tim_yates - I mixed .equals() up with .is()
On of the objects you comparing is not a String but GString, try:
cfgvar.toString().equals(comp)
However your code works with groovy v. 2.4.5. Which version are you using?
Summary: The '{key:spec}'.format_map(dic) allows to format the value from the dic accessed by the key. The spec says how it should be formatted. However, what if I want the separating colon be the part of the key? How should I tell that the colon is not a separator and the next characters are not a specification?
Details: I use string templates for transforming XML attributes to another text. Say, I have attributes of an XML element in the attributes dictionary. One of them has the key 'xlink:href' (literal name of the attribute). When using .format_map() method, how the format string should be written?
The '{xlink:href}'.format_map(attributes) does not work. Python complains KeyError: 'xlink'. (The href would probably be considered a bad specification, but the exception stops further processing.)
There is no way to escape colon in {xlink:href}.
You can't specify arbitrary keys in the replacement field:
replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
arg_name ::= [identifier | integer]
attribute_name ::= identifier
element_index ::= integer | index_string
index_string ::= <any source character except "]"> +
conversion ::= "r" | "s" | "a"
format_spec ::= <described in the next section>
I saw the operator r#"" in Rust but I can't find what it does. It came in handy for creating JSON:
let var1 = "test1";
let json = r#"{"type": "type1", "type2": var1}"#;
println!("{}", json) // => {"type2": "type1", "type2": var1}
What's the name of the operator r#""? How do I make var1 evaluate?
I can't find what it does
It has to do with string literals and raw strings. I think it is explained pretty well in this part of the documentation, in the code block that is posted there you can see what it does:
"foo"; r"foo"; // foo
"\"foo\""; r#""foo""#; // "foo"
"foo #\"# bar";
r##"foo #"# bar"##; // foo #"# bar
"\x52"; "R"; r"R"; // R
"\\x52"; r"\x52"; // \x52
It negates the need to escape special characters inside the string.
The r character at the start of a string literal denotes a raw string literal. It's not an operator, but rather a prefix.
In a normal string literal, there are some characters that you need to escape to make them part of the string, such as " and \. The " character needs to be escaped because it would otherwise terminate the string, and the \ needs to be escaped because it is the escape character.
In raw string literals, you can put an arbitrary number of # symbols between the r and the opening ". To close the raw string literal, you must have a closing ", followed by the same number of # characters as there are at the start. With zero or more # characters, you can put literal \ characters in the string (\ characters do not have any special meaning). With one or more # characters, you can put literal " characters in the string. If you need a " followed by a sequence of # characters in the string, just use the same number of # characters plus one to delimit the string. For example: r##"foo #"# bar"## represents the string foo #"# bar. The literal doesn't stop at the quote in the middle, because it's only followed by one #, whereas the literal was started with two #.
To answer the last part of your question, there's no way to have a string literal that evaluates variables in the current scope. Some languages, such as PHP, support that, but not Rust. You should consider using the format! macro instead. Note that for JSON, you'll still need to double the braces, even in a raw string literal, because the string is interpreted by the macro.
fn main() {
let var1 = "test1";
let json = format!(r#"{{"type": "type1", "type2": {}}}"#, var1);
println!("{}", json) // => {"type2": "type1", "type2": test1}
}
If you need to generate a lot of JSON, there are many crates that will make it easier for you. In particular, with serde_json, you can define regular Rust structs or enums and have them serialized automatically to JSON.
The first time I saw this weird notation is in glium tutorials (old crate for graphics management) and is used to "encapsulate" and pass GLSL code (GL Shading language) to shaders of the GPU
https://github.com/glium/glium/blob/master/book/tuto-02-triangle.md
As far as I understand, it looks like the content of r#...# is left untouched, it is not interpreted in any way. Hence raw string.