I am trying to read coordinate into Databricks as follows:
00°00'0.00"N
However, when read the data, I get the following output:
00?00'0.00"N
It returns back a question mark.
I have tried to replace the value, but this did not work as I received the following error:
Dangling meta character '?' near index 0
? is a special character in regex and you probably tried to replace the ? using regexp_replace, which caused an error. You can instead try replace which interprets the string to be replaced as it is, not as a regex.
df2 = df.withColumn('col1', F.expr("replace(col1, '?', '°')"))
Related
I want to make a list of files with extend() while ignoring any file name ending with 0 as in _file0.h5. I have this line for concatenating all the files into a list
data_files_0.extend(sorted(glob(f"{directory}*_file{filenum}.h5") ) )
I am trying to learn how to implement regex here and I tried
filenum = re.match(r'[^0]')
by putting it above the previous line which gives the error
TypeError: match() missing 1 required positional argument: 'string'
I am pretty confused here and all the examples of f-string with regex don't help me at all.
re.match won't automatically look for strings containing that pattern - you have to provide the string to it - that's the cause of the missing argument error.
Example - re.match('[^0]', "abc0123") will check for matches in the string "abc0123" according to the pattern '[^0]'.
[^0] is likely wrong pattern here since this matches any and every character at any position in the string except for a 0. You might want to use something like .*0\.h5, which matches any string ending with '0.h5'. You can also check out regexr.com, which is a very helpful site for figuring out how regex patterns work in general.
For the other part of the problem - finding the files - you might have to just get all the filenames first, then check which ones end with 0 or not:
all_files = glob(f"{directory}*_file*.h5")
for f in all_files:
if not re.match('.*0\.h5', f):
data_files_0.append(f)
This problem might be very simple but I find it a bit confusing & that is why I need help.
With relevance to this question I posted that got solved, I got a new issue that I just noticed.
Source code:
from PyQt5 import QtCore,QtWidgets
app=QtWidgets.QApplication([])
def scroll():
#QtCore.QRegularExpression(r'\b'+'cat'+'\b')
item = listWidget.findItems(r'\bcat\b', QtCore.Qt.MatchRegularExpression)
for d in item:
print(d.text())
window = QtWidgets.QDialog()
window.setLayout(QtWidgets.QVBoxLayout())
listWidget = QtWidgets.QListWidget()
window.layout().addWidget(listWidget)
cats = ["love my cat","catirization","cat in the clouds","catść"]
for i,cat in enumerate(cats):
QtWidgets.QListWidgetItem(f"{i} {cat}", listWidget)
btn = QtWidgets.QPushButton('Scroll')
btn.clicked.connect(scroll)
window.layout().addWidget(btn)
window.show()
app.exec_()
Output GUI:
Now as you can see I am just trying to print out the text data based on the regex r"\bcat\b" when I press the "Scroll" button and it works fine!
Output:
0 love my cat
2 cat in the clouds
3 catść
However... as you can see on the #3, it should not be printed out cause it obviously does not match with the mentioned regular expression which is r"\bcat\b". However it does & I am thinking it has something to do with that special foreign character ść that makes it a match & prints it out (which it shouldn't right?).
I'm expecting an output like:
0 love my cat
2 cat in the clouds
Researches I have tried
I found this question and it says something about this \p{L} & based on the answer it means:
If all you want to match is letters (including "international"
letters) you can use \p{L}.
To be honest I'm not so sure how to apply that with PyQT5 also still I've made some tries & and I tried changing the regex to like this r'\b'+r'\p{cat}'+r'\b'. However I got this error.
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
Obviously the error says it's not a valid regex. Can someone educate me on how to solve this issue? Thank you!
In general, when you need to make your shorthand character classes and word boundaries Unicode-aware, you need to pass the QRegularExpression.UseUnicodePropertiesOption option to the regex compiler. See the QRegularExpression.UseUnicodePropertiesOption reference:
The meaning of the \w, \d, etc., character classes, as well as the meaning of their counterparts (\W, \D, etc.), is changed from matching ASCII characters only to matching any character with the corresponding Unicode property. For instance, \d is changed to match any character with the Unicode Nd (decimal digit) property; \w to match any character with either the Unicode L (letter) or N (digit) property, plus underscore, and so on. This option corresponds to the /u modifier in Perl regular expressions.
In Python, you could declare it as
rx = QtCore.QRegularExpression(r'\bcat\b', QtCore.QRegularExpression.UseUnicodePropertiesOption)
However, since the QListWidget.findItems does not support a QRegularExpression as argument and only allows the regex as a string object, you can only use the (*UCP) PCRE
verb as an alternative:
r'(*UCP)\bcat\b'
Make sure you define it at the regex beginning.
It's a weird problem
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120"
And two strings below:
s1="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\[Content_Types].xml"
s2="D:\\Users\\UserKnown\\PycharmProjects\\ProjectKnown\\PT\\collections\\120\\_rels\.rels"
When I use the command below:
s1.strip(to_be_stripped)
s2.strip(to_be_stripped)
I get these outputs:
'[Content_Types].x'
'_rels\\.'
If I use lstrip(), they will be:
'[Content_Types].xml'
'_rels\\.rels'
Which is the right outputs.
However, if we replace all Project Known with zeus_pipeline:
to_be_stripped="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120"
And:
s2="D:\\Users\\UserKnown\\PycharmProjects\\zeus_pipeline\\PT\\collections\\120\\_rels\.rels"
s2.lstrip(to_be_stripped)will be '.rels'
If I use / instead of \\, nothing goes wrong. I am wondering why this problem happens.
strip isn't meant to remove full strings exactly. Rather, you give it a string, and every character in that string is removed from the start and of the string to be stripped.
In your case, the variable to_be_stripped contains the characters m and l, so those are stripped from the end of s1. However, it doesn't contain the character x, so the stripping stops there and no characters beyond that are removed.
Check out this question. The accepted answer is probably more extensive than you need - I like another user's suggestion of using replace instead of strip. This would look like:
s1.replace(to_be_stripped, "")
I'm making use of QueryProvider in msticpy.data.data_provider to run a Kusto query statement in Jupyter notebook.
The purpose of the query is to extract a specific part of a string that is typically session (other text), where I want to extract the (other text) - hence the extract function in Line 5.
As the content of the (other text) varies, I used the \w+ in the regex.
I can't execute the query successfully as it keeps complaining of syntax error. I have tried to escape the characters but it seems to have no effect because the same error appear. Would anyone have an idea what is the issue? Or point me to any resources?
Screenshot of current code and error returned
you need to escape the backslash (see: https://learn.microsoft.com/en-us/azure/kusto/query/scalar-data-types/string#string-literals)
regardless, you'd be better off using the parse operator
print s = "session abc"
| extend session = extract(#"session (\w+)", 1, s)
print s = "session abc"
| parse s with "session " session
Firebug identified xpath not working in protractor.I ahve cretaed xpath using firebug.When I identify the xpath using IDE,it is working fine.However when I use the same xpath in protractor,it is not working.My element does not have id or name.So here i can use only xpath option.
Please find the below image for reference.
Here I need to verify whether that particular element has "IRCTC Attractions" text.
Could you please help me?
HTML code:
//div style="width:100%;" class="g_hedtext">IRCTC Attractions /div
Find the element by text and assert it's present:
var elm = element(by.xpath("//div[. = 'IRCTC Attractions']"));
expect(browser.isElementPresent(elm)).toBe(true);
OK, looking at your error message (in the comment):
Exception loading: SyntaxError:
C:\Users\XXXX\AppData\Roaming\npm\TC_model2.js:7
var disclaimermessage = element(by.xpath('//[#id='disclaimer-message']'));
^^^^^^^^^^ Unexpected identifier
(I'm guessing where the carets before "Unexpected identifier" were aligned. Is that right?)
The problem is that you've used single quotes both to delimit the string 'disclaimer-message', and to delimit the whole XPath expression '//[#id='disclaimer-message']'. Thus it appears to the parser that your XPath expression is the stuff between the first two single quotes: '//[#id=', and then the disclaimer-message is some other identifier without any comma or other operator to show what it's doing there.
The solution is to use double quotes inside the XPath expression. XPath accepts either single or double quotes; it doesn't care, as long as you match them with each other. So change the offending line to
var disclaimermessage = element(by.xpath('//[#id="disclaimer-message"]'));
And you should be good to go.
For future reference, this question would have been quicker and easier to answer if you had told us about the error message in the first place.