Remove special characters in sed script - linux

I'm having a hard time removing º and ª in a sentence.
For instance, given this line:
s. ex.ª mandava: e como esse inverno ia seco
I would like to remove "s. ex.ª", ending up with:
mandava: e como esse inverno ia seco
I have tried (without success) the following regex:
s/s. ex.\ª//g
s/s. ex.ª//g
s/s. ex.[ºª]//g
What am I doing wrong?

Changing the input file to utf-8 solved my problem, marking the question as solved.
Thanks again to #ikegami, for pointing me in the right direction.

Related

Adding and trimming spaces before and after separator

If string contain text with one single special character (separator),
How to ad a space after point before separator and trim any space after separator?
Exemple string:
.Dolo.rum ipsum primos# ar.deo
J.ust. simple text# h er.e
Another fr.e.e #. exe mpl e
Expect result:
. Dolo. rum ipsum primos#ar.deo
J. ust. simple text#her.e
Another fr. e. e #.exemple
Since an accepted answer can't be deleted hence sharing the solution as mentioned in the first comment,
=SUBSTITUTE(LEFT(A1,FIND("#",A1)),".",". ")&SUBSTITUTE(MID(A1,FIND("#",A1)+1,LEN(A1))," ","")

Python String cleanup from spaces and Unknown empty element

I got attribute from Selenium Element and it contains empty char or spaces:
When I double click the result :
In VS code:
What I tried so far :
string.replace(" ","") #didnt work
So I came with this resolution (I know its bad ):
edit1 = ticketID[:1]
ticketF = ticketID.replace(edit1,"")
edit2 = ticketF[:1]
ticketE = ticketF.replace(edit2,"")
edit3 = ticketE[:1]
ticketD = ticketE.replace(edit3,"")
What Im looking for is what is those blanks ? tabs ? new lines ?
how to make it better ?
Edit:
ticketID.replace("\n","")
ticketID.replace(" ","")
ticketID.strip()
Those are basically whitespaces, Please use .strip() for any trailing spaces.
In Python, the stripping methods are capable of removing leading and trailing spaces and specific characters. The leading and trailing spaces include blanks, tabs (\t), carriage returns (\r, \n), and the other lesser-known whitespace characters.
If you have ele as an web element.
You can use .text to get the text and then on top of that use .strip()
Probably in your code :
ticketID.text.strip()
They look like lines and not spaces to me.
string.replace("\n","")

How to replace part of a string with an added condition

The problem:
The objective is to convert: "tan(x)*arctan(x)"
Into: "np.tan(x)*np.arctan(x)"
What I've tried:
s = "tan(x)*arctan(x)"
s = s.replace('tan','np.tan')
Out: np.tan(x)*arcnp.tan(x)
However, using pythons replace method resulted in arcnp.tan.
Taking one additional step:
s = s.replace('arcnp.', 'np.arc')
Out: np.tan(x)*np.arctan(x)
Achieves the desired result... but this solution is sloppy and inefficient.
Is there a more efficient solution to this problem?
Any help is appreciated. Thanks in advance.
Here is a way to do the job:
var string = 'tan(x)*arctan(x)';
var res = string.replace(/\b(?:arc)?tan\b/g,'np.$&');
console.log(res);
Explanation:
/ : regex delimiter
\b : word boundary, make sure we don't have any word character before
(?:arc)? : non capture group, literally 'arc', optional
tan : literally 'tan'
\b : word boundary, make sure we don't have any word character after
/g : regex delimiter, global flag
Replace:
$& : means the whole match, ie. tan or arctan
You can use regular expression to solve your issue. Following code is in javascript. Since, u didn't mention the language you are using.
var string = 'tan(x)*arctan(x)*xxxtan(x)';
console.log(string.replace(/([a-z]+)?(tan)/g,'np.$1$2'));

Escape character for parameter in function of EL

I have this element in my page:
<p:panel header="Advies van de dienst aangemaakt op: #{of:formatDate(advies.aangemaaktOp, 'dd/MM/yyyy HHumm')}">
It renders to:
Advies van de dienst aangemaakt op: 30/03/2016 14357
It should be:
Advies van de dienst aangemaakt op: 30/03/2016 14u57
How do I achieve this? I know I should find a way to escape the 'u' character but since I am in a parameter of a function in an expression of EL, I cannot find a solution to do it.
Suggestion on the internet which doesn't work is using the tick '.
Tried also to escape with backslash but also no luck
The solution is 'dd/MM/yyyy HH\'u\'mm'. I had to escape the tick itself.
Complete element:
<p:panel header="Advies van de dienst aangemaakt op: #{of:formatDate(advies.aangemaaktOp, 'dd/MM/yyyy HH\'u\'mm')}">

What encoding is this and how can I decode it?

I've got an old project file with translations to Portuguese where special characters are broken:
error.text.required=\u00C9 necess\u00E1rio o texto.
error.categoryid.required=\u00C9 necess\u00E1ria a categoria.
error.email.required=\u00C9 necess\u00E1rio o e-mail.
error.email.invalid=O e-mail \u00E9 inv\u00E1lido.
error.fuel.invalid=\u00C9 necess\u00E1rio o tipo de combust\u00EDvel.
error.regdate.invalid=\u00C9 necess\u00E1rio ano de fabrica\u00E7\u00E3o.
error.mileage.invalid=\u00C9 necess\u00E1ria escolher a quilometragem.
error.color.invalid=\u00C9 necess\u00E1ria a cor.
Can you tell me how to decode the file to use the common Portuguese letters?
Thanks
The "\u" is prefix for unicode. You can use the strings "as is", and you'll have diacritics showing in the output. A python code would be something like:
print u"\u00C9 necess\u00E1rio o texto."
which outputs:
É necessário o texto.
Otherwise, you need to convert them in their ASCII equivalents. You can do a simple find/replace. I ended up writing a function like that for converting Romanian diacritics a while ago, but I had dynamic strings coming in...
Smell to me like this is unicode?
\u = prefix unicode character
00E1 = hex code for the 2 byte number of the unicode.
Not sure what the format is - I would ask the sencer, but i would try this approach to decode it.
found it ;)
http://www.fileformat.info/info/unicode/char/20/index.htm
Look at the tables with source code. This can be a C++ source file. This is the way you give unicodde characters in source.

Resources