Python3: convert apostrophe unicode string - python-3.x

I have a string value with an apostrophe like this:
"I\\xE2\\x80\\x99m going now."
How can I get correct apostrophe value?
"I`m going now."
As you know, \xE2\x80\x99 is the a unicode character U+2019 RIGHT SINGLE QUOTATION MARK, but I have a string representation instead of byte...

Perhaps this is what you want:
utf8_apostrophe = b'\xe2\x80\x99'.decode("utf8")
str = "I"+utf8_apostrophe+"m going now"
Aside:
I ran into this when converting a single quotation mark, within a UTF-8-encoded tweet, into a normal single quote.
import re
original_tweet = 'I’m going now'
string_apostrophe = "'"
print re.sub(utf8_apostrophe, string_apostrophe, original_tweet)
which produces
I'm going now

Related

Regular expression for splitting a comma with the string

I had to split string data based on Comma.
This is the excel data:-
Please find the excel data
string strCurrentLine="\"Himalayan Salt Body Scrub with Lychee Essential Oil from Majestic Pure, All Natural Scrub to Exfoliate & Moisturize Skin, 12 oz\",SKU_27,\"Tombow Dual Brush Pen Art Markers, Portrait, 6-Pack\",SKU_27,My Shopify Store 1,Valid,NonInventory".
Regex CSVParser = new Regex(",(?=(?:[^\"]\"[^\"]\")(?![^\"]\"))");
string[] lstColumnValues = CSVParser.Split(strCurrentLine);
I have attached the image.The problem is I used the Regex to split the string with comma but i need the ouptut just like SKU_27 because string[0] and string2 contains the forward and backward slash.I need the output string1 and remove the forward and backward slash.
The file seems to be a CVA file. For CVA to be properly formatted, it will use quotes "" to wrap strings that contains comma, such as
id, name, date
1,"Some text, that includes comma", 2020/01/01
Simply split the string by comma, you will get the 2nd column with double quote.
I'm not sure whether you are asking how to remove the double-quotes from lstColumnValues[0] and lstColumnValues[2], or add them to lstColumnValues[1].
To remove the double-quotes, just use Replace:
string myString = lstColumnValues[0].Replace("\"", "");
If you need to add them:
string myString = $"\"{lstColumnValues[1]}\"";

Nested strings for text(,format) conversion

How do you use Text() with a format that has a string inside it ?
=TEXT(A1,"Comfi+"#0"(JO)";"Comfi-"#0"(JO)")
Tried """ both the inner string :
=TEXT(A1," """Comfi+"""#0"""(JO)""";"""Comfi-"""#0"(JO)""" ")
Same result with &char(34)&
Similar issue here, but I couldn't transpose the solution to my problem : How to create strings containing double quotes in Excel formulas?
Post Solution edit :
Building an almanac/calendar with the following (now fixed)formula :
=CONCATENATE(
TEXT(Format!K25,"d"),
" J+",
Format!S25,
" ",
TEXT(Format!AA25,"""Comfi+""#0""(JO)"";""Comfi-""#0""(JO)"""),
" ",
Format!AI25
)
Giving the following output in each cell :
9
J+70
Comfi+21(JO)
CRG
You've got too many quotation marks inside:
=TEXT(A1,"""Comfi+""#0""(JO)"";""Comfi-""#0""(JO)""")
You were tripling many of the inside quotation marks.
Personally, doubling up double-quotes within a quoted string is something I try to avoid at all costs. You can 'escape' the text into literals with a backslash.
=TEXT(A1,"\C\o\m\f\i+#0\(\J\O\);\C\o\m\f\i-#0\(\J\O\)")
'alternately
="Comfi"&text(a1, "+#0;-#0")&"(JO)"
Not all of those actually need to be escaped; only reserved characters. However, I usually escape them all and let Excel sort them out.

Escaping quotes and delimiters in CSV files with Excel

I try to import a CSV file in Excel, using ; as delimiters, but some columns contains
; and/or quotes.
My problem is : I can use double quotes to ignore the delimiters for a specific string, but if there is a double quote inside the string, it ignores delimiters until the first double quote, but not after.
I don't know if it's clear, it's not that easy to explain.
I will try to explain with a example :
Suppose I have this string this is a;test : I use double quotes around the string, to ignore the delimiter => It works.
Now if this string contains delimiters AND double quotes : my trick doesn't work anymore. For example if I have the string this; is" a;test : My added double quotes around the string ignore delimiters for the first part (the delimiter in the part this; is is correctly ignored, but since there is a double quote after, Excel doesn't ignore the next delimiter in the a;test part.
I tried my best to be as clear as possible, I hope you'll understand what is the problem.
When reading in a quoted string in a csv file, Excel will interpret all pairs of double-quotes ("") with single double-quotes(").
so "this; is"" a;test" will be converted to one cell containing this; is" a;test
So replace all double-quotes in your strings with pairs of double quotes.
Excel will reverse this process when exporting as CSV.
Here is some CSV
a,b,c,d,e
"""test1""",""",te"st2,"test,3",test"4,test5
And this is how it looks after importing into Excel:
Import your Excel file in openOffice and export as CSV (column escaped with " unlike Excel csv, utf8, comma against ";").

Reading from a string using sscanf in Matlab

I'm trying to read a string in a specific format
RealSociedad
this is one example of string and what I want to extract is the name of the team.
I've tried something like this,
houseteam = sscanf(str, '%s');
but it does not work, why?
You can use regexprep like you did in your post above to do this for you. Even though your post says to use sscanf and from the comments in your post, you'd like to see this done using regexprep. You would have to do this using two nested regexprep calls, and you can retrieve the team name (i.e. RealSociedad) like so, given that str is in the format that you have provided:
str = 'RealSociedad';
houseteam = regexprep(regexprep(str, '^<a(.*)">', ''), '</a>$', '')
This looks very intimidating, but let's break this up. First, look at this statement:
regexprep(str, '^<a(.*)">', '')
How regexprep works is you specify the string you want to analyze, the pattern you are searching for, then what you want to replace this pattern with. The pattern we are looking for is:
^<a(.*)">
This says you are looking for patterns where the beginning of the string starts with a a<. After this, the (.*)"> is performing a greedy evaluation. This is saying that we want to find the longest sequence of characters until we reach the characters of ">. As such, what the regular expression will match is the following string:
<ahref="/teams/spain/real-sociedad-de-futbol/2028/">
We then replace this with a blank string. As such, the output of the first regexprep call will be this:
RealSociedad</a>
We want to get rid of the </a> string, and so we would make another regexprep call where we look for the </a> at the end of the string, then replace this with the blank string yet again. The pattern you are looking for is thus:
</a>$
The dollar sign ($) symbolizes that this pattern should appear at the end of the string. If we find such a pattern, we will replace it with the blank string. Therefore, what we get in the end is:
RealSociedad
Found a solution. So, %s stops when it finds a space.
str = regexprep(str, '<', ' <');
str = regexprep(str, '>', '> ');
houseteam = sscanf(str, '%*s %s %*s');
This will create a space between my desired string.

Is there a way to add quotes to a multi paragraph string

I wrote the following line:
string QuoteTest2 = "Benjamin Netnayahu,\"BB\", said that: \"Israel will not fall\"";
This example went well, but what can I do in case I want to write a multi paragraph string including quotes?
The following example shows that puting '#' before the doesn't cut it..
string QuoteTest2 = #"Benjamin Netnayahu,\"BB\", said that: \"Israel will not fall\"";
The string ends and the second quote and the over just gives me errors, what should I do?
Use double quotes to escape ""
e.g.
string QuoteTest2 = #"Benjamin Netnayahu,""BB"", said that: ""Israel will not fall""";

Resources