how do I replace a german umlaut in search strings (example Yelp)?
Unfortunately, not all sites treat "ü" as "ue" the same way
Use case: search query input for a web crawler is not recognized as intended.
Example:
Original search string: Asiatische Fusionsküche
Modified search string: Asiatische Fusionskueche
Stuttgart, Baden-Wuerttemberg
site: yelp.de
----> does return different results! (or better: no results)
I tried already:
1) ü in UTF
--> \u00fc
--> U+00FC
Was used like this:
String: "Asiatische Fusionsk\u00fcche"
String: "Asiatische FusionskU+00FCche"
2) ü in HTML
ü
String: "Asiatische Fusionsküche"
Any advice from experts?
Thanks
J.
Related
I want an image url from string which must contain specific string like J02487. Below I've mentioned that string. We have multiple urls but i only want that url which must have this string J02487.
{url:"https:\u002F\u002Fuk.louisvuitton.com\u002Fimages\u002Fis\u002Fimage\u002Flv\u002F1\u002FPP_VP_L\u002Flouis-vuitton-bandoulière-monogram-canvas-wallets-and-small-leather-goods--J02485_PM1_Closeup view.jpg"},{url:"https:\u002F\u002Fuk.louisvuitton.com\u002Fimages\u002Fis\u002Fimage\u002Flv\u002F1\u002FPP_VP_L\u002Flouis-vuitton-bandoulière-monogram-canvas-wallets-and-small-leather-goods--J02485_PM1_Back view.jpg"}],mediaUrl:bA,price:aL,priceRaw:bv,currency:aH,color:"Macadamia",size:a,material:bn,disambiguatingDescription:aG,colorIconURL:"https:\u002F\u002Fuk.louisvuitton.com\u002Fimages\u002Fis\u002Fimage\u002Flv\u002F1\u002FLV\u002Flouis-vuitton--MKC-LG-994_rose_clair.jpg",detailedDescription:"\u003Cul\u003E\n \u003Cli\u003EMacadamia Pink\u002FWhite \u003C\u002Fli\u003E\n \u003Cli\u003ENylon and Monogram coated canvas \u003C\u002Fli\u003E\n \u003Cli\u003ENylon lining \u003C\u002Fli\u003E\n \u003Cli\u003EGold-colour hardware\u003C\u002Fli\u003E\n\u003Cli\u003EStrap: Non-removable, adjustable\u003C\u002Fli\u003E\u003Cli\u003EStrap Drop: 32.0 cm\u002F12.6 inches\u003C\u002Fli\u003E\u003Cli\u003EStrap Drop Max: 52.0 cm\u002F20.5 inches\u003C\u002Fli\u003E\u003C\u002Ful\u003E",activateEngraving:b,sellable:d,cscSellable:d,mom:b,locateInStore:d,materialGroupCode:bf,backOrderDisclaimer:d,backOrderFullDeposit:b,dimensions:{values:{depth:aX,height:aw,width:aT},valuesAlt:{depth:bN,height:bL,width:bI},unitText:at,unitTextAlt:ar},productId:H},{identifier:"J02487",name:v,url:"\u002Feng-gb\u002Fproducts\u002Fbandouliere-monogram-nvprod2420049v#J02487",medias:[{url:cq},{url:"https:\u002F\u002Fuk.louisvuitton.com\u002Fimages\u002Fis\u002Fimage\u002Flv\u002F1\u002FPP_VP_L\u002Flouis-vuitton-bandoulière-monogram-canvas-wallets-and-small-leather-goods--J02487_PM1_Closeup view.jpg"}
The output should be this.
https:\u002F\u002Fuk.louisvuitton.com\u002Fimages\u002Fis\u002Fimage\u002Flv\u002F1\u002FPP_VP_L\u002Flouis-vuitton-bandoulière-monogram-canvas-wallets-and-small-leather-goods--J02487_PM1_Closeup view.jpg
I've tried this
\"(http.*?J02487.*?jpg)\"
and many other options but didn't find an accurate solution. Anyone who can help? :)
You could use
"(https?:[^"]*J02487[^"]*\.jpg)"
" Match "
( Capture group 1
https?: Match http with optional s and :
[^"]*J02487[^"]* Match J02487 between any chars except "
\.jpg Match .jpg and note to escape the dot
)" Close group 1 and match "
Regex demo
I have a string of abbreviations (DIN, ISO, BS), that I want to search for within sentences. However, I only want to return it as a match if it matches exactly to my string.
EX:
Is this a DIN qualified part?
**return DIN
EX2:
What's for dinner?
**return nothing
use FIND:
=IF(ISNUMBER(FIND(" "&"DIN"&" "," "&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,",",""),".",""),"?","")&" ")),"DIN","")
If case does not matter just the full word change FIND to SEARCH. Add more SUBSTITUTES to remove any other punctuation that may be in the sentance
Following sentence contains all accented characters (chars with diacritic) that are used in Czech language.
příliš žluťoučký kůň úpěl ďábelské ódy
Now I convert this line to uppercase using gUU and I get:
PříLIš žLUťOUčKý Kůň úPěL ďáBELSKé óDY
instead of:
PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY
As you can see the characters with accents don't get converted. What do I have to set in my .vimrc to get it working right?
I have a table in DB2 say METAATTRIBUTE wherein a column say "content" might contain any special character including the unicode characters.
For any special character, Eg: "#" I can simply search by :
Select * from METAATTRIBUTE where content like '%#%';
but how to search for unicode characters like "u201B" or "u201E" ???
Thanks in advance.
Assuming you are talking about DB2 LUW, the Unicode string literals are designated by the symbols "u&", followed by a regular string literal in single quotes. Unicode code points are designated by an escape character, backslash by default. For example:
$ db2 "values u&'\201b'"
1
---
‛
1 record(s) selected.
So your query would look like:
Select * from METAATTRIBUTE where content like u&'%\201b%';
Recently, I have had the same problem. This worked for me
select *
from METAATTRIBUTE
where MEDEDELINGSZONE like '%' || UX'201B' || '%'
I've got an old project file with translations to Portuguese where special characters are broken:
error.text.required=\u00C9 necess\u00E1rio o texto.
error.categoryid.required=\u00C9 necess\u00E1ria a categoria.
error.email.required=\u00C9 necess\u00E1rio o e-mail.
error.email.invalid=O e-mail \u00E9 inv\u00E1lido.
error.fuel.invalid=\u00C9 necess\u00E1rio o tipo de combust\u00EDvel.
error.regdate.invalid=\u00C9 necess\u00E1rio ano de fabrica\u00E7\u00E3o.
error.mileage.invalid=\u00C9 necess\u00E1ria escolher a quilometragem.
error.color.invalid=\u00C9 necess\u00E1ria a cor.
Can you tell me how to decode the file to use the common Portuguese letters?
Thanks
The "\u" is prefix for unicode. You can use the strings "as is", and you'll have diacritics showing in the output. A python code would be something like:
print u"\u00C9 necess\u00E1rio o texto."
which outputs:
É necessário o texto.
Otherwise, you need to convert them in their ASCII equivalents. You can do a simple find/replace. I ended up writing a function like that for converting Romanian diacritics a while ago, but I had dynamic strings coming in...
Smell to me like this is unicode?
\u = prefix unicode character
00E1 = hex code for the 2 byte number of the unicode.
Not sure what the format is - I would ask the sencer, but i would try this approach to decode it.
found it ;)
http://www.fileformat.info/info/unicode/char/20/index.htm
Look at the tables with source code. This can be a C++ source file. This is the way you give unicodde characters in source.