Trouble in decoding the Path of the Blob file in Azure Search

Trouble in decoding the Path of the Blob file in Azure Search - azure

I have set a Azure search for blob storage and since the path of the file is a key property it is encoded to Base 64 format.
While searching the Index, I need to decode the path and display it in the front end. But when I try to do that in few of the scenarios it throws error.
int mod4 = base64EncodedData.Length % 4;
if (mod4 > 0)
{
base64EncodedData += new string('=', 4 - mod4);
}
var base64EncodedBytes = System.Convert.FromBase64String(base64EncodedData);
return System.Text.Encoding.ASCII.GetString(base64EncodedBytes);
Please let me know what is the correct way to do it.
Thanks.

Refer to Base64Encode and Base64Decode mapping functions - the encoding details are documented there.
In particular, if you're using .NET, you should use HttpServerUtility.UrlTokenDecode method with UTF-8 encoding, not ASCII.

Related

Groovy - String created from UTF8 bytes has wrong characters

The problem came up when getting the result of a web service returning json with Greek characters in it. Actually it is the city of Mykonos. The challenge is whatever encoding or conversion I'm using it is always displayed as:ΜΎΚΟxCE?ΟΣ . But it should show: ΜΎΚΟΝΟΣ
With Powershell I was able to verify, that the web service is returning the correct characters.
I narrowed the problem down when the byte array gets converted to a String in Groovy. Below is code that reproduces the issue I have. myUTF8String holds the byte array I get from URLConnection.content.text. The UTF8 byte sequence to look at is 0xce, 0x9d. After converting this to a string and back to a byte array the byte sequence for that character is 0xce, 0x3f. The result of below code will show the difference at position 9 of the original byte array and the one from the converted string. For the below test I'm using Groovy Console 4.0.6.
Any hints on this one?
import java.nio.charset.StandardCharsets;
def myUTF8String = "ce9cce8ece9ace9fce9dce9fcea3"
def bytes = myUTF8String.decodeHex();
content = new String(bytes).getBytes()
for ( i = 0; i < content.length; i++ ) {
if ( bytes[i] != content[i] ) {
println "Different... at pos " + i
hex = Long.toUnsignedString( bytes[i], 16).toUpperCase()
print hex.substring(hex.length()-2,hex.length()) + " != "
hex = Long.toUnsignedString( content[i], 16).toUpperCase()
println hex.substring(hex.length()-2,hex.length())
}
}
Thanks a lot
Andreas

you have to specify charset name when building String from bytes otherwise default java charset will be used - and it's not necessary urf-8.
Charset.defaultCharset() - Returns the default charset of this Java virtual machine.
The same problem with String.getBytes() - use charset parameter to get correct byte sequence.
Just change the following line in your code and issue will disappear:
content = new String(bytes, "UTF-8").getBytes("UTF-8")
as an option you can set default charset for the whole JVM instance with the following command line parameter:
java -Dfile.encoding=UTF-8 <your application>
but be careful because it will affect whole JVM instance!
https://docs.oracle.com/en/java/javase/19/intl/supported-encodings.html#GUID-DC83E43D-52F6-41D9-8F16-318F3F39D54F

How to compare filenames with difference in special character encoding?

I am working with a system that syncs files between two vendors. The tooling is written in Javascript and does a transformation on file names before sending it to the destination. I am trying to fix a bug in it that is failing to properly compare file names between the origin and destination.
The script uses the file name to check if it's on destination
For example:
The following file name contains a special character that has different encoding between source and destination.
source: Chinchón.jpg // hex code: ó
destination : Chinchón.jpg // hex code: 0xf3
The function that does the transformation is:
export const normalizeText = (text:string) => text
.normalize('NFC')
.replace(/\p{Diacritic}/gu, "")
.replace(/\u{2019}/gu, "'")
.replace(/\u{ff1a}/gu, ":")
.trim()
and the comparison is happening just like the following:
const array1 = ['Chinchón.jpg'];
console.log(array1.includes('Chinchón.jpg')); // false
Do I reverse the transformation before comparing? what's the best way to do that?

If i got your question right:
// prepare dictionary
const rawDictionary = ['Chinchón.jpg']
const dictionary = rawDictionary.map(x => normalizeText(x))
...
const rawComparant = 'Chinchón.jpg'
const comparant = normalizeText(rawComparant)
console.log(rawSources.includes(comparant))

Google cloud vision - product search return null for base64 image

I'm implementing google cloud vision for the first time.
Successfully created product set, products and assigned images to products.
When I try to execute product search sending base64 encoded image the result is always null. But when I try it with image from google cloud storage it's working. Any idea why it's not working?
$productSearchClient = new ProductSearchClient();
$productSetPath = $productSearchClient->productSetName(config('web.google_vision.project_id'), config('web.google_vision.location'), 2);
# product search specific parameters
$productSearchParams = (new ProductSearchParams())
->setProductSet($productSetPath)
->setProductCategories(['general-v1']);
# search products similar to the image
$imageAnnotatorClient = new ImageAnnotatorClient();
//$image = 'gs://picfly-bucket/wendys-5.jpeg';
$image = base64_encode(file_get_contents(public_path('gallery/test/wendy-search.png')));
$response = $imageAnnotatorClient->productSearch($image, $productSearchParams);
dd($response->getProductSearchResults());

As per this doc, your code reads a local file and queries the API by including inline the raw image bytes (base64 encoded image) in the request. So, you should not call base64_encode() explicitly. The Vision library does the base64 encoding by default. You just need to call fopen() to open your local image data. The code would look like:
$image = fopen('gallery/test/wendy-search.png', 'r');
$response = $imageAnnotatorClient->productSearch($image, $productSearchParams);

How to generate a base64 encoded, SHA-512 hash in Appcelerator?

Have been trying this for 2 days but failed miserably. We are using appcelerator 5.1.0.
I'm able to hash a string using the module Securely . However the result string is in hex format and i need it to be in base64 encoded string.
Tried the Ti.Utils.base64encode function but the result doesn't match what is generated at the backend. Here's my code snippet:
function convertHexToBase64(hexStr){
console.log("hex: "+hexStr);
var hexArray = hexStr
.replace(/\r|\n/g, "")
.replace(/([\da-fA-F]{2}) ?/g, "0x$1 ")
.replace(/ +$/, "")
.split(" ");
var byteString = String.fromCharCode.apply(null, hexArray);
var base64String = Ti.Utils.base64encode(byteString).toString();
console.log("base64 string:"+base64String);
return base64String;
}
Tried to find other modules to use and the node's Buffer is the closest i can get but am not sure how to use a node class in appcelerator...
Anyone can shed a light or two? Thanks.

Finally did it with the help of Forge, putting the steps here for future reference
Create a folder under the lib folder, named it forge
Install the module to local machine (via node), copy the whole contents of the js folder into the forge folder.
In the code, create the object:
var forge = require('forge/forge');
Hash the string first to get a buffer object, then encode it to base64 string.
var md = forge.md.sha512.create();
md.update(saltedText);
var buffer = md.digest();
result = forge.util.encode64(buffer.getBytes());

Decode UTF8 symbols

I have a string in swift:
let flag = "CattÃ¬ Ã²"
I am trying to convert the UTF8 symbols.
I have tried using
stringByRemovingPercentEncoding
but noting changes. How can I convert the symbols properly ?

Welcome to the encoding guessing game! Look like somewhere along the pathway, your string didn't get the correct code page. Here's one way to guess it:
let flag = "CattÃ¬ Ã²"
let encodings = [NSASCIIStringEncoding,
NSNEXTSTEPStringEncoding,
NSJapaneseEUCStringEncoding,
NSUTF8StringEncoding,
NSISOLatin1StringEncoding,
NSSymbolStringEncoding,
NSNonLossyASCIIStringEncoding,
NSShiftJISStringEncoding,
NSISOLatin2StringEncoding,
NSUnicodeStringEncoding,
NSWindowsCP1251StringEncoding,
NSWindowsCP1252StringEncoding,
NSWindowsCP1253StringEncoding,
NSWindowsCP1254StringEncoding,
NSWindowsCP1250StringEncoding,
NSISO2022JPStringEncoding,
NSMacOSRomanStringEncoding,
NSUTF16StringEncoding,
NSUTF16BigEndianStringEncoding,
NSUTF16LittleEndianStringEncoding,
NSUTF32StringEncoding,
NSUTF32BigEndianStringEncoding,
NSUTF32LittleEndianStringEncoding]
for encoding in encodings {
if let bytes = flag.cStringUsingEncoding(encoding),
flag_utf8 = String(CString: bytes, encoding: NSUTF8StringEncoding) {
print("\(encoding): \(flag_utf8)")
}
}
The array contains all the encodings that Cocoa supports.
From the results, it seems like your string was encoded in NSISOLatin1StringEncoding (a.k.a ISO-8859-1), the default encoding for HTML 4.01. This gives Cattì ò in UTF-8, not exactly match your desired result but is the closest among all code pages.
Other good candidates are NSWindowsCP1252StringEncoding and NSWindowsCP1254StringEncoding so I'd suggest you check with other strings.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Trouble in decoding the Path of the Blob file in Azure Search - azure

Refer to Base64Encode and Base64Decode mapping functions - the encoding details are documented there. In particular, if you're using .NET, you should use HttpServerUtility.UrlTokenDecode method with UTF-8 encoding, not ASCII.

Related

Groovy - String created from UTF8 bytes has wrong characters

How to compare filenames with difference in special character encoding?

Google cloud vision - product search return null for base64 image

How to generate a base64 encoded, SHA-512 hash in Appcelerator?

Decode UTF8 symbols

Categories

Resources