Is it possible to convert UTF32 text to UTF16 using only Windows API?

Is it possible to convert UTF32 text to UTF16 using only Windows API? - visual-c++

I'm trying to find converting UTF-32 text to/from any code page is possible using the Windows API alone. I cannot used CLR to do this task.
The Code page identifiers page at Microsoft at http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx lists UTF-32 as being available to only managed applicatiosn.
ConvertStringTo/FromUnicode fails when UTF-32 is used.

You can use this function that takes the UTF-32 codepoint to be converted to it's equivalent UTF-16 codepoint (single or surrogate as the case may be) as the first argument and the high and low surrogates as second and third arguments.
The high and low surrogate values are returned by reference.
If the codepoint is below 0x10000, then we simply return that codepoint in the low surrogate by reference while the high surrogate is 0.
If the codepoint is greater than 0x10000, then, we calculate the high and low surrogate pairs using the rules given on this wikipedia page:
https://en.wikipedia.org/wiki/UTF-16#Example_UTF-16_encoding_procedure
Here's the code:
unsigned int convertUTF32ToUTF16(unsigned int cUTF32, unsigned int &h, unsigned int &l)
{
if (cUTF32 < 0x10000)
{
h = 0;
l = cUTF32;
return cUTF32;
}
unsigned int t = cUTF32 - 0x10000;
h = (((t<<12)>>22) + 0xD800);
l = (((t<<22)>>22) + 0xDC00);
unsigned int ret = ((h<<16) | ( l & 0x0000FFFF));
return ret;
}

With a bit of knowledge of Unicode you should be able to create a UTF32 to UTF16 converter without using any APIs.
All characters in the range U+0000 to U+FFFF can simply have the upper 16 bits removed.
Values in the range U+10000 to U+10FFFF can be converted into two 16-bit words, called surrogate pairs:
http://en.wikipedia.org/wiki/UTF-16#Encoding_of_characters_outside_the_BMP

You can use the iconv library in Windows. It fully supports UTF-32 (big and little endian).

Related

android AudioTrack playback short array (16bit)

I have an application that playback audio. It takes encoded audio data over RTP and decode it to 16bit array. The decoded 16bit array is converted to 8 bit array (byte array) as this is required for some other functionality.
Even though audio playback is working it is breaking continuously and very hard to recognise audio output. If I listen carefully I can tell it is playing the correct audio.
I suspect this is due to the fact I convert 16 bit data stream into a byte array and use the write(byte[], int, int, AudioTrack.WRITE_NON_BLOCKING) of AudioTrack class for audio playback.
Therefore I converted the byte array back to a short array and used write(short[], int, int, AudioTrack.WRITE_NON_BLOCKING) method to see if it could resolve the problem.
However now there is no audio sound at all. In the debug output I can see the short array has data.
What could be the reason?
Here is the AUdioTrak initialization
sampleRate =AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
minimumBufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
minimumBufferSize,
AudioTrack.MODE_STREAM);
Here is the code converts short array to byte array
for (int i=0;i<internalBuffer.length;i++){
bufferIndex = i*2;
buffer[bufferIndex] = shortToByte(internalBuffer[i])[0];
buffer[bufferIndex+1] = shortToByte(internalBuffer[i])[1];
}
Here is the method that converts byte array to short array.
public short[] getShortAudioBuffer(byte[] b){
short audioBuffer[] = null;
int index = 0;
int audioSize = 0;
ByteBuffer byteBuffer = ByteBuffer.allocate(2);
if ((b ==null) && (b.length<2)){
return null;
}else{
audioSize = (b.length - (b.length%2));
audioBuffer = new short[audioSize/2];
}
if ((audioSize/2) < 2)
return null;
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
for(int i=0;i<audioSize/2;i++){
index = i*2;
byteBuffer.put(b[index]);
byteBuffer.put(b[index+1]);
audioBuffer[i] = byteBuffer.getShort(0);
byteBuffer.clear();
System.out.print(Integer.toHexString(audioBuffer[i]) + " ");
}
System.out.println();
return audioBuffer;
}
Audio is decoded using opus library and the configuration is as follows;
opus_decoder_ctl(dec,OPUS_SET_APPLICATION(OPUS_APPLICATION_AUDIO));
opus_decoder_ctl(dec,OPUS_SET_SIGNAL(OPUS_SIGNAL_MUSIC));
opus_decoder_ctl(dec,OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
opus_decoder_ctl(dec,OPUS_SET_MAX_BANDWIDTH(OPUS_BANDWIDTH_FULLBAND));
opus_decoder_ctl(dec,OPUS_SET_PACKET_LOSS_PERC(0));
opus_decoder_ctl(dec,OPUS_SET_COMPLEXITY(10)); // highest complexity
opus_decoder_ctl(dec,OPUS_SET_LSB_DEPTH(16)); // 16bit = two byte samples
opus_decoder_ctl(dec,OPUS_SET_DTX(0)); // default - not using discontinuous transmission
opus_decoder_ctl(dec,OPUS_SET_VBR(1)); // use variable bit rate
opus_decoder_ctl(dec,OPUS_SET_VBR_CONSTRAINT(0)); // unconstrained
opus_decoder_ctl(dec,OPUS_SET_INBAND_FEC(0)); // no forward error correction

Let's assume you have a short[] array which contains the 16-bit one channel data to be played.
Then each sample is a value between -32768 and 32767 which represents the signal amplitude at the exact moment. And 0 value represents a middle point (no signal). This array can be passed to the audio track with ENCODING_PCM_16BIT format encoding.
But things are going weird when playing ENCODING_PCM_8BIT is used (See AudioFormat)
In this case each sample encoded by one byte. But each byte is unsigned. That means, it's value is between 0 and 255, while 128 represents the middle point.
Java has no unsigned byte format. Byte format is signed. I.e. values -128...-1 will represent actual values of 128...255. So you have to be careful when converting to the byte array, otherwise it will be a noise with barely recognizable source sound.
short[] input16 = ... // the source 16-bit audio data;
byte[] output8 = new byte[input16.length];
for (int i = 0 ; i < input16.length ; i++) {
// To convert 16 bit signed sample to 8 bit unsigned
// We add 128 (for rounding), then shift it right 8 positions
// Then add 128 to be in range 0..255
int sample = ((input16[i] + 128) >> 8) + 128;
if (sample > 255) sample = 255; // strip out overload
output8[i] = (byte)(sample); // cast to signed byte type
}
To perform backward conversion all should be the same: each single sample to be converted to exactly one sample of the output signal
byte[] input8 = // source 8-bit unsigned audio data;
short[] output16 = new short[input8.length];
for (int i = 0 ; i < input8.length ; i++) {
// to convert signed byte back to unsigned value just use bitwise AND with 0xFF
// then we need subtract 128 offset
// Then, just scale up the value by 256 to fit 16-bit range
output16[i] = (short)(((input8[i] & 0xFF) - 128) * 256);
}

The issue of not being able to convert data from byte array to short array was resolved when used bitwise operators instead of using ByteArray. It could be due not setting the correct parameters in ByteArray or it is not suitable for such conversion.
Nevertheless implementing conversion using bitwise operators resolved the problem. Since the original question has been resolved by this approach, please consider this as the final answer.
I will raise a separate topic for playback issue.
Thank you for all your support.

How to make crypto.randomBytes() to replace Math.random()

I need a cryptographically secure random number generator to replace Math.random()
I came across crypto.randomBytes() however, it returns a byte array. What is a way to make the byte array into 0-1 ( so that it is compatible with Math.random)

This should do the trick:
crypto.randomBytes(4).readUInt32LE() / 0xffffffff;
randomBytes generates 4 random bytes, which are then read as a 32-bit unsigned integer in little endian. The max value of a 32bit unsigned int is 0xffffffff (or 4,294,967,295 in decimal). By dividing the randomly generated 32bit int by its max value you get a value between 0 and 1.

You could do something like:
var randomVal = (crypto.randomBytes(1)[0] / 255);

Unicode to char* c++ 11

I want to know if is there any way to convert a unicode code to a string or char in C++ 11.
I've been trying with extended latin unicode letter Á (as an example) which has this codification:
letter: Á
Unicode: 0x00C1
UTF8 literal: \xc3\x81
I've been able to do so if it's hardcoded as:
const char* c = u8"\u00C1";
But if i got the byte sequence as a short, how can I do the equivalent to get the char* or std::string 'Á'?
EDIT, SOLUTION:
I was finally able to do so, here is the solution if anyone needs it:
std::wstring ws;
for(short input : inputList)
{
wchar_t wc(input);
ws += wc;
}
std::wstring_convert<std::codecvt_utf8<wchar_t>> cv;
str = cv.to_bytes(ws);
Thanks for the comments they were very helpful.

The C++11 standard contains codecvt_utf8, which converts between some internal character type (try char16_t if your compiler has it, otherwise wchar_t) and UTF-8 encoding.

The problems is that char is only one byte length, while unicode characters require a size of two bytes.
You can still treat it as char*, but you must remember that you are not dealing with an ascii string (there will be zeros).
You may have to switch to wchar_t.

Converting an int or String to a char array on Arduino

I am getting an int value from one of the analog pins on my Arduino. How do I concatenate this to a String and then convert the String to a char[]?
It was suggested that I try char msg[] = myString.getChars();, but I am receiving a message that getChars does not exist.

To convert and append an integer, use operator += (or member function concat):
String stringOne = "A long integer: ";
stringOne += 123456789;
To get the string as type char[], use toCharArray():
char charBuf[50];
stringOne.toCharArray(charBuf, 50)
In the example, there is only space for 49 characters (presuming it is terminated by null). You may want to make the size dynamic.
Overhead
The cost of bringing in String (it is not included if not used anywhere in the sketch), is approximately 1212 bytes of program memory (flash) and 48 bytes RAM.
This was measured using Arduino IDE version 1.8.10 (2019-09-13) for an Arduino Leonardo sketch.
Risk
There must be sufficient free RAM available. Otherwise, the result may be lockup/freeze of the application or other strange behaviour (UB).

Just as a reference, below is an example of how to convert between String and char[] with a dynamic length -
// Define
String str = "This is my string";
// Length (with one extra character for the null terminator)
int str_len = str.length() + 1;
// Prepare the character array (the buffer)
char char_array[str_len];
// Copy it over
str.toCharArray(char_array, str_len);
Yes, this is painfully obtuse for something as simple as a type conversion, but somehow it's the easiest way.

You can convert it to char* if you don't need a modifiable string by using:
(char*) yourString.c_str();
This would be very useful when you want to publish a String variable via MQTT in arduino.

None of that stuff worked. Here's a much simpler way .. the label str is the pointer to what IS an array...
String str = String(yourNumber, DEC); // Obviously .. get your int or byte into the string
str = str + '\r' + '\n'; // Add the required carriage return, optional line feed
byte str_len = str.length();
// Get the length of the whole lot .. C will kindly
// place a null at the end of the string which makes
// it by default an array[].
// The [0] element is the highest digit... so we
// have a separate place counter for the array...
byte arrayPointer = 0;
while (str_len)
{
// I was outputting the digits to the TX buffer
if ((UCSR0A & (1<<UDRE0))) // Is the TX buffer empty?
{
UDR0 = str[arrayPointer];
--str_len;
++arrayPointer;
}
}

With all the answers here, I'm surprised no one has brought up using itoa already built in.
It inserts the string representation of the integer into the given pointer.
int a = 4625;
char cStr[5]; // number of digits + 1 for null terminator
itoa(a, cStr, 10); // int value, pointer to string, base number
Or if you're unsure of the length of the string:
int b = 80085;
int len = String(b).length();
char cStr[len + 1]; // String.length() does not include the null terminator
itoa(b, cStr, 10); // or you could use String(b).toCharArray(cStr, len);

LPCSTR, TCHAR, String

I am use next type of strings:
LPCSTR, TCHAR, String i want to convert:
from TCHAR to LPCSTR
from String to char
I convert from TCHAR to LPCSTR by that code:
RunPath = TEXT("C:\\1");
LPCSTR Path = (LPCSTR)RunPath;
From String to char i convert by that code:
SaveFileDialog^ saveFileDialog1 = gcnew SaveFileDialog;
saveFileDialog1->Title = "Сохранение файла-настроек";
saveFileDialog1->Filter = "bck files (*.bck)|*.bck";
saveFileDialog1->RestoreDirectory = true;
pin_ptr<const wchar_t> wch = TEXT("");
if ( saveFileDialog1->ShowDialog() == System::Windows::Forms::DialogResult::OK ) {
wch = PtrToStringChars(saveFileDialog1->FileName);
} else return;
ofstream os(wch, ios::binary);
My problem is that when i set "Configuration Properties -> General
Character Set in "Use Multi-Byte Character Set" the first part of code work correctly. But the second part of code return error C2440. When i set "Configuration Properties -> General
Character Set in "Use Unicode" the second part of code work correctly. But the first part of code return the only first character from TCHAR to LPCSTR.

I'd suggest you need to be using Unicode the whole way through.
LPCSTR is a "Long Pointer to a C-type String". That's typically not what you want when you're dealing with .Net methods. The char type in .Net is 16bits wide.
You also should not use the TEXT("") macro unless you're planning multiple builds using various character encodings. Try wrapping all your string literals with the _W("") macro instead and a pure unicode build if you can.
See if that helps.
PS. std::wstring is very handy in your scenario.
EDIT
You see only one character because the string is now unicode but you cast it as a regular string. Many or most of the Unicode characters in the ASCII range has their same number as in ASCII but have the second of their 2 bytes set to zero. So when a unicode string is read as a C-string you only see the first character because C-strings are null ( zero ) terminated. The easy ( and wrong ) way to deal with this is to use std:wstring to cast as a std:string then pull the C-String out of that. This is not the safe approach because Unicode has a much large character space then your standard encoding.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is it possible to convert UTF32 text to UTF16 using only Windows API? - visual-c++

You can use the iconv library in Windows. It fully supports UTF-32 (big and little endian).

Related

android AudioTrack playback short array (16bit)

How to make crypto.randomBytes() to replace Math.random()

Unicode to char* c++ 11

Converting an int or String to a char array on Arduino

LPCSTR, TCHAR, String

Categories

Resources