QtWebkit accents in linux - linux

I have made an app using QtWebkit. In the same html page, it works fine while using accents (spanish) on Windows, but it does not work on linux (Ubuntu).
I can not underderstand why, Ubuntu works fine with any other program, with any other browser in the same html page.
Same Qt source for linux and windows applicacion, of course.
Any idea or help?
Thanks.

You are looking for the Qt class QWebSettings. This class has methods like
QWebSettings::setDefaultTextEncoding(const QString & encoding)
From the Qt docs:
Specifies the default text encoding system.
The encoding, must be a string describing an encoding such as "utf-8",
"iso-8859-1", etc. If left empty a default value will be used. For a
more extensive list of encoding names see QTextCodec
and QString QWebSettings::defaultTextEncoding() const
Looks like the QtWebkit default codec is not compatible with the text codec of your page. Which text codec you now have to choose is impossible to say from here.

Related

How to parse succesfully a utf-16le encoded jsonfile with haxe on the php target?

I have some third party application that I have to use, it generates UTF16LE encoded json files.
When I put these manually on my server and try haxe generated php to parse these files, it refuses. It seems it can't detect and convert to the encoding haxe php accepts.
I don't know where to start. Converting it on the client is an impossibility, there are too many of such files and need too frequently be parsed. So I have to use php. It would be nice if haxe has a way to convert it to the encoding it accepts. I have tried RTFM, but I so far I havent found anything that says haxe can convert it. Before I start reinventing some second wheel, I rather make sure there isn't some obvious way to it with haxe.
I am using Haxe version 4.2.1+bf9ff69
What am I overlooking? Is haxe php able to solve this, or is going native php the only option?
== SOLVED ==
As these json files do not need any emoticon support or whatever characters for non-english language, my solution was to strip everything except basic printable ASCII characters.
import sys.io.File;
import php.Syntax;
// some function body
return Syntax.code('preg_replace( "/[^[:print:]]/", "",{0})',File.getContent(_path));
I couldn't share these file on the web, because of privacy concerns. Also I discovered these files had ... wait for it - double BOM's- hacked into it.The BOM detector I threw in reported the first BOM it found happening to be UTF16LE.
Enterprise spaghetti monster probably the reason. Thought I had seen it all, but with that, one probably can't never have seen it all. The wonders of human ingenuity.
Just a blunt strip instead of making my own ludicrous code to unfudge that stuff and justice served. Hurrah.

getExtension for mimetype "audio/wav" returns empty string using apache tika

I'm trying to get the file extension for the valid "audio/wav" mime type.
Using this code
MimeTypes mimeTypes = TikaConfig.getDefaultConfig().getMimeRepository();
String extension = mimeTypes.getDefaultMimeTypes().forName("audio/wav").getExtension();
The extension I get is the empty string.
However, using the "audio/x-wav" extension works.
Is this the expected behavior?
TL; DR
Yes, this is the expected behavior.
x- MIME subtypes are usually for formats which are not yet standardized. The MIME types corresponding to WAV format are audio/vnd.wave, audio/wav, audio/wave, audio/x-wav (see here). Some browsers accept more or less MIME types. Apache servers usually send WAV as x-wav, even though I don't know why.
The official MIME type is now audio/vnd.wave, so you might try it and see if it works.
Sources: here
This is a bug in older versions of Tika.
You need to use a newer version of Apache Tika to get the correct behaviour. (1.15.1 or 1.16 should do it). As taken from the tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java unit test:
assertType("audio/vnd.wave", "testWAV.wav");
(That unit test verifies that the official mime type is the one detected, other aliases like audio/wav will generally be transparently mapped onto the canonical one)
Alternately, if you're stuck on an old Tika version, you should largely be OK to swap out the tika-mimetypes.xml file for the latest version, though if you're swapping it out in an much older version of Tika it's best to re-run the unit tests to ensure you haven't broken anything in the process!

Can Code page and locale differ?

On Windows 7 (64 bit) I set Japan as the locale. On command prompt following is reflected:
LC_ALL: English_United States.1252
LC_CTYPE: English_United States.1252
chcp command: Active Code Page: 932 (which is Japanese)
My question is while converting wchar_t* to char* using ICU library what should be default converter used? In this case "US-ASCII" is used and I am getting garbage result in char*. Input: wchar_t* contains Japanese characters
From the ICU documentation:
Depending on system design, setup and APIs, it may not always be
possible to find a default codepage that fully works as expected...
If you have means of detecting a default codepage name that are more
appropriate for your application, then you should set that name with
ucnv_setDefaultName() as the first ICU function call. This makes sure
that the internally cached default converter will be instantiated from
your preferred name.

OpenCV imwrite 2.2 causes exception with message "OpenCV Error: Unspecified error (could not find a writer for the specified extension)" on Windows 7

I'm porting an OpenCV 2.2 app from Unix (that works) onto Windows 7 64-bit and I receive the following exception when cv::imwrite is called
"OpenCV Error: Unspecified error (could not find a writer for the specified extension) in unknown function, file highgui\src\loadsave.cpp"
The original unix app works fine on my Mac and Linux boxes.
Does anyone know what library or compiler config I could be missing that makes this work on Windows?
UPDATE:
I did the following things to get OpenCV running:
Downloaded the binaries for v2.2 from the OpenCV site for windows. I'm using 2.2 because the original app uses it and I don't want to complicate my build at this stage.
I am trying to imwrite to a .png file. I looked at the OpenCV code and noticed the necessity for external libs for Encoders such as Pngs or jpegs, so I tried writing to .ppm, .bmp which seems not to require deps, but I get the identical error.
An example of my usage is cv::imwrite("out.png", cv_scaled); where cv_scaled is of type cv::Mat with format CV_32FC1
Please remember the identical code works fine in unix
The fact .bmp or .ppm doesn't work this raises more questions:
Why don't these very simple formats work?
Is there a way to see a list of installed Encoders programmatically?
Thanks again for your kind assistance in helping me debug this problem.
Your current installation of OpenCV doesn't support the file format you are trying to create on disk.
Check if the extension of the file is right. If it is, you'll have to recompile OpenCV and add support to this format and possibly install the libraries you are missing.
That's all that can be said without more information.
EDIT:
As I have also failed building an application that uses the C++ interface of OpenCV (v2.3 on VS2005) I ended up using the following workaround: convert the C++ types to the C types when necessary.
To convert from IplImage* to cv::Mat is pretty straight forward:
IplImage* ipl_img = cvLoadImage("test.jpg", CV_LOAD_IMAGE_UNCHANGED);
Mat mat_img(ipl_img);
imshow("window", mat_img);
The conversion cv::Mat to IplImage* is not so obvious, but it's also simple, and the trick is to use a IplImage instead of a IplImage*:
IplImage ipl_from_mat((IplImage)mat_img);
cvNamedWindow("window", CV_WINDOW_AUTOSIZE);
// and then pass the memory address of the variable when you need it as IplImage*
cvShowImage("window", &ipl_from_mat);
Try
cvSaveImage("test.jpg", &(IplImage(image)));
instead of
imwrite("test.jpg", image);
This is a known bug in the version you are using.
From the OpenCV 2.2 API:
The function imwrite saves the image to the specified file. The image format is chosen based on the filename extension, see imread for the list of extensions. Only 8-bit (or 16-bit in the case of PNG, JPEG 2000 and TIFF) single-channel or 3-channel (with ‘BGR’ channel order) images can be saved using this function. If the format, depth or channel order is different, use Mat::convertTo , and cvtColor to convert it before saving, or use the universal XML I/O functions to save the image to XML or YAML format.
You might have more luck converting your file to 8 or 16 bits before saving.
However, even with single channel 8 bit files I have had unknown extension errors trying to save jpg or png files but found that bmp works.

OpenSSL with unicode paths

I have an implementation of SSL handshake from the client side, by using these functions:
SSL_CTX_load_verify_locations
SSL_CTX_use_certificate_chain_file
SSL_CTX_use_PrivateKey_file
All functions get char* type for the filename parameter.
How can I change it to support also unicode file locations?
Thanks!
On which platform? OpenSSL under Posix supports UTF-8 paths, but not on other platforms. Chances are, you will have to manually load the certificate files yourself using standard OS file I/O functions that support Unicode paths, and then parse the raw data and load it into OpenSSL, such as via PEM_read_bio_X509 with sk_X509_NAME_push, PEM_read_bio_PrivateKey/d2i_PrivateKey_bio with SSL_CTX_use_PrivateKey, d2i_X509_bio/PEM_read_bio_X509 with SSL_CTX_use_certificate, etc.
I want to reply to the above post instead of creating a new answer, however I was not able to reply it, so I create a new answer. Based on my testing for SSL_CTX_load_verify_locations and looking at openssl code, actually the openssl would use utf-8 for file path as well on Windows. At the function BIO_new_file to open a file, it would choose utf-8 for file path if both _WIN32 and CP_UTF8 are defined. Those are defined at windows. However openssl also has code to fall back to ANSI path if path is not a valid utf-8 characters. So with that, actually openssl will work with both utf-8 and ANSI path on Windows.

Resources