ANSI encoded text file in blob storage has corrupted characters when reading it in a logic app - azure

I have an issue with text files that are save to a blob storage and then iterated through using a logic app. The files come from a very old (but unfortunately very important) system. Access to it is highly restricted, so I have zero control over how the files are created.
They are uploaded to our blob storage without any suffix and with ANSI encoding. If I view their file contents through the Azure portal I can see that non-standard characters (in this case åäö) are corrupted :
[corrupted chars][1]
I assume this is because Azure assumes UTF-8 encoding? The problem occurs when I use a logic app to iterate through the blobs, then getting the file contents and placing them in a string variable. As far as I understood it, Azure logic apps should automatically convert the encoding to UTF-8, but this doesn't seem to happen, since the characters åäö in the string variable are still garbled.
The data from the string variable is used as input data for an Azure function that needs to be able to see the åäö characters properly. Manually converting the files to UTF-8 before uploading them solves the issue, but this is not practical, since this data flow is supposed to be automated.
The file contents are extracted like so:
[file contents to variable][2]
Infer content makes no difference, neither does renaming the files with a proper .txt suffix.
[1]: https://i.stack.imgur.com/5B2ru.png
[2]: https://i.stack.imgur.com/zb6yj.png

Related

Heroku cannot store files temporarily

I am writing a nodejs app which works with fonts. One action it performs is that it downloads a .ttf font from the web, converts it to a base64 string, deletes the .ttf and uses that string in other stuff. I need the .ttf file stored somewhere, so I convert it. This process takes like 1-2 seconds. I know heroku has an ephemeral file system but I need to store stuff for such a short time. Is there any way I can store my files? Using fs.writeFile currently returns this error:
Error: EROFS: read-only file system, open '/app\test.txt']
I had idea how about you make an action, That would get font, convert it and store it on a global variable before used by another task.
When you want to use it again, make sure you check that global variable already filled or not with that font buffer.
Reference
Singleton
I didn't know that you could store stuff in /tmp directory. It is working for the moment but according to the dyno/ephemeral system, it gets cleaned frequently so I don't know if it may cause other problems in the long run.

Secondary Tile with Icon stored in Application Temporary Storage

I am trying to create a SecondaryTile using a JPG that I extracted to temporary storage (). When I create a URI from this file I get an error from the constructor of the SecondaryTile class. The error message I get is very helpful, it says "Incorrect parameter".
I have tried passing in a URI to the filename like this:
new Uri("file://C:/Users/{username}/AppData/Local/Packages/{myAppPAckage}/TempState/{filename}.jpg");
I have also tried ms-appx even though I don't think that is the right way given my file is stored in temporary app storage.
new Uri("ms-appx:///C:/Users/{username}/AppData/Local/Packages/{myAppPAckage}/TempState/{filename}.jpg");
Using ms-appx:/// as the prefix allows the tile to be pinned without error but the image does not display.
The file system path that I am getting is obtained from ApplicationData.Current.TemporaryFolder.
I found this documentation that provided the uri prefix for the folder I am using to source the image.
new Uri("ms-appdata:///temp/{filename}.jpg");
Unfortunately, this also gives the Incorrect parameter error.
How do I use an image file that is stored in App Temporary Storage?
I needed to use ApplicationData.Current.LocalFolder and the prefix "ms-appdata:///Local".

How to zip list of files + append custom string to each file on Linux on the fly

I need a Linux solution as I've figured out a way to do this on Windows by modifying a C# implementation but I am not sure where to start on Linux. I would like to be able to do the following from the command line:
Run a command providing a list of files to be zipped, an output path, and a custom string
The custom string should be automatically appended to the end of the internal data of each file but not written to the original file. I want it all handled in-stream / in memory.
The data stream is fed to the zip utility and zip file is created at the output location with 0 compression (store only)
Explanation: this custom string is used as a watermark to uniquely identify the files in the zip.

Wrong text encoding when parsing json data

I am curling a website and writing it to .json file; this file is input to my java code which parses it using json library and the necessary data is written back in a CSV file which i later use to store it in a database.
As you know data coming from a website can be in different formats so i make sure that i read and write in UTF-8 format, still i get wrong output.
For example, Østerriksk becomes �sterriksk.
I am doing all this in Linux. I think there is some encoding problem because this same code runs fine in Windows but not in Unix/Linux.
I am quite sure my java code is proper but i am not able to find out what I'm doing wrong.
You're reading the data as ISO 8859-1 but the file is actually UTF-8. I think there's an argument (or setting) to the file reader that should solve that.
Also: curl isn't going to care about the encodings. It's really something in your Java code that's wrong.
What kind of IDE are you using, for example this can happen if you are using Eclipse IDE, and not set your default encoding to utf-8 in properties.

Proper way to differentiate pst and dbx files in bash shell

I want to identify the file-format of the input file given to my shell script - whether a .pst or a .dbx file. I checked How to check the extension of a filename in a bash script?. That one deals with txt files and two methods are given there -
check if the extension is txt
check if the mime type is application/text etc.
I tried file -ib <filename> on a .pst and a .dbx file and it showed application/octet-stream for both. However, if I just do file <filename>, then I get
this for the dbx file -
file1.dbx: Microsoft Outlook Express DBX File Message database
and this for the pst file -
file2.pst: Microsoft Outlook binary email folder (Outlook >=2003)
So, my questions are -
is it better to use mime type detection everytime when the output can be anything and we need a proper check?
How to apply mime type check in this case - both returning "application/octet-stream"?
Update
I didn't want to do an extension based detection because it seems we just can't be sure on a Unix system, that a .dbx file truly is a dbx file. Since file <filename> returns a line which contains the correct information of the file (e.g. "Microsoft Outlook Express DBX File Message database"). That means the file command is able to identify the file type properly. Then why does it not get the correct information in file -ib <filename> command?
Will parsing the string output of file <filename> be fine? Is it advisable assuming I only need to identify a narrow set of data storage files of outlook family (MS Outlook Express, MS Office Outlook 2003,2007,2010 etc.). A small text identifier like application/dbx which could be compared would be all I need.
The file command relies on having a file type detection database which includes rules for the file types that you expect to encounter. It may not be possible to recognize these file types if the file content doesn't have a unique code near the beginning of the file.
Note that the -i option to emit mime types actually uses a separate "magic" numbers file to recognize file types rather than translating long descriptions to file types. It is quite possible for these two databases to be out of sync. If your application really needs to recognize these two file types I suggest that you look at the Linux source code for "file" to see how they recognize them and then code this recognition algorithm right into your app.
If you want to do the equivalent of DOS file type detection, then strip the extension off the filename (everything after the last period) and look up that string in your own table where you define the types that you need.

Resources