API to get MIP label from a file residing on remote share - azure

I need to read the MIP label(If it is there) from a file residing on a remote shares like SMB\DFS or NFS share. One option is to download the file locally and then read file label using MIP SDK. But considering there could be very big data files, I find this option very inefficient.
Is there a better option to read MIP labels from a very large file without downloading the complete file locally ?
Thanks,
Bishnu

Unfortunately, there isn't. The SDK needs the entire file.

Related

Changing file name while writing back file in output folder in Data Lake Storage account in Microsoft Azure

It would be so much helpful if you would help me in my problem.
In my project requirement, I have to store the file with some specific name in Data Lake Store in Microsoft Azure (cloud based platform). After performing any transformation or action on the data frame created by the loaded file in HDInsight cluster, when I am writing the data frame to any specific folder, it gets stored with name "part-00000-xxxx" i.e in hadoop format.
But, as I am having large number of files so I can't go inside the created folder for each file and rename the same specific to my requirement every time.
So, can you please help me out in this?
NOTE: After storing the file we can copy the file to another folder and while copying we can give name whatever we want.But I don't want this solution. I want to provide a specific name to the file once I am writing it back to my storage(Data Lake Store) after processing.
You could provide a subclass of the MultipleOutputFormat class to control the pattern of the filenames, but that will need to be in Java, since you can't write OutputFormats with the streaming API.
Another option might be to use the Azure Storage client to merge, and rename the output files once the job is over.

File read/write on cloud(heroku) using node.js

First of all I am a beginner with node.js.
In node.js when I use functions such as fs.writeFile(); the file is created and is visible in my repository. But when this same process is done on a cloud such as heroku no file is visible in the repository(cloned via git). I know the file is being made because I am able to read it but I cannot view it. Why is this??? Plus how can I view the file?
I had the same issue, and found out that Heroku and other cloud services generally prefer that you don't write in their file system; everything you write/save will be store in "ephemeral filesystem", it's like a ghost file system really.
Usually you would want to use Amazon S3 or reddis for json files etc, and other bigger ones like mp3.
I think it will work if you rent a remote server, like ECS, with a linux system, and a mounted storage space, then this might work.

Spring Integration - Reading files from multiple locations & putting them at a central repository

I need to transfer the file contents from multiple servers to a central repository as soon as there is a change in the file. Also the requirement is that only the changed contents should be transferred, not the whole file.
Could someone let me know if it is possible using Spring-Integration File Inbound/Outbound Adapters.
The file adapters only work on local files (but they can work if you can mount the remote filesystems).
The current file adapters do not support transferring parts of tiles, but we are working on File Tailing Adapters which should be in the code base soon. These will only work for text files, though (and only if you can mount the remote file system). For Windows (and other platforms that don't have a tail command, there's an Apache commons Tailer implementation but, again, it will only work for text files, and if you can mount the shares.
If you can't mount the remote files, or they are binary, there's no out of the box solution, but if you come up with a custom solution to transfer the data (e.g. google tailing remote files), its easy to then hook it into a Spring Integration flow to write the output.

How does RPC transfer big binary data?

If I want to transfer data using RPC or component technology, but the size of data can be very big, how deal with this situation ?
for example, I want to transfer a file to remote as a parameter, but I don't want put the whole file into memory for transferring . How should I do?
I think you should consider the file transfer solution, smth like establishing FTP connection in the background and make operations supposed to perform on this file data to wait until file transferring completes. Also you should take care of correctness of transferred data, checksumming for instance. The other solution probably is mounting remote directory containing files as a local volume or even setting up a distributed file system if you have all files in one place and you are powered with Linux.
Let's me answer my question.
The answer is MTOM, make sure the framework you are using support it.

Creating a stub file on ext3, poor man's HSM in Amazon with EC2, EBS and S3 -

All -
I'm working on creating a poor mans hierarchical storage management solution for my file servers in Amazon. What I would like to do is move files from my file server in EC2 with an atime > 30 days to S3 and leave a stub file behind. When a user, using POSIX standard commands attempts to access the file in any way it would be copied back onto the host and them manipulated there, this operation should be transparent to the user, other than being slow. I am assuming that when a user accesses a stub file I can copy the full file over the stub file transparently to the user?
I'm using ext3 and I can't find any information on creating file system stub files, or if I could use a couple of lines of (python|bash) as the "path" to the actual file in S3.
My other option, also a good one (I think) is to mount a POSIX<-->S3 file system on the file server, S3QL [2] or S3FS [3] seem to be good choices. I still need to help with creating the stub files, but a least this way the stub file target is on a POSIX file system on the same server. I let S3QL or S3FS deal with the POSIX <--> S3 interface.
Maybe all I need is a good stub file HOWTO, but any help would be very much appreciated, thank you very, very much.
[1] http://code.google.com/p/s3ql/
[2] http://code.google.com/p/s3fs/wiki/FuseOverAmazon
C

Resources