Simple file sharing using online file storage - file-sharing

Most of the popular online file hosting services require additional software to perform simple operations with files. E.g. you can't wget or rsync files from Amazon S3. On the other hand maintaining your own file share requires administering and you have to worry about security, data integrity, etc. by yourself. Isn't there a simple online file hosting service which allows you to use wget, rsync, robocopy and other common tools?

Related

How do i store images in distributed system the right way?

I'm trying to create a distributed system which contains mobile app, web userpanel and an API that communicates with DB. I want the user to be able to upload a profile image both from the mobile app and the web userpanel but what is the best and "right" way to store images accross a distributed system? Cant really find anything describing best practices on this topic.
I know that the filepath should be in database, and the image in a file system. But should that file system be on the API server or where?
Here is an diagram of what i think the distributed system should be like.
The "right" way to do something complex like image hosting depends on factors like expected traffic and performance expectations. Designing large systems involves a lot of tradeoffs, so it's best to nail down what requirements are for your system are in order to make decisions that serve those requirements.
As for your question, this diagram is roughly correct - you want to store the location of the uploaded image separate from the image itself. If you wanted your solution to be more scalable, an approach would be turning your file system into its own service with its own API. You would store a hash of the file in your database to reference it rather than its path, then request that image (or a URL to that image) from the new storage service by asking the storage service's API for the file that has the stored hash.
The reason this is more scalable is that the storage service is free to become its own distributed system when we don't require that every file has an associated file system path within a single namespace. A hash is a good candidate for a replacement of the filesystem path, but you could come up with your own storage ID scheme depending on your needs.
However, this may be wildly out of scope for what you are trying to design. If you only expect to have a few thousand users, storing your images and database on your API server in the file system isn't necessarily wrong, but you might experience growing pains if the requirements of your system grow.
Google's site reliability engineer classroom has a lesson on building a distributed image server, which is an adjacent problem to what you're looking to do: https://sre.google/classroom/imageserver/

How to store files in Azure's filesystem

I am writing a simple API that's hosted on Azure, and I need a place to store a config file that can be changed in the code. I want to place this in the webroot.
Before you say this is a terrible practice, I know. This API is very small(the free version of Azure is actually more than enough for me), and the file is less than 1MB in size. I don't want to buy blob storage or other things that are designed to be for large projects. I don't really care about scalability, I just want the data to persist.
The question is where and how can I store this file. Can I use the path "D:/home/site/" or something similar? What do I need to do to make this work? And if this is impossible, are there other options for me that hopefully isn't overkill?
The question is where and how can I store this file. Can I use the path "D:/home/site/" or something similar? What do I need to do to make this work?
It seems that you are hosting your API application on Azure app service and you'd like to find a place that can be used to persist a config file. As you mentioned, you can store your file in d:\home, and the file would be persistent and shared between all instances of your site. This article can help you understand the Azure App Service file system, please read it.
You can upload this config file when you deploy your API application or upload it via Kudu console.

Easiest option to query mdb (ms access) file using smb in Linux

In my workplace the attendance (fingerprint) device uses .mdb file (that stored on Windows PC), and I have a smb account to the network share to that computer smb://10.7.7.x/tas/, inside the share folder contains 3 files:
HITFPTA.ldb
HITFPTA.mdb ==> this one
HITFPTA_History.mdb
What are the easiest option to able to query (in real time) from that file, since our server that should do the query uses Linux (ArchLinux)? (if possible using Go programming language)
For read-only access to a "live" .mdb database from a mix of Windows and non-Windows clients I would recommend using Java and the UCanAccess JDBC driver (details here). If you're not keen on writing Java code but have some familiarity with Python then you could use Jython as described in my other answer here.
(Jackcess, the data-access layer used by UCanAccess, does not use the Access Database Engine and is not intended to make updates to a live multi-user database. However, it should be able to read the database without incident. For reporting purposes it might be prudent to take a copy of the .mdb file and run the reports against that. Or, stick with Windows clients and use ODBC.)

Managing image workflow with Node.js webapp?

I'm new to Node, and was hoping to use Node.js for a a small internal webapp for managing workflow for product photos.
The image files are RAW camera files, and stored on a local NAS device.
The webapp should:
Have a concept of workflow, and be able to jump back/forth between states, and handle error states.
Watch certain directories for image files, and react to new files being added, or existing files being moved/removed.
Send out emails in response to events.
Scan photos for QR barcodes, and generate events based on these.
Rename photos based on user-defined batch patterns in response to events.
Questions:
Is Node.js a suitable tool for something like this? Why or why not?
Any libraries to help manage the workflow? I could only find node-workflow (http://kusor.github.io/node-workflow/) - curious for anybody's experiences with this? Alternatives?
Likewise for file watching? I saw many wrappers for fs.watch (e.g. https://github.com/mikeal/watch), as well as some alternatives (e.g. https://github.com/paulmillr/chokidar) - any advice for somebody new to Node?
Apart from using a NAS with a network filesystem, are there any other alternatives stores I can use for the image files?
I'm open to other alternatives here. I'm worried that the system will get confused or lose track of files.
The Node.js docs also mention that watching files on network file systems might be unreliable (http://nodejs.org/api/fs.html#fs_fs_watch_filename_options_listener). Are there more robust solutions?
Any other tips/suggestions for this project?
Cheers,
Victor
Node.js is a fine platform for building an application like the one you describe. The key question is file storage. You might find this post very interesting:
Storing Images in DB - Yea or Nay?
This other post enumerates a few interesting options for writing workflows:
Workflow engine in Javascript
Shameless add: I have been working on an event coordination library that you might find useful and interesting.
http://durablejs.org

Use Sql Server FileStream or traditional File Server?

I am designing a system that's going to have about 10 millions+ users, each has a photo, which is about 1~2 MB.
We are going to deploy both database and web app using Microsoft Azure
I am wondering the way I should store the photos, there are currently two options,
1, Store all photos use Sql Server FileStream
2, Use File Server
I haven't experienced such large scale BLOB data using FileStream.
Can anybody give my any suggestion? The Cons and Pros?
And anyone with Microsoft Azure experiences concerning the large photos store is really appreciated!
Thx
Ryan.
I vote for neither. Use Windows Azure Blob storage. Simple REST API, $0.15/GB/month. You can even serve the images directly from there, if you make them public (like <img src="http://myaccount.blob.core.windows.net/container/image.jpg" />), meaning you don't have to funnel them through your web app.
Database is almost always a horrible choice for any large-scale binary storage needs. Database is best for relational-only systems, and instead, provide references in your database to the actual storage location. There's a few factors you should consider:
Cost - SQL Azure costs quite a lot per GB of storage, and has small storage limitations (50GB per database), both of which make it a poor choice for binary data. Windows Azure Blob storage is vastly cheaper for serving up binary objects (though has a bit more complicated pricing system, still vastly cheaper per GB).
Throughput - SQL Azure has pretty good throughput, as it can scale well, however, Windows Azure Blog storage has even greater throughput as it can scale to any number of nodes.
Content Delivery Network - A feature not available to SQL Azure (though a complex, custom wrapper could be created), but can easily be setup within minutes to piggy-back off your Windows Azure Blob storage to provide limitless bandwidth to your end-users, so you never have to worry about your binary objects being a bottleneck in your system. CDN costs are similar to that of Blob storage, but you can find all that stuff here: http://www.microsoft.com/windowsazure/pricing/#windows
In other words, no reason not to go with Blob storage. It is simple to use, cost effective, and will scale to any needs.
I can't speak on anything Azure related but for my money the biggest advantage of using FILESTREAM is that that data can get backed up inside the normal SQL Server backup process. The size of the data that you are talking about also suggests that FILESTREAM may be a good choice as well.
I've worked on a SCM system with a RDBMS back end and one of our big decisions was whether to store the file deltas on the file system or inside the DB itself. Because it was cross-RDBMS we had to cook up a generic non-FILESTREAM way of doing it but the ability to do a single shot backup sold us.
FILESTREAM is a horrible option for storing images. I'm surprised MS ever promoted it.
We're currently using it for our images on our website. Mainly the user generated images and any CMS related stuff that admins create. The decision to use FILESTREAM was made before I started. The biggest issue is related to serving the images up. You better have a CDN sitting in front. If not, plan on your system coming to a screeching halt. Of course, most sites have a CDN, but you don't want to be at the mercy of that service going down meaning your system will get overloaded. The amount of stress put on your sql server is the main problem here.
In terms of ease of backup. Your tradeoff there is that your db is MUCH MUCH LARGER and, therefore, the backup takes longer. Potentially, much longer and the system runs slower during the backup. Not to mention, moving backups around takes longer (i.e., restoring prod data in a dev environment or on local machines for dev purposes). Don't use this as a deciding factor.
Most cloud services have automatic redundancy of any files that you store on their system (i.e., aws's S3 and azure's blob). If you're on premise, just make sure you use a shared location for the images and make sure that location is backed up. I think the best option is to set it up so each image (other UGC file types too) has an entry in your db with a path to that file. Going one step further, separate the root path into a config setting and only store the remaining path with the entry. For example, root path in config might be a base url, a shared drive or virtual dir, or a blank entry. Then your entry might have "/files/images/image.jpg". This way, if you move your filestore, you can just update the root config. I would also suggest creating a FileStoreProvider interface (Singleton) that can be used for managing (saving, deleting, updating) these files. This way, if you switch between AWS, Azure, or on premise, you can just create a new Provider.
I have a client server DB, i manage many files (doc, txt, pdf, ...) and all of them go in a filestream BLOB. Customers has 50+ MB dbs. If in azure you can do the same go for it. Having all in the db is a wonderful thing. It is considered good policy also for Postgres and MySQL

Resources