Sync a large number of images in ExpressionEngine 2.10.1 - expressionengine

I'm using ExpressionEngine 2.10.1 and have a directory with approx 20,000 images in it. All the image files are in a single directory (there are no sub-directories).
When I attempt to sync the files with EE - under Content > Files > File Upload Preferences > Synchronize - the progress bar will get about 25% along and then hang. There are no errors in the server log relating to it.
I guess this is because I have a large number of files although haven't read anything about limits on the number of files EE can support.
Is there a way to resolve this, e.g. some way to batch sync the files?

Related

Node JS - How to view progress when using Archiver (zip)

I already searched on the Web and I finded solutions using "entries".
But this solution is not fit when you have (for example) a very big file in folder containing may files.
The progress seams to be stopped while processing of the big file.
I really want the user to see the current processing.
So I thought to another solution.
Display the size of the compress file during the processing.
I know it's possible to use "createReadStream" to follow a file upload.
But how to follow the size of compress file....?

azure data factory: iterate over millions of files

Previously I had a problem on how to merge several JSON files into one single file,
which I was able to resolve it with the answer of this question.
At first, I tried with just some files by using wild cards in the file name in the connection section of the input dataset. But when I remove the file name, theory tells me that all of the files in all folders would be loaded recursively as I checked the copy recursively option, in the source section of the copy activity.
The problem is that when I manually trigger the pipeline after removing the file name from the input of the data set, only some of the files get loaded and the task ends successfully but only loading around 400+ files, each folder has 1M+ files, I want to create BIG csv files by merging all the small JSON files of the source (I already was able to create csv file by mapping the schemas in the copy activity).
It is probably stopping due to a timeout or out of memory exception.
One solution is to loop over the contents of the directory using
Directory.EnumerateFiles(searchDir)
This way you can process all the files without having the list / contents of all files in memory at the same time.

creating multiple zip files based on size

I have 150GB of jpg's in around 30 folders. I am trying to import them into the media library of a CMS. The CMS will accept a bulk import of images in a zip file but there is a limit of 500MB on the size of the zip (and it won't accept multi-volume zips).
I need to go into each folder and zip the images into a small number of ~500MB zip files. I am using WinRAR but it doesn't seem to have the facility to do what I want.
Is there another product that will do what I want?
Thanks
David
It is possible with WinRAR also. Please see this guide: Create Multi-part Archives to Split Large Files for Emailing, Writing to CD [How To]

Maximum number of images on folder

we are working on image gallery where we expect 1 million to 40 million photos but we are thinking to keep them in photo folder
but can one photo folder keep 40 million photos. if i directly keep them inside photo folder without creating any subfolder is there any issue of i have to create folder based on date of upload so that for any given date the photo uploaded in that day will go in that day folder like that .
i dont have any issue in creating that structure but for the knowledge point of view i want to know what is the problem if we keep few millions of photo directly in one folder. i have seen few websites who is doing this, for example if you will see this page all images are there under image folder.
something about 5 million images.all images are there under respective id for example under
4132808 so it shows that under images directory there are more than 5 million sub folder.is it ok to keep that much folder under one directory
http://www.listal.com/viewimage/4132808
http://iv1.lisimg.com/image/4132808/600full-the-hobbit%3A-an-unexpected-journey-photo.jpg
Depends on the filesystem check the file system comparison page on Wikipedia for comparison.
However you might want to sort in some structure like
images/[1st 2 char of some kind of hash/[2nd 2 char of hash]/...
With this you create an easily reproducable path with drastically decreasing the number of files in one folder.
You want to do this because in any event if you'd want to list the contents of the folder (or any application would need to do it) it would cause a huge performance problem.
What you can see on other sites is only how you publish those images. Of course they can be served seemingly from the safe url but in the underlying structure you want partition the files somehow.
Some calculations:
Let's say you use the sha256 hash of the filename to create the path. That gives you 40 chars of [0-9a-f]. So if you chose to have 2 letters sub folders then you'd have 256 of folders on each level. Now let's assume you do it for 3 levels: ab/cd/ef/1234...png. That's 256^3 folder meaning 16 million. So even if you'll be fine up to couple billion images.
As for serving the files you can do something like this with apache + mod_rewrite:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/images/../../../.*
RewriteRule ^/images/(..)(..)(..)(.*)$ /images/$1/$2/$3/$4 [L]
This would reroute the requests for the images to the correct place
See How many files can I put in a directory?.
Don't put all your files into one folder, it does not scale. If you don't want to start with a deep folder hierarchy, start simple and put the logic where you build the path to the folder in one class or method. This allows to simply rearrange later if needed.

How to load files in a specific order

I would like to know how I can load some files in a specific order. For instance, I would like to load my files according to their timestamp, in order to make sure that subsequent data updates are replayed in the proper order.
Lets say I have 2 types of files : deal info files and risk files.
I would like to load T1_Info.csv, then T1_Risk.csv, T2_Info.csv, T2_Risk.csv...
I have tried to implement a comparator, as it is said on Confluence, but it seems that the loadInstructions file has the priority. It will order the Info files and the risk files independently. (loading T1_Info.csv, T2_Info.csv and then T1_Risk.csv, T2_Risk.csv..)
Do I have to implement a custom file loader, or is it possible using an AP configuration ?
The loading of the files based on load instructions is done in
com.quartetfs.tech.store.csv.impl.CSVDataModelFactory.load(List<FileLoadDescriptor>). The FileLoadDescriptor list you receive is created directly from the load instructions files.
What you can do is create a simple instructions files with 2 entries, one for deal info and one for risk. So your custom implementation of CSVDataModelFactory will be called with a list of two items. In your custom implementation you scan the directory where the files are, sort them in the order you want them to be parsed and call the super.load() with the list of FileLoadDescriptor you created from the directory scanning.
If you want to also load files that are place in the future in this folder you have to add to your load instructions a line that will match all files and that will make the super.load() implementation to create a directory watcher for that (you should then maybe override createDirectoryWatcher() to not watch the files already present in the folder when load is called).

Resources