Scan remote directory and find sequential images with known constants - search

I have a quick and dirty project I need assistance with.
Outline: A remote server uses randomized file naming for storing JPEGs. All of the JPEGs are stored within the same directory. For example, "website.com/photos/". All of the images in that directory have a 10-digit (0-9) file name, with the suffix .jpg. The images are sequentially named (for example 12XXXXXXXX.jpg then much later the series becomes 13XXXXXXXX.jpg) but not every sequential number is used. Most are not. One image might be 1300055000.jpg but then the next image won't be until 130099000.jpg.
I am looking for a program to scan this directory and 'try' every file name possibility (0000000000.jpg - 9999999999.jpg) and then output a URL sheet (basic HTML) with links to the working JPEGs that are found.
All non-working JPEGs when tried return a 404 not found page. All working JPEGs return a sizable photo.
Your assistance is greatly appreciated! I'd be willing to compensate for the work. Thank you!

Related

How to upload a photo to User Form using PATH and display on other computers?

I built a User Form where there is a post-select configuration that depending on choices shows an image and text results.
I uploaded the pictures from a cell where the image PATH is written to a picture box.
Image1.Picture = LoadPicture (Sheets ("XXX"). Cells (34, 5))
I have two problems:
The image is loaded upside down. I tried to find a ROTATE command, tried to make the image 180 degrees in a folder in advance, even tried to upload it to picture box directly one by one and the same problem. How can I make it rotate 180 degrees?
I sent the file to my co-workers and they were able to open the User Form, but when they clicked the SHOW button an error occurred that says the PATH was not found. I guess it is because the image is in a folder only on my computer. How can I put it in a shared folder and make PATH relevant to everyone (assuming there is a different USER for each computer)?
Regarding your first problem, there is a way of rotating images with vba code but it might be easier to just rotate it in advance so it gets displayed correctly. See this post for more info.
Your second problem can be solved in multiple ways, you could include the images in the workbook and load it from there, but if you don't like that, you can specify the path dynamically. Of course you have to provide your co workers with the image files aswell in this case. If the images are stored in the same folder as the excel file you can use
Application.ActiveWorkbook.Path
to dynamically find the path of the excel file, which would then also be the path of the image. You can also store the image in a subfolder in the folder of the excel file and create that path like this:
path=Application.ActiveWorkbook.Path & "\subfoldername"

How to prepare test data for textsum?

I have been able to successfully run the pre-trained model of TextSum (Tensorflow 1.2.1). The output consists of summaries of CNN & Dailymail articles (which are chuncked into bin format prior to testing).
I have also been able to create the aforementioned bin format test data for CNN/Dailymail articles & vocab file (per instructions here). However, I am not able to create my own test data to check how good the summary is. I have tried modifying the make_datafiles.py code to remove had coded values. I am able to create tokenized files, but the next step seems to be failing. It'll be great if someone can help me understand what url_lists is being used for. Per the github readme -
"For each of the url lists all_train.txt, all_val.txt and all_test.txt, the corresponding tokenized stories are read from file, lowercased and written to serialized binary files train.bin, val.bin and test.bin. These will be placed in the newly-created finished_files directory."
How is a URL such as http://web.archive.org/web/20150401100102id_/http://www.cnn.com/2015/04/01/europe/france-germanwings-plane-crash-main/ being mapped to the corresponding story in my data folder? If someone has had success with this, please do let me know how to go about this. Thanks in advance!
Update: I was able to figure out how to use own data to create bin files for testing (and avoid using url_lists altogether).
This will be helpful - https://github.com/dondon2475848/make_datafiles_for_pgn
Will update answer once I figure out how to fix ROGUE scoring for this.

What does .sprite file refers to?

I'm using Liferay Portal 6, The .sprite file is not specified in the source code, however, it's included in the URL with a slash dot, then it's blocked by a security program.
When I delete those file in theme/docroot/images and I deploy the project, they are generated again.
I would like to know how to manage those files or rename them?
You can open those files: It's combined images - look up "CSS Sprite" for a thorough documentation. They're used to limit the number of requests that go back to the server. Without sprites, you'd have every theme image loaded individually. With them you only need the sprite once, resulting in a significant performance boost: You want to have as few http-requests per page as possible, and sprites are one automatically handled way to help you achieving this.

Maximum number of images on folder

we are working on image gallery where we expect 1 million to 40 million photos but we are thinking to keep them in photo folder
but can one photo folder keep 40 million photos. if i directly keep them inside photo folder without creating any subfolder is there any issue of i have to create folder based on date of upload so that for any given date the photo uploaded in that day will go in that day folder like that .
i dont have any issue in creating that structure but for the knowledge point of view i want to know what is the problem if we keep few millions of photo directly in one folder. i have seen few websites who is doing this, for example if you will see this page all images are there under image folder.
something about 5 million images.all images are there under respective id for example under
4132808 so it shows that under images directory there are more than 5 million sub folder.is it ok to keep that much folder under one directory
http://www.listal.com/viewimage/4132808
http://iv1.lisimg.com/image/4132808/600full-the-hobbit%3A-an-unexpected-journey-photo.jpg
Depends on the filesystem check the file system comparison page on Wikipedia for comparison.
However you might want to sort in some structure like
images/[1st 2 char of some kind of hash/[2nd 2 char of hash]/...
With this you create an easily reproducable path with drastically decreasing the number of files in one folder.
You want to do this because in any event if you'd want to list the contents of the folder (or any application would need to do it) it would cause a huge performance problem.
What you can see on other sites is only how you publish those images. Of course they can be served seemingly from the safe url but in the underlying structure you want partition the files somehow.
Some calculations:
Let's say you use the sha256 hash of the filename to create the path. That gives you 40 chars of [0-9a-f]. So if you chose to have 2 letters sub folders then you'd have 256 of folders on each level. Now let's assume you do it for 3 levels: ab/cd/ef/1234...png. That's 256^3 folder meaning 16 million. So even if you'll be fine up to couple billion images.
As for serving the files you can do something like this with apache + mod_rewrite:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/images/../../../.*
RewriteRule ^/images/(..)(..)(..)(.*)$ /images/$1/$2/$3/$4 [L]
This would reroute the requests for the images to the correct place
See How many files can I put in a directory?.
Don't put all your files into one folder, it does not scale. If you don't want to start with a deep folder hierarchy, start simple and put the logic where you build the path to the folder in one class or method. This allows to simply rearrange later if needed.

node.js read images from PDF

I need to use PDF in a way similar to ZIP/RAR. To hold many images (ancient tibetan buddist literature), ideally 60000. But splitting in 10-100 volumes is OK.
Anything can be used for packing, but for unpacking we need Node.js. Because same PDF file must be served on web. But some users will need to use whole PDF.
So the question is, what node module I can use to read any single arbitrary image from huge PDF? Example would really help.
Every image is a single page. (Or in otherwords every page is single image)
We have been using https://github.com/mirkokiefer/Node-Magick for this....
But the pngs we get out sometimes are fairly low quality..

Resources