Hi I need to read multiple .bib files (they are texts), and I need to pretty much search in those.
I'm wondering if I need multiple scanners, one for each file?
Or do I do one Scanner for all and for every new file I point to a new reference file.
What is the better practice?
Related
I am currently working on an Excel file to automate a number of actions. There are about a dozen users using a copy of this file and I cannot create a shared file because each user needs to have a certain amount of data that is unique to them and cannot be shared with others.
The problem is that I have to make regular updates of my VBA code and that every time I have to manually update files of all users.
Would there be a way to store my code in a file on the network or on Github where I could modify my macros and where my file could "read" my code?
I’ve heard about XLAM files but I haven’t been able to get anything functional with it. Is this solution viable?
An application works with entities that consist of images and texts. For the program, several JPGs, PNGs, TIFFs, texts or jsons are considered as a one whole work, but they are a bunch of different files on a disk. It's convinient for a user to be able to easily copy works, sent them to another people, download, upload etc. While a work is actually several files it's cumbersome to work with them for a user.
Probably, I can zip required files but it looks like a heavy workaround. I don't need compression, and speed of the application is important.
I'm not sure, may be, it's okay to just strictly read the whole data from each file and sequentially write it to a final file, construct it as a json or something else... but I suppose that a turnkey solution may be here.
My question:
How can I pack different files into one in order to go quickly through them, copy, move, delete, edit in a GUI application like any file manager does, and to be able for a person to manually copy works, send to a friend etc.?
I code in Python.
Thank you in advance.
I am making an application that will save information for certain files. I was wondering what the best way to keep track of files. I was thinking of using the absolute path for a file but that could change if the file is renamed. I found that if you run ls -i each file has an id beside it that is unique(?). Is that ok to use for a unique file id?
The inode is unique per device but, I would not recommend using it because imagine your box crashes and you move all the files to a new file system now all your files have new ids.
It really depends on your language of choice but almost all of them include a library for generating UUID's. While collisions are theoretically possible its a veritable non-issue. Generate the UUID prepend it to the front of your file and you are in business. As your implementation grows it will also allow you to create a HashTable index of your files for quick look ups later.
The question is, "unique across what?"
If you need something unique on a given machine at a given point in time, then yes, the inode number + device number is nearly always unique - these can be obtained from stat() or similar in C, os.stat() in python. However, if you delete a file and create another, the inode number may be reused. Also, two different hosts may have a completely different idea of what the device,inodeno pairs are.
If you need something to describe the file's content (so two files with the same content have the same id), you might look into one of the SHA or RIPEMD functions. This will be pretty unique - the odds of an accidental collision are astronomically low.
If you need some other form of uniqueness, please elaborate.
I am trying to build excerpts for each document returned as a search results on my website. I am using the Sphinx search engine and the Apache web server on Linux CentOS. The function within the Sphinx API that I'd like to use is called BuildExcerpts. This function requires you to pass an array of strings where each string contains the documents contents.
I'm wondering what the best practice is for retrieving the document contents in real time as I serve the results on the web. Currently, these documents are in text files on my system, spread across multiple drives. There are roughly 100MM of them and they take up a few terabytes of space.
It's easy for me to call something like file_get_contents(), but that feels like the wrong way to do this. My databases are already gigantic ( 100GB+ ) and I don't particularly want to throw the document contents in there along with the document attributes that already exist. Perhaps this is the best way to do this, however.
Suggestions?
Well the source needs to be fetched from somewhere. If you dont want to duplicate it in your database, then you will need to fetch it from the filesystem. (using file_get_contets or similar)
Although the BuildExerpts function does give you one extra option "load_files"
... then sphinx will read the data from the filename for you.
What problem are you experiencing with reading it from files? Is it too slow? If so maybe use some caching in front - using memcache maybe.
I've got a system in which users select a couple of options and receive an image based on those options. I'm trying to combine multiple generated images(corresponding to those options) into the requested picture. I'm trying to optimize this so that if, an image exists for a certain option (i.e. the file exists), then there's no need to compute it and we move on to the next step.
Should I store these images in different folders, where each folder is an option name? Should I store them in the same folder, adding a prefix corresponding to the option to each image? Should I store the filenames in a database and check there? Which way is faster to check a file for existence?
I'm using PHP on Linux, but I'm also interested if the answer varies if I change the programming language or the OS.
If you're going to be producing a lot of these images, it doesn't seem very scalable to keep them all in one flat directory. I would go with a hierarchy, which will make it a lot easier to manage.
It's always going to be quicker to check in a database than to check if a file exists though, so if speed is the primary concern, use a hierarchical folder structure and keep all the filenames in a database.