how to remove file(blob) from blobdir after object is deleted form zodb(data.fs) - pyramid

i have uploaded file(PDF 4mb) on server it is stored in blobdir. and reference in object of MYCLASS with attribute attachment in (zodb data.fs). if i am deleting object of MYCLASS then that object is deleted but the file(PDF 4mb) on blobdir is not deleted. how to delete that blob file after object is deleted?

The file is part of a past ZODB revision. You'll need to pack your ZODB database to remove historical revisions.
How far back you pack your database is up to you. Once you have removed old revisions you can no longer roll back the database to those states anymore.
How you pack the ZODB depends on your setup. If you are using ZEO there is a command-line tool (zeopack) that instructs ZEO server to pack the storage for you.
You can also do it programmatically; from your Pyramid app, for example, with the db.pack() method:
import time
from pyramid_zodbconn import get_connection
db = get_connection(request).db()
db.pack(days=7)
I used the days parameter to pack the ZODB but retain history for the past week. You can also use a timestamp t (UNIX seconds since the epoch) to specify a specific point in time to which to pack, or omit either to remove all old revisions.
Once the revision that references the blob is removed, the blob file is not immediately removed; whenever you pack a backup is created in case you need to revert the operation. A future pack operation replaces the previous backup with the new backup, clearing the blobs for good.

Related

Can AZCopy perform the copy starting from the most recent files and folders

Hi all hope everyone is safe.
I'm migrating a Windows Application that contain 1.8 Million files and folders of images and other non-DB dependency files from our On-Prem DataCenter to an Azure VM.
The application can tolerate missing images (it displays an X in their place), and I plan to use AZCopy to copy these files to the blob, however I noticed that AZCopy has its own way of choosing what files and folders to start with. Is there a way to let AZCopy start with the most recent files and folders? If that is possible, then I can do the cut-off as soon as AZCopy copies the last few days instead of waiting for the whole copy to be completed. So instead of few days downtime, it would be only few hours.
Thanks
We have a way to filter to files after a point in time but not before.
Thank you for this suggestion. We are going add this in our backlog. Include-before shouldn't take too much work, since the foundation is already there.
As a suggestion in the meantime, and, this is not the most amazing workaround ever, but however it could potentially mark files after the time with a certain windows attribute that
Isn't on the other files
Won't affect your workload
Then, filter out the attribute.
Unfortunately, all of the file attributes we can filter by do something.
// Available attributes (SMB) include:
// R = Read-only files
// A = Files ready for archiving
// S = System files
// H = Hidden files
// C = Compressed files
// N = Normal files
// E = Encrypted files
// T = Temporary files
// O = Offline files
// I = Non-indexed files
// Reference for File Attribute Constants:
// https://learn.microsoft.com/en-us/windows/win32/fileio/file-attribute-constants

linux api rename behavior when new refer to an existing file

When I am reading the document for rename in the page https://linux.die.net/man/3/rename, i found the following
If the link named by the new argument exists, it shall be removed and old renamed to new. In this case, a link named new shall remain visible to other processes throughout the renaming operation and refer either to the file referred to by new or old before the operation began. Write access permission is required for both the directory containing old and the directory containing new.
How should I understand the following
refer either to the file referred to by new or old before the operation began
in this case a file with the same name with what new points exists, then after the rename operation, the new should points to either the old or the new. But the document says it is before the operation began which makes me confused.
How should I understand this? Could you give me an example?
What this phrase means is that, during rename, the old new is replaced with the new new atomically.
What this means is that there is no point during the rename operation where trying to access new will result in a file not found error. Every access will result in either the old or the new new being returned.
After the rename is done (assuming it finished successfully), of course the new new will be referenced under that name.
This highlights rename's usefulness in atomically replacing files. If you have a path containing some important file, and you need to update that file such that, no matter what happens, anyone any time that opens /var/lib/important will get either the old or the new version, this is the sequence of operations you need to do:
Create an updated version of the file with the path /var/lib/important.new.
Flush and close the /var/lib/important.new.
rename("/var/lib/important.new", "/var/lib/important");
Depending on your use case, flush /var/lib.
This guarantees that no matter what happens (process crash, power failures, kernel faults), either the old or the new file are available, complete and correct.
That last step (flushing the directory) is only necessary if you need to rely on it being the new version of the file that is available. If you do not do it, a power failure might cause the old file to re-appear after a restart. Typical uses don't bother with this step.

How does `aws s3 sync` determine if a file has been updated?

When I run the command in the terminal back to back, it doesn't sync the second time. Which is great! It shouldn't. But, if I run my build process and run aws s3 sync programmatically, back to back, it syncs all the files both times, as if my build process is changing something differently the second time.
Can't figure out what might be happening. Any ideas?
My build process is basically pug source/ --out static-site/ and stylus -c styles/ --out static-site/styles/
According to this - http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
S3 sync compares the size of the file and the last modified timestamp to see if a file needs to be synced.
In your case, I'd suspect the build system is resulting in a newer timestamp even though the file size hasn't changed?
AWS CLI sync:
A local file will require uploading if the size of the local file is
different than the size of the s3 object, the last modified time of
the local file is newer than the last modified time of the s3 object,
or the local file does not exist under the specified bucket and
prefix.
--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.
You want the --size-only option which looks only at the file size not the last modified date. This is perfect for an asset build system that will change the last modified date frequently but not the actual contents of the files (I'm running into this with webpack builds where things like fonts kept syncing even though the file contents were identical). If you don't use a build method that incorporates the hash of the contents into the filename it might be possible to run into problems (if build emits same sized file but with different contents) so watch out for that.
I did manually test adding a new file that wasn't on the remote bucket and it is indeed added to the remote bucket with --size-only.
This article is a bit dated but i'll contribute nonetheless for folks arriving here via google.
I agree with checked answer. To add additional context, AWS S3 functionality is different than standard linux s3 in a number of ways. In Linux, an md5hash can be computed to determine if a file has changed. S3 does not do this, so it can only determine based on size and/or timestamp. What's worse, AWS does not preserve timestamp when transferring either way, so timestamp is ignored when syncing to local and only used when syncing to s3.

Backups in Brightway: how to use them

I am going to make some modifications to methods and the biosphere3 database. As I might break things (I have before), I would like to create backups.
Thankfully, there exist backup() methods for just this. For example:
myBiosphere = Database('biosphere3')
myBiosphere.backup()
According to the docs, this "Write[s] a backup version of the data to the backups directory." Doing so indeed creates a backup, and the location of this backup is conveniently returned when calling backup().
What I wish to do is to load this backup and replace the database I have broken, if need be. The docs seem to stay silent on this, though the docs on serialize say "filepath (str, optional): Provide an alternate filepath (e.g. for backup)."
How can one restore a database with a saved version?
As a bonus question: how is increment_version(database, number=None) called, and how can one use it to help with database management?
The code to backup is quite simple:
def backup(self):
"""Save a backup to ``backups`` folder.
Returns:
File path of backup.
"""
from bw2io import BW2Package
return BW2Package.export_obj(self)
So you would restore the same as with any BW2Package:
from brightway2 import *
BW2Package.import_file(filepath)
However, if would recommend using backup_project_directory(project) and restore_project_directory(filepath) instead, as they don't go through an (older) intermediate format.
increment_version is only for the single file database backend, and is invoked automatically every time the database is saved. You could add versioning to the sqlite database backend, but this is non-trivial.

MKS Integrity: Getting content of archive (dropped member)

I know that I can use the CLI command si viewrevision to get the content of a versioned file. Downside is that this file must not be dropped.
Does anyone know a way (other than addfromarchive) to get the content when knowing the archive?
I don't believe this is possible
si projectco is documented as "checks out members of a project into working files". If you drop the member from the project, it is no longer part of the project.
At first blush, si viewrevision doesn't explicitly state in the documentation that it requires a project, but if you try to run the command without a project (or a sandbox, which implies a project), you will be prompted for one. Failing to provide one at the prompt exits the command with the message 'A value for "--project" is required.' I tried doing this specifying a change package ID which the member was part of, and that still doesn't work.
Your si addfromarchive option is the only published way to do this.
Disclosure: I am a PTC employee.
Why not use add from archive?
You could also use a temporary server location (S:/Server/prj_tmp/project.pj)
as destination and the member will stay dropped in the original project.
(Ok, ok, someone could create a sandbox from S:/Server/prj_tmp/project.pj and generate new versions in the archive of the dropped member, ad/delete labels ...)
There might be another possibility if your project has a checkpoint where the file was not yet dropped.
Just create a build sandbox with the project revision of that checkpoint and then:
C:\BuildSandboxes\prjA\src> si viewrevision ..... :)
.
You might also use something like
C:\Sandboxes\prjA\src> si configuresubproject --subprojectRevision=1.2 --type=build project.pj
view your revision and then go back with
C:\Sandboxes\prjA\src> si configuresubproject --type=default project.pj
but this might affect user that are currently working on the project. (eg. they would not be able to check in while the subproject is configured as build)

Resources