What are obsolete snapshots and snapshot files?

What are obsolete snapshots and snapshot files? - jestjs

I find the Jest Snapshot Summary a bit confusing. After running tests in one of our repositories, I get the following Summary:
Snapshot Summary
› 2 snapshots written in 1 test suite.
› 50 obsolete snapshot files found, re-run with `-u` to remove them.
› 3 obsolete snapshots found, re-run with `-u` to remove them.
Snapshot testing means we compare the current tests' output against the output before our changes, to catch side effects.
Hence, if I get it right, the summary means
2 tests are new, no snapshots were available to compare against
50 tests still provide the same output as before
3 tests have been removed, but the snapshots are still around
So running with -u would
Update the time stamp for 50 snapshots, but not change their contents
Delete the files for 3 snapshots that are useless
Is that understanding correct?

It's been a while I posted this question and by know I can answer it myself:
"Obsolete" refers to snapshots or snapshot files, for which no .toMatchSnapshot() exists any more.
Snapshots are organised in one file per test suite. Single snapshots in those files are stored along with the name of their test, given in jest's it() function. If you rename a test, the old snapshot is still in the snapshots file, but recognised as "obsolete".
› 2 snapshots written in 1 test suite.
⇒ 2 tests are new, no snapshots were available to compare against
This one holds true.
› 50 obsolete snapshot files found
50 tests still provide the same output as before
Is wrong, the 50 corresponding test suites have been renamed, moved or removed. Such a high number is unusual and you should probably find a way to re-map the snapshots to their tests, before updating them.
› 3 obsolete snapshots found
⇒ 3 tests have been removed, but the snapshots are still around
So this is only partly right, since the tests might have been renamed, not removed.

Related

Why is Spark much faster at reading a directory compared to a list of filepaths?

I have a directory in S3 containing millions of small files. They are small (<10MB) and GZ, and I know it's inefficient for Spark. I am running a simple batch job to convert these files to parquet format. I've tried two different ways:
spark.read.csv("s3://input_bucket_name/data/")
as well as
spark.read.csv("file1", "file2"..."file8million")
where each file given in the list is located in the same bucket and subfolder.
I notice that when I feed in a whole directory, there isn't as much delay at the beginning for the driver indexing files (looks like around 20 minutes before the batch starts). In the UI for 1 directory, there is 1 task after this 20 minutes which looks like the conversion itself.
However, with individual filenames, this time for indexing increases to 2+ hours, and my job to do the conversion in the UI doesn't show up until this time. For the list of files, there are 2 tasks: (1) First one is listing leafs for 8mil files, and then (2) job that looks like the conversion itself.
I'm trying to understand why this is the case. Is there anything different about the underlying read API that would lead to this behaviour?

spark assumes every path passed in is a directory
so when given a list of paths, it has to do a list call on each
which for s3 means: 8M LIST calls against the s3 servers
which is rate limited to about 3k/second, ignoring details like thread count on client, http connectons etc
and with LIST build at $0.005 per 1000 calls, so 8M requests comes to $50
oh, and as the LIST returns nothing, the client falls back to a HEAD which adds another S3 API call, doubling execution time and adding another $32 to the query cost
in contrast,
listing a dir with 8M entries kicks off a single LIST request for the first 1K entries
and 7999 followups
s3a releases do async prefetch of the next page of results (faster, esp if the incremental list iterators are used). one thread to fetch, one to process and will cost you 4c
The big directory listing is more efficient and cost effective strategy, even ignoring EC2 server costs

Retaining Delta log transaction data of Delta Lake forever

I had a small confusion on transactional log of Delta lake. In the documentation it is mentioned that by default retention policy is 30 days and can be modified by property -: delta.logRetentionDuration=interval-string .
But I don't understand when the actual log files are deleted from the delta_log folder. Is it when we run some operation? Or may be VACCUM operation. However, it is mentioned that VACCUM operation only deletes data files and not logs. But will it delete logs older than specified log retention duration?
reference -: https://docs.databricks.com/delta/delta-batch.html#data-retention

delta-io/delta PROTOCOL.md:
By default, the reference implementation creates a checkpoint every 10 commits.
There is an async process that runs for every 10th commit to the _delta_log folder. It will create a checkpoint file and will clean up the .crc and .json files that are older than the delta.logRetentionDuration.
Checkpoints.scala has checkpoint > checkpointAndCleanupDeltaLog > doLogCleanup. MeetadataCleanup.scala has doLogCleanup > cleanUpExpiredLogs.

The value of the option is an interval literal. There is no way to specify literal infinite and months and years are not allowed for this particular option (for a reason). However nothing stops you from saying interval 1000000000 weeks - 19 million years is effectively infinite.

Azure DevOps Pipeline Test results contain Duplicate Test Cases

If I go to Test results screen after run of my pipeline, it is showing each test case from Java/Maven/TestNG automated test project duplicated. One instance of each test case shows blank for machine name and the duplicate of that shows a machine name.
Run 1000122 - JUnit_TestResults_3662

There are several possibilities. First, if you added multiple configurations to a test plan, if so, the tests cases will be repeated in the plan with the each of the configurations you have assigned.
Another possibility is that when you passed parameters to the test method, did you use multiple parameters, so the test method was executed two times.
The information you provided is not sufficient. Can you share the code or screenshots of your Test Samples?

Reduce Service Fabric backup size

I'm trying to use Service Fabric backups with Actors:
var backupDescription = new BackupDescription(BackupOption.Full, BackupCallbackAsync);
await BackupAsync(backupDescription, TimeSpan.FromHours(1), cancellationToken);
But I've noticed that one backup file may contains several files like:
edb0000036A.log 5120 KB
edb0000036B.log 5120 KB
edb00000366.log 5120 KB
...
I haven't found any info about these files but it seems that they are just logs and I may not include them. Am I right or these files must be included in backup?
These files are quite heavy so I'm trying to reduce size of backups
UPDATE 1:
I have tried to use incremental backup. But it seems that Actors do not support Incremental backup as I have read on MSDN. Moreover I have tested but got Exception "Invalid backup option. Parameter name: option"

Instead of doing full backups every hour, you can also use incremental backups, which will result in a smaller size. (For example, do a full backup every day, and incrementals every hour for instance)
The log files are transaction logs, they are not optional for restore. More info here.

Should changeman demote delete the components from promotion libraries

I promoted a new JCL 'TTTTS360' to TST( Promotion level 1). I noticed and JCL was created in ( as this is new JCL) in TTTTTST.E998.JCL(TTTTS360) and similarly entry was created in parameter lib 'COMPTST.AAAA.PARMLIB(QEEEEAU)'.
Now once I demote my package to level 0 i.e. development , I still see 'TTTTTST.E998.JCL(TTTTS360)', 'COMPTST.AAAA.PARMLIB(QEEEEAU)', shouldn't they be removed ? I was expecting them to be removed all together?
I see following steps in changeman JOB
SYSPRINT DEL1CTC
SYSPRINT DEL1JCL
DEL1CTC CHANGEMAN STEP
DELETE QEEEEAU
QEEEEAU WAS DELETED FROM TARGET DATA SET
DEL1JCL CHANGEMAN STEP
DELETE TTTTS360
TTTTS360 WAS DELETED FROM TARGET DATA SET

ChangeMan has the concept of "staging libraries" and "promotion libraries." The former are sometimes referred to as "package datasets" because they are part of your ChangeMan package.
When you promote your package, typically the members from your staging libraries are copied to the corresponding target promotion libraries. When you demote your package, the members that were promoted are deleted from the target promotion libraries.
Your staging libraries aren't cleaned up until after install and baseline have completed as part of your request to install in your production environment. The cleanup may be days or weeks afterward, as a backout requires the staging libraries to be present.
Having said all that, ChangeMan is very configurable as Bruce Martin indicated in his comments. Talk to your ChangeMan Administrator(s) about what behavior you should expect to see.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What are obsolete snapshots and snapshot files? - jestjs

Related

Why is Spark much faster at reading a directory compared to a list of filepaths?

Retaining Delta log transaction data of Delta Lake forever

Azure DevOps Pipeline Test results contain Duplicate Test Cases

Reduce Service Fabric backup size

Should changeman demote delete the components from promotion libraries

Categories

Resources