How can I find out when a dataset was allocated? - mainframe

I need to get the exact time when a dataset was allocated (not opened). I've searched all the SMF records but can't find anything suitable. Any ideas? It was allocated via SVC99 from a vendor STC. I'm pretty sure it's allocated and then OPENed, as the first SMF record referencing the dataset is DISP=OLD.
The dataset is not SMS managed.

I think you are out of luck.
To my knowledge, creation time is not an attribute of any dataset. So looking through the SMF records probably wont help. You might be able to get the time of the first action perform after allocation, but that doesn't sound like it will help you. I found a thread on the IBM Mainframe forum that basically confirms my thought.
It says:
The only way to get the creation time of a data set is through
analyzing SMF data for the date the data set was created, the creation
time is not an attribute of the data set.
I'm not even certain you could get the time that way as no SMF data is
written when the data set is created. The best you can do is the time
the first I/O to the data set completes and a type 15 SMF record is
written.

Related

Information Link Caching

I'm working through a problem that I can't seem to find a solution for. I'm attempting to speed up the load time for a report. The idea is to open the report on on the Analyst Client, and I've identified one information link that bogs down the load time. Easy enough, I figured i'd cache the information link:
I reloaded the report expecting the first load to take a while, however the data is reloading everything every time. The amount is less than 2 GB so that can't be the problem. The only other issue I can think of is that I am filtering the data in the Where clause in my SQL statement. Can you all think of anything I'm missing?

In Microsoft Dataverse how can I see a real time capacity or data usage? I only see a daily values

We are moving to an online D365 environment and we're trying to determine how much data we're using in the Dataverse tables. Under Capacity I can go into the environment details and see how much space per table is used, but it's a daily value. We're trying to remove some data as we're reaching some of our capacity limits -- but I can't find where it shows how much data is being used per table in real time. Thanks for any advise on how to pull this.
This article does mention how to get to the capacity limits and usage, but all values appear to be updated just daily:
https://learn.microsoft.com/en-us/power-platform/admin/capacity-storage?source=docs
I'm trying to find some way to see the data used in real time.

Usage data not showing after importing data

After performing two successful imports (one for users and one for follow relationships) the usage data view has not updated with the expected values. Does this mean my records were not created from the import?
I was expecting around 50k user records with as many follow relationships.
Currently, I'm just creating user records as only having their ID set. When I do this via the api get_or_create I can see the usage update in real time. However, doing this via an import appears to have had no effect? Same for follow relationships.
I've noticed in the docs that it states An array of users in which each line can have up to 10,000 entries, does that mean I'm limited to 10k users per instruction?
Some details about what the issue was.
Automatic import has a config to disable tracking statistics for dashboard.
This configuration exists to support very big imports. However, automatic import limited by 300MB by default and it doesn't need to disable it because it's a manageable amount without any care.
That's why running configuration was wrong and it's enabled now.

How spark structured streaming calculate watermark

However, to run this query for days, it’s necessary for the system to
bound the amount of intermediate in-memory state it accumulates. This
means the system needs to know when an old aggregate can be dropped
from the in-memory state because the application is not going to
receive late data for that aggregate any more. To enable this, in
Spark 2.1, we have introduced watermarking, which lets the engine
automatically track the current event time in the data and attempt to
clean up old state accordingly. You can define the watermark of a
query by specifying the event time column and the threshold on how
late the data is expected to be in terms of event time. For a specific
window starting at time T, the engine will maintain state and allow
late data to update the state until (max event time seen by the engine
- late threshold > T). In other words, late data within the threshold will be aggregated, but data later than the threshold will start
getting dropped (see later in the section for the exact guarantees).
Let’s understand this with an example. We can easily define
watermarking on the previous example using withWatermark() as shown
below.
Above is copied from http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html.
From the above documenation description(For a specific window starting at time T),it is the starting time of a given window.
I think the document is wrong, it should be the ending time of a given window.
I confirm by investigating the spark code, the document is wrong, T is the ending time of the window

Cassandra count use case

I'm trying to figure out an appropriate use case for Casandra's counter functionality. I thought of a situation and I was wondering if this would be feasible. I'm not quite sure because I'm still experimenting with Cassandra so any advice would be appreciated.
Lets say you had a small video service, you record the log of views in Cassandra while recording what video was played, which user played it, country, referer etc. You obviously want to show a count of how many times that video was played would incrementing a counter every time you insert a play event be a good solution to this? Or would there be a better alternative. Counting all the events on read every time would take a pretty big performance hit and even if you cached the results the cache would be invalidated pretty quickly if you had a busy site.
Any advice would be appreciated!
Counters can be used for whatever you need to count within an application -- both "frontend" data and "backend" one. I personally use them to store user's behaviour information (for backend analysis) and frontend ratings (each operation a user do in my platform give to the user some points). There is no real limitation on use case -- the limitation is given by few technical limitations, the bigger coming to my mind:
a counter cf can be made only by counters columns (except PK, obviously)
counters can't be reset: to set 0 value to a counter you need to read and calculate before writing (with no guarantee about the fact that someone else updated before you)
no ttl and no indexing/deletion
As far as your video service it all depends on how you choose to model data -- if you find a valid model to hit few partitions each time you write/read and you have a good key distribution I don't see any real problem in its implementation.
btw: you tagged Cassandra 2.0 but if you have to use counters you should think about 2.1 for the reasons described here

Resources