How to determine the source of a job submission - cron

I am searching for a way to determine how a job was initiated on the HPCC cluster.
There are several ways to submit a job. For example:
1- a manual submission via the ECL IDE / ECL Watch
2- an external cron submission
3- an ECL submission of dynamically built code
4- if a file lands in a directory, it triggers a submission
etc.
I can retrieve some important information by executing a STD.System.Workunit.WorkunitList,
but I cannot find any function that would give me an attribute indicating the source of
that submission.
HPCC is a data-centric platform and ECL reflects that approach. So I am attempting to build
a matrix that defines the code in relation to that data. A product is technically a bunch
of data (files) that is the result of source input -> scrub and transformation processes -> to
the final base files. Then those files are then prepped / indexed for external use:
1- Roxie queries
2- PowerBI
3- webpage
4- reports ftp'd or emailed
etc.
I want to build this matrix that defines (by product) the initiating job(s), where they were initiated, any schedule (?), the associated input/output files (flagging whether they are source/intermediate/base/output). I am trying to design this so that the matrix can be dynamically built, because as we all know:
(1) nowhere does this type of documentation exist so that if someone new comes in to work on
a product, they can go and see the scope and life cycle of the data,
(2) nobody likes to document,
(3) the second any manual documentation is actually created and saved; it is out of sync with reality
So far, the design will be a collection of files (defined by the level of detail) which would
then be JOINed together to yield the final matrix. Not sure if this would end up as a PowerBI report or a webpage...still tossing that around. Still, this might prove to be something useful for
anyone using HPCC who wants a 30,000 ft view of their product.
I have attempted to programmatically scan a WUID output, looking for the necessary attributes but I have had little success.
I appreciate any assistance / comments.

No matter what component submits ECL to execute on the platform, they all ultimately end up going through the same WsWorkunits API, which is the public SOAP / REST interface.
While some client applications will leave a fingerprint so you can deduce where it came from, it is not a foolproof mechanism...
For Example: In http://play.hpccsystems.com:8010/esp/files/index.html#/workunits/W20221115-075604/xml you can see the ECL IDE appends some meta information into the Workunit (it stores the IDE version number in the "Application" section)

Related

Automatically update a data source (Excel doc, from ServiceNow) in PowerApps

The question here is: Am I on the right path (this is the first time I'm trying this), and if not, what would be smarter to try? If this is the right path, can you offer suggestions on how to do this best, because if this works, I am going to use it often on a lot of different tasks in this app.
I'm running a PowerApps Canvas app. As part of its program, I want it to be able to reference (read-only) a collection of data. That data is in ServiceNow, and my group is not permitted to access ServiceNow using the API.
During testing of the app, I just had it reference a SharePoint list (which I had filled with some dummy data), but I can re-code those lines as needed to pull from some other data source.
Because I am touching a few different systems here, I am not sure if this is the right way to go and I'm afraid I'll spend too long trying and find out that it would never have worked because of x. Thus my question.
This is what I think will work. Am I headed in the right direction?
Set up the scheduled report in ServiceNow. (Done!)
Program ServiceNow to email the Excel file output. Make sure it is
always the same title. (Done!)
Build a Power Automate flow to capture that email and save the
attached file to a location (OneDrive?) that can be accessed by the
app. If there is a file there already, delete it first.
Add the Excel file as a data source to the app, and start
referencing it as needed.
8-12 hours later, ServiceNow pushes out another scheduled data
drop, and the whole thing updates again.
In my perfect world, this system would work completely unattended.
Offhand, a glitch I can see is that ServiceNow generates an Excel file, but it's not a table, and PowerApps I think must ingest as a data source an Excel file that is a table. But (shrug) I might be wrong.
Am I thinking of this correctly? Is this the best avenue to follow?

Inferring the user intention from the event stream in an event store. Is this even a correct thing to do?

We are using an event store that stores a single aggregate - a user's order (imagine an Amazon order than can be updated at any moment by both a client or someone in the e-commerce company before it actually gets dispatched).
For the first time we're going to allow our company's employees to see the order's history, as until now they could only see its current state.
We are now realizing that the events that form up the aggregate root don't really show the intent or what the user actually did. They only serve to build the current state of the order when applied sequencially to an empty order. The question is: should they?
Imagine a user that initially had one copy of book X and then removed it and added 2 again. Should we consider this as an event "User added 1 book" or events "User removed 1 book" + "User added 2 books" (we seem to have followed this approach)?
In some cases we have one initial event that then is followed by other events. I, developer, know for sure that all these events were triggered by a single command, but it seems incredibly brittle for me to make that kind of assumptions when generating on the fly this "order history" functionality for the user to see. But if I don't treat them, at least in the order history feature as a single action, it will seem like there were lots of order amendments when in fact there was just one, big one.
Should I have "macro" events that contain "micro events" inside? Should I just attach the command's id to the event so I can then easily inferr what event happened at the same and which ones not (an alternative would be relying on timestamps.. but that's disgusting).
What's the standard approch to deal with this kind of situations? I would like to be able to look at any time to the aggregate's history and generate this report (I don't want to build the report incrementally every time the order is updated).
Thanks
Command names should ideally be descriptive of intent. Which should mean it's possible to create event names which make the original intent clear. As a rule of thumb, the events in the event stream should be understandable to the relevant members of the business. It's a good rule of thumb. It should contain stuff like 'cartUpdated' etc.
Given the above, I would have expected that the showing the event stream should be fine. But I totally get why it may not be ideal in some circumstances. I.e. it may be too detailed. In which case maybe create a 'summeriser' read model fed the events.
It is common to include the command’s ID in the resulting events’ metadata, along with an optional correlation ID (useful for long running processes). This then makes it easier to build the order history projection. Alternatively, you could just use the event time stamps to correlate batches in whatever way you want (perhaps you might only want one entry even for multiple commands, if they happened in a short window).
Events (past tense) do not always capture human - or system - user intent. Commands (imperative mood) do. As all command data cannot always be easily retraced from the events it generated, keeping a structured log of commands looks like a good option here.

How to ease updating inferno with web performance test scripts

Updating can performance test script e.g. with LoadRunner can take a lot of time and be quite frustrating. If there has been some updates with the applications, you usually have to run the script and then find out what has to be changed, update and run again and so on. Does anyone have some concrete best practices how to ease this updating inferno? One obvious thing is good communication with developers.
It depends on the kind of updates. If the update is dramatic, like adding new fields for user to fill in, then, someone has to manually touch up the test scripts.
If, however, the update is minor, for example, some changes to the hidden fields or changes to the internal names of user-facing fields, then it's possible to write a script that checks the change and automatically updates the test script.
One of the performance test platforms, NetGend, automatically takes care of the hidden fields and the internal names of user-facing fields so it's very easy to create a script to performance-test a HTML form. Tester only needs to fill in the values that he/she would have to enter using a browser, so no correlation is necessary there. Please send me a message if you need to know more about it.
There are many things you can do to insulate your scripts from build to build variability. The higher up the OSI stack you go the lower the maintenance charge, but the higher the resource cost for the virtual user type. Assuming changes are limited to page level resources and a few hidden fields here and there for web sites or applications, then you can record in HTML mode. You blast the EXTRARES sections as the page parser in HTML mode will automatically parse the page and load the page resources even without an explicit reference - It can be a real pain to keep these sections in synch if you have developers who are experimenting quite a bit.
Next up, for forms which have a very high velocity in terms of change consider the use of a web_custom_request() for the one form. You can use correlation statements to pick up all of the name|value pairs as needed and build the form submit dynamically. There will be a little bit more up front work for this but you should have pay offs at around the fourth changed build where you would normally have been rebuilding some scripts.
Take a look at all of the hosts referenced in your code. Parameterize all of these items. I have a template that I use for web virtual users which pairs a default value and the ability to change any of the host names via the control panel extra attributes section. Take a look at the example for lr_get_attrib_string() for how you might implement the pickup and pair that with a check for NULL and a population with a default value in your code
This is going to seem counter intuitive, but comment your script heavily for changes that are occurring often so you know where to take the extra labor change up front to handle a more dynamic data set.
Almost nothing you do with any tool can save you from struuctural changes in the design and flow of the app, such as the insertion of a new page in the workflow, but paying attention to the design on the high change pages, of which there are typically a small number, can result in a test code with a very long life.
Of course if your application is web services based then there is a natual long life to the use of exposed public services. Code may change on the back end of the service, but typically the exposed public interface is very stable.

What area of machine learning should I look into to automatically extract certain info from messages

I have an app that extracts information from incoming messages. The messages all contain the same information, but they have different forms depending on the source that sent them.
Example:
Message from source A :
A: You spent $50.00 at Macy's on 2/20/12
Message from source B :
Purchase, $50.00, Macy's, 2Feb2012, Balance $5000.00
Every message from a single source has the same form though. So at the moment, I'm doing it by writing a set of regular expressions to first identify which message I'm trying to decode (i.e. what source it came from so I know what the form of the message is), and then extracting the necessary information from the message (in the above example, I want to know the transaction amount, the store where the transaction happened, and the date). If I discover a new source for a message, or a source changes the format of their message (doesn't happen very often, but could happen), I need to manually write the regular expressions for that message. I'm sure however that I could automate this using some kind of machine learning technique. I just don't know much about machine learning, and I don't know where to even start looking for a technique that would apply to my problem. I would like someone to just point me in the right direction on where to start reading.
In order to detect and label amounts, dates, person names and similar information you can use a technique called Named Entity Recognition. The Stanford Named Entity Recognizer comes with pretrained, ready to use models.
You also use whatever labeled data you have generated so far to learn a custom model for your application. The standard techniques used for this purpose are Conditional Random Fields or Sequence Perceptron. There are many toolkits implementing these models, including:
Wapiti - A simple and fast discriminative sequence labelling toolkit.
Sequor - sequence labeler based on Collins's (2002) perceptron.

Generate XML feeds or auto export files in SAP?

Is there any way to generate automatically the result of an SAP transaction? Let's say I want to see the production orders for one MRP controller (I have the COOIS transaction for this). Is there any way to generate an XML feed with the result of that transaction and refresh it let's say.. every 10 minutes?
Or to auto-export an .xls file with the result somewhere... ? I know I have the jobs and the spools but I have to manually download the result from the SAP GUI.
I don't have access to ABAP so I would like to know if there are other methods to get data from SAP?
Since "a transaction" might be anything from a simple report to a complex interactive application that does not even have a simple "result", I doubt that there's a way to provide any generic tool for this. You might try the following:
Schedule a job and have the result sent to some mailbox instead of printing it. Then use the programming language of your choice to grab and process the mail.
Check whether there are BAPIs available (BAPI_PRODORD_* or something like that - I'm not a CO expert, so I wouldn't know which one to use). You can call these BAPIs from an external program without having to write ABAP yourself - however, you'll most likely need the help of someone who knows ABAP in order to get the interface documentation and understand the concepts.

Resources