Hybris catalog cronjob sync is not working - sap-commerce-cloud

For the first time when we run the sync cronjob (product/content) sync, it runs properly and creates media dump in the admin tab.
from next time when we run it, it just shows successful but actually, sync does not happen.
When I go back and clear the media dump from the admin tab, it starts working and again creats media dump.
So every time I am forced to manually clear the media dump for making this sync job to work.
please advise.

CatalogVersionSyncJob is designed to run only once with each instance. So if we create a sync job instance by ImpEx/HMC, it'll work for first time but in the second execution, it won't get any newly/modified items and no item will be synced. Which mean, the system needs a new instance for each sync execution!
If we execute catalog sync from Catalog Management Tool(HMC/backoffice), then each time, it internally creates a new instance of selected sync job. Hence, it's working.
To solve this, write the custom job which basically does the same thing as HMC/backoffice does internally. Like creates a new instance, assign sync job, and execute it.
For more information, refer configure-catalog-sync-cronjob-Hybris

I've encountered this issue, and the workaround was to create another CronJob that would remove those media dumps before the sync runs.
At a high-level we have a CompositeCronJob that does two things (there are actually more, but I'll just say we have 2 for the sake of this issue) in sequence:
Remove the media dump from the Sync CronJob
Sync CronJob

Related

How to setup an ADF pipeline that isolates every pipeline run and create its own computer resources?

I have a simple pipeline in ADF that is triggered by a Logic App every time someone submits a file as response in a Microsoft forms. The pipeline creates a cluster based in a Docker and then uses a Databricks notebook to run some calculations that can take several minutes. 
The problem is that every time the pipeline is running and someone submits a new response to the forms, it triggers another pipeline run that, for some reason, will make the previous runs to fail.
The last pipeline will always work fine, but earlier runs will show this error:
 > Operation on target "notebook" failed: Cluster 0202-171614-fxvtfurn does not exist 
However, checking the parameters of the last pipeline it uses a different cluster id, 0202-171917-e616dsng for example.
 It seems that for some reason, the computers resources for the first run are relocated in order to be used for the new pipeline run. However, the IDs of the cluster are different.
I have set up the concurrency up to 5 in the pipeline general settings tab, but still getting the same error. 
Concurrency setup screenshot
Also, in the first connector that looks up for the docker image files I have the concurrency set up to 15, but this won’t fix the issue 
look up concurrency screenshot
To me, it seems a very simple and common task when it comes to automation and data workflows, but I cannot figure it out.
I really appreciate any help and suggestions, thanks in advance
The best way would be use an existing pool rather than recreating the pool everytime

How can I view CruiseControl.Net logs in real time?

I use CruiseControl.Net for continuous integration and I would like to read the log output of the current project in real time. For example, if it is running a compile command, I want to be able to see all the compile output so far. I can see where the log files are stored but it looks like they are only created once the project finishes. Is there any way to get the output in real time?
The CCTray app will allow you to see a snapshot of the last 5 or so lines of output of any command on a regular interval.
It's not a live update as that would be too resource intensive, as would be a full output of the log to-date.
Unless you write something to capture and store the snapshots you're out of luck. Doing this also presents to possibility of missing messages that appear between snapshots, so it would not be entirely reliable. It would however give you a slightly better idea of what is going on.
You can run ccnet.exe as a command line application instead of running ccservice as a Windows service. It will output to the terminal as it runs. It's useful for debugging.

How to know which instance deleted my file on a Linux server?

I have a workflow pipeline which will generate data files on a Linux server periodically, and also a cleanup service which will remove data files which are older than a week.
However, sometimes I found that a new generated data file will be missing, which is definitely no older than a week. I'm not sure whether it is a logic bug of the cleanup service, or another program did it. Currently I don't have any idea on how to investigate this issue. Is there any method to log all the file deletion activities and related process id as well as process name?
Thanks in advance.

Manually start SharePoint timer job

I'd like to invoke a timer job installed on a SharePoint server manually. What would be useful is something along the lines of an stsadm command.
My scenario is, I've deployed a solution with a bunch of features to a customers server. I don't want to wait for the weekly schedule to kick a particular timer job to life. I would like to just punch in a command to get the specific job to run immediately. Obviously in the development enviroment I've got the schedule set for a few minutes but I want to do a test run while I'm on site with the customer.
You can develop a custom command line based tool that gets the job's SPJobDefiniton based on the criteria that identifies your job from the service.JobDefinitions collection. From there you can execute it using the Execute() method.

Process text files ftp'ed into a set of directories in a hosted server

The situation is as follows:
A series of remote workstations collect field data and ftp the collected field data to a server through ftp. The data is sent as a CSV file which is stored in a unique directory for each workstation in the FTP server.
Each workstation sends a new update every 10 minutes, causing the previous data to be overwritten. We would like to somehow concatenate or store this data automatically. The workstation's processing is limited and cannot be extended as it's an embedded system.
One suggestion offered was to run a cronjob in the FTP server, however there is a Terms of service restriction to only allow cronjobs in 30 minute intervals as it's shared-hosting. Given the number of workstations uploading and the 10 minute interval between uploads it looks like the cronjob's 30 minute limit between calls might be a problem.
Is there any other approach that might be suggested? The available server-side scripting languages are perl, php and python.
Upgrading to a dedicated server might be necessary, but I'd still like to get input on how to solve this problem in the most elegant manner.
Most modern Linux's will support inotify to let your process know when the contents of a diretory has changed, so you don't even need to poll.
Edit: With regard to the comment below from Mark Baker :
"Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files."
That will happen with the inotify watch you set on the directory level - the way to make sure you then don't pick up the partial file is to set a further inotify watch on the new file and look for the IN_CLOSE event so that you know the file has been written to completely.
Once your process has seen this, you can delete the inotify watch on this new file, and process it at your leisure.
You might consider a persistent daemon that keeps polling the target directories:
grab_lockfile() or exit();
while (1) {
if (new_files()) {
process_new_files();
}
sleep(60);
}
Then your cron job can just try to start the daemon every 30 minutes. If the daemon can't grab the lockfile, it just dies, so there's no worry about multiple daemons running.
Another approach to consider would be to submit the files via HTTP POST and then process them via a CGI. This way, you guarantee that they've been dealt with properly at the time of submission.
The 30 minute limitation is pretty silly really. Starting processes in linux is not an expensive operation, so if all you're doing is checking for new files there's no good reason not to do it more often than that. We have cron jobs that run every minute and they don't have any noticeable effect on performance. However, I realise it's not your rule and if you're going to stick with that hosting provider you don't have a choice.
You'll need a long running daemon of some kind. The easy way is to just poll regularly, and probably that's what I'd do. Inotify, so you get notified as soon as a file is created, is a better option.
You can use inotify from perl with Linux::Inotify, or from python with pyinotify.
Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files.
With polling it's less likely you'll see partial files, but it will happen eventually and will be a nasty hard-to-reproduce bug when it does happen, so better to deal with the problem now.
If you're looking to stay with your existing FTP server setup then I'd advise using something like inotify or daemonized process to watch the upload directories. If you're OK with moving to a different FTP server, you might take a look at pyftpdlib which is a Python FTP server lib.
I've been a part of the dev team for pyftpdlib a while and one of more common requests was for a way to "process" files once they've finished uploading. Because of that we created an on_file_received() callback method that's triggered on completion of an upload (See issue #79 on our issue tracker for details).
If you're comfortable in Python then it might work out well for you to run pyftpdlib as your FTP server and run your processing code from the callback method. Note that pyftpdlib is asynchronous and not multi-threaded, so your callback method can't be blocking. If you need to run long-running tasks I would recommend a separate Python process or thread be used for the actual processing work.

Resources