OPC Publisher module does not start on my Ubuntu VM as an edge module - linux

The OPC Publisher marketplace image runs successfully as a standalone container (albeit with server connection problems). But I am not able to deploy it as an edge module, especially after changing container create options.
Background: In my host laptop I was never able to get the module up so I created a Ubuntu VM. When I tried to deploy the edge module in the VM with default container create options the module did show up in the iotedge module list as "running". I wanted to set the "--op" option to set publishing rate so I changed it in the create options using the portal "Set modules" tab. Since there is no update button I used create button to "recreate" the modules. After this the module did not show up.
After that the OPC publisher module is not showing up on the edge VM. I am following the Microsoft tutorial.
Following is the command:
sudo docker run -v /iiotedge:/appdata mcr.microsoft.com/iotedge/opc-publisher:latest --aa --pf=/appdata/publishednodes.json --c="HostName=<iot hub name>.azure-devices.net;DeviceId=iothubowner;SharedAccessKey=<hub primary key>" --dc="HostName=<edge device id/name>.azure-devices.net;DeviceId=<edge device id/name>;SharedAccessKey=<edge primary key>" --op=10000
Container create options:
{
"Hostname": "opcpublisher",
"Cmd": [
"--pf=/appdata/publishednodes.json",
"--aa",
"--op=10000"
],
"HostConfig": {
"Binds": [
"/iiotedge:/appdata"
]
}
}
I have not specified the connection strings explicitly since the documentation from Microsoft assures that the runtime will pass them automatically.
The relevant iotedge journalctl logs are here.
Oct 06 19:36:05 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:05Z [INFO] - Pulling image mcr.microsoft.com/iotedge/opc-publisher:latest...
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [INFO] - Successfully pulled image mcr.microsoft.com/iotedge/opc-publisher:latest
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [INFO] - Creating module OPCPublisher...
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [INFO] - Starting new listener for module OPCPublisher
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [ERR!] - Internal server error: Could not create module OPCPublisher
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: caused by: Could not get module OPCPublisher
The logs from iotedge itself is not much useful. Find below anyway.
~$ iotedge logs OPCPublisher
A module runtime error occurred
I have also tried docker container prune just to be sure but it did not help.
Also strangely in the Azure portal when I try to restart the module from the troubleshoot page it throws an error "module not found in the current environment"
Can someone please help me out in troubleshooting this problem? I will be glad to share more details if required.

I raised a support query in Azure portal. After sending support bundles and trying various suggestions like removing DNS configuration, changing bind path to a non-sudo location etc. the team zeroed in on the edge version mismatch.
After re-reading the documentation I uninstalled the earlier iotedge package and installed aziot-edge instead and problem solved!
The team has raised a github issue for public tracking here:
https://github.com/Azure/Industrial-IoT/issues/1425
#asergaz also pointed to the right direction but did not notice since it came a bit later

Related

Listener oracle in pacemaker cluster

I need some help. I have created a cluster with 2 nodes. I created all resources, but listener has errors and the pacemaker cluster status shows Oracle Listener has stopped.
In the web interface I have the following error messages:
Failed to start listener_or on Mon Oct 25 16:30:52 2021 on node lha1: Listener pdb1 appears to have started, but is not running properly:
and
Unable to get metadata for resource agent 'ocf:heartbeat:oralsnr' (timeout)
In pacemaker cluster status I have the following error message:
listener_or_start_0 on lha1 'error' (1): call=35, status='complete', exitreason='Listener pdb1 appears to have started, but is not running properly: ', last-rc-change='2021-10-25 16:30:52 +03:00', queued=0ms, exec=393ms
However, I can connect to the base.
Do you have any ideas how I could solve this issue?
The issue was that the agent couldn't see the Oracle listener.
The solution was to:
Create tnsnames.ora
Write configurtion
Reload services
After this the pcs didn't have any errors.

Anchore Engine - Jenkins CI plugin

We are trying to scan our docker images using Anchore Engine Jenkins plugin.
Currently we create our application docker images, push it in our own private local registry and then deploy it in our test environments.
Now, we want to setup docker image scanning in our CI/CD process to check for any vulnerabilities.
We have installed Anchore Engine using the recommended Docker-Compose yaml method given in the Documentation link:
https://anchore.freshdesk.com/support/solutions/articles/36000020729-install-on-docker-swarm
Post installation, we installed the
Anchore Container Image Scanner Plugin in Jenkins.
We configured the plugin as mentioned in the document link:
https://wiki.jenkins.io/display/JENKINS/Anchore+Container+Image+Scanner+Plugin
However, the scanning fails. Error Message as follows:
2018-10-11T07:01:44.647 INFO AnchoreWorker Analysis request accepted, received image digest sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8
2018-10-11T07:01:44.647 INFO AnchoreWorker Waiting for analysis of 10.180.25.2:5000/hello-world:latest, polling status periodically
2018-10-11T07:01:44.647 DEBUG AnchoreWorker anchore-engine get policy evaluation URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true
2018-10-11T07:01:44.648 DEBUG AnchoreWorker Attempting anchore-engine get policy evaluation (1/300)
2018-10-11T07:01:44.675 DEBUG AnchoreWorker anchore-engine get policy evaluation failed. URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true, status: HTTP/1.1 404 NOT FOUND, error: {
"detail": {},
"httpcode": 404,
"message": "image is not analyzed - analysis_status: not_analyzed"
}
NOTE:
In Image TAG 10.180.25.2:5000/hello-world:latest, 10.180.25.2:5000 is our local private registry and hello-world:latest is latest hello-world image available in docker hub which we pulled and pushed in our registry to try out image scanning using Anchore-Engine.
Unfortunately we are not able to find much resource online to try and resolve the above mentioned issue.
Anyone who might have worked on Anchore-Engine, please may I request to have a look and help us resolve this issue.
Also, any suggestions or alternatives to anchore-engine or detailed steps in case we might have missed anything would be really appreciated.
End of the output is as follows:
2018-10-15T00:48:43.880 WARN AnchoreWorker anchore-engine get policy evaluation failed. HTTP method: GET, URL: http://10.180.25.2:8228/v1/images/sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8/check?tag=10.180.25.2:5000/hello-world:latest&detail=true, status: 404, error: {
"detail": {},
"httpcode": 404,
"message": "image is not analyzed - analysis_status: not_analyzed"
}
2018-10-15T00:48:43.880 WARN AnchoreWorker Exhausted all attempts polling anchore-engine. Analysis is incomplete for sha256:7d6fb7e5e7a74a4309cc436f6d11c29a96cbf27a4a8cb45a50cb0a326dc32fe8
2018-10-15T00:48:43.880 ERROR AnchorePlugin Failing Anchore Container Image Scanner Plugin step due to errors in plugin execution
hudson.AbortException: Timed out waiting for anchore-engine analysis to complete (increasing engineRetries might help). Check above logs for errors from anchore-engine
at com.anchore.jenkins.plugins.anchore.BuildWorker.runGatesEngine(BuildWorker.java:480)
at com.anchore.jenkins.plugins.anchore.BuildWorker.runGates(BuildWorker.java:343)
at com.anchore.jenkins.plugins.anchore.AnchoreBuilder.perform(AnchoreBuilder.java:338)
at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.Build$BuildExecution.build(Build.java:206)
at hudson.model.Build$BuildExecution.doRun(Build.java:163)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
I also checked status and found below:
docker run anchore/engine-cli:latest anchore-cli --u admin --p admin123 --url http://172.18.0.1:8228/v1 system status
Service analyzer (dockerhostid-anchore-engine, http://anchore-engine:8084): up
Service catalog (dockerhostid-anchore-engine, http://anchore-engine:8082): up
Service policy_engine (dockerhostid-anchore-engine, http://anchore-engine:8087): down (unavailable)
Service simplequeue (dockerhostid-anchore-engine, http://anchore-engine:8083): up
Service apiext (dockerhostid-anchore-engine, http://anchore-engine:8228): up
Service kubernetes_webhook (dockerhostid-anchore-engine, http://anchore-engine:8338): up
Engine DB Version: 0.0.7
Engine Code Version: 0.2.4
It seems service policy engine is down
Service policy_engine (dockerhostid-anchore-engine, http://anchore-engine:8087): down (unavailable)
I also checked the docker logs . I found below error:
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [DEBUG] service (policy_engine) starting in: 4
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [INFO] Registration complete.
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [INFO] Checking feeds client credentials
[service:policy_engine] 2018-10-15 09:37:46+0000 [-] [bootstrap] [DEBUG] Initializing a feeds client
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [bootstrap] [DEBUG] init values: [None, None, None, (), None, None]
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [bootstrap] [DEBUG] using values: ['https://ancho.re/v1/service/feeds', 'https://ancho.re/oauth/token', 'https://ancho.re/v1/account/users', 'anon#ancho.re', 3, 60]
[service:policy_engine] 2018-10-15 09:37:47+0000 [-] [urllib3.connectionpool] [DEBUG] Starting new HTTPS connection (1): ancho.re
[service:policy_engine] 2018-10-15 09:37:50+0000 [-] [bootstrap] [ERROR] Preflight checks failed with error: HTTPSConnectionPool(host='ancho.re', port=443): Max retries exceeded with url: /v1/account/users/anon#ancho.re (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ffa905f0b90>: Failed to establish a new connection: [Errno 113] No route to host',)). Aborting service startup
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/anchore_manager/cli/service.py", line 158, in startup_service
raise Exception("process exited: " + str(rc))
Exception: process exited: 1
[anchore-policy-engine] [anchore_manager.cli.service/startup_service()] [INFO] service process exited at (Mon Oct 15 09:37:50 2018): process exited: 1
[anchore-policy-engine] [anchore_manager.cli.service/startup_service()] [INFO] exiting service thread
Thanks and Regards,
Rohan Shetty
When images are added to anchore-engine, they are queued for analysis which moves them through a simple state machine that starts with ‘not_analyzed’, goes to ‘analyzing’ and finally ends in either ‘analyzed’ or ‘analysis_failed’. Only when an image has reached ‘analyzed’ will a policy evaluation be possible.
The anchore Jenkins plugin will add an image, then poll the engine for image status/evaluation for the configured number of tries (default 300). Once the image goes to ‘analyzed’ (where policy evaluation is possible), the plugin will then receive a policy evaluation result from the engine.
The plugin will fail the build (by default) if the max retries has been performed and the image has not reached ‘analyzed’, if the image does reach ‘analyzed’ but the policy evaluation is producing a ‘fail’ result (meaning the image didn’t pass your configured policy checks). Note that all build failure behavior can be controlled in the plugin (I.e. there are options to allow the plugin to succeed even if the analysis or image eval fails).
You’ll need to look at the end of the output from your build run (instead of just the beginning from your post), and combined with the information above, it should be clear which scenario is causing the plugin to fail the build.
We have resolved the issue.
Root Cause:
We were not able to establish a successful https connection to URL : https://ancho.re from within the anchore-engine docker container.
As a result the service:policy_engine was not able to start.
https://ancho.re is required to download policy feeds and sync-up periodically. Without these policy anchore-engine won't be able to analyse the docker images.
Solution:
1) We passed a HTTPS_PROXY URL as an environment variable in the docker-compose.yaml of anchore-engine.
We used this proxy URL to bypass restrictions in our environment and establish a connection with https://ancho.re url.
2) Restarted the docker containers.
Finally we got all services up and running including Anchore policy-engine.
FYI:
It takes a while to download all the required Feeds depending on your internet speed.
Lastly, Thanks to the Anchore community for quick responses and support over slack.
Hope this helps.
Warm Regards,
Rohan Shetty

Submitting first job to pacemaker

I followed this guide:
https://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/
I stayed with the Active/Passive DRBD file system sharing. I had to reboot my cluster and now I am getting the following error:
Current DC: rbx-1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Tue Nov 28 17:01:14 2017
Last change: Tue Nov 28 16:40:09 2017 by root via cibadmin on rbx-1
2 nodes configured
5 resources configured
Node rbx-2: UNCLEAN (offline)
Online: [ rbx-1 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started rbx-1
WebSite (ocf::heartbeat:apache): Stopped
Master/Slave Set: WebDataClone [WebData]
WebData (ocf::linbit:drbd): FAILED rbx-1 (blocked)
Stopped: [ rbx-2 ]
WebFS (ocf::heartbeat:Filesystem): Stopped
Failed Actions:
* WebData_stop_0 on rbx-1 'invalid parameter' (2): call=20, status=complete, exitreason='none',
last-rc-change='Tue Nov 28 16:27:58 2017', queued=0ms, exec=3ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Any ideas?
Also does anyone have any recommended guides for submitting jobs?
This post is relatively old at this point but I'll leave this here for others to find if they stumble upon the same issue.
This problem has to do with an issue with the DRBD integration script that pacemaker uses. If it's broken, missing, has incorrect permissions, etc. you can get an error like this. In CentOS 7 that script is located at /usr/lib/ocf/resource.d/drbd
Note: This is specifically for the guide mentioned by OP but may help you:
Section 7.1 has a big "IMPORTANT" block that talks about replacing the Pacemaker integration script due to a bug. If you use the command it tells you to there, you actually replace the script with a 404 Error page which obviously doesn't work, causing the error. You can fix this issue by replacing the script with the original, either by reinstalling DRBD...
yum remove -y kmod-drbd84 drbd84-utils
yum install -y kmod-drbd84 drbd84-utils
...or finding just the drbd script elsewhere and adding/replacing it to /usr/lib/ocf/resource.d/drbd. Make sure its permissions are correct and that it is set as executable.
Hope that helps!

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads
So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.
./bdutil deploy -e extensions/spark/spark_env.sh
using debian-8 as image (as bdutil still uses debian-7-backports).
At some point I got
Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1
full debug output is in https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272
(project id and bucket names changed)
Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:
17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
and not able to import sparkSQL. For what I've read everything should be started automatically.
Up to this point I'm a bit lost and don't know what else to do.
Am I missing any step? Is any of the commands faulty? Thanks in advance.
Update: solved
As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave
java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`
That sounded to me like connectors were not initialized properly, so after running
./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors
it worked as expected.
The last version of bdutil on https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil.

Unable to start titan server with embedded cassandra and rexter

I am trying to run Titan with embedded cassandra and rexster. Downloaded Titan distribution titan-all-0.3.2 and unpacked on a linux box. After unpacking this is what i ran the command
$ ./bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties
This is what i see in the logs
After starting RexPro services its unable to deploy and start grizzly. Has anyone had this issue?
Exception stack trace:
13/10/18 14:51:31 INFO server.RexProRexsterServer: RexPro serving on port: [8184]
Oct 18, 2013 2:51:31 PM org.glassfish.grizzly.servlet.WebappContext deploy
INFO: Starting application [jersey] ...
Oct 18, 2013 2:51:31 PM org.glassfish.grizzly.servlet.WebappContext deploy
SEVERE: [jersey] Exception deploying application. See stack trace for details.
java.lang.RuntimeException: com.sun.jersey.api.container.ContainerException: No WebApplication provider is present
at org.glassfish.grizzly.servlet.WebappContext.initServlets(WebappContext.java:1479)
at org.glassfish.grizzly.servlet.WebappContext.deploy(WebappContext.java:265)
There were some packaging problems in some of the 0.3.2 zip files. You basically need to replace a jar file or two around Jersey to get it to work (or I think use the titan-cassandra distribution instead of titan-all).
You can read more about the issue here and its solution (also reported here), but the answer is:
You should be able to patch 0.3.2 by replacing this jar file in the
Titan lib directory:
jersey-core-1.8.jar
with:
jersey-core-1.17
(http://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.17/jersey-core-1.17.jar)

Resources