Mono on Linux: Event Logging

Mono on Linux: Event Logging - linux

I am working on getting C# applications written for Windows to run on Linux using Mono. I am using Mono 5.18.0.240 from the Mono repository, on Ubuntu 18.04.1.
My understanding is that Mono includes a local file-based event logger. By setting the environment variable MONO_EVENTLOG_TYPE to local (followed by an optional path), events are logged to a file-based log. However, logged events seem to not be sorted into the correct source directory that gets created. This makes it so that all events are logged to the same directory, which makes it more difficult to navigate through the files should many events be logged.
Consider this C# program that just logs two events each for three event sources:
using System;
using System.Diagnostics;
namespace EventLogTest
{
class Program
{
public static void Main()
{
var sources = new string[] { "source1", "source2", "source3" };
foreach(var source in sources){
if(! EventLog.SourceExists(source)) EventLog.CreateEventSource(source, "Application");
EventLog log = new EventLog();
log.Source = source;
log.WriteEntry("some event");
log.WriteEntry("another event");
}
}
}
}
We can build the program into an executable and then run it:
$ csc events.cs
Microsoft (R) Visual C# Compiler version 2.8.2.62916 (2ad4aabc)
Copyright (C) Microsoft Corporation. All rights reserved.
$ MONO_EVENTLOG_TYPE=local:./eventlog mono ./events.exe
The resulting structure of the eventlog directory looks like this:
$ tree ./eventlog
./eventlog
└── Application
├── 1.log
├── 2.log
├── 3.log
├── 4.log
├── 5.log
├── 6.log
├── Application
├── source1
├── source2
└── source3
5 directories, 6 files
Note that the directories source1, source2, and source3 were created, but the six log files were placed in the top level Application directory instead of the source directories. If we look at the source field of each log file, we can see that the source is correct:
$ grep -a Source ./eventlog/Application/*.log
eventlog/Application/1.log:Source: source1
eventlog/Application/2.log:Source: source1
eventlog/Application/3.log:Source: source2
eventlog/Application/4.log:Source: source2
eventlog/Application/5.log:Source: source3
eventlog/Application/6.log:Source: source3
My expectation is that the above directory structure should look like this instead, considering that each event log source had two events written (and I don't see the point of the second Application directory):
./eventlog
└── Application
├── source1
│   ├── 1.log
│   └── 2.log
├── source2
│   ├── 1.log
│   └── 2.log
└── source3
├── 1.log
└── 2.log
Now, I know that the obvious solution might be to use a logging solution other than Mono's built-in event logging. However, at this point, it is important that I stick with the built-in tools available.
Is there a way to configure Mono's built-in local event logging to save the events to log files in the relevant source directory, or is this possibly a bug in Mono?

Related

Spark glob filter to match a specific nested partition

I'm using Pyspark, but I guess this is valid to scala as well
My data is stored on s3 in the following structure
 main_folder
└──  year=2022
└──  month=03
├──  day=01
│ ├──  valid=false
│ │ └──  example1.parquet
│ └──  valid=true
│ └──  example2.parquet
└──  day=02
├──  valid=false
│ └──  example3.parquet
└──  valid=true
└──  example4.parquet
(For simplicity there is only one file in any folder, and only two days, in reality, there can be thousands of files and many days/months/years)
The files that are under the valid=true and valid=false partitions have a completely different schema, and I only want to read the files in the valid=true partition
I tried using the glob filter, but it fails with AnalysisException: Unable to infer schema for Parquet. It must be specified manually. which is a symptom of having no data (so no files matched)
spark.read.parquet('s3://main_folder', pathGlobFilter='*valid=true*)
I noticed that something like this works
spark.read.parquet('s3://main_folder', pathGlobFilter='*example4*)
however, as soon as I try to use a slash or do something above the bottom level it fails.
spark.read.parquet('s3://main_folder', pathGlobFilter='*/example4*)
spark.read.parquet('s3://main_folder', pathGlobFilter='*valid=true*example4*)
I did try to replace the * with ** in all locations, but it didn't work

pathGlobFilter seems to work only for the ending filename, but for subdirectories you can try below, however it may ignore partition discovery. To consider partition discovery add basePath property in load option
spark.read.format("parquet")\
.option("basePath","s3://main_folder")\
.load("s3://main_folder/*/*/*/valid=true/*")
However I am not sure if you can combine both wildcarding and pathGlobFilter if you want to match based on both subdirectories and end filenames.
Reference:
https://simplernerd.com/java-spark-read-multiple-files-with-glob/
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

Understanding --archive in dataproc pyspark

This is what the commmand help says:
--archives=[ARCHIVE,...]
Comma separated list of archives to be extracted into the working
directory of each executor. Must be one of the following file formats:
.zip, .tar, .tar.gz, or .tgz.
and, this answer here tells me that --archives will only be extracted on worker nodes
I am testing the --archive behavior the following way :
tl;dr - 1. I create an archive and zip it. 2. I create a simple rdd and map its element to os. walk('./'). 3. The archive.zip gets listed as a directory but os.walk does not traverse down this branch
My archive directory:
.
├── archive
│   ├── a1.py
│   ├── a1.txt
│   └── archive1
│   ├── a1_in.py
│   └── a1_in.txt
├── archive.zip
└── main.py
2 directories, 6 files
Testing code:
import os
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext
rdd = sc.parallelize(range(1))
walk_worker = rdd.map(lambda x: str(list(os.walk('./')))).distinct().collect()
walk_driver = list(os.walk('./'))
print('driver walk:', walk_driver)
print('worker walk:',walk_worker)
Dataproc run command:
gcloud dataproc jobs submit pyspark main.py --cluster pyspark-monsoon31 --region us-central1 --archives archive.zip
output:
driver walk: [('./', [], ['.main.py.crc', 'archive.zip', 'main.py', '.archive.zip.crc'])]
worker walk: ["[('./', ['archive.zip', '__spark_conf__', 'tmp'], ['pyspark.zip', '.default_container_executor.sh.crc', '.container_tokens.crc', 'default_container_executor.sh', 'launch_container.sh', '.launch_container.sh.crc', 'default_container_executor_session.sh', '.default_container_executor_session.sh.crc', 'py4j-0.10.9-src.zip', 'container_tokens']), ('./tmp', [], ['liblz4-java-5701923559211144129.so.lck', 'liblz4-java-5701923559211144129.so'])]"]
The output for driver node: The archive.zip is available but not extracted - EXPECTED
The output for worker node: os.walk is listing archive.zip as an extracted directory. The 3 directories available are ['archive.zip', '__spark_conf__', 'tmp']. But, to my surprise, only ./tmp is further traveresed and that is it
I have checked using os.listdir that archive.zip actually is a directory and not a zip. It's structure is:
└── archive.zip
└── archive
├── a1.py
├── a1.txt
└── archive1
├── a1_in.py
└── a1_in.txt
So, why is os.walk not walking down the archive.zip directory?

archive.zip is added as a symlink to worker nodes. Symlinks are not traversed by default.
If you change to walk_worker = rdd.map(lambda x: str(list(os.walk('./', followlinks=True)))).distinct().collect() you will get the output you are looking for:
worker walk: ["[('./', ['__spark_conf__', 'tmp', 'archive.zip'], ...
('./archive.zip', ['archive'], []), ('./archive.zip/archive', ['archive1'], ['a1.txt', 'a1.py']), ...."]

Nokia 3310: MIDlet always gives "Can't compile the file"

After reading this comment proving that it's possible to write custom apps for the new Nokia 3310 3G (TA-1006), I'm trying to get my own app running.
After doing a lot of reading about what MIDP, CLDC etc. are, I installed the Java ME SDK (on a fresh Ubuntu installation since Oracle only supports that or Windows), Eclipse and the Sun Wireless toolkit.
First of, I couldn't find any information on which version of MIDP and CLDC are supported by the device, so I went ahead and tried a few possible permutations, these are my results:
CLDC \ MIDP | 1.0 | 2.0 | 2.1 |
1.0 | * | * | X |
1.1 | * | * | ? |
1.8 | X | X | ? |
The ? ones I have not tried since MIDP 2.1 does not work and there is nothing to be gained and the X ones give an error "Can't install [MIDlet name] because it doesn't work with this phone".
So it seems like the phone supports the MIDP 2.0 profile and CLDC 1.1 configurations, however when I try to install my app (with any of the configurations of *) it always goes like this:
"[MIDlet name] is untrusted. Continue anyway?" > Ok (this was expected)
"Can't compile the file" (here is where I'm stuck)
What I tried so far (besides the various version permutations)
Initially I tried with a very basic MIDlet subtype:
public void startApp()
{
Form form = new Form("Hello");
form.append(new StringItem("Hello", "World!");
Display.getDisplay(this).setCurrent(form);
}
Next, I tried using these templates provided by the Eclipse plugin:
Splash MIDlet Template
Hello World Midlet Template
When selecting the runtime configuration (always picked DefaultColorPhone) I adjusted the version profile from MIDP-2.1 to MIDP-2.0
Tried the other configs MediaControlSkin and QwertyDevice
I always produced the *.jar and .jad files by clicking the "Packaging > Create Package" button in the "Application Descriptor" view.
At some point it became experimenting with various settings which I didn't have much confidence that it'll work, reading up and rinse-repeat. When looking for alternatives, the whole journey became quite frustrating since a lot of links are either on dodgy websites, 404s or for the old 3310 phone.
TL;DR
What configuration and build steps are necessary to get a simple (unsigned) application compiled for the new Nokia 3310?
Here are the full contents of the simplest failing example which imo should work:
$ tree
.
├── Application Descriptor
├── bin
│   └── com
│   └── stackoverflow
│   └── kvn
│   └── test
│   └── SOExample.class
├── build.properties
├── deployed
│   └── DefaultColorPhoneM2.0
│   ├── SOTest.jad
│   └── SOTest.jar
├── res
└── src
└── com
└── stackoverflow
└── kvn
└── test
└── SOExample.java
13 directories, 6 files
$ cat Application\ Descriptor
MIDlet-1: SOExample,,com.stackoverflow.kvn.test.SOExample
MIDlet-Jar-URL: SOTest.jar
MIDlet-Name: SOTest MIDlet Suite
MIDlet-Vendor: MIDlet Suite Vendor
MIDlet-Version: 1.0.0
MicroEdition-Configuration: CLDC-1.1
MicroEdition-Profile: MIDP-2.0
$ cat build.properties
# MTJ Build Properties
DefaultColorPhoneM2.0.includes=src/com/stackoverflow/kvn/test/SOExample.java\
DefaultColorPhoneM2.0.excludes=\
$ cat src/com/stackoverflow/kvn/test/SOExample.java
package com.stackoverflow.kvn.test;
import javax.microedition.lcdui.*;
import javax.microedition.midlet.*;
public class SOExample extends MIDlet {
private Form form;
protected void destroyApp(boolean unconditional)
throws MIDletStateChangeException { /* nop */ }
protected void pauseApp() { /* nop */ }
protected void startApp() throws MIDletStateChangeException {
form = new Form("Hello");
form.append(new StringItem("Hello", "World!"));
Display.getDisplay(this).setCurrent(form);
}
}
Software info of the device: Model: TA-1006; Software: 15.0.0.17.00; OS version: MOCOR_W17.44.3_Release; Firmware number: sc7701_barphone

Write dataframe to path outside current directory with a function?

I got a question that relates to (maybe is a duplicate of) this question here.
I try to write a pandas dataframe to an Excel file (non-existing before) in a given path. Since I have to do it quite a few times, I try to wrap it in a function. Here is what I do:
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
def excel_to_path(frame, path):
writer = pd.ExcelWriter(path , engine='xlsxwriter')
frame.to_excel(writer, sheet_name='Output')
writer.save()
excel_to_path(df, "../foo/bar/myfile.xlsx")
I get thrown the error [Errno 2] No such file or directory: '../foo/bar/myfile.xlsx'. How come and how can I fix it?
EDIT : It works as long the defined pathis inside the current working directory. But I'd like to specify any given pathinstead. Ideas?

I usually get bitten by forgetting to create the directories. Perhaps the path ../foo/bar/ doesn't exist yet? Pandas will create the file for you, but not the parent directories.
To elaborate, I'm guessing that your setup looks like this:
.
└── src
├── foo
│   └── bar
└── your_script.py
with src being your working directory, so that foo/bar exists relative to you, but ../foo/bar does not - yet!
So you should add the foo/bar directories one level up:
.
├── foo_should_go_here
│   └── bar_should_go_here
└── src
├── foo
│   └── bar
└── your_script.py

Read all files in a nested folder in Spark

If we have a folder folder having all .txt files, we can read them all using sc.textFile("folder/*.txt"). But what if I have a folder folder containing even more folders named datewise, like, 03, 04, ..., which further contain some .log files. How do I read these in Spark?
In my case, the structure is even more nested & complex, so a general answer is preferred.

If directory structure is regular, lets say something like this:
folder
├── a
│   ├── a
│   │   └── aa.txt
│   └── b
│   └── ab.txt
└── b
├── a
│   └── ba.txt
└── b
└── bb.txt
you can use * wildcard for each level of nesting as shown below:
>>> sc.wholeTextFiles("/folder/*/*/*.txt").map(lambda x: x[0]).collect()
[u'file:/folder/a/a/aa.txt',
u'file:/folder/a/b/ab.txt',
u'file:/folder/b/a/ba.txt',
u'file:/folder/b/b/bb.txt']

Spark 3.0 provides an option recursiveFileLookup to load files from recursive subfolders.
val df= sparkSession.read
.option("recursiveFileLookup","true")
.option("header","true")
.csv("src/main/resources/nested")
This recursively loads the files from src/main/resources/nested and it's subfolders.

if you want use only files which start with name "a" ,you can use
sc.wholeTextFiles("/folder/a*/*/*.txt") or sc.wholeTextFiles("/folder/a*/a*/*.txt")
as well. We can use * as wildcard.

sc.wholeTextFiles("/directory/201910*/part-*.lzo") get all match files name, not files content.
if you want to load the contents of all matched files in a directory, you should use
sc.textFile("/directory/201910*/part-*.lzo")
and setting reading directory recursive!
sc._jsc.hadoopConfiguration().set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
TIPS: scala differ with python, below set use to scala!
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Mono on Linux: Event Logging - linux

Related

Spark glob filter to match a specific nested partition

Understanding --archive in dataproc pyspark

Nokia 3310: MIDlet always gives "Can't compile the file"

Write dataframe to path outside current directory with a function?

Read all files in a nested folder in Spark

Categories

Resources