Running scala script with line breaks in spark-shell - apache-spark

I'm trying to run a scala script through spark shell using the following command:
spark-shell -i myScriptFile.scala
I can get the above command to work when I have single-line commands, but if I have any line-breaks in the script (for readability), the spark-shell (or REPL?) interprets each of the lines as a full action. Here is a sample of my script:
import org.apache.spark.sql.types._
import java.util.Calendar
import java.text.SimpleDateFormat
// *********************** This is for Dev ***********************
val dataRootPath = "/dev/test_data"
// *********************** End of DEV specific paths ***************
val format = new SimpleDateFormat("yyyy-MM-dd")
val currentDate = format.format(Calendar.getInstance().getTime()).toString
val cc_df = spark.read.parquet(s"${dataRootPath}/cc_txns")
.filter($"TXN_DT" >= date_sub(lit(current_date), 365) && $"TXN_DT" < lit(current_date))
.filter($"AMT" >= 0)
....
System.exit(0)
When running the spark-shell with this script, I get the following error:
<console>:1: error: illegal start of definition
The syntax for the script is correct because if I start the shell and manually paste this code in with :paste, everything works fine.
I have tried ending all multi-line commands with a backslash \ but that didn't work either.
Does anyone have any suggestions on how I can keep my script multi-lined but still be able to pass it the spark-shell as an argument to start with?

Try:
val x = { some statement ...
. some statement2 ...
. idem ditto
. ...
}

You can do :paste then paste (Ctrl+V). After pasting, do Ctrl+D in the shell.

Related

How to call a forward the value of a variable created in the script in Nextflow to a value output channel?

i have process that generates a value. I want to forward this value into an value output channel. but i can not seem to get it working in one "go" - i'll always have to generate a file to the output and then define a new channel from the first:
process calculate{
input:
file div from json_ch.collect()
path "metadata.csv" from meta_ch
output:
file "dir/file.txt" into inter_ch
script:
"""
echo ${div} > alljsons.txt
mkdir dir
python3 $baseDir/scripts/calculate.py alljsons.txt metadata.csv dir/
"""
}
ch = inter_ch.map{file(it).text}
ch.view()
how do I fix this?
thanks!
best, t.
If your script performs a non-trivial calculation, writing the result to a file like you've done is absolutely fine - there's nothing really wrong with this approach. However, since the 'inter_ch' channel already emits files (or paths), you could simple use:
ch = inter_ch.map { it.text }
It's not entirely clear what the objective is here. If the desire is to reduce the number of channels created, consider instead switching to the new DSL 2. This won't let you avoid writing your calculated result to a file, but it might mean you can avoid an intermediary channel, potentially.
On the other hand, if your Python script actually does something rather trivial and can be refactored away, it might be possible to assign a (global) variable (below the script: keyword) such that it can be referenced in your output declaration, like the line x = ... in the example below:
Valid output
values
are value literals, input value identifiers, variables accessible in
the process scope and value expressions. For example:
process foo {
input:
file fasta from 'dummy'
output:
val x into var_channel
val 'BB11' into str_channel
val "${fasta.baseName}.out" into exp_channel
script:
x = fasta.name
"""
cat $x > file
"""
}
Other than that, your options are limited. You might have considered using the env output qualifier, but this just adds some syntactic-sugar to your shell script at runtime, such that an output file is still created:
Contents of test.nf:
process test {
output:
env myval into out_ch
script:
'''
myval=$(calc.py)
'''
}
out_ch.view()
Contents of bin/calc.py (chmod +x):
#!/usr/bin/env python
print('foobarbaz')
Run with:
$ nextflow run test.nf
N E X T F L O W ~ version 21.04.3
Launching `test.nf` [magical_bassi] - revision: ba61633d9d
executor > local (1)
[bf/48815a] process > test [100%] 1 of 1 ✔
foobarbaz
$ cat work/bf/48815aeefecdac110ef464928f0471/.command.sh
#!/bin/bash -ue
myval=$(calc.py)
# capture process environment
set +u
echo myval=$myval > .command.env

Executing multiple linux commands using karate.fork()

Is it possible to add multiple commands using karate.fork()? I tried adding the commands using ; or && separation but the second command doesn't seem to be getting executed.
I am trying to cd to a particular directory before executing bash on a shell script.
* def command =
"""
function(line) {
var proc = karate.fork({ redirectErrorStream: false, useShell: true, line: line });
proc.waitSync();
karate.set('sysOut', proc.sysOut);
karate.set('sysErr', proc.sysErr);
karate.set('exitCode', proc.exitCode);
}
"""
* call command('cd ../testDirectory ; bash example.sh')
Note that instead of line - args as an array of command line arguments is supported, so try that as well - e.g. something like:
karate.fork({ args: ['cd', 'foo;', 'bash', 'example.sh'] })
But yes this may need some investigation. You can always try to have all the commands in a single batch file which should work.
Would be good if you can try the 1.0 RC since some improvements may have been added: https://github.com/intuit/karate/wiki/1.0-upgrade-guide

How to execute an exe file supply the input text file and get the output in a variable in Python 3 on Windows?

I have an exe file mycode.exe at "D:\projFolder\mycode.exe" and an input text file in.txt at "D:\projFolder\in.txt".
I am writing a Python3 script which will execute this exe file with the supplied input text and compare the output.
To achieve it, I am just trying to execute Windows command as :
cmd> "D:\projFolder\mycode.exe" < "D:\projFolder\in.txt"
and want to save the result of the above command in a variable say, resultstdout, and later use it to compare with an expected output file out.txt.
My problem, how to execute the Windows command "D:\projFolder\mycode.exe" < "D:\projFolder\in.txt" in Python3 script?
I was previously working on Python2 and I was achieving it as follows:
baseDirectory = "D:/projFolder"
( stat, consoleoutput ) = subprocess.getstatusoutput(baseDirectory + "/mycode.exe"+ " < " + contestDirectory+ "/" +in.txt)
if(stat == 0):
# Perform result comparision
else:
# Some execption while executing the command.
However, I am not sure on how to refactor the above code in Python 3.

with SBT compile a Scala string cant be '${foo.bar}'?

I want a Scala string to be ${foo.bar} (literally, for testing some variable substitution later).
I tried:
val str = "${foo.bar}"
val str = """${foo.bar}"""
val str = "\${foo.bar}"
val str = "$${foo.bar}"
val str = "$\{foo.bar}"
All giving compile errors like Error:(19, 15) possible missing interpolator: detected an interpolated expression or invalid escape character.
This is not a question about String interpolation (or variable substitution), This normally works without problems. Starting the Scala REPL (Scala 2.11.3, Java 1.8) works as expected. Somewhere there must be an SBT a setting (other than -Xlint or a hidden Xlint) which apparently is causing this behavior (from commandline and IntelliJ).
The s or f interpolator will emit a constant:
$ scala -Xlint
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions for evaluation. Or try :help.
scala> "${foo.bar}"
<console>:12: warning: possible missing interpolator: detected an interpolated expression
"${foo.bar}"
^
res0: String = ${foo.bar}
scala> f"$${foo.bar}"
res1: String = ${foo.bar}
It's usual to use -Xfatal-warnings to turn the warning into an error. IntelliJ reports it as an error at the source position, whereas scalac reports it as a warning, but with a summary error message that will fail a build.
\$ and \{ are invalid escape characters and will not compile. The other versions compile just fine on 2.12.6 though perhaps there are problems in earlier versions.

problems with optparse and python 3.4

Since upgrading to Python 3.4.3 optparse doesn't appear to recognise command line options. As a simple test i run this (from the optparse examples)
# test_optparse.py
def main():
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
(options, args) = parser.parse_args()
print(options)
if __name__ == '__main__':
main()
When I run test_optparse.py -f test I get
{'verbose': True, 'filename': None}
But running within my IDE i get
{'filename': 'test', 'verbose': True}
I first noted this in a script where I concatenated a run command, for example;
run_cmd = 'python.exe ' + '<path to script> + ' -q ' + '<query_name>'
res = os.system(run_cmd)
But when I displayed the run_cmd string it displayed in the interpreter over 2 lines
print(run_cmd)
'python.exe <path to script> -q '
' <query_name>'
So it may be that the passing of the command line is being fragmented by something and only the first section is being passed (hence no query name) and so the run python script fails with 'no query specified'.
I've changed all this to use subprocess.call to get around this, but it useful to have the run_query script for command line use as was. Any ideas or suggestions?

Resources