I am working in Scala programming language. I want to hash the entire column of dataframe with sha2 and salt. I have implemented the following UDF which should take MessageDigest and input string which will be hashed.
val md = MessageDigest.getInstance("SHA-256")
val random = new SecureRandom();
val salt: Array[Byte] = new Array[Byte](16)
random.nextBytes(salt)
md.update(salt)
dataFrame.withColumn("ColumnName", Sqlfunc(md, col("ColumnName")))
....some other code....
val HashValue: ((MessageDigest, String) => String) = (md: MessageDigest, input: String) =>
{
val hashedPassword: Array[Byte] = md.digest(input.getBytes(StandardCharsets.UTF_8))
val sb: StringBuilder = new StringBuilder
for (b <- hashedPassword) {sb.append(String.format("%02x", Byte.box(b)))}
sb.toString();
}
val Sqlfunc = udf(HashValue)
However the above code does not compile, because I dont know how to pass messageDigest to this function so I am running into following error
<<< ERROR!
java.lang.ClassCastException: com...................$$anonfun$9 cannot be cast to scala.Function1
Can someone tell me what am I doing wrong?
Also I am novice on cryptography so feel free to suggest anything you can. We have to use Sha2 and salt.
What do you think about the performance here?
Thanks
The MessageDigest is not in your data. It's just context for the UDF evaluation. This type of context is provided via closures.
There are many ways to achieve the desired effect. The following is a useful pattern that uses function currying:
object X extends Serializable {
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf
def foo(context: String)(arg1: Int, arg2: Int): String =
context.slice(arg1, arg2)
def udfFoo(context: String): UserDefinedFunction =
udf(foo(context) _)
}
Trying it out:
spark.range(1).toDF
.select(X.udfFoo("Hello, there!")('id, 'id + 5))
.show(false)
generates
+-----------------+
|UDF(id, (id + 5))|
+-----------------+
|Hello |
+-----------------+
This question already has an answer here:
Most efficient way to create a Scala Map from a file of strings?
(1 answer)
Closed 4 years ago.
Hi so I'm trying to create a Map[String, String] based on a text file, in the textfile there are arbritrary lines that begin with ";;;" that I ignore with the function and the lines that i dont ignore are the key-> values. they are separated by 2 spaces.
whenever i run my code i get an error saying the expected type Map[String,String] isn't the required type, even though my conversions seem correct.
def createMap(filename: String): Map[String,String] = {
for (line <- Source.fromFile(filename).getLines) {
if (line.nonEmpty && !line.startsWith(";;;")) {
val string: String = line.toString
val splits: Array[String] = string.split(" ")
splits.map(arr => arr(0) -> arr(1)).toMap
}
}
}
I expect it to return a (String -> String) map but instead i get a bunch of errors. how would i fix this?
Since your if statement is not an expression in the for-loop. You should use the if as a filter when yielding your results. To return a result, you must make it a for-comprehension. After the for-comprehension filters the results. You can map this structure to a Map.
import scala.io.Source
def createMap(filename: String): Map[String,String] = {
val keyValuePairs = for (line <- Source.fromFile(filename).getLines; if line.nonEmpty && !line.startsWith(";;;")) yield {
val string = line.toString
val splits: Array[String] = string.split(" ")
splits(0) -> splits(1)
}
keyValuePairs.toMap
}
Okay, so I took a second look. It looks like the file has some corrupt encodings. You can try this as a solution. It worked in my Scala REPL:
import java.nio.charset.CodingErrorAction
import scala.io.{Codec, Source}
def createMap(filename: String): Map[String,String] = {
val decoder = Codec.UTF8.decoder.onMalformedInput(CodingErrorAction.IGNORE)
Source.fromFile(filename)(decoder).getLines()
.filter(line => line.nonEmpty && !line.startsWith(";;;"))
.flatMap(line => {
val arr = line.split("\\s+")
arr match {
case Array(key, value) => Some(key -> value)
case Array(key, values#_*) => Some(key -> values.mkString(" "))
case _ => None
}
}).toMap
}
Interested in getting some of the Chisel2 code to work within the Chisel3 equilibrium, I managed to have Chisel2 chisel-tutorial examples like FullAdder:
class FullAdder extends Module {
val io = new Bundle {
val a = UInt(INPUT, 1)
val b = UInt(INPUT, 1)
val cin = UInt(INPUT, 1)
val sum = UInt(OUTPUT, 1)
val cout = UInt(OUTPUT, 1)
}
// Generate the sum
val a_xor_b = io.a ^ io.b
io.sum := a_xor_b ^ io.cin
// Generate the carry
val a_and_b = io.a & io.b
val b_and_cin = io.b & io.cin
val a_and_cin = io.a & io.cin
io.cout := a_and_b | b_and_cin | a_and_cin
}
up and running with the command:
>test:runMain examples.Launcher FullAdder
using a bit of magic dust contained in the line:
import Chisel._
However, once I tried instantiating that FullAdder in this example (adding, of course, import Chisel._):
class Adder(val n:Int) extends Module {
val io = new Bundle {
val A = UInt(INPUT, n)
val B = UInt(INPUT, n)
val Cin = UInt(INPUT, 1)
val Sum = UInt(OUTPUT, n)
val Cout = UInt(OUTPUT, 1)
}
//create a vector of FullAdders
val FAs = Vec(n, Module(new FullAdder()).io)
val carry = Wire(Vec(n+1, UInt(width = 1)))
val sum = Wire(Vec(n, Bool()))
//first carry is the top level carry in
carry(0) := io.Cin
//wire up the ports of the full adders
for (i <- 0 until n) {
FAs(i).a := io.A(i)
FAs(i).b := io.B(i)
FAs(i).cin := carry(i)
carry(i+1) := FAs(i).cout
sum(i) := FAs(i).sum.toBool()
}
io.Sum := sum.toBits.toUInt()
io.Cout := carry(n)
}
I got an error concerning this line:
io.Sum := sum.toBits.toUInt()
as follows:
[error] /home/apaj/testing-learning-journey/learning-journey/src/main/scala/examples/Adder.scala:32: not enough arguments for method toUInt: (implicit compileOptions: chisel3.core.CompileOptions)chisel3.core.UInt.
[error] Unspecified value parameter compileOptions.
[error] io.Sum := sum.toBits.toUInt()
Information found here and here enabled me to conclude that I should try with asUInt() instead of toUInt().
However, that leads to the following output to my reqest:
> test:run-main examples.Launcher Adder
[info] Running examples.Launcher Adder
Starting tutorial Adder
[info] [0.001] Elaborating design...
chisel3.core.Binding$ExpectedHardwareException: bits to be indexed 'chisel3.core.UInt#30' must be hardware, not a bare Chisel type
which is followed by a lot of java-like complaints and concludes with:
================================================================================
Errors: 1: in the following tutorials
Tutorial Adder: exception bits to be indexed 'chisel3.core.UInt#30' must be hardware, not a bare Chisel type
================================================================================
The only relevant resource I could find is this bug report, but I am really at a loss how to implement this advice and where exactly should I attack this problem of 'chisel3.core.UInt#30' must be hardware.
I guess I am missing something else I should import to enable the correct translation of asUInt() in this context, but I am afraid I am not seeing it. Kindly provide help if possible or at least directions for further reading - either is highly appreciated, thank you!
I'm a little rusty on the Chisel2 parts, but I think the problem (after your correct fix to use io.Sum := sum.asUInt()) is the line
val FAs = Vec(n, Module(new FullAdder()).io)
This is not instantiating a Vec of FullAdders but merely creating a Vec with elements of that type. The following compiles for me. It creates the Vec from an Seq of instantiated FullAdders.
val FAs = VecInit(Seq.fill(n)(Module(new FullAdder()).io))
It is trying to disambiguate code like this that pushed the somewhat different API of chisel3. I hope this helps.
I'm searching a Textpad syntax file for groovy. There is none on the Textpad Syntax Definitions page (http://www.textpad.com/add-ons/syna2g.html).
All I have found so far are links to a file that was on Codehaus (http://docs.codehaus.org/download/attachments/2747/groovy.syn). Now that Codehaus is gone, where do I find that file? Anybody still has it installed and can post it here?
It's in the internet archive:
https://web.archive.org/web/20150508150805/http://docs.codehaus.org/download/attachments/2747/groovy.syn
I'll post it here as well -- though at time of writing, it's 3 years old, and probably needs updating ;-)
; (c) July 2004, Guillaume Laforge
; Groovy, a scripting language for the JVM, is hosted at Codehaus
; This file is a Groovy Syntax for TextPad,
; inspired from the Java Syntax file provided with TextPad
C=1
[Syntax]
Namespace1 = 6
IgnoreCase = No
InitKeyWordChars = A-Za-z_
KeyWordChars = A-Za-z0-9_
BracketChars = {[()]}
OperatorChars = -+*/<>!~%^&|=.
PreprocStart =
SyntaxStart =
SyntaxEnd =
HexPrefix = 0x
CommentStart = /*
CommentEnd = */
CommentStartAlt = """
CommentEndAlt = """
SingleComment = //
SingleCommentCol =
SingleCommentAlt =
SingleCommentColAlt =
SingleCommentEsc =
StringsSpanLines = Yes
StringStart = "
StringEnd = "
StringAlt =
StringEsc = \
CharStart = '
CharEnd = '
CharEsc = \
[Keywords 1]
; Keywords and common classes
as
assert
Boolean
Byte
Character
Class
Double
Float
Integer
Long
Number
Object
Short
String
property
void
abstract
assert
boolean
break
byte
case
catch
char
class
const
continue
default
do
double
else
extends
false
final
finally
float
for
goto
if
implements
import
instanceof
in
int
interface
long
native
new
null
package
private
protected
public
return
short
static
strictfp
super
switch
synchronized
this
throw
throws
transient
true
try
void
volatile
while
[Keywords 2]
abs
accept
allProperties
and
any
append
asImmutable
asSynchronized
asWritable
center
collect
compareTo
contains
count
decodeBase64
div
dump
each
eachByte
eachFile
eachFileRecurse
eachLine
eachMatch
eachProperty
eachPropertyName
eachWithIndex
encodeBase64
every
execute
filterLine
find
findAll
findIndexOf
flatten
getErr
getIn
getOut
getText
inject
inspect
intersect
intdiv
invokeMethod
isCase
join
leftShift
max
min
minus
mod
multiply
negate
newInputStream
newOutputStream
newPrintWriter
newReader
newWriter
next
or
padLeft
padRight
plus
pop
previous
print
println
readBytes
readLine
readLines
reverse
reverseEach
rightShift
rightShiftUnsigned
round
size
sort
splitEachLine
step
subMap
times
toDouble
toFloat
toInteger
tokenize
toList
toLong
toURL
transformChar
transformLine
upto
use
waitForOrKill
withInputStream
withOutputStream
withPrintWriter
withReader
withStream
withStreams
withWriter
withWriterAppend
write
writeLine
I tried to use readInt() to read two integers from the same line but that is not how it works.
val x = readInt()
val y = readInt()
With an input of 1 727 I get the following exception at runtime:
Exception in thread "main" java.lang.NumberFormatException: For input string: "1 727"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:231)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at scala.Console$.readInt(Console.scala:356)
at scala.Predef$.readInt(Predef.scala:201)
at Main$$anonfun$main$1.apply$mcVI$sp(Main.scala:11)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:75)
at Main$.main(Main.scala:10)
at Main.main(Main.scala)
I got the program to work by using readf but it seems pretty awkward and ugly to me:
val (x,y) = readf2("{0,number} {1,number}")
val a = x.asInstanceOf[Int]
val b = y.asInstanceOf[Int]
println(function(a,b))
Someone suggested that I just use Java's Scanner class, (Scanner.nextInt()) but is there a nice idiomatic way to do it in Scala?
Edit:
My solution following paradigmatic's example:
val Array(a,b) = readLine().split(" ").map(_.toInt)
Followup Question: If there were a mix of types in the String how would you extract it? (Say a word, an int and a percentage as a Double)
If you mean how would you convert val s = "Hello 69 13.5%" into a (String, Int, Double) then the most obvious way is
val tokens = s.split(" ")
(tokens(0).toString,
tokens(1).toInt,
tokens(2).init.toDouble / 100)
// (java.lang.String, Int, Double) = (Hello,69,0.135)
Or as mentioned you could match using a regex:
val R = """(.*) (\d+) (\d*\.?\d*)%""".r
s match {
case R(str, int, dbl) => (str, int.toInt, dbl.toDouble / 100)
}
If you don't actually know what data is going to be in the String, then there probably isn't much reason to convert it from a String to the type it represents, since how can you use something that might be a String and might be in Int? Still, you could do something like this:
val int = """(\d+)""".r
val pct = """(\d*\.?\d*)%""".r
val res = s.split(" ").map {
case int(x) => x.toInt
case pct(x) => x.toDouble / 100
case str => str
} // Array[Any] = Array(Hello, 69, 0.135)
now to do anything useful you'll need to match on your values by type:
res.map {
case x: Int => println("It's an Int!")
case x: Double => println("It's a Double!")
case x: String => println("It's a String!")
case _ => println("It's a Fail!")
}
Or if you wanted to take things a bit further, you could define some extractors which will do the conversion for you:
abstract class StringExtractor[A] {
def conversion(s: String): A
def unapply(s: String): Option[A] = try { Some(conversion(s)) }
catch { case _ => None }
}
val intEx = new StringExtractor[Int] {
def conversion(s: String) = s.toInt
}
val pctEx = new StringExtractor[Double] {
val pct = """(\d*\.?\d*)%""".r
def conversion(s: String) = s match { case pct(x) => x.toDouble / 100 }
}
and use:
"Hello 69 13.5%".split(" ").map {
case intEx(x) => println(x + " is Int: " + x.isInstanceOf[Int])
case pctEx(x) => println(x + " is Double: " + x.isInstanceOf[Double])
case str => println(str)
}
prints
Hello
69 is Int: true
0.135 is Double: true
Of course, you can make the extrators match on anything you want (currency mnemonic, name begging with 'J', URL) and return whatever type you want. You're not limited to matching Strings either, if instead of StringExtractor[A] you make it Extractor[A, B].
You can read the line as a whole, split it using spaces and then convert each element (or the one you want) to ints:
scala> "1 727".split(" ").map( _.toInt )
res1: Array[Int] = Array(1, 727)
For most complex inputs, you can have a look at parser combinators.
The input you are describing is not two Ints but a String which just happens to be two Ints. Hence you need to read the String, split by the space and convert the individual Strings into Ints as suggested by #paradigmatic.
One way would be splitting and mapping:
// Assuming whatever is being read is assigned to "input"
val input = "1 727"
val Array(x, y) = input split " " map (_.toInt)
Or, if you have things a bit more complicated than that, a regular expression is usually good enough.
val twoInts = """^\s*(\d+)\s*(\d+)""".r
val Some((x, y)) = for (twoInts(a, b) <- twoInts findFirstIn input) yield (a, b)
There are other ways to use regex. See the Scala API docs about them.
Anyway, if regex patterns are becoming too complicated, then you should appeal to Scala Parser Combinators. Since you can combine both, you don't loose any of regex's power.
import scala.util.parsing.combinator._
object MyParser extends JavaTokenParsers {
def twoInts = wholeNumber ~ wholeNumber ^^ { case a ~ b => (a.toInt, b.toInt) }
}
val MyParser.Success((x, y), _) = MyParser.parse(MyParser.twoInts, input)
The first example was more simple, but harder to adapt to more complex patterns, and more vulnerable to invalid input.
I find that extractors provide some machinery that makes this type of processing nicer. And I think it works up to a certain point nicely.
object Tokens {
def unapplySeq(line: String): Option[Seq[String]] =
Some(line.split("\\s+").toSeq)
}
class RegexToken[T](pattern: String, convert: (String) => T) {
val pat = pattern.r
def unapply(token: String): Option[T] = token match {
case pat(s) => Some(convert(s))
case _ => None
}
}
object IntInput extends RegexToken[Int]("^([0-9]+)$", _.toInt)
object Word extends RegexToken[String]("^([A-Za-z]+)$", identity)
object Percent extends RegexToken[Double](
"""^([0-9]+\.?[0-9]*)%$""", _.toDouble / 100)
Now how to use:
List("1 727", "uptime 365 99.999%") collect {
case Tokens(IntInput(x), IntInput(y)) => "sum " + (x + y)
case Tokens(Word(w), IntInput(i), Percent(p)) => w + " " + (i * p)
}
// List[java.lang.String] = List(sum 728, uptime 364.99634999999995)
To use for reading lines at the console:
Iterator.continually(readLine("prompt> ")).collect{
case Tokens(IntInput(x), IntInput(y)) => "sum " + (x + y)
case Tokens(Word(w), IntInput(i), Percent(p)) => w + " " + (i * p)
case Tokens(Word("done")) => "done"
}.takeWhile(_ != "done").foreach(println)
// type any input and enter, type "done" and enter to finish
The nice thing about extractors and pattern matching is that you can add case clauses as necessary, you can use Tokens(a, b, _*) to ignore some tokens. I think they combine together nicely (for instance with literals as I did with done).