Java and Python codes give different output? - rounding

This is the Java code that results 897986030:
import java.util.Arrays;
import java.util.Scanner;
class Algorithm {
public static void main(String args[]) throws Exception {
int mod = 1000000007;
long factor = 900414279;
long p1 = 883069911;
long p2 = 32;
long val = 560076994;
val = (val%mod+factor*p1*p2%mod)%mod;
System.out.println(val);
}
}
This is the equivalent Python code that outputs 480330031:
factor = 900414279
p1 = 883069911
p2 = 32;
val = 560076994;
mod = 1000000007;
val = (val%mod+factor*p1*p2%mod)%mod;
print val
Please help. Thanks!

The answer lies in the fact that you are using primitive types in java that are prone to overflows.
Let me explain if you are not aware of this concept already. In java, C, C++ and the likes primitive types have a certain amount of space allocated for them and the variables cannot use any space more than that. This means that there is a maximum number that the long data type can store. This is done for performance reasons.
What might be happening in the code above is that when you multiply two long values the result might become larger than the maximum the long data type can store. And this results in an overflow, causing the data to be narrowed. So the results of the math expression are messed up.
With Python this is not that much of an issue because Python can store numbers of a much larger range. Overflows are rare in Python. And this is why things like cryptographic applications where big numbers are used are easy to write in Python.

Related

Is there a better way to convert a stream of code points into a string in Kotlin?

I have a sequence of code points as Sequence<Int>.
I want to get this into a String.
What I currently do is this:
val string = codePoints
.map { codePoint -> String(intArrayOf(codePoint), 0, 1) }
.joinToString()
But it feels extremely hairy to create a string for each code point just to concatenate them immediately after. Is there a more direct way to do this?
So far the best I was able to do was something like this:
val string2 = codePoints.toList().toIntArray()
.let { codePoints -> String(codePoints, 0, codePoints.size) }
The amount of code isn't really any better, and it has a toList().toIntArray() which I'm not completely fond of. But it at least avoids the packaging of everything into dozens of one-code-point strings, and the logic is still written in the logical order.
You can either go for the simple:
val string = codePoints.joinToString("") { Character.toString(it) }
// or
val string = codePoints.joinToString("", transform = Character::toString)
Or use a string builder:
fun Sequence<Int>.codePointsToString(): String = buildString {
this#codePointsToString.forEach { cp ->
appendCodePoint(cp)
}
}
This second one expresses exactly what you want, and may benefit from future optimizations in the string builder.
it feels extremely hairy to create a string for each code point just to concatenate them immediately after
Did you really measure a performance issue with the extra string objects created here? Using toList() would also create a bunch of object arrays behind the scenes (one for each resize), which is a bit less, but not tremendously better. And as you pointed out toIntArray on top of that is yet another array creation.
Unless you know the number of elements in the sequence up front, I don't believe there is much you can do about that (the string builder approach will also likely use a resizable array behind the scenes, but at least you don't need extra array copies).
val result = codePoints.map { Character.toString(it) }.joinToString("")
Edit, based on Joffrey's comment below:
val result = codePoints.joinToString("") { Character.toString(it) }
Additional edit, full example:
val codePoints: Sequence<Int> = sequenceOf(
'a'.code,
Character.toCodePoint(0xD83D.toChar(), 0xDE03.toChar()),
Character.toCodePoint(0xD83D.toChar(), 0xDE04.toChar()),
Character.toCodePoint(0xD83D.toChar(), 0xDE05.toChar())
)
val result = codePoints.joinToString("") { Character.toString(it) }
println(result)
This will print: a😃😄😅

Is StringBuffer the Kotlin way to handle multiple string concatenation like in java?

What would be the kotlin way to handle multiple string concatenation?
--edit--
placing the piece of code that led me to this doubt
fun getNCharsFromRange(n: Int, range: CharRange): String {
val chars = range.toList()
val buffer = StringBuffer()
while (buffer.length < n) {
val randomInt = Random.Default.nextInt(0, chars.lastIndex)
val newchar = chars[randomInt]
val lastChar = buffer.lastOrNull() ?: ""
if (newchar != lastChar) {
buffer.append(newchar)
}
}
return buffer.toString()
}
A StringBuilder is the standard way to construct a String in Kotlin, as in Java.
(Unless it can be done in one line, of course, where a string template is usually better than Java-style concatenation.)
Kotlin has one improvement, though: you can use buildString to handle that implicitly, which can make the code a little more concise.  For example, your code can be written as:
fun getNCharsFromRange(n: Int, range: CharRange): String {
val chars = range.toList()
return buildString {
while (length < n) {
val randomInt = Random.Default.nextInt(0, chars.lastIndex)
val newChar = chars[randomInt]
val lastChar = lastOrNull() ?: ""
if (newChar != lastChar)
append(newChar)
}
}
}
That has no mention of buffer: instead, buildString creates a StringBuilder, makes it available as this, and then returns the resulting String.  (So length, lastOrNull(), and append refer to the StringBuilder.)
For short code, this can be significantly more concise and clearer; though the benefits are much less clear with longer code.  (Your code may be in the grey area between…)
Worth pointing out that the function name is misleading: it avoids repeated characters, but allows duplicates that are not consecutive.  If that's deliberate, then it would be worth making clear in the function name (and/or its doc comment).  Alternatively, if the intent is to avoid all duplicates, then there's an approach which is much simpler and/or more efficient: to shuffle the range (or at least part of it).
Using existing library functions, and making it an extension function on CharRange, the whole thing could be as simple as:
fun CharRange.randomChars(n: Int) = shuffled().take(n).joinToString("")
That shuffles the whole list, even if only a few characters are needed.  So it would be even more efficient to shuffle just the part needed.  But there's no library function for that, so you'd have to write that manually.  I'll leave it as an exercise!

Scala Stream tail laziness and synchronization

In one of his videos (concerning Scala's lazy evaluation, namely lazy keyword), Martin Odersky shows the following implementation of cons operation used to construct a Stream:
def cons[T](hd: T, tl: => Stream[T]) = new Stream[T] {
def head = hd
lazy val tail = tl
...
}
So tail operation is written concisely using lazy evaluation feature of the language.
But in reality (in Scala 2.11.7), the implementation of tail is a bit less elegant:
#volatile private[this] var tlVal: Stream[A] = _
#volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
if (!tailDefined)
synchronized {
if (!tailDefined) {
tlVal = tlGen()
tlGen = null
}
}
tlVal
}
Double-checked locking and two volatile fields: that's roughly how you would implement a thread-safe lazy computation in Java.
So the questions are:
Doesn't lazy keyword of Scala provide any 'evaluated maximum once' guarantee in a multi-threaded case?
Is the pattern used in real tail implementation an idiomatic way to do a thread-safe lazy evaluation in Scala?
Doesn't lazy keyword of Scala provide any 'evaluated maximum once'
guarantee in a multi-threaded case?
Yes, it does, as others have stated.
Is the pattern used in real tail implementation an idiomatic way to do
a thread-safe lazy evaluation in Scala?
Edit:
I think I have the actual answer as to why not lazy val. Stream has public facing API methods such as hasDefinitionSize inherited from TraversableOnce. In order to know if a Stream has a finite size not, we need a way of checking without materializing the underlying Stream tail. Since lazy val doesn't actually expose the underlying bit, we can't do that.
This is backed by SI-1220
To strengthen this point, #Jasper-M points out that the new LazyList api in strawman (Scala 2.13 collection makeover) no longer has this issue, since the entire collection hierarchy has been reworked and there are no longer such concerns.
Performance related concerns
I would say "it depends" on which angle you're looking at this problem. From a LOB point of view, I'd say definitely go with lazy val for conciseness and clarity of implementation. But, if you look at it from the point of view of a Scala collections library author, things start to look differently. Think of it this way, you're creating a library which will be potentially be used by many people and ran on many machines across the world. This means that you should be thinking of the memory overhead of each structure, especially if you're creating such an essential data structure yourself.
I say this because when you use lazy val, by design you generate an additional Boolean field which flags if the value has been initialized, and I am assuming this is what the library authors were aiming to avoid. The size of a Boolean on the JVM is of course VM dependent, by even a byte is something to consider, especially when people are generating large Streams of data. Again, this is definitely not something I would usually consider and is definitely a micro optimization towards memory usage.
The reason I think performance is one of the key points here is SI-7266 which fixes a memory leak in Stream. Note how it is of importance to track the byte code to make sure no extra values are retained inside the generated class.
The difference in the implementation is that the definition of tail being initialized or not is a method implementation which checks the generator:
def tailDefined: Boolean = tlGen eq null
Instead of a field on the class.
Scala lazy values are evaluated only once in multi-threaded cases. This is because the evaluation of lazy members is actually wrapped in a synchronized block in the generated code.
Lets take a look at the simple claas,
class LazyTest {
lazy val x = 5
}
Now, lets compile this with scalac,
scalac -Xprint:all LazyTest.scala
This will result in,
package <empty> {
class LazyTest extends Object {
final <synthetic> lazy private[this] var x: Int = _;
#volatile private[this] var bitmap$0: Boolean = _;
private def x$lzycompute(): Int = {
LazyTest.this.synchronized(if (LazyTest.this.bitmap$0.unary_!())
{
LazyTest.this.x = (5: Int);
LazyTest.this.bitmap$0 = true
});
LazyTest.this.x
};
<stable> <accessor> lazy def x(): Int = if (LazyTest.this.bitmap$0.unary_!())
LazyTest.this.x$lzycompute()
else
LazyTest.this.x;
def <init>(): LazyTest = {
LazyTest.super.<init>();
()
}
}
}
You should be able to see... that the lazy evaluation is thread-safe. And you will also see some similarity to that "less elegant" implementation in Scala 2.11.7
You can also experiment with tests similar to following,
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
case class A(i: Int) {
lazy val j = {
println("calculating j")
i + 1
}
}
def checkLazyInMultiThread(): Unit = {
val a = A(6)
val futuresList = Range(1, 20).toList.map(i => Future{
println(s"Future $i :: ${a.j}")
})
Future.sequence(futuresList).onComplete(_ => println("completed"))
}
checkLazyInMultiThread()
Now, the implementation in standard library avoids using lazy because they are able to provide a more efficient solution than this generic lazy translation.
You are correct, lazy vals use locking precisely to guard against double evaluation when accessed at the same time by two threads. Future developments, furthermore, will give the same guarantees without locking.
What is idiomatic, in my humble opinion, is a highly debatable subject when it comes to a language that, by design, allows for a wide range of different idioms to be adopted. In general, however, application code tends to be considered idiomatic when going more into the direction of pure functional programming, as it gives a series of interesting advantages in terms of ease of testing and reasoning that would make sense to give up only in case of serious concerns. This concern can be one of performance, which is why the current implementation of the Scala Collection API, while exposing in most cases a functional interface, makes heavy use (internally and in restricted scopes) of vars, while loops and established patterns from imperative programming (as the one you highlighted in your question).

Convert Rcpp Armadillo matrix to double*

In RcppArmadillo, I need to know how I can convert arma::mat to c-style array double * for use in other functions.
When I run the following functions, the computer crashes:
R part:
nn3 <- function(x){
results=.Call("KNNCV", PACKAGE = "KODAMA", x)
results
}
C++ part:
double KNNCV(arma::mat x) {
double *cvpred = x.memptr();
return cvpred[1];
}
and at the end, I try:
nn3(as.matrix(iris[,-5]))
Can you help me to find the errors, please?
First, there is no such such thing as vector stored in a double*. You can cast to a C-style pointer to double; but without length information that does not buy you much.
By convention, most similar C++ classes give you a .begin() iterator to the beginning of the memory block (which Armadillo happens to guarantee to be contiguous, just like std::vector) so you can try that.
Other than that the (very fine indeed) Armadillo documentation tells you about memptr() which is probably what you want here. Straight copy from the example there:
mat A = randu<mat>(5,5);
const mat B = randu<mat>(5,5);
double* A_mem = A.memptr();
const double* B_mem = B.memptr();

Best way to build object from delimited string (hopefully not looped case)

this question feels like it would have been asked already, but I've not found anything so here goes...
I have constructor which is handed a string which is delimited. From that string I need to populate an object's instance variables. I can easily split the string by the delimited to give me an array of strings. I know I can simply iterate through the array and set my instance variables using ifs or a switch/case statement based on the current array index - however that just feels a bit nasty. Pseudo code:
String[] tokens = <from generic string tokenizer>;
for (int i = 0;i < tokens.length;i++) {
switch(i) {
case(0): instanceVariableA = tokens[i];
case(1): instanceVarliableB = tokens[i];
...
}
}
Does anyone have any ideas of how I do this better/nicer?
For what it's worth, I'm working in Java, but I guess this is language independant.
Uhm... "nasty" is in the way the constructor handles the parameters. If you can't change that then your code snippet is as good as it may be.
You could get rid of the for loop, though...
instanceVariableA = tokens[0];
instanceVariableB = tokens[1];
and then introduce constants (for readibilty):
instanceVariableA = tokens[VARIABLE_A_INDEX];
instanceVariableB = tokens[VARIABLE_B_INDEX];
NOTE: if you could change the string parameter syntax you could introduce a simple parser and, with a little bit of reflection, handle this thing in a slightly more elegant way:
String inputString = "instanceVariableA=some_stuff|instanceVariableB=some other stuff";
String[] tokens = inputString.split("|");
for (String token : tokens)
{
String[] elements = token.split("=");
String propertyName = tokens[0];
String propertyValue = tokens[1];
invokeSetter(this, propertyName, propertyValue); // TODO write method
}
Could you not use a "for-each" loop to eliminate much of the clutter?
I really think the way you are doing it is fine, and Manrico makes a good suggestion about using constants as well.
Another method would be to create a HashMap with integer keys and string values where the key is the index and the value is the name of the property. You could then use a simple loop and some reflection to set the properties. The reflection part might make this a bit slow, but in another language (say, PHP for example) this would be much cleaner.
just an untested idea,
keep the original token...
String[] tokens = <from generic string tokenizer>;
then create
int instanceVariableA = 0;
int instanceVariableB = 1;
if you need to use it, then just
tokens[instanceVariableA];
hence no more loops, no more VARIABLE_A_INDEX...
maybe JSON might help?
Python-specific solution:
Let's say params = ["instanceVariableA", "instanceVariableB"]. Then:
self.__dict__.update(dict(zip(params, tokens)))
should work; that's roughly equivalent to
for k,v in zip(params, tokens):
setAttr(self, k, v)
depending on the presence/absence of accessors.
In a non-dynamic language, you could accomplish the same effect building a mapping from strings to references/accessors of some kind.
(Also beware that zip stops when either list runs out.)

Resources