Hashing a tuple not matching the expected value in python - python-3.x

I am trying to solve the below problem:
Given an integer, n , and n space-separated integers as input, create a tuple, t , of those n integers. Then compute and print the result of hash(t).
I am using python 3.
My code is
if __name__ == '__main__':
n = int(input())
integer_list = map(int, input().split())
t = tuple(integer_list)
print(hash(t))
The expected output is 3713081631934410656 but I am getting -3550055125485641917. I think my code is correct. Why am i getting a different output?
If I am using Pypy3, I am getting the correct output 3713081631934410656 but not with Python 3

Python doesn't promise that tuple hashing will produce any particular output. There is no such thing as the "correct" output for hash(some_tuple). The tuple hash implementation is free to change, and it has changed in Python 3.8.
Your assignment was likely written for a different Python version than the one you're testing on, without consideration of the fact that the tuple hash algorithm is an implementation detail.

Related

How to use conditions while operating on dataframes in julia

I am trying to find the mean value of the dataframe's elements in corresponding to particular column when either of the condition is true. For example:
Using Statistics
df = DataFrame(value, xi, xj)
resulted_mean = []
for i in range(ncol(df))
push!(resulted_mean, mean(df[:value], (:xi == i | :xj == i)))
Here, I am checking when either xi or xj is equal to i then find the mean of the all the corresponding values stored in [:value] column. This mean will later be pushed to the array -> resulted_mean
However, this code is not producing the desired output.
Please suggest the optimal approach to fix this code snippet.
Thanks in advance.
I agree with Bogumił's comment, you should really consult the Julia documentation to get a basic understanding of the language, and then run through the DataFrames tutorials. I will however annotate your code to point out some of the issues so you might be able to target your learning a bit better:
Using Statistics
Julia (like most other languages) is case sensitive, so writing Usingis not the same as the reserved keyword using which is used to bring package definitions into your namespace. The relevant docs entry is here
Note also that you are using the DataFrames package, so to make your code reproducible you would have had to do using DataFrames, Statistics.
df = DataFrame(value, xi, xj)
It's unclear what this line is supposed to do as the arguments passed to the constructor are undefined, but assuming value, xi and xj are vectors of numbers, this isn't a correct way to construct a DataFrame:
julia> value = rand(10); xi = repeat(1:2, 5); xj = rand(1:2, 10);
julia> df = DataFrame(value, xi, xj)
ERROR: MethodError: no method matching DataFrame(::Vector{Float64}, ::Vector{Int64}, ::Vector{Int64})
You can read about constructors in the docs here, the most common approach for a DataFrame with only few columns like here would probably be:
julia> df = DataFrame(value = value, xi = xi, xj = xj)
10×3 DataFrame
Row │ value xi xj
│ Float64 Int64 Int64
─────┼────────────────────────
1 │ 0.539533 1 2
2 │ 0.652752 2 1
3 │ 0.481461 1 2
...
Then you have
resulted_mean = []
I would say in this case the overall approach of preallocating a vector and pushing to it in a loop isn't ideal as it adds a lot of verbosity for no reason (see below), but as a general remark you should avoid untyped arrays in Julia:
julia> resulted_mean = []
Any[]
Here the Any means that the array can hold values of any type (floating point numbers, integers, strings, probability distributions...), which means the compiler cannot anticipate what the actual content will be from looking at the code, leading to suboptimal machine code being generated. In doing so, you negate the main advantage that Julia has over e.g. base Python: the rich type system combined with a lot of compiler optimizations allow generation of highly efficient machine code while keeping the language dynamic. In this case, you know that you want to push the results of the mean function to the results vector, which will be a floating point number, so you should use:
julia> resulted_mean = Float64[]
Float64[]
That said, I wouldn't recommend pushing in a loop here at all (see below).
Your loop is:
for i in range(ncol(df))
...
A few issues with this:
Loops in Julia require an end, unlike in Python where their end is determined based on code indentation
range is a different function in Julia than in Python:
julia> range(5)
ERROR: ArgumentError: At least one of `length` or `stop` must be specified
You can learn about functions using the REPL help mode (type ? at the REPL prompt to access it):
help?> range
search: range LinRange UnitRange StepRange StepRangeLen trailing_zeros AbstractRange trailing_ones OrdinalRange AbstractUnitRange AbstractString
range(start[, stop]; length, stop, step=1)
Given a starting value, construct a range either by length or from start to stop, optionally with a given step (defaults to 1, a UnitRange). One of length or stop is required. If length, stop, and step are all specified, they must
agree.
...
So you'd need to do something like
julia> range(1, 5, step = 1)
1:1:5
That said, for simple ranges like this you can use the colon operator: 1:5 is the same as `range(1, 5, step = 1).
You then iterate over integers from 1 to ncol(df) - you might want to check whether this is what you're actually after, as it seems unusual to me that the values in the xi and xj columns (on which you filter in the loop) would be related to the number of columns in your DataFrame (which is 3).
In the loop, you do
push!(resulted_mean, mean(df[:value], (:xi == i | :xj == i)))
which again has a few problems: first of all you are passing the subsetting condition for your DataFrame to the mean function, which doesn't work:
julia> mean(rand(10), rand(Bool, 10))
ERROR: MethodError: objects of type Vector{Float64} are not callable
The subsetting condition itself has two issues as well: when you write :xi, there is no way for Julia to know that you are referring to the DataFrame column xi, so all you're doing is comparing the Symbol :xi to the value of i, which will always return false:
julia> :xi == 2
false
Furthermore, note that | has a higher precedence than ==, so if you want to combine two equality checks with or you need brackets:
julia> 1 == 1 | 2 == 2
false
julia> (1 == 1) | (2 == 2)
true
More things could be said about your code snippet, but I hope this gives you an idea of where your gaps in understanding are and how you might go about closing them.
For completeness, here's how I would approach your problem - I'm interpreting your code to mean "calculate the mean of the value column, grouped by each value of xi and xj, but only where xi equals xj":
julia> combine(groupby(df[df.xi .== df.xj, :], [:xi, :xj], sort = true), :value => mean => :resulted_mean)
2×3 DataFrame
Row │ xi xj resulted_mean
│ Int64 Int64 Float64
─────┼─────────────────────────────
1 │ 1 1 0.356811
2 │ 2 2 0.977041
This is probably the most common analysis pattern for DataFrames, and is explained in the tutorial that Bogumił mentioned as well as in the DataFrames docs here.
As I said up front, if you want to use Julia productively, I recommend that you spend some time reading the documentation both for the language itself as well as for any of the key packages you're using. While Julia has some similarities to Python, and some bits in the DataFrames package have an API that resemble things you might have seen in R, it is a language in its own right that is fundamentally different from both Python and R (or any other language for that matter), and there's no way around familiarizing yourself with how it actually works.

On a dataset made up of dictionaries, how do I multiply the elements of each dictionary with Python'

I started coding in Python 4 days ago, so I'm a complete newbie. I have a dataset that comprises an undefined number of dictionaries. Each dictionary is the x and y of a point in the coordinates.
I'm trying to compute the summatory of xy by nesting the loop that multiplies xy within the loop that sums the products.
However I haven't been able to figure out how to multiply the values for the two keys in each dictionary (so far I only got to multiply all the x*y)
So far I've got this:
If my data set were to be d= [{'x':0, 'y':0}, {'x':1, 'y':1}, {'x':2, 'y':3}]
I've got the code for the function that calculates the product of each pair of x and y:
def product_xy (product_x_per_y):
prod_xy =[]
n = 0
for i in range (len(d)):
result = d[n]['x']*d[n]['y']
prod_xy.append(result)
n+1
return prod_xy
I also have the function to add up the elements of a list (like prod_xy):
def total_xy_prod (sum_prod):
all = 0
for s in sum_prod:
all+= s
return all
I've been trying to find a way to nest this two functions so that I can iterate through the multiplication of each x*y and then add up all the products.
Make sure your code works as expected
First, your functions have a few mistakes. For example, in product_xy, you assign n=0, and later do n + 1; you probably meant to do n += 1 instead of n + 1. But n is also completely unnecessary; you can simply use the i from the range iteration to replace n like so: result = d[i]['x']*d[i]['y']
Nesting these two functions: part 1
To answer your question, it's fairly straightforward to get the sum of the products of the elements from your current code:
coord_sum = total_xy_prod(product_xy(d))
Nesting these two functions: part 2
However, there is a much shorter and more efficient way to tackle this problem. For one, Python provides the built-in function sum() to sum the elements of a list (and other iterables), so there's no need create total_xy_prod. Our code could at this point read as follows:
coord_sum = sum(product_xy(d))
But product_xy is also unnecessarily long and inefficient, and we could also replace it entirely with a shorter expression. In this case, the shortening comes from generator expressions, which are basically compact for-loops. The Python docs give some of the basic details of how the syntax works at list comprehensions, which are distinct, but closely related to generator expressions. For the purposes of answering this question, I will simply present the final, most simplified form of your desired result:
coord_sum = sum(e['x'] * e['y'] for e in d)
Here, the generator expression iterates through every element in d (using for e in d), multiplies the numbers stored in the dictionary keys 'x' and 'y' of each element (using e['x'] * e['y']), and then sums each of those products from the entire sequence.
There is also some documentation on generator expressions, but it's a bit technical, so it's probably not approachable for the Python beginner.

How to correctly use enumerate with two inputs and three expected outputs in python spark

I've been tryng to replicate the code in http://www.data-intuitive.com/2015/01/transposing-a-spark-rdd/ to traspose an RDD in pyspark. I am able to load my RDD correctly and apply the zipWithIndex method to it as follows:
m1.rdd.zipWithIndex().collect()
[(Row(c1_1=1, c1_2=2, c1_3=3), 0),
(Row(c1_1=4, c1_2=5, c1_3=6), 1),
(Row(c1_1=7, c1_2=8, c1_3=9), 2)]
But, when I want to apply it a flatMap with a lambda enumerating that array either the syntax is non-valid:
m1.rdd.zipWithIndex().flatMap(lambda (x,i): [(i,j,e) for (j,e) in enumerate(x)]).take(1)
Or, the positional argument i appears as missing:
m1.rdd.zipWithIndex().flatMap(lambda x,i: [(i,j,e) for (j,e) in enumerate(x)]).take(1)
When I run the lambda in python, it needs an extra index parameter to catch the function.
aa = m1.rdd.zipWithIndex().collect()
g = lambda x,i: [(i,j,e) for (j,e) in enumerate(x)]
g(aa,3) #extra parameter
Which seems to me unnecessary as the index has been calculated previously.
I'm quite an amateur in python and spark and I would like to know what is the issue with the indexes and why neither spark nor python are catching them. Thank you.
First let's take a look a the signature of RDD.flatMap (preservesPartitioning parameter removed for clarity):
flatMap(self: RDD[T], f: Callable[[T], Iterable[U]]) -> RDD[U]: ...
As you can see flatMap expects an unary function.
Going back to your code:
lambda x, i: ... is a binary function, so clearly it won't work.
lambda (x, i): ... use to be a syntax for an unary function with tuple argument unpacking. It used structural matching to destructure (unpack in Python nomenclature) a single input argument (here Tuple[Any, Any]). This syntax was brittle and has been removed in Python 3. A correct way to achieve the same result in Python 3 is indexing:
lambda xi: ((x[1], j, e) for e, j in enumerate(x[0]))
If you prefer structural matching just use standard function:
def flatten(xsi):
xs, i = xsi
for j, x in enumerate(xs):
yield i, j, x
rdd.flatMap(flatten)

Python For Loop - Switching from PHP

I am new to python and usually use PHP. Can someone please clear me structure of For loop for python.
numbers = int(x) for x in numbers
In PHP everything realted to For loop used to inside the body. I can't understand why methods are before for loop in python.
First of all the statement is missing brackets:
numbers = [int(x) for x in numbers]
This is called a list comprehension and it is the equivalent of:
numbers = []
for x in numbers:
numbers.append(int(x))
Note that you can also use comprehensions as generator expressions, in which case the [] become ():
numbers = (int(x) for x in numbers)
which is the equivalent of:
def numbers(N):
for x in N:
yield int(x)
This means that the for loop will only execute one yield at the time as it is being processed. In other words, while the first example builds a list in memory, a generator returns one element at a time when executed. This is great to process large lists where you can generate one element a time without getting everything into memory (e.g. processing a file line-by-line).
So as you can see comprehensions and generator expressions are a great way to reduce the amount of code required to process lists, tuples and any other iterable.

While loop in Python converting integer to a string?

So I can't seem to get this. Without the 'while' loop this code works fine but as soon as I apply the loop it stops working right. From some reason it's treating x as a string. Like if x were 2 it would print y as '2222' instead of 16. I'm still new at this can someone tell my why? Thanks!
go = 'y'
while go == 'y':
print('enter x')
x = input()
y = x * 4
print(y)
print('go again?')
go = input()
Python 3's input function always returns a string. This is a change from Python 2, where input returned different kinds of Python objects depending on what was entered by the user. Python 3's version is equivalent to Python 2's raw_input.
With that background in mind, it's easy to fix your code. Just call the int constructor to turn your string into an integer. Or if you want to support non-integer values (like 1.4), use float instead.
As an aside, as your code is currently formatted in the question, it has an infinate loop. Is your logic to change the go variable really at top level? If so, it won't ever change during the loop, which will run forever.
This is actually dependent on your python version. input will automatically convert your string to an integer if it finds one. To prevent this, use the raw_input function in python < 3. For python 3 and above I believe this is the default behavior.

Resources