How to select rows and assign them new values with SparkR? - apache-spark

In R programming language I can do following:
x <- c(1, 8, 3, 5, 6)
y <- rep("Down",5)
y[x>5] <- "Up"
This would result in a y vector being ("Down", "Up", "Down", "Down", "Up")
Now my x sequence is an output of the predict function on a linear model fit. The predict function in R returns a sequence while the predict function in Spark returns a DataFrame containing the columns of the test-dataset + the columns label and prediction.
By running
y[x$prediction > .5]
I get the error:
Error in y[x$prediction > 0.5] : invalid subscript type 'S4'
How would I solve this problem?

On selecting rows:
Your approach will not work, since y, as a product of Spark predict, is a Spark (and not R) dataframe; you should use the filter function of SparkR. Here is a reproducible example using the iris dataset:
library(SparkR)
sparkR.version()
# "2.2.1"
df <- as.DataFrame(iris)
df
# SparkDataFrame[Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string]
nrow(df)
# 150
# Let's keep only the records with Petal_Width > 0.2:
df2 <- filter(df, df$Petal_Width > 0.2)
nrow(df2)
# 116
Check also the example in the docs.
On replacing row values:
The standard practice for replacing row values in Spark dataframes is first to create a new column with the required condition, and then possibly dropping the old column; here is an example, where we replace values of Petal_Width greater than 0.2 with 0's in the df we have defined above:
newDF <- withColumn(df, "new_PetalWidth", ifelse(df$Petal_Width > 0.2, 0, df$Petal_Width))
head(newDF)
# result:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species new_PetalWidth
1 5.1 3.5 1.4 0.2 setosa 0.2
2 4.9 3.0 1.4 0.2 setosa 0.2
3 4.7 3.2 1.3 0.2 setosa 0.2
4 4.6 3.1 1.5 0.2 setosa 0.2
5 5.0 3.6 1.4 0.2 setosa 0.2
6 5.4 3.9 1.7 0.4 setosa 0.0 # <- value changed
# drop the old column:
newDF <- drop(newDF, "Petal_Width")
head(newDF)
# result:
Sepal_Length Sepal_Width Petal_Length Species new_PetalWidth
1 5.1 3.5 1.4 setosa 0.2
2 4.9 3.0 1.4 setosa 0.2
3 4.7 3.2 1.3 setosa 0.2
4 4.6 3.1 1.5 setosa 0.2
5 5.0 3.6 1.4 setosa 0.2
6 5.4 3.9 1.7 setosa 0.0
The method also works along different columns; here is an example of a new column taking values 0 or Petal_Width, depending on a condition for Petal_Length:
newDF2 <- withColumn(df, "something_here", ifelse(df$Petal_Length > 1.4, 0, df$Petal_Width))
head(newDF2)
# result:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species something_here
1 5.1 3.5 1.4 0.2 setosa 0.2
2 4.9 3.0 1.4 0.2 setosa 0.2
3 4.7 3.2 1.3 0.2 setosa 0.2
4 4.6 3.1 1.5 0.2 setosa 0.0
5 5.0 3.6 1.4 0.2 setosa 0.2
6 5.4 3.9 1.7 0.4 setosa 0.0

Related

How to merge or groupby multiple rows having same value in one of the columns

I need help on the below problem,
Problem:
I have below dataframe. In this dataframe, first row is a title, and first column is also a title column. I have three different rows of 5.0, 10.0, 20.0 in first column (Phi) and I have different values against different values.
df_combined
Phi 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0
5.0 -6.7 5.6 -2.7 -1.0 4.4 -6.4 6.3 -4.2
10.0 -3.8 3.1 -1.5 -0.5 2.5 -3.6 3.6 -2.4
20.0 6.3 -5.3 2.6 0.9 -4.2 6.1 -6.0 4.0
5.0 -1.7 5.6 -6.7 -7.0 1.4 -0.4 3.3 -4.2
10.0 -3.8 3.1 -1.5 -4.5 2.5 -1.6 2.6 -4.4
20.0 6.3 -1.3 2.6 0.9 -4.2 6.1 -7.0 4.0
5.0 -0.7 5.6 -6.7 -7.0 1.4 -0.4 3.3 -4.2
10.0 -3.8 3.1 -6.5 -2.5 6.5 -8.6 4.6 -3.4
20.0 6.3 -1.3 2.6 3.9 -3.2 4.1 -5.0 9.0
Expected output:
I want my dataframe lookimg like this where values against 5.0 come together. And same for 10.0 and 20.0. I do not want to aggregate or count or addition of these values. I just want these columns come together next to each other.
Phi 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0
-6.7 5.6 -2.7 -1.0 4.4 -6.4 6.3 -4.2
5.0 -1.7 5.6 -6.7 -7.0 1.4 -0.4 3.3 -4.2
-0.7 5.6 -6.7 -7.0 1.4 -0.4 3.3 -4.2
10.0 -3.8 3.1 -1.5 -4.5 2.5 -1.6 2.6 -4.4
-3.8 3.1 -6.5 -2.5 6.5 -8.6 4.6 -3.4
6.3 -5.3 2.6 0.9 -4.2 6.1 -6.0 4.0
20.0 6.3 -1.3 2.6 0.9 -4.2 6.1 -7.0 4.0
6.3 -1.3 2.6 3.9 -3.2 4.1 -5.0 9.0
I tried groupby function like below,
df_combined2 = df_combined.groupby(df_combined['Phi'])
But got below error,
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001BE9EC2BDC0>
What should I do?
sort the values by PHI and then make the repeating values as empty
# sort the dataframe
df=df.sort_values('Phi')
#to make repetitive values under PHI as empty
df['Phi']=df['Phi'].mask(df['Phi'].eq(df['Phi'].shift(1)), "")
df
OR
# sort the dataframe
df=df.sort_values('Phi')
# makes PHI as empty when diff from previous value is 0
df['Phi']=df['Phi'].mask(df['Phi'].diff().eq(0), '')
df
Phi 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0
0 5.0 -6.7 5.6 -2.7 -1.0 4.4 -6.4 6.3 -4.2
3 -1.7 5.6 -6.7 -7.0 1.4 -0.4 3.3 -4.2
6 -0.7 5.6 -6.7 -7.0 1.4 -0.4 3.3 -4.2
1 10.0 -3.8 3.1 -1.5 -0.5 2.5 -3.6 3.6 -2.4
4 -3.8 3.1 -1.5 -4.5 2.5 -1.6 2.6 -4.4
7 -3.8 3.1 -6.5 -2.5 6.5 -8.6 4.6 -3.4
2 20.0 6.3 -5.3 2.6 0.9 -4.2 6.1 -6.0 4.0
5 6.3 -1.3 2.6 0.9 -4.2 6.1 -7.0 4.0
8 6.3 -1.3 2.6 3.9 -3.2 4.1 -5.0 9.0

boxplot not show the plots

Following this tutorial, I used the first few statements in order to show the distribution of iris data as below
from sklearn.datasets import load_iris
from pandas import DataFrame
import numpy as np
iris = load_iris()
colors = ["blue", "red", "green"]
df = DataFrame(
data=np.c_[iris["data"], iris["target"]], columns=iris["feature_names"] + ["target"]
)
print (df)
df.boxplot(by="target", layout=(2, 2), figsize=(10, 10))
Problem is that I don't see the boxplot output although the df is not empty.
$ python3 pca_iris.py
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
0 5.1 3.5 1.4 0.2 0.0
1 4.9 3.0 1.4 0.2 0.0
2 4.7 3.2 1.3 0.2 0.0
3 4.6 3.1 1.5 0.2 0.0
4 5.0 3.6 1.4 0.2 0.0
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2.0
146 6.3 2.5 5.0 1.9 2.0
147 6.5 3.0 5.2 2.0 2.0
148 6.2 3.4 5.4 2.3 2.0
149 5.9 3.0 5.1 1.8 2.0
[150 rows x 5 columns]
$
How can I debug more to see what is the problem with boxplot?
I'm using jupyter notebook and the same code showed me the plots. if you are use python code I think you have to use
matplotlib.pyplot.show()
to see the plots

There's an error in <path> code and it's affecting the rest of my web page.Can anyone spot it? if so, your help would be much appreciated

The SVG file is an outline of the highland map in Scotland. The closing reads like an error and is in red in Brackets. I'm using an SVG of the UK to reference from...If anyone has experience understanding the SVG file and sees where I've gone wrong, your help would be much appreciated.
<svg class="img-fluid" baseprofile="tiny" fill="#008000" stroke="black" version="1.2" viewbox="0 0 1000 1870" width="100%" xmlns="http://www.w3.org/2000/svg">
<path d="M226.8 714.8l0 1.3 2.5 0-1.3 6.4-1.8 3.7-2.7 1.6-3.6 0.4-1.7-1-1.6-4.2-1.4-1-1.4-0.6-3-3.2-1.8-1 5.7-6.7 3-2.4 3.8-0.5 5 3.2 2.1 2.4-1.8 1.6z m-23.6-5.8l-3.1 0.5-3.1-0.1-2.5-0.6 1.5-1.3 3.3-2 1.8-0.4 1.1 1 2.3 0.6 1.2 0.9-2.5 1.4z m56.7-42.7l-3.2 1.3-2.5-1.1-2.2-2.3-0.6-2.5 2.3-1.4 3.3 0.7 2.5 2.5 0.4 2.8z m-5.5-36.7l-0.4 2.7-1.6 5.3-1.4 6.2 0.1 5.4 0.8 1.3 1.3 3 0.5 2.9-1.6 1.3-1.6 0.7-1.5 1.1-1.6 0.6-1.6-1.1 0.3-1.2 0.4-1.3-1-1.5-0.2-3 0.8-11.4 0.7-3 1.5-1.4 2.8-0.4 0-1.2-1.2-1.8-0.8-1.9 1.1-1.1 1-0.1 1.9 1.2 0-1.3-1.3-2.4 0.8 0.2 0.7 0.5 0.6 0.7 0.5 1z m1.4-6.5l0 2.8-1.5-0.8-0.6-1.8 0.2-2.4 1.1-2.3-0.4-0.5-0.2-0.8 0.6-1.1 0.7-0.8 0.7-0.1 0.6 0.9-1.2 6.9z m-33-30.3l0.9 1.9 0.8 1.1 2.3 1.9 0.9 1.2 2.3 4.1 0.4 0.8 4.3 3.7 1.8 2.4 1 1.7 0.4 1.5 0.9 12.5 0.9 5.3 0.2 2.5-0.3 2.7-0.8 4-0.2 2.4-0.5 1.4-2.6 2.5 0.3 0.4 1.6-1.1 1.7-0.6 1.5 0.5 0.8 1.8-0.6 1.7-1 1.5-0.6 1 0.5 3 1.4 1.8 1.5 1.3 0.6 1.3-0.7 1.6-3.3 2-1.3 1.2 8.3-1.1 2.1 0.6 0.5 1.5-1.2 1.6-2.4 1.2 0 1.2 2.6-0.3 2.1-1.1 2-0.2 2.1 2.2 0.9 0.7 1.3 0.1 2.4-0.2 1.5 0.7 0.9 1.6 0.6 1.7 0.6 0.9 1.4 0.2 0.9-0.8 0.6-1.2 0.5-0.7 4 0.2 2.7-2.8 2.7-1 2.8 0 1.6 2.3 7.2 0-1.6 4.4 0 1.1 0.4 2-0.3 1.7-0.8 1.4-1.9 2.7-3.5 3-1.5 0.7-3.8 0.4-0.6 1.2-0.2 3.9-0.9 2.5-2 1.6-2.1 1.2-0.9 0.8-0.8 1.4-5.6 8-1.5 1.6-1.8 1.2-5.1 1-1.4-0.6-0.9-2.3 0.3-3 2.6-3.9-0.3-1.5 0-1.3 0.9-0.8 0.7-2.5 0.6-0.8 1.8-3.4 0.8-1.2 1.1-1 9.3-4.8 0-1.2-2.7 1.5-2.7 0.8-8.2 0.2-0.5-0.5-0.2-2.3-0.2-1-2.2-3.9-1.3-1.5-1.1-0.7 0.4 1.9 0.4 1.4 1.1 2.8-1 2.6-0.6 1.2-1 1 0.7 1.4-1.2 1.4-0.8 2-0.9 1.7-1.4 0.8-1.2-1.4-0.6-3.1-0.4-3.6-0.1-2.7-0.8-0.9-0.5-0.3-1.4 1.2-0.7-1.1-0.7-0.7-0.9-0.5-0.9-0.1 0 1.1 1.4 1.4-1.7 1.2-13.6 3.6 0-1.4 1.1-0.9 2.9-3.9 0-1.1-1.8-0.1-1.7 0.5-1.6 0.8-1.5 1.2-1.8-2.8-1.5-3.3 1.1-0.5 0.9-0.7 0.7-0.9 0.6-1.5-4.4 0.5-1.6-0.5-4.5-6.1-1.1-3.1-1.2-1.7-0.3-1.3 4.4-3.2 1.9-0.8 2.2 0.3 1.7 1.6 1.7 2.1 2 1.8 2.5 0.6-1.2-2.3-6-5.1-1.3-1.5-0.6-0.6-1.1-0.2-1.1 0.2-2.1 0.9-1.2 0 1.4-1.1 0-1.3-0.7-0.2-1.9-1 1-0.6 0.2-0.7-0.8-0.4-0.7-1-0.2-1.4 0.5-2-1.4 0.4-1.9 2.4-1.3 0.8 0.5-1.5 0.1-1.7-0.4-1.6-0.9-1.4-0.8 2.4-1.1 0.7-1.1 0.5-0.9 1.4-0.1 2.1 1 3.2-0.2 2-1.5 1.4-8.9-3.1-2.2-1.5-1.8-2.1-0.8-2.6-0.9-4.6-3.9-3.1-1.1-4 5.9 0-1.3-1.5-0.9-1.9-0.1-2.2 0.7-2.3 1.3-1.9 1-0.5 0.9 0.6 5 7.5 0.6 1.7 0.9 1.1 1.8 1.4 1.2 0.3-0.5-2.3 0-1.1 1.2-1.2-1.6-2-1.4-2.3-0.2-0.7 0.2-0.8 0.9-0.3 0.2-0.7 0.8-0.8 1.7 0.2 2.8 1.3-0.9-3.2-2.5-2.9-2.9-1.9-2.4-0.7 1.3-1.7 1-2.7 0.2-2.6-1.1-1.5 0-1.4 2 0.1 1.3 1.3 1.1 1.6 2.1 1.5 0.4 1.4 0.2 1.7 0.2 1.1 0.9 1.2 0.7 0.5 4.5 2.5 1 0.9 1.1 1.7 1.4 1.5 1.2-0.1 1.4-0.7 1.6-0.2-2.2 5.4 1 0.3 2-2.7 1.2-3.5 1.2 0.7 7.3 9.7 0-1.2-0.4-0.6-0.9-1.8 0-2.5-1.3-2.2-3.3-4-1.6-3.7 0.3-1.9 1.3-1.7 1.2-3.7-1.1 0.2-1.1-0.2-0.9-0.5-0.7-0.7 0.4-2.5-0.5-1.4-0.8-1.1-0.5-1.2-0.1-2.1 0.2-0.6 0.7-0.2 3.8-2.5 0.6-0.7 0.9-2.1 0.5-2.2 0.7-1.3 1.3 0.6 1-0.6 0.3-0.6z m245.5 13.7l1.6 3.7 0.9 3.7 1.9 1.4 0 2-0.7 1.2 0.6 1.8-0.1 3 2.3 1.8 0.6 2.6 0.2 2.7-0.8 5.6 0.2 2.8-3.3 4.1-0.4 2.1 0.5 1 1.6-0.4 3.4-4.1 5.4-0.6 1.4-1.9 1.7-0.7 7.7-1.6 5.7 5.6 3.9 0.9 1.3 7.1-0.6 2-2.7 2.2-2.6 5.1-2.6 2.1-1.3 2.2 0.3 1.1 3.6 2.3-0.9 8.7 3.3 2.5-0.9 1.8 0.3 1.8-0.9 2.5-1.6 1.9-2.1-0.1-5.4 2.6-2.4 3.2-3 1.9-2.6 3.1-2.6 0.3-3.3 3.4-0.3 0.2-5.6 3.2-0.7 1.9 0.5 1.1-0.9 5.7 0.3 6.4-0.8 1.8-2.6 3-1.5 6-1.9-0.7-5.9 1.7-2-0.6-4-3.4-0.9 0.7-1 2.5-0.6 5.2-0.9 0.7-5.5-1.2-6.4 1.3-6.2-2.4-1.3 0.2-2.3 2.6 0.6 3.1-0.5 1-5 2.9-4.8 1.4-2.9 3.7-3.2-0.5-1.1 0.8-0.3 0.5-1.2 1.1-4.2 6.1-1 0-2.6-1.3-1.9-2.9-1.9-0.2-2.9 4.5-0.7 2.5-2.6 1.5 0.5 4.8-2 2-0.3 1.3 1.1 2.8-5.7 3-5.5 0.9-0.6 1.3 0 3.8-4.6-0.4-3.3-1.8-7 1.2-3-1.7-2.8-0.5-4.5 2.7-4.7 0.6-1.8 3-2.1 0.9-1.4 1.9-6.7-1.6-2.3-2.5-12.3-1.6-0.7 0 0.6-1.1-1.9-1.3 0-1.3 0.9-0.5 2.1-1.9 2.4 0.3 1.3-0.1 2.8-2.6 2.2 0 4.1 1.3 2-0.6 3.8-2.5 7.7-1.8 1.7-1.2-15.9 3.5-7.4-3.6-0.1-1 1-2.5 0.6-0.8 6.5-7.2 1.7-2.9 1.6-3.4-5.7 5.1-0.8 1.4-0.4 1.1-1 1-1.2 0.7-1 0.3-0.8 0.5-0.3 1.3-0.2 1.5-0.3 1.4-1.3 2.6-1.2 1.5-1.5 0.7-1.7 0.1-0.8 0.4-0.9 1.6-2 0.7-0.8 1.1-1.1 2.1-2.7 3.5-1.5 1-2 0.4-1 0.7-1.4 3.2-0.9 0.7-1.2 0.4-1.1 0.9-1 1.2-3.9 5.8-6.3 7.7-6.6 5.3-0.6 0.3-0.9-0.7-1.1-2.3-1-0.5-1.6-0.3-1.8-0.8-1.7-1.2-1.1-1.4 1.4-2.8 0.4-1.4 0.2-1.7-0.7 0-3.6 5-5.8-1.3-14.3-11.9-0.6-1.3 0.1-2.8 1-1.3 1.4-0.5 4.8-0.4 3.3 0.7 3.1 1.6 2.6 2.7 0.8-1.3-2.3-1.9-1-1.2 0-1.7 10.2-6.9 2.3-0.3 6 3.2 2.5 0.5 5.1-0.4 2.8-0.7 2-1.4-8.9 1.2-2.1-0.4-4.8-3.1-4-1-10.5 5.8-3.6 1.1-1 0-1.2-0.5-2.2-1.6-1.2-0.3-6.7 0.5-7.1-1.9-2.4 0.2-1.9 1.9-1.7 1.1-2.4-0.5-2.2-1.3-1.3-1.2-0.9-1.9-0.3-1-0.1-1.2 0.2-1.7 0.5-0.1 0.6 0.4 0.7-0.5 2.1-3.3 1.3-1 14.7-1.6 1.2-0.6 3.3-2.6 1.4-0.5 1.7 0.4 1 0.9 0.9 1.3 1.4 1.1 5.1 2.2 0.8 0.7-0.1-1.9-0.5-1.9-1-1.3-1.7-0.2 0-1.3 1.2-0.2 1.2 0.1 1.1 0.5 1 0.9 1.4-1.8 2.2-0.8 4.4 1.3-7.2-3.6-1.9-2.3 0-1.9 1.3-1.3 10-1.5 3.9-1.7 3.1-3.2-1.2-0.6-1.8 0.2-1.6 0.8-1.5 3-2-0.1-3.7-1 0.8-1.8 3.8-4.2-4.3-0.8-9.1 2-4.4-1.2 3.2-0.9 3.4-0.2 0-1.3-0.7-0.5-2-2 1.7-2.1 1.1-3.6 0.8-4.1 1.1-3.5 1.7-2.2 2.7-1.9 2.9-1.1 2.5 0.5 0.9 1.1 1.9 3.5 1.1 1.4 1.4 1.1 1.5 0.7 3.2 0.5 5.2-1.2 4.6-2.4 0-1.2-1.7 0.5-1.5 0.9-1.4 0.5-1.3-0.7-2.2 1.5-2-0.5-2.1-1.4-1.9-0.8-1.5-1 0.4-2.2 0.1-2.5-2.1-1.5-0.8 0.1-2 0.9-1.1 0.2-0.9-0.4-1.7-1.7-1-0.3-1-1.3 1.4-3 4-5.2 1.1-1.9 0.7-0.7 1.2-0.1 2.8 0.3 1-0.8 1.7-1 2.4 0.9 4.1 3.1 1.9 1.1 9.5 0.1 1.2-1.8 1.6-0.1 1.6 0.4 0.6 0.3 0.7 0 0-1.3-1.9-1.2-8.1 1.4-2.2 0.9-1.3 0.2-0.7-0.6-2.4-3.2-1.1-1-8.1-2.5-1.2-1.8 0.7-3 3.4-4.9 1.3-2.4-1.8-3 2.3-3.4 4-2.6 3.3-0.7 3.1 2.3 3 3.7 3.3 2.5 3.7-1.3 0-1.2-4-0.9-0.9-0.8-1.4-2.6-0.9-1.2-1-0.5-0.9-1.2 1.3-2.7 2.3-2.5 2.3-0.9 0-1.3-2.3-1-1.6 1.4-1.2 2.2-1.2 1.1-1.6 0.6-3.4 2.4-1.9 0.7-2-0.3-3.7-1.7-1.9-0.5-4.1 1.4-2-0.1-0.8-2.5 0.4-0.6 2.2-5.5 0.8-0.8 2.1-1.3 0.8-0.9 0.9-0.8 0.7 0.7 0.5 1.2 0.5 0.6 2.3-0.4 10.7-5.1 4.3-4.1 0.8-5-1.6-0.4-2.6 2.7-4.4 6.2-2.5 2-2.2-0.4-2.3-1.1-2.8-0.5 2.3-1.5 0.8-1.2-0.2-1.5-0.8-2.2-0.2-1.2-0.4 0.1-1.5 1.5-6.5 8.6-2.4 1.2-5 0.8-2.2-1.2-1.7-7 0.3 0.2 0.9-0.9 0.1-0.5-1.1-2-0.2-0.6 0-2.4 0.3-1.1 1-1.2-1-0.6-2.9-0.9-0.7-0.4-0.2-1.9-0.9-3.1-0.2-2.3 0.3-1.3 1.3-3 0.4-1.2 0.1-2.5 0-2.3 0.4-1.4 1.5 0-0.8-3.6 1.8-0.7 1.7 0.3 1.6 1.1 5.2 5.4 1.6 1.2 0.8-0.7 1.7 2.6 0.8 0.6 2.9 2.1 1.1 0.3-0.5-1-0.8-2.7 6.9-1.2 4 0.7 2.2-0.1 0.4-1.2-2.9-2.5-4.4-1-4.2 0.6-2.7 2.4-3.3-3.9 0.6-0.6 0.7-1.7-3.2-0.7-1.6-2.8-1.2-3.5-1.8-2.9-1.5-0.6-1.5-0.2-1.2-0.7-0.5-2.2 0.6-2.2 1.9-2.2-0.5-1.7 0-1.3 3.6-2.1 1.8 0 2.8 3.2 1.2-0.2 1-1.4 0.8-1.9-1.1-0.4-0.6-0.9-0.7-1.2-0.9-1.1-1.2-0.7-8-1.9-0.3-1.9 0.5-4.7-0.2-2.1 0.3-1.5-0.8-3.6-0.2-2.3 0.4-1.9 0.7-1.8 1-1.5 1.3-1.1 2.4-0.9 2.5-0.1 2.5 0.8 2.4 1.6-0.4 2.4 0.3 3.6 0.7 3.6 0.7 2.7 1 1.9 1.2 1.3 1.5 0.8 1.6-0.1-0.8-0.8-0.4-0.9-0.7-2.2 1.2-0.1 0.9 0.3 1.8 1.1-0.6-3.5 0.4-2.6 0.1-2.3-1.3-2.8-3.7-3.5-1.2-2 1-1.9-0.8-0.2-0.5-1 0.9-1.7 2.2-3 0.8-1.5 1.4 1.8 1 0.7 3.6 0-0.6 5 2.7 3.6 4 2.1 3 0.4-0.5-1.2 1.1-1.6 1.2-5.1 0.9-2 1.2 0.1 4.4 1.9 1.3 1.1 2.7 3.3 3.2 2 7 2.7-1.5-2.5-1-1.1-2.8-1.3-4.3-4.4-6.1-2.8-0.8-0.8-0.4-1.9 0.4-1.2 0.6-0.1 0.7 0.8 0.7 0.1 0.6-2.1 1.8 1.1 2.7 4 2.1 1.1 1.1 0 1.8-1 1-0.2 1.2 0.4 2.2 1.6 0.9 0.4 1.2 0.7 3.6 3.5 4 1.9 2.8 5 1.9 1.3-0.5-1-0.7-1.6-0.5-1.7-0.3-1.3-0.5-1.2-9.6-8.5-2.3-0.6 0-1.4 1.9 0.1 0.7-1.4-0.1-2.1-0.9-2.2-1.2-1.1-3.1-0.6-1.3-1.3-2.9-0.7-3.9-5.1-2.5-0.5 0.7-1.1-1.4-2.1-1.9-0.7-3.9 0.3 0.7-2 0.6-0.6 0-1.3-3.2-5.8 1.8-2.4 3.7 1 2.3 4.8 2.1-1.8 2 0.7 2.2 0.3 2.3-2.9-0.6 0 0.4-1.8 0-1.7-0.5-1.5-0.7-1.2 1.4 0 0-1.3-2-1.3 1.3-1 2.7-0.7 1.3-0.7 0-1.4-3.2 0.2-1.5-0.4-1.3-1 0.7-1.2-0.7-0.9-0.8-0.7-0.9-0.5-0.9-0.3 0.5-3 0.2-0.9-1.2 0-1-0.7-4.4-6.9 0.1-3 1.5-0.5 2.2 0.9 4.1 2.9 1.6 0.4 1.6-0.4 3.9-2.6 2-0.7 2.1 0.2 2 1.7 0.7-1.3 1.3-1.2 1.4-0.9 0.8-0.3 1.7 0.5 2.9 2.6 1.3 0.6 4.8-0.4 1.5 0.4 6.6 5.1-0.9-1.9-1.1-1.2-2.6-2 1.2-1.1 2.8-0.5 1.3-0.9-6.1-1.2-1.9 0-0.8 0.5-1.1 1.6-1 0.3-1-0.3-2-1.7-1-0.4-1.4-1-2.9-4.3-0.9-1-0.2-0.7-1.8-2.8-0.3-0.3-0.8-2.4-0.5-2.6 1.9 0-1-2 0.1-1.8 0.7-1.8 0.9-2 0.1-0.9-0.2-2.3 0.4-0.4 0.7 0.2 1.3 0.8 0.6 0.1 5.6 4 4-0.2-2.7-2.2-0.9-1.2 0.3-1.5-0.6-1.3-0.9 0.9-1 0.5-1 0.1-1.1-0.2 0-1.3 1.3 0 0-1.3-0.7-0.1-1.3-1.3 5.3-3.1 2.4 1.1 2.9 2-10.6-9.3 0-3.4 0.2-1.5 0.7-0.9 5.1-3.1 1.5-1.5 1.1-2.9-0.7 0 0.8-3.8 0.3-3.7 0.7-3.2 1.6-2 12.3 2.8 2.3 1.5 1.7 2.3 0 0.9 0.5 2.9 0.6 2.5 0.3-0.2-0.1 2.1-0.2 1.6-0.5 1.4-0.6 1.3 1.8-1.4 0.7-1.7 0.2-1.9 0-1.9 0.4-2.7 0.8-1.2 0.6-1.3-0.5-3.1 1.7 1.5 3 4 1.6 0.8 2.7 2.1 1.2 0.4 1 0.7 0.1 1.5-0.4 1.7-0.4 1.2-1.1 1.5-3.4 3.3-1.8 4.7-1.8 3.4-0.9 2.5 1.8-0.3 5.7-6.4 0.9-0.5 0.6-2.9 1.6-1.3 1.9-0.9 1.8-1.8 1.1-3.1 0.5-3.4 1-2.6 2.3-1 4.2 0.8 4 1.8 4.7 3.5 0 1.4-0.2 0.8 0.2 1.8-2.2 2.7-1.5 3.3-1.2 3.8-1.7 4 1.7-0.7 1.6-1.6 2.6-3.9 0.1-0.8-0.1-1.5 0.8-0.6 3.9-4.3 0.2-0.4 3.8-2.6 2.5-1.2 1-0.2 1.9 0.7 2.9 2.7 2.1 0.6-1.3-1.3 1-0.8 1-2.2 0.6-0.7 1.1-0.2 1.9 1 1.3 0.2 0.8-0.5 0.8-2.4 1-0.6 3.7 0 2 0.6 0.9 0.1 1-0.7 0.9-1.3 1.2-3.1 0.9-0.6 0.8-0.2 0.3 0.2 1.8 3.6 0.5 0.5 2.2-0.5 8.2 1.4 8.2-1.4 4.2-1.9 6.5-5.1 3.7-1.8 6-1.1 3.1 0 2.2 1.1-1.3 1.3 2.3 2.2 2.7-0.3 5.5-1.9 5.3 2.4 2.6 0 1.3-3.7-1.6-0.7-1.8-1.3-1.5-1.9-1-2.5 2.1-3 1.5-1 1.6 0.3 0.4 1 0.2 1.7 0.4 1.7 0.7 0.7 2.9 0 6.3-2 3.9-0.4 5 3.6 3.2 0.1 6.3-1.3 3.3 1.4-0.6 3.4-1.9 3.8-0.4 2.7-1.1 1.5-1.4 3.4-0.9 1.5-2.5 2.5-0.7 1.2-1.1 3.5 0 3 1 2.2 2 1.3 1.2-0.1 1.1-0.7 0.9-0.2 0.8 1 0.3 2.2-0.4 1.6-0.9 1-1 0.3-0.9 6.5-4.2 8.3-5 7.4-3.7 4.2-4 2.4-8.4 3-3.4 2.1-5.1 5.2-0.8 1.7-0.8 3-1.8 3.1-4 4.5-21.1 17.5-7.1 3.7-1.8 1.8-0.8 1.9-0.9 3.7-0.6 1.3-0.8 0.9-2.4 1.4-3.1 0.7-6.1 4.3-1 0.6-0.6 1.1-0.4 1.5 0 1.9-1.6-0.9-0.4-0.4 0.6-1.2-1.3-0.7-2.5-0.3-2.7-1.7-0.5 0.8 0.3 1.5 0.8 1.6 1.5 1.1 2.9 0.1 1.5 0.7 0.7 0.8 2 3.5-2 3.4-0.5 1.9 0.5 2-1.5 0-2.9-1.1-1.5 0-0.8 0.7-1.7 2.3-0.8 0.7-2.7-0.1-5.5-2.9-2.7-0.7-7.7 2.1-2.1-1.5-3.1-4.4-1.9-1.7-1.9-0.7 5.1 7.7 3 2.6 3.7 0.8 3.9-1.8 2-0.4 1.7 1.6 1.6 2.1 2.3 1.6 1.6 0-0.6-3.1 6.2 4.3 3.3 0.9 5.5-4.3 3 0.7 1.7 2.3-1.3 2.3 3.2 0.9 3.5-1.6 3.5-3.3 2.7-4 1.4-1.7 1.2 0.2 0.7 1.7-0.3 3-0.8 1.6-13 17.2-2.2 1.8-2.5 3.9-1.3 1.3-1.9 0.4-1.5-0.7-0.5-1.8 1.2-2.8-2.6-2.5-3.3 1.2-7.4 6.8-1.3 0.6-2.5 0.1-6.2 1.8-1.8-0.3-1.6 0.1-0.7 2-0.7 2.5-1.4 1.2-1.4 0.6-6.2 6.3-0.9 1.2-0.8 1.8-0.8 2.8 0.5 0.1 1.3-1 2.9-1.2 1.4-1.5 1.4-1.8 0.9-1.7 1.2-1.7 4.4-2.7 3.7-3.9 4.1-2.1 0.8-0.1 1.1 0.5 1.2 1.6 4.9 1.1 2.1-0.1 1.5-1.2 1.2-1.3 4.2-2.7 1.5-0.3 0.4 2.3-2 3.7-2.7 3.5-2.8 2.3-3.4 5.4 0.6 1.6 0.7 0.9-4.5 0.4-1.8 1.1-1.6 2.1 0.9 1.7-0.7 0.2-2.2-0.7-5.2 0 0 1.4 2.7-0.1 1.3 0.3 1.2 0.9-1.8 4.1-1 1.8-1.1 1.4 3.4 0.9 1.8 0.1 2-1 1.5-2 0.8-1.1 0-0.6 2.2-0.1 1-0.6 1.8-2.6 1.4-1.4 1.5-1 1.4-0.4 1.7-0.8-1.1-2-3.6-3.3 2.7 0.2 8.8-1.4 9.2 0.9 3-0.9 7.9-6z" id="GBR2745" name="Highland">
</path>
</svg>

Type mismatch error for filter function with dplyr over a spark data frame

I am currently working on Rstudio over a rhel cluster.
I use spark 2.0.2 over a yarn client & have installed the following versions of sparklyr & dplyr
sparklyr_0.5.4 ;
dplyr_0.5.0
A simple test on the following lines results in error
data = copy_to(sc, iris)
filter(data , Sepal_Length >5)
Error in filter(data, Sepal_Length > 5) :
(list) object cannot be coerced to type 'double'
I checked with the read & all looks fine
head(data)
Source: query [6 x 5]
Database: spark connection master=yarn-client app=sparklyr local=FALSE
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
is this a known bug & are there known fixes for this?
It's not a bug. You have to specify that you want to use the filter function from the dplyr package. Probably you are using the filter function from the stats package. That's why you get that error. You can specify the right version with this: dplyr::filter
res <- dplyr::filter(data, Sepal_Length > 5) %>% dplyr::collect()
head(res)
# A tibble: 6 x 5
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 5.4 3.9 1.7 0.4 setosa
3 5.4 3.7 1.5 0.2 setosa
4 5.8 4.0 1.2 0.2 setosa
5 5.7 4.4 1.5 0.4 setosa
6 5.4 3.9 1.3 0.4 setosa
To be sure, in the RStudio console, just type filter (or any other function) and check the popup with the function name that appears. On the right, you can see the package that it's going to be used if you don't explicitly name the package with ::.

How does Excel's FREQUENCY function work?

With the following function:
=FREQUENCY(C2:C724,D2:D37)
The second parameter is the BIN
What I don't understand is why Excel would increment the BIN for the rest of your values. The BIN does not change! It stays the same bin. Yet when I paste the formula for all my values it does this:
=FREQUENCY(C2:C724,D2:D37)
=FREQUENCY(C2:C724,D3:D38)
=FREQUENCY(C2:C724,D4:D39)
The last column is what was generated (and this is correct but it does not make sense!!)
Etoh bin
15.9 20 0
14.6 19 0
14.1 18 0
13.9 17 0
13.3 16 0
13.3 15 0
13.2 14 1
12.6 13 2
12.1 12 3
11.8 11 6
11.5 10 4
11.2 9 4
11 8 8
10.5 7 10
10.3 6 26
10.3 5 27
10.2 4 40
10.1 3 89
9.8 2 151
9.7 1 205
9.5 0 102
9.1 -1 17
8.9 -2 7
8.3 -3 3
8.1 -4 2
8.1 -5 0
7.9 -6 3
7.6 -7 2
7.5 -8 2
7.5 -9 1
7.5 -10 1
7.4 -11 0
7.2 -12 0
7.1 -13 1
7.0 -14 0
7.0 -15 0
6.8
6.7
6.6
6.5
6.4
6.2
6.2
6.1
6.0
5.9
5.8
5.8
5.7
5.7
5.7
5.5
5.5
5.5
5.4
5.3
5.3
5.3
5.3
5.3
5.3
5.3
5.2
5.2
5.2
5.1
5.1
5.1
5.1
5.1
5.0
5.0
5.0
5.0
5.0
4.9
4.9
4.8
4.8
4.8
4.7
4.7
4.6
4.6
4.6
4.5
4.5
4.5
4.5
4.4
4.3
4.1
4.1
4.1
4.1
4.1
4.1
4.0
4.0
4.0
4.0
4.0
3.9
3.9
3.9
3.9
3.9
3.8
3.8
3.7
3.6
3.6
3.6
3.6
3.6
3.5
3.5
3.4
3.4
3.4
3.4
3.3
3.3
3.3
3.3
3.2
3.2
3.2
3.2
3.2
3.2
3.1
3.1
3.1
3.1
3.1
3.1
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
2.9
2.9
2.9
2.8
2.8
2.8
2.8
2.8
2.8
2.8
2.8
2.7
2.7
2.7
2.7
2.7
2.7
2.7
2.7
2.6
2.6
2.6
2.6
2.6
2.6
2.6
2.6
2.6
2.5
2.5
2.5
2.5
2.5
2.4
2.4
2.4
2.4
2.4
2.4
2.4
2.4
2.3
2.3
2.3
2.3
2.3
2.3
2.3
2.3
2.3
2.3
2.3
2.3
2.2
2.2
2.2
2.2
2.2
2.2
2.2
2.2
2.2
2.2
2.2
2.2
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.1
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.9
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.8
1.7
1.7
1.7
1.7
1.7
1.7
1.7
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.6
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.3
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.1
-0.2
-0.2
-0.2
-0.2
-0.2
-0.2
-0.2
-0.2
-0.2
-0.2
-0.2
-0.3
-0.3
-0.3
-0.3
-0.3
-0.3
-0.3
-0.3
-0.3
-0.3
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.5
-0.5
-0.5
-0.5
-0.5
-0.5
-0.6
-0.6
-0.6
-0.6
-0.6
-0.6
-0.7
-0.7
-0.7
-0.7
-0.7
-0.7
-0.7
-0.7
-0.8
-0.8
-0.8
-0.8
-0.8
-0.8
-0.8
-0.8
-0.9
-0.9
-0.9
-0.9
-0.9
-0.9
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.1
-1.1
-1.2
-1.2
-1.2
-1.3
-1.3
-1.7
-1.8
-1.9
-1.9
-2.1
-2.2
-2.4
-2.4
-2.5
-2.5
-2.6
-3.0
-3.2
-3.7
-4.6
-4.6
-6.1
-6.3
-6.3
-7.0
-7.8
-8.1
-8.5
-9.0
-10.2
-13.2
If I do as both of you suggested with the $, I am getting the WRONG results:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
It's a bit of keyboarding to make it easier, not mouse work, but do exactly this:
In cell E2, enter
=FREQUENCY(C2:C724,D2:D37). Hit
Enter.
You should now be on cell E3.
Press on the up arrow once on your
keyboard to return to E2 (and it will be 0).
Hold
down the Shift key and press on the down arrow until you've
reached cell E37. E2 through E37 will be highlighted.
Don't do anything else other than
hit the F2 key on your
keyboard now.
Now press and hold down
Ctrl + Shift.
Then, with those two held down, hit
Enter.
Voila! In every cell between E2
and E37 you should see this
{=FREQUENCY(C2:C724,D2:D37)} in the formula bar
(notice the {} brackets) and
the formula works. This is what makes an array formula.
It looks like what you've done is tried to copy and paste the function from the first cell to each of the subsequent cells.
Because it's an array function, what you need to do is:
Copy the text of the first cell (from the formula bar)
Select all the cells you are wanting the results to appear in
Paste the text you just copied into the formula bar
Press ctrl-shift-enter
This will put the same array function into all the cells
Unsure what you mean but it sounds like you need to use absolute addressing, prefix the column with $
You need to use absolute reference for the second argument only =FREQUENCY(C2:C$724,D2:D37)

Resources