Spark -Python Not able to add values in a column

Spark -Python Not able to add values in a column - python-3.x

I have created a RDD named Distance . All are float values. I intend to add them and find the total sum .
print(Distance.take(5))
Output : [802.0, 1055.0, 919.0, 204.0, 951.0]
print(sum(Distance.take(5)))
Output : 3931.0
totalDistance=Distance.reduce(lambda x,y:(x+y))
Output: Py4JJavaError Traceback (most recent call last)
<ipython-input-29-874f9e382e38> in <module>()
2 # Reduce takes a function that acts on two elements and returns an object of same type.
3
----> 4 totalDistance=Distance.reduce(lambda x,y:(x+y))

Related

AttributeError: module 'cirq' has no attribute 'GridQubit'

I tried the following code in nvidia-dgx2 machine.
import cirq
# Pick a qubit.
qubit = cirq.GridQubit(0, 0)
# Create a circuit
circuit = cirq.Circuit(
cirq.X(qubit)**0.5, # Square root of NOT.
cirq.measure(qubit, key='m') # Measurement.
)
print("Circuit:")
print(circuit)
# Simulate the circuit several times.
simulator = cirq.Simulator()
result = simulator.run(circuit, repetitions=20)
print("Results:")
print(result)
But, I get the attribute error.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_36197/3759634386.py in <module>
2
3 # Pick a qubit.
----> 4 qubit = cirq.GridQubit(0, 0)
5
6 # Create a circuit
AttributeError: module 'cirq' has no attribute 'GridQubit'
Any solution to this issue?

How can I make a list of three sentences to a string?

I have a target word and the left and right context that I have to join together. I am using pandas and I try to join the sentences, and the target word, together into a list, which I can then turn into a string so that it would work with my vectorizer. Basically I am just trying to turn a list of three sentences to a string.
This is the error that I get:
AttributeError Traceback (most recent call last)
<ipython-input-195-ae09731d3572> in <module>()
3
4 vectorizer=CountVectorizer(max_features=100000,binary=True,ngram_range=(1,2))
----> 5 feature_matrix=vectorizer.fit_transform(trainTexts)
6 print("shape=",feature_matrix.shape)
3 frames
/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/text.py in _preprocess(doc, accent_function, lower)
66 """
67 if lower:
---> 68 doc = doc.lower()
69 if accent_function is not None:
70 doc = accent_function(doc)
AttributeError: 'list' object has no attribute 'lower'
I have tried using .joinand .split but they are not working for me so I am doing something wrong.
import sys
import csv
import random
csv.field_size_limit(sys.maxsize)
trainLabels = []
trainTexts = []
with open ("myTsvFile.tsv") as train:
trainData = [row for row in csv.reader(train, delimiter='\t')]
random.shuffle(trainData)
for example in trainData:
trainLabels.append(example[1])
trainTexts.append(example[3:6])
The indexes example[3:6] means that the 3 is left context 4 is target word and 5 right context.
print('Text:', trainTexts[3])
print('Label:', trainLabels[1])
edited the few printed lines from the code:
['Visa electron käy aika monessa paikassa luottokortista . Mukaanlukien ', 'Paypal', ' , mikä avaa taas lisää ovia .']
['Nyt pistän pääni pölkyllä : ', 'WinForms', ' on ihan ok .']

Pandas & Dataframe: ValueError: can only convert an array of size 1 to a Python scalar

I tried to find a similar question, but still couldnt find the solution.
I am working on a dataframe with pandas.
The following code is nor working. It is only working for the first row on the dataframe. Already for the second row I am getting the error. as below. Maybe somebody sees the mistake and can help :)
census2=census_df[census_df["SUMLEV"]==50]
list=census2["CTYNAME"].tolist()
max=0
for county1 in list:
countylist=[]
df1=census2[census2["CTYNAME"]==county1]
countylist.append(df1["POPESTIMATE2010"].item())
countylist.append(df1["POPESTIMATE2011"].item())
countylist.append(df1["POPESTIMATE2012"].item())
countylist.append(df1["POPESTIMATE2013"].item())
countylist.append(df1["POPESTIMATE2014"].item())
countylist.append(df1["POPESTIMATE2015"].item())
countylist.sort()
difference=countylist[5]-countylist[0]
if difference > max:
max=difference
maxcounty=county1
print(maxcounty)
print(max)
[54660, 55253, 55175, 55038, 55290, 55347]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-340aeaf28039> in <module>()
12 countylist=[]
13 df1=census2[census2["CTYNAME"]==county1]
---> 14 countylist.append(df1["POPESTIMATE2010"].item())
15 countylist.append(df1["POPESTIMATE2011"].item())
16 countylist.append(df1["POPESTIMATE2012"].item())
/opt/conda/lib/python3.6/site-packages/pandas/core/base.py in item(self)
829 """
830 try:
--> 831 return self.values.item()
832 except IndexError:
833 # copy numpy's message here because Py26 raises an IndexError
ValueError: can only convert an array of size 1 to a Python scalar

model can predict new image

ValueError Traceback (most recent call last) in () ----> 1 prediction = model.predict(image_resized.reshape(1,50,50,3)) 2 print('Prediction Score:\n',prediction[0]) ValueError: cannot reshape array of size 2352 into shape (1,50,50,3)

By just looking at what you have posted you should replace image_resized.reshape(1,50,50,3) with image_resized.reshape(1,28,28,3)

Perform a frequency distribution count on a generator, and return values that are greater than n

Is there a way to perform a count on a generator object that is pointing to a list of lists? If so, can I make the count operation output a generator object (of counted items) of previous generator object? I then would like to get a frequency count. I am using generators to conserve memory and prevent crashes. My real data set/list is enormous!
I have a generator object, 'gen_list', created from a list of lists, I'll just show you what the list looks like if the generator object was printed:
In [1]: ll = [(('color'), ('blue')), (('food'), ('grapes')), (('color'), ('blue'))]
# create generator object 'test2'
In [2]: genobj = (each for each in ll)
# create a generator object with counted items
In [3]: count = (test2.count((i), i) for i in test2)
# list count
In [4]: list(count)
This creates the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-72-83b1c94e3edd> in <module>()
----> 1 list(count)
<ipython-input-70-829ea68a1314> in <genexpr>(.0)
----> 1 count = (test2.count((i), i) for i in test2)
AttributeError: 'generator' object has no attribute 'count'
So I am stuck here. If I can resolve this, I can move onto getting a frequency count (in the form of a generator object) which would look something like:
[(2, ('color', 'blue')), (1, ('food', 'grapes')), (2, ('color', 'blue'))]
Then I would only want save items with values greater than 2, for visual analysis.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Spark -Python Not able to add values in a column - python-3.x

Related

AttributeError: module 'cirq' has no attribute 'GridQubit'

How can I make a list of three sentences to a string?

Pandas & Dataframe: ValueError: can only convert an array of size 1 to a Python scalar

model can predict new image

Perform a frequency distribution count on a generator, and return values that are greater than n

Categories

Resources