Cannot from pandas import Dataframe - python-3.x

from pandas import Dataframe
ImportError Traceback (most recent call last)
in ()
----> 1 from pandas import Dataframe
ImportError: cannot import name 'Dataframe'
I understand there are workarounds but I need to do this for an assignment. I am using Jupiter Python ver 3.6.
Thsnks in Advance

from pandas import DataFrame
Notice capitalization

Related

deleting pandas dataframe rows not working

import numpy as np
import pandas as pd
randArr = np.random.randint(0,100,20).reshape(5,4)
df =pd.DataFrame(randArr,np.arange(101,106,1),['PDS', 'Algo','SE','INS'])
df.drop('103',inplace=True)
this code not working
Traceback (most recent call last):
File "D:\Education\4th year\1st sem\Machine Learning Lab\1st Lab\python\pandas\pdDataFrame.py", line 25, in <module>
df.drop('103',inplace=True)
The string '103' isnt in the index, but the integer 103 is:
Replace df.drop('103',inplace=True) with df.drop(103,inplace=True)

'SparkSession' object has no attribute 'textFile'

I am currently using SparkSession and was told that SparkContext is within SparkSession. However, when doing up the code, it is showing me an error that SparkContext does not exist in SparkSession
Below is the code that i have done
import findspark
findspark.init()
from pyspark.sql import SparkSession, Row
import collections
spark = SparkSession.builder.config("spark.sql.warehouse.dir", "file://C:/temp").appName("SparkSQL").getOrCreate()
lines = spark.textFile('C:/Users/file.xslx')
The error is as follow:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_59944/722806425.py in <module>
----> 1 lines = spark.textFile('C:/Users/samue/bt4221_spark/exercise/week5/customer-orders.xslx')
AttributeError: 'SparkSession' object has no attribute 'textFile'
My current version of
findspark: 1.4.2
pyspark: 3.0.3
I dont think its related to any version issue. Any help is greatly appreciated! :)
textFile is present in SparkContext class not in SparkSession.
spark.sparkContext.textFile('filepath')

Pandas ImportError

Whenever I try to import pandas as pd it shows the error
ImportError: cannot import name 'window' from 'pandas.core' (//anaconda3/lib/python3.7/site-packages/pandas/core/__init__.py)

Facing 'list' object has no attribute 'find' error while parsing all docuemnts inside a collection

Hi I have a database name:Tmobile and it has a collection with two files Mynew and 30dc..
See image
Now i am trying to loop through the files and print any element /file from it.
Below is my code:
import pandas as pd
import json
from pandas.io.json import json_normalize
from unittest.mock import inplace
from pymongo import MongoClient
from idlelib.rpc import response_queue
from pandas import DataFrame
connection = MongoClient('localhost', 27017)
db=connection.Tmobile
collection=db.Collections
#print(collection)
for collection1 in collection.find():
print(collection1)
But when i try this i get the below error:
Traceback (most recent call last):
File "C:\Users\esrilka\eclipse-workspace\My First PyDev Project\dbsample.py", line 13, in <module>
['30dc6d110c7a0d3d371177ac0a3624bc_1', 'Mynew']
for collection in collection.find():
AttributeError: 'list' object has no attribute 'find'

module 'pyspark_csv' has no attribute 'csvToDataframe'

I am new to spark and facing an error while converting .csv file to dataframe. I am using pyspark_csv module for the conversion but gives an error saying "module 'pyspark_csv' has no attribute 'csvToDataframe".
here is my code:
import findspark
findspark.init()
findspark.find()
import pyspark
sc=pyspark.SparkContext(appName="myAppName")
sqlCtx = pyspark.SQLContext
#csv to dataframe
sc.addPyFile('/usr/spark-1.5.0/python/pyspark_csv.py')
sc.addPyFile('https://raw.githubusercontent.com/seahboonsiew/pyspark-csv/master/pyspark_csv.py')
import pyspark_csv as pycsv
#skipping the header
def skip_header(idx, iterator):
if(idx == 0):
next(iterator)
return iterator
#loading the dataset
data=sc.textFile('gdeltdata/20160427.CSV')
data_header = data.first()
data_body = data.mapPartitionsWithIndex(skip_header)
data_df = pycsv.csvToDataframe(sqlctx, data_body, sep=",", columns=data_header.split('\t'))
AttributeError Traceback (most recent call last)
<ipython-input-10-8e47cd9759e6> in <module>()
----> 1 data_df = pycsv.csvToDataframe(sqlctx, data_body, sep=",", columns=data_header.split('\t'))
AttributeError: module 'pyspark_csv' has no attribute 'csvToDataframe'
As mentioned in https://github.com/seahboonsiew/pyspark-csv
Please try using the following command:
csvToDataFrame
with Frame instead of frame

Resources