Attribute Error : Function object has no attribute - python-3.x

I see there are so many questions about this title or for this issue, But I still don't understand why it is occurring.
I have imported Pandas and Numpy.
Then I read my file using pd.read_excel.
Then I viewed the head of my file using .head()
Now, after I sliced my data also the .head method was working fine. But now suddenly it throws an Attribute error and it gets resolved once I re-import my file again, but then, after some time it again gives me the same error. What is wrong that I am doing? and I don't understand this error clearly.
import pandas as pd
import numpy as np
sales = pd.read_excel('SALESC.xlsx', header=0)
sales.isnull().sum()
sales["Date"] = pd.to_datetime(sales['Date of document'])
sales = sales[pd.notnull(sales['Quantity sold']) & pd.notnull(sales['Unit
selling price including tax'])]
sales = sales.iloc[:,[3,6,8,9,10,11,19,35,39]]
sales.head(5)
Can someone explain the problem? and how to resolve it, thanks in advance

Related

Quantstats - TypeError: Invalid comparison between dtype=datetime64[ns, America/New_York] and datetime

I'm trying to use some of the Quantstats modules, specifically the quantstats.reports module, in Anaconda to get some metrics reports on a portfolio I've designed. I'm fairly new to Python/Quantstats and am really just trying to get a feel for the library.
I've written the following code to utilize the report module to spit out a complete html report and save it under the Output folder:
import quantstats as qs
qs.extend_pandas()
stock = qs.utils.download_returns('GLD')
qs.reports.html(stock, output='Output/GLD.html')
I then get the following TypeError:
TypeError: Invalid comparison between dtype=datetime64[ns, America/New_York] and datetime
I believe this may be a result of the datetime64 class being localized to my timezone and datetime remaining TZ naive. Frankly, digging through the Quantstats code has been a little beyond my current skillset.
If anybody has any recommendations for fixes, I would greatly appreciate it.
I came upon this while DDGing exactly the same issue.
Not sure which of your columns has the timezone localisation in it, but
df['date'] = df['date'].dt.tz_localize(None)
will get rid of localization for the column df['date']
Incidentally, the usual situation is that the index of a pandas timeseries contains np.Datetime64 types, but when you assign it to a column via
df['date'] = df.index
the resulting column contains pandas Timestamps.
I got this issue resolved after I had lowered the yfinance version from latest version 0.1.87 to
yfinance => 0.1.74

Attribute error: DatasetDict' object has no attribute 'to_tf_dataset'

I am working on fine tuning a data for an NLP project using the hugginface library.
Here is the code i am having the challenge with. Has anyone been able to solve this problem?
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")
tf_dataset = testdata.to_tf_dataset(
columns=["input_ids", "token_type_ids", "attention_mask"],
label_cols=["labels"],
batch_size=2,
collate_fn=data_collator,
shuffle=True
)
NB: I have seen suggestions about upgrading to the latest versions, and i have done that but the problem perists.
I faced the same problem. In my case I was working with a csv file. I used the following code to load the dataset:
from datasets import load_dataset
dataset_training = load_dataset("csv", file)
Then the method to_tf_dataset returned:
Attribute error: DatasetDict' object has no attribute 'to_tf_dataset'
To overcome this issue I loaded the content as a pandas Dataframe and then I loaded again using another method:
import pandas as pd
data = pd.read_csv("file.csv")
from datasets import Dataset
dataset = Dataset.from_pandas(data)
After that, to_tf_dataset method worked correctly. I have no explanation for this answer but it worked for me.

How can I correctly output a dataframe with the values(?) of a JSON as columns?

First time posting.I am Learning how to use Python and decided I will do so using the Riot Games API.
Anyway, I'm trying to output a Legends of Runeterra leaderboard into a DataFrame, however my DataFrame is not mapping 'correctly'. I've done a lot of Googling and have finally given up and thought I'd just ask.
Im betting it's something obvious!
This is my current query - Nice a simple...(This took me 2 hours :P)
import requests
import pandas
response = requests.get("https://europe.api.riotgames.com/lor/ranked/v1/leaderboards?api_key=RGAPI-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx")
response_file = response.json()
data = pandas.DataFrame.from_dict(response_file,orient='columns')
print(data)
It outputs:
I dont want the 'players' key to be the column. I want the Name, Rank and LP to be the columns. I believe these are called values? But I cannot seem to figure out how to do this.
Any help, or links to posts that I have missed that help resolve this would be amazing.
Thank you
EDIT 13:20 13/02/2021
Attached JSON file as requested
https://pastebin.com/ks4AaXQp
I couldnt figure out how to attach a file here, so I threw it in PasteBin.
Try this:
data = pd.read_json(response_file)
If this does not work, post the response_file as a .json file and I'll try to assist you further.
I have managed to resolve this myself.
I didn't specify the 'Key' when creating the DataFrame:
The corrected Code is:
import requests
import pandas
response = requests.get("https://europe.api.riotgames.com/lor/ranked/v1/leaderboards?api_key=RGAPI-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx")
response_file = response.json()
data = pandas.DataFrame(response_file['players'])
print(data)
Output

NameError: name 'split' is not defined with Spark

I have been working on a big dataset with Spark. Last week when I ran the following lines of code it worked perfectly, now it is throwing an error: NameError: name 'split' is not defined. Can somebody explain why this is not working and what should I do? Name split is not defined... Should I define the method? Is it a dependency that I should import? The documentation doesn't say I ahve to import anything in order to use the split method. The code below.
test_df = spark_df.withColumn(
"Keywords",
split(col("Keywords"), "\\|")
)
You can use pyspark.sql.functions.split(), but you first need to import this function:
from pyspark.sql.functions import split
It's better to explicitly import just the functions you need. Do not do from pyspark.sql.functions import *.

Python pandas-datareader fails on comma

I am trying to get a price of a stock from google by using pandas-datareader.data but when I try to call Amazon(amazons price right now is over 1,000) it gives me a value error. I assume it is because of the comma in the price. It automatically attempts to turn it into a float so I have no opportunity to use a .replace function.
ValueError: could not convert string to float: '1,001.30'
I seemingly cannot seem to find a workaround to this issue so any help would be very appreciated, thanks.
import pandas_datareader.data as web
def money(stock):
#df = web.DataReader(stock, "google", start=start, end=end)
df2 = web.get_quote_google(stock)
I think there seems to be currently a compatibility issue with panads and pandas_datareader. However, this might solve your problem using yahoo-finance:
use pip install yahoo-finance to install the module and then run
import yahoo_finance
import pandas as pd
symbol = yahoo_finance.Share("AMZN")
google_df = symbol.get_price()
This gives me no error on the price of Amazon

Resources