I see there are so many questions about this title or for this issue, But I still don't understand why it is occurring.
I have imported Pandas and Numpy.
Then I read my file using pd.read_excel.
Then I viewed the head of my file using .head()
Now, after I sliced my data also the .head method was working fine. But now suddenly it throws an Attribute error and it gets resolved once I re-import my file again, but then, after some time it again gives me the same error. What is wrong that I am doing? and I don't understand this error clearly.
import pandas as pd
import numpy as np
sales = pd.read_excel('SALESC.xlsx', header=0)
sales.isnull().sum()
sales["Date"] = pd.to_datetime(sales['Date of document'])
sales = sales[pd.notnull(sales['Quantity sold']) & pd.notnull(sales['Unit
selling price including tax'])]
sales = sales.iloc[:,[3,6,8,9,10,11,19,35,39]]
sales.head(5)
Can someone explain the problem? and how to resolve it, thanks in advance
Related
I'm trying to use some of the Quantstats modules, specifically the quantstats.reports module, in Anaconda to get some metrics reports on a portfolio I've designed. I'm fairly new to Python/Quantstats and am really just trying to get a feel for the library.
I've written the following code to utilize the report module to spit out a complete html report and save it under the Output folder:
import quantstats as qs
qs.extend_pandas()
stock = qs.utils.download_returns('GLD')
qs.reports.html(stock, output='Output/GLD.html')
I then get the following TypeError:
TypeError: Invalid comparison between dtype=datetime64[ns, America/New_York] and datetime
I believe this may be a result of the datetime64 class being localized to my timezone and datetime remaining TZ naive. Frankly, digging through the Quantstats code has been a little beyond my current skillset.
If anybody has any recommendations for fixes, I would greatly appreciate it.
I came upon this while DDGing exactly the same issue.
Not sure which of your columns has the timezone localisation in it, but
df['date'] = df['date'].dt.tz_localize(None)
will get rid of localization for the column df['date']
Incidentally, the usual situation is that the index of a pandas timeseries contains np.Datetime64 types, but when you assign it to a column via
df['date'] = df.index
the resulting column contains pandas Timestamps.
I got this issue resolved after I had lowered the yfinance version from latest version 0.1.87 to
yfinance => 0.1.74
I am working on fine tuning a data for an NLP project using the hugginface library.
Here is the code i am having the challenge with. Has anyone been able to solve this problem?
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")
tf_dataset = testdata.to_tf_dataset(
columns=["input_ids", "token_type_ids", "attention_mask"],
label_cols=["labels"],
batch_size=2,
collate_fn=data_collator,
shuffle=True
)
NB: I have seen suggestions about upgrading to the latest versions, and i have done that but the problem perists.
I faced the same problem. In my case I was working with a csv file. I used the following code to load the dataset:
from datasets import load_dataset
dataset_training = load_dataset("csv", file)
Then the method to_tf_dataset returned:
Attribute error: DatasetDict' object has no attribute 'to_tf_dataset'
To overcome this issue I loaded the content as a pandas Dataframe and then I loaded again using another method:
import pandas as pd
data = pd.read_csv("file.csv")
from datasets import Dataset
dataset = Dataset.from_pandas(data)
After that, to_tf_dataset method worked correctly. I have no explanation for this answer but it worked for me.
First time posting.I am Learning how to use Python and decided I will do so using the Riot Games API.
Anyway, I'm trying to output a Legends of Runeterra leaderboard into a DataFrame, however my DataFrame is not mapping 'correctly'. I've done a lot of Googling and have finally given up and thought I'd just ask.
Im betting it's something obvious!
This is my current query - Nice a simple...(This took me 2 hours :P)
import requests
import pandas
response = requests.get("https://europe.api.riotgames.com/lor/ranked/v1/leaderboards?api_key=RGAPI-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx")
response_file = response.json()
data = pandas.DataFrame.from_dict(response_file,orient='columns')
print(data)
It outputs:
I dont want the 'players' key to be the column. I want the Name, Rank and LP to be the columns. I believe these are called values? But I cannot seem to figure out how to do this.
Any help, or links to posts that I have missed that help resolve this would be amazing.
Thank you
EDIT 13:20 13/02/2021
Attached JSON file as requested
https://pastebin.com/ks4AaXQp
I couldnt figure out how to attach a file here, so I threw it in PasteBin.
Try this:
data = pd.read_json(response_file)
If this does not work, post the response_file as a .json file and I'll try to assist you further.
I have managed to resolve this myself.
I didn't specify the 'Key' when creating the DataFrame:
The corrected Code is:
import requests
import pandas
response = requests.get("https://europe.api.riotgames.com/lor/ranked/v1/leaderboards?api_key=RGAPI-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx")
response_file = response.json()
data = pandas.DataFrame(response_file['players'])
print(data)
Output
I have been working on a big dataset with Spark. Last week when I ran the following lines of code it worked perfectly, now it is throwing an error: NameError: name 'split' is not defined. Can somebody explain why this is not working and what should I do? Name split is not defined... Should I define the method? Is it a dependency that I should import? The documentation doesn't say I ahve to import anything in order to use the split method. The code below.
test_df = spark_df.withColumn(
"Keywords",
split(col("Keywords"), "\\|")
)
You can use pyspark.sql.functions.split(), but you first need to import this function:
from pyspark.sql.functions import split
It's better to explicitly import just the functions you need. Do not do from pyspark.sql.functions import *.
I am trying to get a price of a stock from google by using pandas-datareader.data but when I try to call Amazon(amazons price right now is over 1,000) it gives me a value error. I assume it is because of the comma in the price. It automatically attempts to turn it into a float so I have no opportunity to use a .replace function.
ValueError: could not convert string to float: '1,001.30'
I seemingly cannot seem to find a workaround to this issue so any help would be very appreciated, thanks.
import pandas_datareader.data as web
def money(stock):
#df = web.DataReader(stock, "google", start=start, end=end)
df2 = web.get_quote_google(stock)
I think there seems to be currently a compatibility issue with panads and pandas_datareader. However, this might solve your problem using yahoo-finance:
use pip install yahoo-finance to install the module and then run
import yahoo_finance
import pandas as pd
symbol = yahoo_finance.Share("AMZN")
google_df = symbol.get_price()
This gives me no error on the price of Amazon