'DeltaMergeBuilder' object has no attribute 'WhenNotMatchedInsert' - databricks

while doing merge with deltatable getting below error.
'DeltaMergeBuilder' object has no attribute 'WhenNotMatchedInsert'
from delta.tables import *
delta_df = DeltaTable.forPath(spark,"dbfs:/user/hive/warehouse/FileStore/tables/stream_write2")
delta_df.alias("t").merge(
df.alias("s"),
"target.empid=source.empid").whenMatchedUpdate(set =
{
"name":"source.name",
"city":"source.city",
"country":"source.country",
"contactno":"source.contactno"
}
).WhenNotMatchedInsert(Values =
{
"empid":"source.empid",
"name":"source.name",
"city":"source.city",
"country":"source.country",
"contactno":"source.contactno"
}
)
.execute()
error:
AttributeError: 'DeltaMergeBuilder' object has no attribute 'WhenNotMatchedInsert'
AttributeError Traceback (most recent call last)
<command-3810732791373279> in <cell line: 1>()
1 delta_df.alias("t").merge(
2 df.alias("s"),
3 "target.empid=source.empid").whenMatchedUpdate(set =
4 {
5 "name":"source.name",
AttributeError: 'DeltaMergeBuilder' object has no attribute 'WhenNotMatchedInsert'
Command took 0.21 seconds -- byat 1/5/2023, 6:09:03 PM on Cluster
I am working on upsert in delta table but getting below error.

The error is simple - your function name is using upper case W, but it should be lower-case: whenNotMatchedInsert

Related

AttributeError: 'function' object has no attribute 'isna'

After applying isna to my original dataset, and saving it to a new variable.
That new variable is not acting like a dataframe and the output shows this error (AttributeError: 'function' object has no attribute 'isna') when I look for its shape.
When I read the new dataframe, it gives the description of the new dataframe in the output.
df1Books = df1.dropna
print(df1Books)
It is giving description of the new variable df1Books
And
df1Books.head()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-8f073fd0d9cc> in <module>
----> 1 df1Books.head()-tt
AttributeError: 'function' object has no attribute 'head'

Using zoneinfo with pandas.date_range

I am trying to use zoneinfo instead of pytz. I am running into a problem using zoneinfo to initiate dates and passing it on to pd.date_range.
Below is an example of doing the exact same thing with pytz and with zoneinfo. But, while passing it to pd.date_range getting an error with the latter.
pytz example:
start_date = datetime(2021, 1, 1, 0, 0, 0)end_date = datetime(2024, 1, 1, 0, 0, 0) # exclusive end range
pt = pytz.timezone('Canada/Pacific')start_date = pt.localize(start_date)end_date = pt.localize(end_date)
pd.date_range(start_date, end_date-timedelta(days=1), freq='d')
zoneinfo example:
start_date1 = '2021-01-01 00:00:00
start_date1 = datetime.strptime(start_date1, '%Y-%m-%d %H:%M:%S').replace(microsecond=0, second=0, minute=0, tzinfo=ZoneInfo("America/Vancouver"))end_date1 = start_date1 + relativedelta(years=3)
pd.date_range(start_date1, end_date1-timedelta(days=1), freq='d')
Yet, when using zoneinfo I get the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/Documents/GitHub/virtual/lib/python3.9/site-packages/pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslibs.tzconversion.tz_convert_from_utc_single'
Traceback (most recent call last):
File "pandas/_libs/tslibs/timezones.pyx", line 266, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/Documents/GitHub/virtual/lib/python3.9/site-packages/pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslibs.tzconversion.tz_convert_from_utc_single'
Traceback (most recent call last):
File "pandas/_libs/tslibs/timezones.pyx", line 266, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/vp/7ptlp5l934vdh1lvmpgk4qyc0000gn/T/ipykernel_67190/3566591779.py in <module>
5 end_date1 = start_date1 + relativedelta(years=3)
6
----> 7 pd.date_range(start_date1, end_date1-timedelta(days=1), freq='d')
8
9 # Because certain distributions will be a result of combined distributions,
~/Documents/GitHub/virtual/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
1095 freq = "D"
1096
-> 1097 dtarr = DatetimeArray._generate_range(
1098 start=start,
1099 end=end,
~/Documents/GitHub/virtual/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in _generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, closed)
450
451 if tz is not None and index.tz is None:
--> 452 arr = tzconversion.tz_localize_to_utc(
453 index.asi8, tz, ambiguous=ambiguous, nonexistent=nonexistent
454 )
~/Documents/GitHub/virtual/lib/python3.9/site-packages/pandas/_libs/tslibs/tzconversion.pyx in pandas._libs.tslibs.tzconversion.tz_localize_to_utc()
~/Documents/GitHub/virtual/lib/python3.9/site-packages/pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Testing the parameters:
start_date==start_date1
and
end_date==end_date1
Both tests result in True.
if understanding correctly you want to create a date range (1D freq) using ZoneInfo…if correct I see a few things going on with your code.
#1 When dealing with datetimes be sure the object is in the correct dtype. I believe datetime64 format will work better.
#2 From the provide code I don’t think ‘strptime’ or ‘replace’ are needed. To access "America/Vancouver" within ZoneInfo you can make it work if you parse start_date1 into years, months, days, hours and minutes.
#3 When start_date1 is parsed, you can add 3 to years (or another number) to create the end date.
The above will create a DatetimeIndex over the specified range.
Datetimes are always tricky. As always you can get to the same destination using different paths…this is just one of them.
start_date_str = '2021-01-01 00:00:00'
start_date_datetime64 = pd.to_datetime(start_date_str) # change dtype to datetime64
year = start_date_datetime64.year
month = start_date_datetime64.month
day = start_date_datetime64.day
hour = start_date_datetime64.hour
minute = start_date_datetime64.minute
start_date_formatted = dt.datetime(year, month, day, hour, minute, tzinfo=ZoneInfo("America/Vancouver"))
end_date_formatted = dt.datetime(year + 3, month, day, hour, minute, tzinfo=ZoneInfo("America/Vancouver"))
result = pd.date_range(start_date_formatted, end_date_formatted-pd.Timedelta(days=1), freq='d')
OUTPUT- DatetimeIndex, dtype='datetime64[ns, America/Vancouver]', length=1095, freq='D')
This error was a result of compatibility between the pandas version and the nbformat version. Once I updated both to the newest version, the code worked with no error.

i got error like this : DataFrame constructor not properly called

I got error when I want to make dataframe after cleaning data! The code is as follows:
data_clean = pd.DataFrame(cleaner_data,columns=['tweet'])
data_clean.head()
and error info :
ValueError Traceback (most recent call last)
<ipython-input-62-1d07a4d30120> in <module>
----> 1 data_clean = pd.DataFrame(cleaner_data,columns=['tweet'])
2 data_clean.head()
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 )
508 else:
--> 509 raise ValueError("DataFrame constructor not properly called!")
510
511 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
I don't know how to solve it. It's said dataframe constructor no properly called.
Do like this:
df_clean = cleaner_data['tweet']
df_clean.head()

AttributeError: 'str' object has no attribute 'date'

I have written code as below:
import datetime as dt
d = dt.date(2000, 1, 15)
But I am getting an error:
AttributeError Traceback (most recent call last)
<ipython-input-23-669287944c85> in <module>
1 # for loop - to convert float->str date to month-year format
2 rnum = 0
----> 3 d = dt.date(2000, 1, 15)
4 data_2["Date_F"] = d
5 for dt in data_2["strDate"]:
AttributeError: 'str' object has no attribute 'date'
I am using Jupyter. Please tell me how to resolve this error.
dont know why you are getting this. I am getting output like datetime.date(2000, 1, 15)
You may check your indentation or something.

AttributeError: 'DataFrame' object has no attribute 'droplevel' in pandas

I am getting a strange (to my understanding) message when I try to drop a level from a multi-indexed pandas dataframe.
For a reproducible example:
toy.to_json()
'{"["ISRG","EPS_diluted"]":{"2004-12-31":0.33,"2005-01-28":0.33,"2005-03-31":0.25,"2005-04-01":0.25,"2005-04-29":0.25},"["DHR","EPS_diluted"]":{"2004-12-31":0.67,"2005-01-28":0.67,"2005-03-31":0.67,"2005-04-01":0.58,"2005-04-29":0.58},"["BDX","EPS_diluted"]":{"2004-12-31":0.75,"2005-01-28":0.75,"2005-03-31":0.72,"2005-04-01":0.72,"2005-04-29":0.72},"["SYK","EPS_diluted"]":{"2004-12-31":0.4,"2005-01-28":0.4,"2005-03-31":0.42,"2005-04-01":0.42,"2005-04-29":0.42},"["BSX","EPS_diluted"]":{"2004-12-31":0.35,"2005-01-28":0.35,"2005-03-31":0.42,"2005-04-01":0.42,"2005-04-29":0.42},"["BAX","EPS_diluted"]":{"2004-12-31":0.18,"2005-01-28":0.18,"2005-03-31":0.36,"2005-04-01":0.36,"2005-04-29":0.36},"["EW","EPS_diluted"]":{"2004-12-31":0.4,"2005-01-28":0.4,"2005-03-31":0.5,"2005-04-01":0.5,"2005-04-29":0.5},"["MDT","EPS_diluted"]":{"2004-12-31":0.44,"2005-01-28":0.45,"2005-03-31":0.45,"2005-04-01":0.45,"2005-04-29":0.16},"["ABT","EPS_diluted"]":{"2004-12-31":0.63,"2005-01-28":0.63,"2005-03-31":0.53,"2005-04-01":0.53,"2005-04-29":0.53}}'
toy.droplevel(level = 1, axis = 1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-33-982eee5ba162> in <module>()
----> 1 toy.droplevel(level = 1, axis = 1)
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
4370 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4371 return self[name]
-> 4372 return object.__getattribute__(self, name)
4373
4374 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'droplevel'
Problem is the use of an older pandas version, because if you check DataFrame.droplevel:
New in version 0.24.0.
The solution is to use MultiIndex.droplevel:
toy.columns = toy.columns.droplevel(level = 1)

Resources