I got error when I want to make dataframe after cleaning data! The code is as follows:
data_clean = pd.DataFrame(cleaner_data,columns=['tweet'])
data_clean.head()
and error info :
ValueError Traceback (most recent call last)
<ipython-input-62-1d07a4d30120> in <module>
----> 1 data_clean = pd.DataFrame(cleaner_data,columns=['tweet'])
2 data_clean.head()
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 )
508 else:
--> 509 raise ValueError("DataFrame constructor not properly called!")
510
511 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
I don't know how to solve it. It's said dataframe constructor no properly called.
Do like this:
df_clean = cleaner_data['tweet']
df_clean.head()
Related
For the Python API for tabular dataset of AzureML (azureml.data.TabularDataset), there are two experimental methods which have been introduced:
download(stream_column, target_path=None, overwrite=False, ignore_not_found=True)
mount(stream_column, mount_point=None)
Parameter stream_column has been defined as The stream column to mount or download.
What is the actual meaning of stream_column? I don't see any example any where?
Any pointer will be helpful.
The stack trace:
Method download: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_11561/3904436543.py in <module>
----> 1 tab_dataset.download(target_path="../data/tabular")
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_base_sdk_common/_docstring_wrapper.py in wrapped(*args, **kwargs)
50 def wrapped(*args, **kwargs):
51 module_logger.warning("Method {0}: {1} {2}".format(func.__name__, _method_msg, _experimental_link_msg))
---> 52 return func(*args, **kwargs)
53 return wrapped
54
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
130 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
131 try:
--> 132 return func(*args, **kwargs)
133 except Exception as e:
134 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):
TypeError: download() missing 1 required positional argument: 'stream_column'
Update on 5th March, 2022
I posted this as a support ticket with Azure. Following is the answer I have received:
As you can see from our documentation of TabularDataset Class,
the “stream_column” parameter is required. So, that error is occurring
because you are not passing any parameters when you are calling the
download method. The “stream_column” parameter should have the
stream column to download/mount. So, you need to pass the column name
that contains the paths from which the data will be streamed.
Please find an example here.
my initial data was like this
My data is a pandas dataframe with columns 'title' and 'label'. I want to make a custom dataset with this. so I made the dataset like below. I'm working on google colab
class newsDataset(torch.utils.data.Dataset):
def __init__(self,train=True,transform=None):
if train:
self.file = ttrain
else:
self.file= ttest
self.text_list = self.file['title'].values.tolist()
self.class_list=self.file['label'].values.tolist()
def __len__(self):
return len(self.text_list)
def __getitem__(self,idx):
label = self.class_list[idx]
text = self.text_list[idx]
if self.transform is not None:
text=self.transform(text)
return label, text
and this is how I call the dataloader
trainset=newsDataset()
train_iter = DataLoader(trainset)
iter(train_iter).next()
and it gives
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-153-9872744bc8a9> in <module>()
----> 1 iter(train_iter).next()
5 frames
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataset.py in __getattr__(self, attribute_name)
81 return function
82 else:
---> 83 raise AttributeError
84
85 #classmethod
AttributeError:
There was no exact error message. can anybody help me?
Please add the following missing line to your __init__ function:
self.transform = transform
You don't have self.transform attribute so you need to initialize it in __init__ method
I am working on the Jupyter notebook using Python3. I am trying to load an asc file containing 2 columns by using lin_data1 = np.genfromtxt(outdir+"/test_22/mp_harmonic_im_r8.00.ph.asc"), I am getting the following error.
'ValueError Traceback (most recent call last)
<ipython-input-152-5d1a4cbeab20> in <module>
1 # format of the path: SIMULATION-NAME/output-NNNN/PARFILE-NAME
2
----> 3 lin_data1 = np.genfromtxt(outdir+"/test_22/mp_harmonic_im_r8.00.ph.asc")
~/.local/lib/python3.8/site-packages/numpy/lib/npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding)
2078 # Raise an exception ?
2079 if invalid_raise:
-> 2080 raise ValueError(errmsg)
2081 # Issue a warning ?
2082 else:
ValueError: Some errors were detected !
Line #2 (got 2 columns instead of 3)
Line #3 (got 2 columns instead of 3
DATA SET HERE https://drive.google.com/open?id=1r24rrKWcIpA1x34tPY8olJFMtjzl0IRn
I am trying to convert my time series into type DateTime, so to do that I needed to make all the number eg.(1256,430,7) into same size eg.(1256,0430,0007) for the to_datetime() to work.
So fist I separated the Entity according to their length and added number of zero required, concat the "Series" into one that were seperated.
FIRST ERROR
This error was sorted by using append() in Series. Then I tried to_datetime()
Second Error
I cant figure out what am I doing wrong
I updated my pandas library up to date.
Still the problem remains.
I tried this on Google Colab thinking might be some problem in my pandas lib.
a='0'+arr_time[arr_time.astype(str).str.len()==3].astype(int).astype(str)
b='0'+dep_time[dep_time.astype(str).str.len()==3].astype(int).astype(str)
c='00'+arr_time[arr_time.astype(str).str.len()==2].astype(int).astype(str)
d='00'+dep_time[dep_time.astype(str).str.len()==2].astype(int).astype(str)
e='000'+arr_time[arr_time.astype(str).str.len()==1].astype(int).astype(str)
f='000'+dep_time[dep_time.astype(str).str.len()==1].astype(int).astype(str)
g=arr_time[arr_time.astype(str).str.len()==4].astype(int).astype(str)
h=dep_time[dep_time.astype(str).str.len()==4].astype(int).astype(str)
arr_time=pd.concat([a,c,e,g])
dep_time=pd.concat([b,d,f,h])
'''concat() is then replaced by append() ERROR detail is below
{AttributeError Traceback (most recent call
last)
<ipython-input-20-61e7a2e98b70> in <module>()
----> 1 arr_time=pd.concat([aa,ba,ca,pa])
2 dep_time=pd.concat([ad,bd,cd,pa])
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in
__getattr__(self, name)
5065 if
self._info_axis._can_hold_identifiers_and_holds_name(name):
5066 return self[name]
-> 5067 return object.__getattribute__(self, name)
5068
5069 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'concat'}'''
arr_time=a.append(c).append(e).append(g)
dep_time=b.append(d).append(f).append(h)
datetime=arr_time.to_datetime(format="%H%M")
'''second error BOTH OF THEM LOOK ALIKE
{AttributeError Traceback (most recent call last)
<ipython-input-13-5a63dad5c284> in <module>
----> 1 datetime=arr_time.to_datetime(format="%H%M")
~\AppData\Local\Continuum\anaconda3\lib\site- packages\pandas\core\generic.py in __getattr__(self, name)
5065 if
self._info_axis._can_hold_identifiers_and_holds_name(name):
5066 return self[name]
-> 5067 return object.__getattribute__(self, name)
5068
5069 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'to_datetime'}'''
I am trying to select sensors by placing a box around their geographic coordinates:
In [1]: lat_min, lat_max = lats(data)
lon_min, lon_max = lons(data)
print(np.around(np.array([lat_min, lat_max, lon_min, lon_max]), 5))
Out[1]: [ 32.87248 33.10181 -94.37297 -94.21224]
In [2]: select_sens = sens[(lat_min<=sens['LATITUDE']) & (sens['LATITUDE']<=lat_max) &
(lon_min<=sens['LONGITUDE']) & (sens['LONGITUDE']<=lon_max)].copy()
Out[2]: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-7881f6717415> in <module>()
4 lon_min, lon_max = lons(data)
5 select_sens = sens[(lat_min<=sens['LATITUDE']) & (sens['LATITUDE']<=lat_max) &
----> 6 (lon_min<=sens['LONGITUDE']) & (sens['LONGITUDE']<=lon_max)].copy()
7 sens_data = data[data['ID'].isin(select_sens['ID'])].copy()
8 sens_data.describe()
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
703 return NotImplemented
704 elif isinstance(other, (np.ndarray, pd.Index)):
--> 705 if len(self) != len(other):
706 raise ValueError('Lengths must match to compare')
707 return self._constructor(na_op(self.values, np.asarray(other)),
TypeError: len() of unsized object
Of course, sens is a pandas DataFrame. Even when I use .where() it raises the same error. I am completely stumped, because it is a simple comparison that shouldn't raise any errors. Even the data types match:
In [3]: sens.dtypes
Out[3]: ID object
COUNTRY object
STATE object
COUNTY object
LENGTH float64
NUMBER object
NAME object
LATITUDE float64
LONGITUDE float64
dtype: object
So what is going on?!?
-----EDIT------
As per Ethan Furman's answer, I made the following changes:
In [2]: select_sens = sens[([lat_min]<=sens['LATITUDE']) & (sens['LATITUDE']<=[lat_max]) &
([lon_min]<=sens['LONGITUDE']) & (sens['LONGITUDE']<=[lon_max])].copy()
And (drumroll) it worked... But why?
I'm not familiar with NumPy nor Pandas, but the error is saying that one of the objects in the comparison if len(self) != len(other) does not have a __len__ method and therefore has no length.
Try doing print(sens_data) to see if you get a similar error.
I found a similar issue and think the problem may be related to the Python version you are using.
I wrote my code in Spyder
Python 3.6.1 |Anaconda 4.4.0 (64-bit)
but then passed it to someone using Spyder but
Python 3.5.2 |Anaconda 4.2.0 (64-bit)
I had one numpy.float64 object (as far as i understand, similar to lat_min, lat_max, lon_min and lon_max in your code) MinWD.MinWD[i]
In [92]: type(MinWD.MinWD[i])
Out[92]: numpy.float64
and a Pandas data frame WatDemandCur with one column called Percentages
In [96]: type(WatDemandCur)
Out[96]: pandas.core.frame.DataFrame
In [98]: type(WatDemandCur['Percentages'])
Out[98]: pandas.core.series.Series
and i wanted to do the following comparison
In [99]: MinWD.MinWD[i]==WatDemandCur.Percentages
There was no problem with this line when running the code in my machine (Python 3.6.1)
But my friend got something similar to you in (Python 3.5.2)
MinWD.MinWD[i]==WatDemandCur.Percentages
Traceback (most recent call last):
File "<ipython-input-99-3e762b849176>", line 1, in <module>
MinWD.MinWD[i]==WatDemandCur.Percentages
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\ops.py", line 741, in wrapper
if len(self) != len(other):
TypeError: len() of unsized object
My solution to his problem was to change the code to
[MinWD.MinWD[i]==x for x in WatDemandCur.Percentages]
and it worked in both versions!
With this and your evidence, i would assume that it is not possible to compare numpy.float64 and perhaps numpy.integers objects with Pandas Series, and this could be partly related to the fact that the former have no len function.
Just for curiosity, i did some tests with float and integer objects (please tell the difference with numpy.float64 object)
In [122]: Temp=1
In [123]: Temp2=1.0
In [124]: type(Temp)
Out[124]: int
In [125]: type(Temp2)
Out[125]: float
In [126]: len(Temp)
Traceback (most recent call last):
File "<ipython-input-126-dc80ab11ca9c>", line 1, in <module>
len(Temp)
TypeError: object of type 'int' has no len()
In [127]: len(Temp2)
Traceback (most recent call last):
File "<ipython-input-127-a1b836f351d2>", line 1, in <module>
len(Temp2)
TypeError: object of type 'float' has no len()
Temp==WatDemandCur.Percentages
Temp2==WatDemandCur.Percentages
Both worked!
Conclusions
In another python version your code should work!
The problem with the comparison is specific for numpy.floats and perhaps numpy.integers
When you include [] or when I create the list with my solution, the type of object is changed from a numpy.float to a list, and in this way it works fine.
Although the problem seems to be related to the fact that numpy.float64 objects have no len function, floats and integers, which do not have a len function either, do work.
Hope some of this works for you or someone else facing a similar issue.