Deleting index in elasticsearch - python-3.x

I want to delete an entire index of elastic search which i had created using the following code in python notebook.
es.index(index='para', doc_type='people', id=1, body={
"name":"Farid ullah",
"height":"160",
"age":"23",
"gender":"male",
"date of birth":"04/02/1994",
"Qualification":"BS in Software engineering"
})
the delete command is as follows,
es.delete(index='para', doc_type='people'),
but I get the following error
TypeError Traceback (most recent call last)
<ipython-input-7-26c24345ae23> in <module>()
----> 1 es.delete(index='para', doc_type='people')
C:\Users\Farid ullah\Anaconda3\lib\site-packages\elasticsearch\client\utils.py in _wrapped(*args, **kwargs)
71 if p in kwargs:
72 params[p] = kwargs.pop(p)
---> 73 return func(*args, params=params, **kwargs)
74 return _wrapped
75 return _wrapper
TypeError: delete() missing 1 required positional argument: 'id'
Can I not be able to delete entire index?
Is there any way to delete it without specifying the id of a particular one?

In your case, 'people' is not an index, it's a type. The index name is 'para'.
I don't know the python API, but your should try something like :
es.delete(index='para')
In this doc :
http://elasticsearch-py.readthedocs.io/en/master/api.html
It is suggested to use something like :
es.indices.delete(index='para')

Related

Azure ML Tabular Dataset : missing 1 required positional argument: 'stream_column'

For the Python API for tabular dataset of AzureML (azureml.data.TabularDataset), there are two experimental methods which have been introduced:
download(stream_column, target_path=None, overwrite=False, ignore_not_found=True)
mount(stream_column, mount_point=None)
Parameter stream_column has been defined as The stream column to mount or download.
What is the actual meaning of stream_column? I don't see any example any where?
Any pointer will be helpful.
The stack trace:
Method download: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_11561/3904436543.py in <module>
----> 1 tab_dataset.download(target_path="../data/tabular")
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_base_sdk_common/_docstring_wrapper.py in wrapped(*args, **kwargs)
50 def wrapped(*args, **kwargs):
51 module_logger.warning("Method {0}: {1} {2}".format(func.__name__, _method_msg, _experimental_link_msg))
---> 52 return func(*args, **kwargs)
53 return wrapped
54
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
130 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
131 try:
--> 132 return func(*args, **kwargs)
133 except Exception as e:
134 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):
TypeError: download() missing 1 required positional argument: 'stream_column'
Update on 5th March, 2022
I posted this as a support ticket with Azure. Following is the answer I have received:
As you can see from our documentation of TabularDataset Class,
the “stream_column” parameter is required. So, that error is occurring
because you are not passing any parameters when you are calling the
download method. The “stream_column” parameter should have the
stream column to download/mount. So, you need to pass the column name
that contains the paths from which the data will be streamed.
Please find an example here.

Why my custom dataset gives attribute error?

my initial data was like this
My data is a pandas dataframe with columns 'title' and 'label'. I want to make a custom dataset with this. so I made the dataset like below. I'm working on google colab
class newsDataset(torch.utils.data.Dataset):
def __init__(self,train=True,transform=None):
if train:
self.file = ttrain
else:
self.file= ttest
self.text_list = self.file['title'].values.tolist()
self.class_list=self.file['label'].values.tolist()
def __len__(self):
return len(self.text_list)
def __getitem__(self,idx):
label = self.class_list[idx]
text = self.text_list[idx]
if self.transform is not None:
text=self.transform(text)
return label, text
and this is how I call the dataloader
trainset=newsDataset()
train_iter = DataLoader(trainset)
iter(train_iter).next()
and it gives
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-153-9872744bc8a9> in <module>()
----> 1 iter(train_iter).next()
5 frames
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataset.py in __getattr__(self, attribute_name)
81 return function
82 else:
---> 83 raise AttributeError
84
85 #classmethod
AttributeError:
There was no exact error message. can anybody help me?
Please add the following missing line to your __init__ function:
self.transform = transform
You don't have self.transform attribute so you need to initialize it in __init__ method

i got error like this : DataFrame constructor not properly called

I got error when I want to make dataframe after cleaning data! The code is as follows:
data_clean = pd.DataFrame(cleaner_data,columns=['tweet'])
data_clean.head()
and error info :
ValueError Traceback (most recent call last)
<ipython-input-62-1d07a4d30120> in <module>
----> 1 data_clean = pd.DataFrame(cleaner_data,columns=['tweet'])
2 data_clean.head()
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 )
508 else:
--> 509 raise ValueError("DataFrame constructor not properly called!")
510
511 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
I don't know how to solve it. It's said dataframe constructor no properly called.
Do like this:
df_clean = cleaner_data['tweet']
df_clean.head()

Import CSV - Writing data to a file from the dictionary - Error

I am new to Python.I am trying to write some data to a CSV file.I want to write to a file from a dictionary in Python
def write_info(self):
fname='userinfo.csv'
field_names = ['Username', 'Password']
with open(fname, 'w') as op_file:
op_writer = csv.DictWriter(op_file, fieldnames=field_names)
op_writer.writeheader()
**for row in self.user_dict:
op_writer.writerow(row)**
Can you guys tell me how to read the dictionary and write it to the file. When I print the dictionary self.user_dict I can see the values.
When I see the
**for row in self.user_dict:
op_writer.writerow(row)**
and I get the error.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-e1069dd9aafc> in <module>()
28
29 # writing to file
---> 30 auth.write_info()
<ipython-input-44-e1069dd9aafc> in write_info(self)
17 op_writer.writeheader()
18 for row in self.user_dict:
---> 19 op_writer.writerow(row)
20
21 # fill in your code
~\AppData\Local\Continuum\anaconda3\lib\csv.py in writerow(self, rowdict)
153
154 def writerow(self, rowdict):
--> 155 return self.writer.writerow(self._dict_to_list(rowdict))
156
157 def writerows(self, rowdicts):
~\AppData\Local\Continuum\anaconda3\lib\csv.py in _dict_to_list(self, rowdict)
146 def _dict_to_list(self, rowdict):
147 if self.extrasaction == "raise":
--> 148 wrong_fields = rowdict.keys() - self.fieldnames
149 if wrong_fields:
150 raise ValueError("dict contains fields not in fieldnames: "
**AttributeError: 'str' object has no attribute 'keys'**
self.user_dict variable does not contain a dict.
The way you want it, user_dict should be a list of dicts.
user_dict = []
user_dict.append({'username': 'joe', 'password': 'test'})
user_dict.append({'username': 'doe', 'password': 'test'})

How do you use dask + distributed for NFS files?

Working from Matthew Rocklin's post on distributed data frames with Dask, I'm trying to distribute some summary statistics calculations across my cluster. Setting up the cluster with dcluster ... works fine. Inside a notebook,
import dask.dataframe as dd
from distributed import Executor, progress
e = Executor('...:8786')
df = dd.read_csv(...)
The file I'm reading is on an NFS mount that all the worker machines have access to. At this point I can look at df.head() for example and everything looks correct. From the blog post, I think I should be able to do this:
df_future = e.persist(df)
progress(df_future)
# ... wait for everything to load ...
df_future.head()
But that's an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-26-8d59adace8bf> in <module>()
----> 1 fraudf.head()
/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/dataframe/core.py in head(self, n, compute)
358
359 if compute:
--> 360 result = result.compute()
361 return result
362
/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/base.py in compute(self, **kwargs)
35
36 def compute(self, **kwargs):
---> 37 return compute(self, **kwargs)[0]
38
39 #classmethod
/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/base.py in compute(*args, **kwargs)
108 for opt, val in groups.items()])
109 keys = [var._keys() for var in variables]
--> 110 results = get(dsk, keys, **kwargs)
111
112 results_iter = iter(results)
/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
55 results = get_async(pool.apply_async, len(pool._pool), dsk, result,
56 cache=cache, queue=queue, get_id=_thread_get_id,
---> 57 **kwargs)
58
59 return results
/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, queue, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, **kwargs)
479 _execute_task(task, data) # Re-execute locally
480 else:
--> 481 raise(remote_exception(res, tb))
482 state['cache'][key] = res
483 finish_task(dsk, key, state, results, keyorder.get)
AttributeError: 'Future' object has no attribute 'head'
Traceback
---------
File "/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/async.py", line 264, in execute_task
result = _execute_task(task, data)
File "/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/async.py", line 246, in _execute_task
return func(*args2)
File "/work/analytics2/analytics/python/envs/analytics/lib/python3.5/site-packages/dask/dataframe/core.py", line 354, in <lambda>
dsk = {(name, 0): (lambda x, n: x.head(n=n), (self._name, 0), n)}
What's the right approach to distributing a data frame when it comes from a normal file system instead of HDFS?
Dask is trying to use the single-machine scheduler, which is the default if you create a dataframe using the normal dask library. Switch the default to use your cluster with the following lines:
import dask
dask.set_options(get=e.get)

Resources