Dataset + Experiment Run tracking - azure-machine-learning-service

Can't see associated run in Dataset pages in Azure ML Workspace
I want to see when & where dataset was used in the experiment. But I can't see it. I tried tutorial notebook.
run.get_details()
output
'containerInstance': {'cpuCores': 2, 'memoryGb': 3.5, 'region': None},
'data': {},
'dataReferences': {},...
Detail log doesn't show any dataset information as above. And no associated run in dataset page
enter image description here
I want to confirm that my dataset was used in the experiment from ...
run.get_details()
Workspace Dataset page
Any workaround on this ?

Thanks for the feedback. We are working on adding the tracking to retrieve dataset from Experiment run at the moment. Will publish documentations and tutorials once the feature is released.

Thanks for the question. We are adding the logging to get_by_name(). As a work around, in your training code, you can do
Dataset.get()
instead of
Dataset.get_by_name()
and then it will be there. You should see
'azureml.dataset.get.titanic dataset:1': '{"name": "titanic dataset", "definition": "1", "snapshot": ""}'}

Related

Setting up Visual Studio Code to run models from Hugging Face

I am trying to import models from hugging face and use them in Visual Studio Code.
I installed transformers, tensorflow, and torch.
I have tried looking at multiple tutorials online but have found nothing.
I am trying to run the following code:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier("I hate it when I'm sitting under a tree and an apple hits my head.")
print(result)
However, I get the following error:
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Traceback (most recent call last):
File "c:\Users\user\Desktop\Artificial Intelligence\transformers\Workshops\workshop_3.py", line 4, in <module>
classifier = pipeline('sentiment-analysis')
File "C:\Users\user\Desktop\Artificial Intelligence\transformers\src\transformers\pipelines\__init__.py", line 702, in pipeline
framework, model = infer_framework_load_model(
File "C:\Users\user\Desktop\Artificial Intelligence\transformers\src\transformers\pipelines\base.py", line 266, in infer_framework_load_model
raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
ValueError: Could not load model distilbert-base-uncased-finetuned-sst-2-english with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForSequenceClassification'>, <class 'transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification'>, <class 'transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertForSequenceClassification'>).
I have already searched online for ways to set up transformers to use in Visual Studio Code but nothing is helping.
Do you know how to fix this error, or if someone knows how to successfully use models from Hugging Face into my code, it would be appreciated?
This question is a little less about Hugging Face itself and likely more about installation and the installation steps you took (and potentially your program's access to the cache file where the models are automatically downloaded to.).
From what I am seeing either:
1/ your program is unable to access the model
2/ your program is throwing specific value errors in a bit of an edge case
If 1/ Take a look here: [https://huggingface.co/docs/transformers/installation#cache-setup][1]
Notice that it the docs walks through where the pre-trained models are downloaded. Check that it was downloaded here: C:\Users\username\.cache\huggingface\hub (of course with your own username on your computer instead. Check in the cache location to make sure it was downloaded? (You can check in the cache locations mentioned.)
Second, if for some reason, there is an issue with downloading, you can try downloading manually and doing it via offline mode (this is more to get it up and running): https://huggingface.co/docs/transformers/installation#offline-mode
Third, if it is downloaded, do you have the right permissions to access the .cache? (Try running your program (if it is a program that you trust) on Windows Terminal as an administrator.). Various ways - find one that you're comfortable with, here are a couple hints from Stackoverflow/StackExchange: Opening up Windows Terminal with elevated privileges, from within Windows Terminal or this: https://superuser.com/questions/1560049/open-windows-terminal-as-admin-with-winr
If 2/ I have seen people bring up very specific issues on not finding specific values (not the same as yours but similar) and the issue was solved by installing PyTorch because some models only exist as PyTorch models. You can see the full response from #YokoHono here: Transformers model from Hugging-Face throws error that specific classes couldn t be loaded

How can I use Brightway2 with US LCI database?

Short version:
I am trying to upload US LCI database to Brightway2 and I am failing miserably. Has anyone succeeded? If so, could you share it with me? :D
Long version:
I am following the notebook IO - Importing the US LCI database notebook and I am having a lot of problems. I am aware that, as the notebook indicates, it is a work in progress. Anyhow, I wanted to give it a try:
I tried uploading every ecospold version database found here, following the method from the notebook. The only one that gave me a similar results was version FY20.Q3.02. However, right off the bat I get the following differences/errors:
Same as the notebook, I get this error: Couldn't apply strategy link_technosphere_by_activity_hash: Object in source database can't be uniquely linked to target database. And two activities that are linked. When I follow the instructions of ignoring these datasets, it throws me that error over and over again.
Trying to move on with the tutorial, I get more errors and at the end I end up with all exchanges unlinked:
633 datasets
37513 exchanges
37505 unlinked exchanges
Finally, after running the code in line [15]:
import functools
f = functools.partial(link_iterable_by_fields,
other=Database(config.biosphere),
kind='biosphere'
)
sp.apply_strategy(f)
sp.statistics(f)
I end up with:
0 datasets
0 exchanges
0 unlinked exchanges
Which is hilarious and sad at the same time. Since I am new with Python and BW, my troubleshooting is clumpsy and probably erroneous (I promise I googled a lot and went through the code). And concluded I am failing and it is time to ask questions:
Has anybody succeeded uploading the US LCI database to Brightway2?
If so, how? Which file did you use?
Thank you!!!!
This is an excellent question. I have added text to the offending notebook to note that it is obsolete.
In general, I think trying to import the ecospold files is a fools errand, as though they are labeled ecospold2, they are actually ecospold1 (which is a totally different format):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ecoSpold xmlns="http://www.EcoInvent.org/EcoSpold01">
The most recent export also raises an error when I try the ecospold1 importer:
AttributeError: no such child: {http://www.EcoInvent.org/EcoSpold01}modellingAndValidation
This is a required attribute in ecospold1.
I think the best way forward would be to consume the JSON-LD directly. Note that it is important not to run bw2setup(), as you would also want to use their list of elementary flows and LCIA methods. Currently the experimental JSON-LD importer fails because the provided datasets need allocation, but don't provide a set of consistent allocation methods. When I use the git checkout of bw2io and do the following:
uslci = JSONLDImporter(
"/Users/cmutel/Downloads/National_Renewable_Energy_Laboratory-USLCI_Database/",
"US LCI",
preferred_allocation="CAUSAL_ALLOCATION"
)
uslci.apply_strategies()
I get the following error:
UnallocatableDataset: We currently only support exchange-specific CAUSAL_ALLOCATION
This is fixable, but someone would need to step through this and fix the allocation procedure, and I don't have the time to do that now.

How should i process the data in a json/dataframe format so that is suitable for rasa chatbots

I'm new with NLP and the rasa api. I'm trying to prepare the data so that it can be used as training data for intent recognition. The function that I'm trying to use is:
from rasa_nlu.training_data import load_data #Import function
train_data_rasa=load_data('/content/data_file.json') #Json file
However the next error pop ups:
AttributeError: 'str' object has no attribute 'get'
The json file is the result of using pandas.to_json() function. The original dataset, is the ATIS flight intent dataframe in which there are two columns: The text and the intent.
Here is a preview of the json file:
{"Intent":{"0":"atis_flight","1":"atis_flight_time","2":"atis_airfare","3":"atis_airfare","4":"atis_flight","5":"atis_aircraft","6" ........
I don't really know what is going on as the dataset seems to be clean. I have also tried multiple alternatives such as markdown (md) type of file but it does not seem to work.
Thank you in advance !!
I would suggest to try the rasa data convert command (that converts your training data from json to yml format) and then try to train your data (with command rasa train from the cli) to see if you get the same error. Also, the Training Data format page in the docs might be a useful resource for you since it explains the types of training data and their expected structure. Another idea would be to post your question also on the Rasa forum where there might be more people that have encountered the same error like here. That way you might get more ideas on how to solve your issue or more people will jump in and help.

Pygsheets and AWS Lambda

I am sure the suggestions here will be to use an S3 bucket and I am aware of this. My question is a bit more difficult, from what I am gathering, in that I want to use Pygsheets, a python library, to write to a Google Sheet. However, after getting through all the deployment and layer steps... what is stopping me is a pesky .json file needs to be read by one of the functions in Pygsheets. I do believe it is reading and writing something else on the fly which may not be allowed in and of itself but I am asking regardless.
Link directly to the function that needs to be used in conjunction with the secret.json from Google: Pygsheets Github
Sample code:
print("-->Using the library pygsheets to update...")
print(f"-->Accessing client_secret.json")
gc = pygsheets.authorize(service_file='client_secret.json')
print(f"-->Opening Google Sheets")
#open the google spreadsheet
sh = gc.open_by_url('https://...')
print(f"-->Accessing")
#select the first sheet
wks = sh[0]
print(f"-->Updating selected cells... ")
#update the first sheet with df, starting at cell A11.
wks.set_dataframe(df, 'J14')
Again, I am close to my final product of automating my sheets using this script/library/lambda that I can taste it :). If the absolute best workaround is S3 please be gentle I am a first year analyst trying to get my feet wet. Superior is telling me it would take a while to hook up a connection to S3 so thats also a reason to avoid. Thanks!
Fixed. Simply added the .json creds to the deployment package. I had ran into an issue with pandas so I have a blend of layers and a deployment package with my .py script (and, again, with secret.json). Thanks!

error uploading csv file on cloud jupyter notebook

I have set up a google cloud account
I want to perform my deep learning much more faster on a jupyter notebook, but
I cannot find a way to read my csv file
I downloaded it with wget from my github account and afterwards I tried
dataset = pd.read_csv('/home/user/.jupyter/SIEMENSTRAIN.csv')
but I get the following error
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
Why? When I read it on my laptop using my jupyter notebooks, everything runs well
Any suggestions?
I tried the recommended solutions for this error and I got the next warning
/home/user/anaconda3/lib/python3.5/site-packages/ipykernel/main.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
if name == 'main':
When I ran dataset.head() this is what appeared
Any help please?
There are a number of possibilities that could be causing the problem... I would first always make sure that Pandas (pd)'s version is updated and compatible.
The more likely cause is that the CSV itself is not right, so pd.read_csv() is not able to work correctly (thus a Parse Error). This may have something to do with the headers, though I'm not sure what your original CSV file looks like. It's worth playing around with read_csv, for example:
df = pandas.read_csv(fileName, sep='delimiter', header=None)
This tampers with 2 things - the delimiter, and if pd is reading a header from CSV or not.
I go through some pd.read_csv() stuff in my book about Stock Prediction (another cool Machine Learning problem) and Deep Learning, feel free to check it out.
Good Luck!
I tried what you proposed and this is what I got
So, any suggestions?
I suppose that the path is ok, but it just won't be read properly, or am I wrong?

Resources