How to load Stanfordnlp pipeline without printing the load processor messages

How to load Stanfordnlp pipeline without printing the load processor messages - python-3.x

I am trying to get dependency relations for words using Stanfordnlp. I have downloaded the English models and able to load the models to get the dependency relations for the words in the text. However, it will also print the whole load process messages.
Sample code:
import stanfordnlp
config = {
'processors': 'tokenize,pos,lemma,depparse', # Comma-separated list of processors to use
'lang': 'en', # Language code for the language to build the Pipeline in
'tokenize_model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_tokenizer.pt',
'pos_model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_tagger.pt',
'pos_pretrain_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt.pretrain.pt',
'lemma_model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_lemmatizer.pt',
'depparse_model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_parser.pt',
'depparse_pretrain_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt.pretrain.pt'
}
text = 'The weather is nice today.'
# This downloads the English models for the neural pipeline
nlp = stanfordnlp.Pipeline(**config) # This sets up a default neural pipeline in English
doc = nlp(text)
doc.sentences[0].print_dependencies()
>>>
Use device: cpu
---
Loading: tokenize
With settings:
{'model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_tokenizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: pos
With settings:
{'model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_tagger.pt', 'pretrain_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
---
Loading: lemma
With settings:
{'model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_lemmatizer.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
[Running seq2seq lemmatizer with edit classifier]
---
Loading: depparse
With settings:
{'model_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt_parser.pt', 'pretrain_path': 'C:\\path\\stanfordnlp_resources\\en_ewt_models\\en_ewt.pretrain.pt', 'lang': 'en', 'shorthand': 'en_ewt', 'mode': 'predict'}
Done loading processors!
---
('The', '2', 'det')
('weather', '4', 'nsubj')
('is', '4', 'cop')
('nice', '0', 'root')
('today', '4', 'obl:tmod')
('.', '4', 'punct')
I installed Stanfordnlp using Anaconda and working with Jupyter notebooks. Is there a way to skip the messages as I only need the dependencies.

if you only want to get rid of those lines in Jupyter notebooks, you can simply clear the outputs right after calling the pipeline;
from IPython.display import clear_output
...
nlp = stanfordnlp.Pipeline(**config)
clear_output()

Related

API to pull the list of supported languages in AWS Translate

I am currently working on a project where I need to translate Customer comments into English from the source language on AWS. It is easy to do so using AWS Translate but before I call translate API to translate text into English, I want to check whether the source language is supported by AWS or not?
One solution is to put all the language codes supported by AWS Translate into a list and then check source language against this list. This is easy but it is going to be messy and I want to make it more dynamic.
So, I am thinking to code like this
import boto3
def translateUserComment(source_language):
translate = boto3.client(service_name='translate', region_name='region', use_ssl=True)
languages_supported = tanslate.<SomeMethod>()
if source_language in languages_supported:
result = translate.translate_text(Text="Hello, World",
SourceLanguageCode=source_language, TargetLanguageCode="en")
print('TranslatedText: ' + result.get('TranslatedText'))
print('SourceLanguageCode: ' + result.get('SourceLanguageCode'))
print('TargetLanguageCode: ' + result.get('TargetLanguageCode'))
else:
print("The source language is not supported by AWS Translate")
Problem is that I am not able to find out any API call to get the list of languages/language codes supported by AWS Translate for place.
Before I posted this question,
I have tried to search for similar questions on stackoverflow
I have gone through the AWS Translate Developer guide but still no luck
Any suggestion/ redirection to the right approach is highly appreciated.

Currently there is no API for this service, although this code would work, in this code a class is created Translate_lang with all the language codes and country wise
from here-> https://docs.aws.amazon.com/translate/latest/dg/what-is.html
, you can call this class into your program and use it by creating an instance of the class:
translate_lang_check.py
class Translate_lang:
def __init__(self):
self.t_lang = {'Afrikaans': 'af', 'Albanian': 'sq', 'Amharic': 'am',
'Arabic': 'ar', 'Armenian': 'hy', 'Azerbaijani': 'az',
'Bengali': 'bn', 'Bosnian': 'bs', 'Bulgarian': 'bg',
'Catalan': 'ca', 'Chinese (Simplified)': 'zh',
'Chinese (Traditional)': 'zh-TW', 'Croatian': 'hr',
'Czech': 'cs', 'Danish': 'da ', 'Dari': 'fa-AF',
'Dutch': 'nl ', 'English': 'en', 'Estonian': 'et',
'Farsi (Persian)': 'fa', 'Filipino Tagalog': 'tl',
'Finnish': 'fi', 'French': 'fr', 'French (Canada)': 'fr-CA',
'Georgian': 'ka', 'German': 'de', 'Greek': 'el', 'Gujarati': 'gu',
'Haitian Creole': 'ht', 'Hausa': 'ha', 'Hebrew': 'he ', 'Hindi': 'hi',
'Hungarian': 'hu', 'Icelandic': 'is', 'Indonesian': 'id ', 'Italian': 'it',
'Japanese': 'ja', 'Kannada': 'kn', 'Kazakh': 'kk', 'Korean': 'ko',
'Latvian': 'lv', 'Lithuanian': 'lt', 'Macedonian': 'mk', 'Malay': 'ms',
'Malayalam': 'ml', 'Maltese': 'mt', 'Mongolian': 'mn', 'Norwegian': 'no',
'Persian': 'fa', 'Pashto': 'ps', 'Polish': 'pl', 'Portuguese': 'pt',
'Romanian': 'ro', 'Russian': 'ru', 'Serbian': 'sr', 'Sinhala': 'si',
'Slovak': 'sk', 'Slovenian': 'sl', 'Somali': 'so', 'Spanish': 'es',
'Spanish (Mexico)': 'es-MX', 'Swahili': 'sw', 'Swedish': 'sv',
'Tagalog': 'tl', 'Tamil': 'ta', 'Telugu': 'te', 'Thai': 'th',
'Turkish': 'tr', 'Ukrainian': 'uk', 'Urdu': 'ur', 'Uzbek': 'uz',
'Vietnamese': 'vi', 'Welsh': 'cy'}
def check_lang_in_translate(self, given_country):
if given_country in self.t_lang:
return self.t_lang[given_country]
else:
return None
def check_lang_code_in_translate(self, given_lang):
if given_lang in list(self.t_lang.values()):
return True
else:
return False
You can call and check your lang codes using the methods of this class:
from translate_lang_check import Translate_lang
tl = Translate_lang()
print(tl.check_lang_code_in_translate('en'))

HuggingFace - GPT2 Tokenizer configuration in config.json

The GPT2 finetuned model is uploaded in huggingface-models for the inferencing
Below error is observed during the inference,
Can't load tokenizer using from_pretrained, please update its configuration: Can't load tokenizer for 'bala1802/model_1_test'. Make sure that: - 'bala1802/model_1_test' is a correct model identifier listed on 'https://huggingface.co/models' - or 'bala1802/model_1_test' is the correct path to a directory containing relevant tokenizer files
Below is the configuration - config.json file for the Finetuned huggingface model,
{
"_name_or_path": "gpt2",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_inner": null,
"n_layer": 12,
"n_positions": 1024,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"transformers_version": "4.3.2",
"use_cache": true,
"vocab_size": 50257
}
Should I configure the GPT2 Tokenizer just like the "model_type": "gpt2" in the config.json file

Your repository does not contain the required files to create a tokenizer. It seems like you have only uploaded the files for your model. Create an object of your tokenizer that you have used for training the model and save the required files with save_pretrained():
from transformers import GPT2Tokenizer
t = GPT2Tokenizer.from_pretrained("gpt2")
t.save_pretrained('/SOMEFOLDER/')
Output:
('/SOMEFOLDER/tokenizer_config.json',
'/SOMEFOLDER/special_tokens_map.json',
'/SOMEFOLDER/vocab.json',
'/SOMEFOLDER/merges.txt',
'/SOMEFOLDER/added_tokens.json')

Abnormal behavior of python package eve

I have installed the eve package on my windows machine but every time I shutdown the machine and try to load the eve package I get module not found error.
On re-installation attempt(Btw I used the latest pip version to install), I get
from eve import Eve
app=Eve()
app.run()
The error points to the second line.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-79-46d1b24866c8> in <module>()
30 # host = '127.0.0.1'
31
---> 32 app = Eve()
33 # app.run()
34
~\AppData\Local\Continuum\anaconda3\lib\site-packages\eve\flaskapp.py in __init__(self, import_name, settings, validator, data, auth, redis, url_converters, json_encoder, media, **kwargs)
158 self.settings = settings
159
--> 160 self.load_config()
161 self.validate_domain_struct()
162
~\AppData\Local\Continuum\anaconda3\lib\site-packages\eve\flaskapp.py in load_config(self)
275
276 try:
--> 277 self.config.from_pyfile(pyfile)
278 except:
279 raise
~\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\config.py in from_pyfile(self, filename, silent)
128 try:
129 with open(filename, mode='rb') as config_file:
--> 130 exec(compile(config_file.read(), filename, 'exec'), d.__dict__)
131 except IOError as e:
132 if silent and e.errno in (
~\AppData\Local\Continuum\anaconda3\lib\site-packages\bokeh\settings.py in <module>()
9 from os.path import join, abspath, isdir
10
---> 11 from .util.paths import ROOT_DIR, bokehjsdir
12
13
ModuleNotFoundError: No module named 'config'
Moreover, I find that there is no folder "lib" but "Lib". If this is the problem how do I rectify it?
However, the code below works but runs for microsecs, not like running a back-end server with api's:
from eve import Eve
app=Eve
app.run
The settings.py file:
# Let's just use the local mongod instance. Edit as needed.
# Please note that MONGO_HOST and MONGO_PORT could very well be left
# out as they already default to a bare bones local 'mongod' instance.
MONGO_HOST = 'localhost'
MONGO_PORT = 27017
MONGO_DBNAME = 'apitest'
# Enable reads (GET), inserts (POST) and DELETE for resources/collections
# (if you omit this line, the API will default to ['GET'] and provide
# read-only access to the endpoint).
RESOURCE_METHODS = ['GET', 'POST', 'DELETE']
# Enable reads (GET), edits (PATCH), replacements (PUT) and deletes of
# individual items (defaults to read-only item access).
ITEM_METHODS = ['GET', 'PATCH', 'PUT', 'DELETE']
people = {
# 'title' tag used in item links.
'item_title': 'person',
# by default the standard item entry point is defined as
# '/people/<ObjectId>/'. We leave it untouched, and we also enable an
# additional read-only entry point. This way consumers can also perform GET
# requests at '/people/<lastname>/'.
'additional_lookup': {
'url': 'regex("[\w]+")',
'field': 'lastname'
},
'cache_control': 'max-age=10,must-revalidate',
'cache_expires': 10,
'resource_methods': ['GET', 'POST'],
# Schema definition, based on Cerberus grammar. Check the Cerberus project
# (https://github.com/pyeve/cerberus) for details.
'schema': {
'firstname': {
'type': 'string',
'minlength': 1,
'maxlength': 10,
},
'lastname': {
'type': 'string',
'minlength': 1,
'maxlength': 15,
'required': True,
# talk about hard constraints! For the purpose of the demo
# 'lastname' is an API entry-point, so we need it to be unique.
'unique': True,
},
# 'role' is a list, and can only contain values from 'allowed'.
'role': {
'type': 'list',
'allowed': ["author", "contributor", "copy"],
},
# An embedded 'strongly-typed' dictionary.
'location': {
'type': 'dict',
'schema': {
'address': {'type': 'string'},
'city': {'type': 'string'}
},
},
'born': {
'type': 'datetime',
},
}
}
DOMAIN = {
'people': people,
}
So, What could be the solution to this problem?
Any help is appreciated.

I don't have this issue after a quick test. Let me share with you all steps and let me know anything is different.
1) Enter Anaconda Prompt
2) conda create -n eswar python=3.6
3) conda activate eswar
4) pip install eve
5) python
5.1) import eve
5.2) exit()
6) shutdown windows machine
7) restart windows machine
8) enter anaconda prompt
9) conda activate eswar
10) python
11) from eve import Eve
12) everything looks fine.
did you forget to activate your env after restart?

ModuleNotFoundError: No module named 'loglevels'

# -*- coding: utf-8 -*-
{
'name': "myfirstModel",
'summary': """
Short (1 phrase/line) summary of the module's purpose, used as
subtitle on modules listing or apps.openerp.com""",
'description': """
Long description of module's purpose
""",
'author': "My Company",
'website': "http://www.yourcompany.com",
# Categories can be used to filter modules in modules listing
# Check https://github.com/odoo/odoo/blob/master/odoo/addons/base/module/module_data.xml
# for the full list
'category': 'Uncategorized',
'version': '0.1',
# any module necessary for this one to work correctly
'depends': ['base'],
# always loaded
'data': [
# 'security/ir.model.access.csv',
'views/views.xml',
'views/templates.xml',
],
# only loaded in demonstration mode
'demo': [
'demo/demo.xml',
],
}
i got this error while creating a new module in odoo... i am unable to import 'loglevels' in pycharm...?
any help is appreciated...

YAML list of text/url pairs to be used in Grav/Twig

I would like to represent the following JSON structure in YAML.
[
{'text': 'Text1', 'url': 'Url1'},
{'text': 'Text2', 'url': 'Url2'},
{'text': 'Text3', 'url': 'Url3'},
]
I have tried without success:
-
text: Text1
url: Url1
-
text: Text2
url: Url2
-
text: Text3
url: Url3
In case it might be relevant, the structure is going to be used in Grav/Twig, although I think it is a pure YAML issue.

Grav now provide two new filters for yaml-encoding and yaml-decoding.
Example usage: {{ page.header.myarray|yaml_encode }}
These filters will be available in the upcoming 1.4 release as per this commit: https://github.com/getgrav/grav/commit/c721be8787b09aab1dce6bd012c8d43d1a985558
Hope it helps

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to load Stanfordnlp pipeline without printing the load processor messages - python-3.x

if you only want to get rid of those lines in Jupyter notebooks, you can simply clear the outputs right after calling the pipeline; from IPython.display import clear_output ... nlp = stanfordnlp.Pipeline(**config) clear_output()

Related

API to pull the list of supported languages in AWS Translate

HuggingFace - GPT2 Tokenizer configuration in config.json

Abnormal behavior of python package eve

ModuleNotFoundError: No module named 'loglevels'

YAML list of text/url pairs to be used in Grav/Twig

Categories

Resources