How to override attribute of a BERT class - nlp

I am using BertJapaneseTokenizer for an NLP project.
I am running into the error "The unidic_lite dictionary is not installed". On checking under the hood it looks from one of the elif conditions that this error will be thrown only if the variable mecab_dic is set = "unidic_lite" (please see this link https://huggingface.co/transformers/v4.11.3/_modules/transformers/models/bert_japanese/tokenization_bert_japanese.html)
Furthermore, due to some unavoidable constraints I cannot install unidic_lite module (let's just assume this is a constraint I have). However, I can install unidic. So my question is -- is there a way to do a manual override like mecab_dic = "unidic" such that elif statement for unidic_lite is bypassed altogether?
(relatedly, we can see from the constructor of MecabTokenizer that mecab_dic should have default value as "ipadic", then why is it set as mecab_dic set = 'unidic_lite' anyway!?)
Any suggestion would be super helpful. Thanks!
(P.S: I am not super tech-savvy so this was attempt at a crude workaround)
What I tried: downloaded the tokenization_bert_japanese.py code to my local machine and created a static variable mecab_dic = "unidic" in it and tried to import BertJapanesTokenizer from this customized python script instead of the standard "from transformers import BertJapaneseTokenzier"

Related

Azure ML release bug AZUREML_COMPUTE_USE COMMON_RUNTIME

On 2021-10-13 in our application in Azure ML platform we get this new error that causes failures in pipeline steps - python module import failures - warning stack <- warning that leads to pipeline runtime error
we needed to set it to false. Why is it failing? What exactly are exact (and long term) consequences when opting out? Also, Azure ML users - do you think it was rolled out appropriately?
Try to add into your envirnoment new variable like this:
environment.environment_variables = {"AZUREML_COMPUTE_USE_COMMON_RUNTIME":"false"}
Long term (throughout 2022), AzureML will be fully migrating to the new Common Runtime on AmlCompute. Short term, this change is a large undertaking, and we're on the lookout for tricky functionality of the old Compute Runtime we're not yet handling correctly.
One small note on disabling Common Runtime, it can be more efficient (avoids an Environment rebuild) to add the environment variable directly to the RunConfig:
run_config.environment_variables["AZUREML_COMPUTE_USE_COMMON_RUNTIME"] = "false"
We'd like to get more details about the import failures, so we can fix the regression. Are you setting the PYTHONPATH environment variable to make your custom scripts importable? If so, this is something we're aware isn't working as expected and are looking to fix it within the next two weeks.
We identified the issue and have rolled out a hotfix on our end addressing the issue. There are two problems that could've caused the import issue. One is that we are overwriting the PYTHONPATH environment variable. The second is that we are not adding the python script's containing directory to python's module search path if the containing directory is not the current working directory.
It would be great if you can please try again without setting the AZUREML_COMPUTE_USE_COMMON_RUNTIME environment variable and see if the problem is still there. If it is, please reply to either Lucas's thread or mine with a minimal repro or description of where the module you are trying to import is located at in relation to the script being run and the root of the snapshot (which is the current working directory).

Viewing the parameters of the branch predictor in gem5

part question. First, how do I configure the size of a branch predictor?
I can see that I can set the type using the se.py config script and the --bp-type argument. (In my case I'm setting it to LTAGE), but how do I change the size of the tables? And is there an easy way to see the total size of all tables?
My second part, is looking at the code, I don't understand the LTAGE constructor:
LTAGE::LTAGE(const LTAGEParams *params)
: TAGE(params), loopPredictor(params->loop_predictor)
{
}
The LTAGEParams doesn't appear to be defined anywhere except here:
LTAGE*
LTAGEParams::create()
{
return new LTAGE(this);
}
How can I see what all the members of LTAGEParams are?
About the size, have a look at m5out/config.ini after running a simulation, it contains all SimObject parameters.
In the case of branch predictors, all Python-visible parameters of each implemented predictor will be defined in its respective class declaration at src/cpu/pred/BranchPredictor.py. They also inherit the parameters of their base class. For example, LTAGE has all the parameters of the TAGE class - out of which the tage object is re-assigned to be an instance of LTAGE_TAGE - and a new parameter, loop_predictor, which is a LoopPredictor instance.
You can then just set any of the values in that file from your Python config and they will be used (double check in config.ini after re-running).
The C++ constructor of SimObjects takes a param object like LTAGEParams and that object is autogenerated from the corresponding Python SimObject file, and passes values from Python configs to C++ using pybind11.
gem5 has a lot of code generation, so whenever you can't find a definition, grep inside the build directory. This is also why I recommend setting up Eclipse inside the build directory: How to setup Eclipse IDE for gem5 development?
SimObject autogeneration is described further at: https://cirosantilli.com/linux-kernel-module-cheat/#gem5-python-c-interaction

Unable to change model parameter in the classic_control environments in OpenAI gym

I am working with the CartPole-v1 environment and I am trying to change some of the model parameters (such as force_mag) in order to test the robustness of my algorithms w.r.t model variations. I am trying to do so with the following code:
env = gym.make('myCartPole-v1)
env.force_mag = -10.0 # nominal is +10.0
but I do not see any change in the model behavior, while testing it with my learnt policy (which should fail/worsen, but it does not). Nothing happens when changing other parameters (e.g.
masscart, length...). Am I missing something? What is the best way to produce local changes to the models?
The problem was solved by creating an instance of the environment and by calling env.unwrapped. The code in the example should look like: env = gym.make('myCartPole-v1).unwrapped

Executing a python script in snaplogic

I am trying to run a python script through script snap in snaplogic. I am facing some issues where it ask me to declare a script hook variable. Can you please help me on that.
With the script snap you should use the "Edit Script" button on the snap itself. This will open a script editor and generate a skeleton script in the language you've selected (Py in this case).
In the skeleton you can see the baseline methods and functions we define. In there you can see the usage and comments of the scripthook var. If you have an existing script I would recommend trying to write it into this skeleton's execute method than trying to implement scripthook in your existing code. You can also define your own methods and functions within the confines of the skeleton class and reference them with "this." notation.
For faster answers on SnapLogic related questions I'd recommend visiting the SnapLogic Community site.
As explained by #dwhite0101, within Script Snap when you click Edit Script you get an option to generate code template.
ScriptHook is an interface that is implemented as callback mechanism for Script Snap to call into the script.
It helps you to deal with input and output rows. The constructor below initializes the input, output, error and log variables.
self object is similar to this in c++ that holds the current row values.
class TransformScript(ScriptHook):
def __init__(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log
You can perform transformations in execute method:
def execute(self):
self.log.info("Executing Transform script")
while self.input.hasNext():
in_doc = self.input.next()
wrapper = java.util.HashMap()
for field in in_doc:
#your code
Next step is to store your results in an object and output it:
wrapper['original'] = result
self.output.write(result, wrapper)
Make sure to indent your code properly.

How to get SMAC3 working for Python 3x on Windows

This is a great package for Bayesian optimization of hyperparameters (especially mixed integer/continuous/categorical...and has shown to be better than Spearmint in benchmarks). However, clearly it is meant for Linux. What do I do...?
First you need to download swig.exe (the whole package) and unzip it. Then drop it somewhere and add the folder to path so that the installer for SMAC3 can call swig.exe.
Next, the Resource module is going to cause issues because that is only meant for Linux. That is specifically used by Pynisher. You'll need to comment out import pynisher in the execute_func.py module. Then, set use_pynisher:bool=False in the def __init__(self...) in the same module. The default is true.
Then, go down to the middle of the module where an if self.use_pynisher....else statement exists. Obviously our code now enters the else part, but it is not setup correctly. Change result = self.ta(config, **obj_kwargs) to result = self.ta(list(config.get_dictionary().values())). This part may need to be adjusted yet depending on what kind of inputs your function handles, but essentially you can see that this will enable the basic example shown in the included branin_fmin.py module. If doing the random forest example, don't change at all...etc.

Resources