I have a game I am working on. There is one random seed generated at some point. At later point, I want to use this seed to derive some attributes like attack or defense of a character.
I want attributes to not be correlated to each other. One way I thought of is using hash functions like so:
const attackHex = createHash('md5')
.update(seed)
.update('attack')
.digest('hex')
const attack = Number(attackHex);
console.log('attack', attackHex, attack)
const defenseHex = createHash('md5')
.update(seed)
.update('defense')
.digest('hex')
const defense = Number(defenseHex);
console.log('defense', defenseHex, defense)
Outputs:
attack 73341812d1bd6fc73c022b4971618c27 NaN
defense 0620fbd637b7cf2f7d83dc3c8d5f8528 NaN
But the number conversion is not too happy... I guess it is too big of a number.
I would appreciate any help.
Oooh I should do const defense = Number('0x' + defenseHex);... I also end up making the number smaller by:
const defense = Number('0x' + defenseHex.slice(0, 6));
Related
I am building an app around GPT-3, and I would like to know how much tokens every request I make uses. Is this possible and how ?
Counting Tokens with Actual Tokenizer
To do this in python, first install the transformers package to enable the GPT-2 Tokenizer, which is the same tokenizer used for [GPT-3]:
pip install transformers
Then, to tokenize the string "Hello world", you have a choice of using GPT2TokenizerFast or GPT2Tokenizer.
from transformers import GPT2TokenizerFast\
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])
or
from transformers import GPT2Tokenizer\
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])
In either case, tokenizer() produces a python list of token representing the string, which can the be counted with len(). The documentation doesn't mention any differences in behavior between the two methods. I tested both methods on both text and code and they gave the same numbers. The from_pretrained methods are unpleasantly slow: 28s for GPT2Tokenizer, and 56s for GPT2TokenizerFast. The load time dominates the experience, so I suggest NOT the "fast" method. (Note: the first time you run either of the from_pretrained methods, a 3MB model will be downloaded and installed, which takes a couple minutes.)
Approximating Token Counts
The tokenizers are slow and heavy, but approximations can be to go back and forth between them, using nothing but the number of characters or tokens. I developed the following approximations by observing the behavior of the GPT-2 tokenizer. They hold well for English text and python code. The 3rd and 4th functions are perhaps the most useful since they let us quickly fit a text in the GPT-3's token limit.
import math
def nchars_to_ntokens_approx(nchars):
#returns an estimate of #tokens corresponding to #characters nchars
return max(0,int((nchars - init_offset)*math.exp(-1)))
def ntokens_to_nchars_approx(ntokens):
#returns an estimate of #characters corresponding to #tokens ntokens
return max(0,int(ntokens*math.exp(1) ) + 2 )
def nchars_leq_ntokens_approx(maxTokens):
#returns a number of characters very likely to correspond <= maxTokens
sqrt_margin = 0.5
lin_margin = 1.010175047 #= e - 1.001 - sqrt(1 - sqrt_margin) #ensures return 1 when maxTokens=1
return max( 0, int(maxTokens*math.exp(1) - lin_margin - math.sqrt(max(0,maxTokens - sqrt_margin) ) ))
def truncate_text_to_maxTokens_approx(text, maxTokens):
#returns a truncation of text to make it (likely) fit within a token limit
#So the output string is very likely to have <= maxTokens, no guarantees though.
char_index = min( len(text), nchars_leq_ntokens_approx(maxTokens) )
return text[:char_index]
OPEN-AI charges GPT-3 usage through tokens, this counts both the prompt and the answer. For OPEN-AI 750 words would have an equivalent of around 1000 tokens or a token to word ratio of 1.4 . Pricing of the token depends of the plan you are on.
I do not know of more accurate ways of estimating cost. Perhaps using GPT-2 tokenizer from Hugging face can help. I know the tokens from the GPT-2 tokenizer are accepted when passed to GPT-3 in the logit bias array, so there is a degree of equivalence between GPT-2 tokens and GPT-3 tokens.
However GPT-2 and GPT-3 models are different and GPT-3 famously has more parameters than GPT-3 so GPT-2 estimations are probably lower token wise. I am sure you can write a simple program that estimates the price by comparing prompts and token usage, but that might take some time.
Here is an example from openai-cookbook that worked perfectly for me:
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
num_tokens_from_string("tiktoken is great!", "gpt2")
>6
Code to count how much tokens a GPT-3 request used:
def count_tokens(input: str):
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
res = tokenizer(input)['input_ids']
return len(res)
print(count_tokens("Hello world"))
I have two text files. One is a list(key-value pairs) of items and the other is a input file that the key-value pairs are to be matched. If a match is found it is marked with its corresponding value in the input file.
For example:
my list file:
food = ###food123
food microbiology = ###food mircobiology456
mirco organism = ###micro organims789
geo tagging = ###geo tagging614
gross income = ###gross income630
fermentation = fermentation###929
contamination = contamination##878
Salmonella species = Salmonella species###786
Lactic acid bacteria = Lactic acid bacteria###654
input file:
There are certain attributes necessary for fermentation of meat.
It should be fresh, with low level of contamination, the processing should be hygienic and the refrigeration should be resorted at different stages.
Elimination of pathogens like Coliform, Staphylococci, Salmonella species may be undertaken either by heat or by irradiation. There is need to lower the water activity and this can be achieved by either drying or addition of the salts.
Inoculation with an effective, efficient inoculum consisting of Lactic acid bacteria and, or Micrococci which produces lactic acid and also contributes to the flavor development of the product.
Effective controlled time, temperature humidity during the production is essential.
And, Salt ensures the low pH value and extends the shelf-life of the fermented meats like Sausages.
Expected Output:
There are certain attributes necessary for ((fermentation###929)) of meat.
It should be fresh, with low level of ((contamination##878)), the processing should be hygienic and the refrigeration should be resorted at different stages.
Elimination of pathogens like Coliform, Staphylococci, ((Salmonella species###786)) may be undertaken either by heat or by irradiation. There is need to lower the water activity and this can be achieved by either drying or addition of the salts.
Inoculation with an effective, efficient inoculum consisting of ((Lactic acid bacteria###654)) and, or Micrococci which produces lactic acid and also contributes to the flavor development of the product.
Effective controlled time, temperature humidity during the production is essential.
And, Salt ensures the low pH value and extends the shelf-life of the fermented meats like Sausages.
For this I am using python3, parsing the list file, and storing it in a hash. Hash has all the elements of the list as key-value pairs. Then each line of input file is matched with all keys present in hash and when a match is found the corresponding hash value is replaced as shown in the output.
This method works fine when the size of input and list is small, but when both the list and input size grows its taking lot of time.
How can I improve the time complexity of this matching method?
Algorithm I am using :
#parse list and store in hash
for l in list:
ll = l.split("=")
hash[ll[0]] = ll[1]
#iterate input and match with each key
keys = hash.keys()
for line in lines:
if(line != ""):
for key in keys:
my_regex = r"([,\"\'\( \/\-\|])" + key + r"([ ,\.!\"ред\'\/\-)])"
if((re.search(my_regex, line, re.IGNORECASE|re.UNICODE))):
line = re.sub(my_regex, r"\1" + "((" + hash[key] + "))" + r"\2",line)
I wanted to calculate the time differences (in minutes). However the data I gotten is not using a conventional time format, it is using the following format "yyyy-mm-dd-HH-MM-ss" in UTC time. I can't use this directly in moments or other library as seems. What is the recommendations to handle this specific format?
How can I use library such as "moment" to calculate the time differences with this time format from my data?
Not sure but possibly try it with moment, something like:
const moment = require('moment');
const yourSpecialFormat = 'YYYY-MM-DD-HH-mm-ss';
const someDateInYourFormat = '2020-02-22-05-58-57';
let now = moment(moment().format(yourSpecialFormat), yourSpecialFormat);
let then = moment(someDateInYourFormat, yourSpecialFormat);
console.log('Hours ----> ', moment.duration(now.diff(then)).asHours());
console.log('Minutes ----> ', moment.duration(now.diff(then)).asMinutes ());
I am testing an ERC20 Token on the Rinkeby testnet.
I am sending transfer transactions of 1e23 units.
The response from web3 says
I have tried converting the amount to a string using the Javascript toString method.
And converting with web3.utils.toHex().
Both return errors
dat=token.methods.transfer(w3.utils.toHex(to),web3.utils.toHex(amount)).encodeABI()
/*
OR
dat=token.methods.transfer(w3.utils.toHex(to),web3.utils.toHex(amount)).encodeABI()
*/
w3.eth.sendTransaction({from:from,to:TOKEN_ADDRESS,data:dat,gas:gasLimit()},(err,txhash)=>{
if(err) throw err
console.log(txhash)
callback(txhash)
})
Uncaught Error: Please pass numbers as strings or BigNumber objects to avoid precision errors.
TLDR
Use the built-in util functions to convert ether to wei:
var amount = web3.utils.toWei('1000000','ether');
Old answer below:
Literally just follow the advise of the error.
The to number should initially be of type string because the number type in javascript is too small to store addresses.
If the amount starts of as a reasonable number then convert it to BigNumber using a bignumber library. Web3 internally uses bn.js as its bignumber library so for full compatibility you should also use the same but bignum is also compatible in my experience:
const BN = require('bn.js');
token.methods.transfer(new BN(to),new BN(amount)).encodeABI()
Based on your comment it appears you are trying to pass 1e+24 as a number. The problem is it is too large to fit in a double without losing precision. Web3 is refusing to use the number because it has already lost precision even before web3 has a chance to process it. The fix is to use a string instead:
var amount = '1000000000000000000000000';
token.methods.transfer(to,amount).encodeABI()
If you really don't want to type 24 zeros you can use string operations:
var amount = '1' + '0'.repeat(24);
Or if this is amount is really a million ethers it's better to use the built-in util functions to show what you really mean:
var amount = web3.utils.toWei('1000000','ether');
I know this is old but i was having troubble with some tests for solidity using chai and i added this comment:
/* global BigInt */
With that you can use big numbers
const testValue = 2000000000000000000n;
I'm new to numpy, have googled a lot, but it is hard for me (at the moment) to speed my code more up. I optimized my code as much as I could using #profile and numba. But my code is still very slow for a large number of documents and it needs a lot of memory space. I'm pretty sure I'm not using numpy the right (fast) way. Because I want to learn, I hope some of you can help me improving my code.
My whole code you can find on:
my code on bitbucket
The very slow part is the log-entropy-weight calculation in the file CreateMatrix.py (create_log_entropy_weight_matrix and __create_np_p_ij_matrix_forLEW)
The profiling result of the two methods you can view here
Here the two methods:
#profile
#jit
def create_log_entropy_weight_matrix(self, np_freq_matrix_ordered):
print(' * Create Log-Entropy-Weight-Matrix')
np_p_ij_matrix = self.__create_np_p_ij_matrix_forLEW(np_freq_matrix_ordered)
np_p_ij_matrix_sum = np_p_ij_matrix.sum(0)
np_log_entropy_weight_matrix = np.zeros(np_freq_matrix_ordered.shape, dtype=np.float32)
n_doc = int(np_freq_matrix_ordered.shape[0])
row_len, col_len = np_freq_matrix_ordered.shape
negative_value = False
for col_i, np_p_ij_matrix_sum_i in enumerate(np_p_ij_matrix_sum):
for row_i in range(row_len):
local_weight_i = math.log(np_freq_matrix_ordered[row_i][col_i] + 1)
if not np_p_ij_matrix[row_i][col_i]:
np_log_entropy_weight_matrix[row_i][col_i] = local_weight_i
else:
global_weight_i = 1 + (np_p_ij_matrix_sum_i / math.log(n_doc))
np_log_entropy_weight_matrix[row_i][col_i] = local_weight_i * global_weight_i
# if np_log_entropy_weight_matrix[row_i][col_i] < 0:
# negative_value = True
#print(' - - test negative_value:', negative_value)
return(np_log_entropy_weight_matrix)
##profile
#jit
def __create_np_p_ij_matrix_forLEW(self, np_freq_matrix_ordered):
np_freq_matrix_ordered_sum = np_freq_matrix_ordered.sum(0)
np_p_ij_matrix = np.zeros(np_freq_matrix_ordered.shape, dtype=np.float32)
row_len, col_len = np_freq_matrix_ordered.shape
for col_i, ft_freq_sum_i in enumerate(np_freq_matrix_ordered_sum):
for row_i in range(row_len):
p_ij = division_lew(np_freq_matrix_ordered[row_i][col_i], ft_freq_sum_i)
if p_ij:
np_p_ij_matrix[row_i][col_i] = p_ij * math.log(p_ij)
return(np_p_ij_matrix)
</code>
Hope someone can help me to improve my code :)
Here's a stab a removing one level of iteration:
doc_log = math.log(n_doc)
local_weight = np.log(np_freq_matrix_ordered + 1)
for col_i, np_p_ij_matrix_sum_i in enumerate(np_p_ij_matrix_sum):
local_weight_j = local_weight[:, col_i]
ind = np_p_ij_matrix[:, col_i]>0
local_weight_j[ind] *= 1 + np_p_ij_matrix_sum_i[ind] / doc_log
np_log_entropy_weight_matrix[:, col_i] = local_weight_j
I haven't run any tests; I just read through your code and replaced things that were unnecessarily iterative.
Without fully understanding your code it looks like it is performing things that can be done on the whole array at one - *, +, log, etc. The only if is avoiding log(0). I replaced one if with the ind masking.
The variable names are long and descriptive. At some level that is good, but it often is easier to read code with shorter names. It takes more concentration to distinguish np_p_ij_matrix from np_p_ij_matrix_sum_i than to distinguish x from y.
Notice I also replaced the [][] indexing with [,] style. Not necessarily faster, but easier to read.
But I haven't used numba enough to know where these changes improve its response. numba lets you get by with an iterative style of coding that makes an old-time MATLAB coder blanch.