Presto fails to read hexadecimal string: Not a valid base-16 number - presto

Is there a way for presto to check if a string is hex or not? I have the following query keeps failing:
from_base(hexstring, 16)
with error
> /usr/local/lib/python3.7/dist-packages/pyhive/presto.py in _process_response(self, response)
> 347 self._state = self._STATE_FINISHED
> 348 if 'error' in response_json:
> --> 349 raise DatabaseError(response_json['error'])
> 350
> 351
>
> DatabaseError: {'message': 'Not a valid base-16 number:
> ffffffffffdfae90', 'errorCode': 7, 'errorName':
> 'INVALID_FUNCTION_ARGUMENT', 'errorType': 'USER_ERROR', 'failureInfo':
> {'type': 'io.prestosql.spi.PrestoException', 'message': 'Not a valid
> base-16 number: ffffffffffdfae90', 'cause': {'type':
> 'java.lang.NumberFormatException', 'message': 'For input string:
> "ffffffffffdfae90"', 'suppressed': [], 'stack':
>
However, python is ok with the string:
int('ffffffffffdfae90',16)
returns
18446744073707433616

from_base returns BIGINT which can hold up to 2^63 - 1 i.e. 9223372036854775807 which is less then 18446744073707433616 while python's int is undounded, so this particular number is just too big for Presto.

Related

Groovy Script to find a string in console output and make the build failure

I have the following entry in the Jenkins console "Output: × 35 of 45 failed (78%) 06:13 247 3 38 66 140", I need to grep the number 78 and make the build failure if the number is >=30 with Groovy script.
Can someone please help me with this?
def result= manager.logContains('%)')
println result*.toString() // Read a line which has the failed test cases count
String test = result
def failedtest = test.substring(test.indexOf("%") - 2)
def failednumber = failedtest.split("%")[0].toInteger() as int
println failednumber

type error in functions to run point in polygon query on RAPIDS

I want to create a point in polygon query for 14million NYC taxi trips and find out which of the 263 taxi zones the trips were located.
I want to the code on RAPIDS cuspatial. I read a few forums and posts, and came across cuspatial polygon limitations that users can only perform queries on 32 polygons in each run. So I did the following to split my polygons in batches.
This is my taxi zone polygon file
cusptaxizone
(0 0
1 1
2 34
3 35
4 36
...
258 348
259 349
260 350
261 351
262 353
Name: f_pos, Length: 263, dtype: int32,
0 0
1 232
2 1113
3 1121
4 1137
...
349 97690
350 97962
351 98032
352 98114
353 98144
Name: r_pos, Length: 354, dtype: int32,
x y
0 933100.918353 192536.085697
1 932771.395560 191317.004138
2 932693.871591 191245.031174
3 932566.381345 191150.211914
4 932326.317026 190934.311748
... ... ...
98187 996215.756543 221620.885314
98188 996078.332519 221372.066989
98189 996698.728091 221027.461362
98190 997355.264443 220664.404123
98191 997493.322715 220912.386162
[98192 rows x 2 columns])
There are 263 polygons/ taxi zones in total - I want to do queries in 24 batches and 11 polygons in each iteration.
def create_iterations(start, end, batches):
iterations = list(np.arange(start, end, batches))
iterations.append(end)
return iterations
pip_iterations = create_iterations(0, 264, 24)
#loop to do point in polygon query in a table
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
cuda_df['borough'] = " "
for i in range(len(iter_batch)-1):
start = pip_iterations[i]
end = pip_iterations[i+1]
pip = cuspatial.point_in_polygon(cuda_df['pickup_longitude'], cuda_df['pickup_latitude'],
cuspatial_data[0][start:end], #poly_offsets
cuspatial_data[1], #poly_ring_offsets
cuspatial_data[2]['x'], #poly_points_x
cuspatial_data[2]['y'] #poly_points_y
)
for i in pip.columns:
cuda_df['borough'].loc[pip[i]] = polygon_name[i]
return cuda_df
When I ran the function I received a type error. I wonder what might cause the issue?
pip_pickup = perform_pip(cutaxi, cusptaxizone, pip_iterations)
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
It seems like you are passing in cutaxi for cuda_df, cusptaxizone for cuspatial_data and pip_iterations for polygon_name variable in perform_pip function. There is no variable/value passed for iter_batch defined in perform_pip function:
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
Hence, you get the above error which states that iter_batch is missing. As stated in the above comment as well you are not passing the right number of parameters for perform_pip function.
If you edit your code to pass in the right number of variables to perform_pip function the above mentioned error :
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
would be resolved.

re.sub : How to solve TypeError: expected string or bytes-like object

i have a dataframe called tweet of the following types:
Id Text
0 1281015183687720961 #AngelaRuchTruck has #BubbaWallace beat, by fa...
1 1281015160803667968 I’m an old, white male. I marched in the 60s a...
2 1281014374744891392 This is me and I am saying #EnoughIsEnoughNS L...
3 1281014363193819139 The Ultimate Fighter Finale! Join in on the fu...
4 1281014339433095169 This #blm $hit is about done
... ... ...
12529 1279207822207725569 First thing I see, getting here #BLM #BLMDC #B...
12530 1279206857253543936 So here’s a thought for all of you #BLM people...
12531 1279206802035539969 #campingworld #Hamilton #BreakTheSilenceForSus...
12532 1279205845474127872 #Day 3.168 . . #artmenow #drawmenow #nodapl #n...
12533 1279205399535792128 Oh but wait ....... Breonna Taylor! #BreonnaTa...
I am trying to clean the text tweet['Text'] using the following code
tweet['cleaned_text'] = re.sub(r"(?:\#RT|http?\://|https?\://|www)\S+", "", tweet['Text'])
tweet['cleaned_text']= re.sub(r'^RT[\s]+', '', tweet['cleaned_text']))
But i get this error:
~\AppData\Local\Continuum\anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
190 a callable, it's passed the Match object and must return
191 a replacement string to be used."""
--> 192 return _compile(pattern, flags).sub(repl, string, count)
193
194 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
A suggested answer is to use the following code:
cleaned = []
txt = list(tweet['Text'])
for i in txt:
cleaned.append(re.sub(r"(?:\#RT|http?\://|https?\://|www)\S+", "", i))
tweet['cleaned_text'] = cleaned
the code works fine. However, tweet['cleaned_text'] is still not a string. For example when I use the following code:
Blobtweet = TextBlob(tweet["cleaned_text"])
I get this error
~\AppData\Local\Continuum\anaconda3\lib\site-packages\textblob\blob.py in __init__(self, text, tokenizer, pos_tagger, np_extractor, analyzer, parser, classifier, clean_html)
368 if not isinstance(text, basestring):
369 raise TypeError('The `text` argument passed to `__init__(text)` '
--> 370 'must be a string, not {0}'.format(type(text)))
371 if clean_html:
372 raise NotImplementedError("clean_html has been deprecated. "
TypeError: The `text` argument passed to `__init__(text)` must be a string, not <class 'pandas.core.series.Series'>
###########
or
text=tweet['cleaned_text']
text = text.lower()
tokens = tokenizer.tokenize(text)
I get the following error:
AttributeError: 'Series' object has no attribute 'lower'
All those examples worked fine when i have a string
tweet['cleaned_text'] returns a column, not a string, you have to iterate throuh each element of the column.
cleaned = []
txt = list(tweet['Text'])
for i in txt:
t = re.sub(r"(?:\#RT|http?\://|https?\://|www)\S+", "", i)
cleaned.append(re.sub(r'^RT[\s]+', '', t))
tweet['cleaned_text'] = cleaned

with select value in one column:SyntaxError: invalid syntax

this is my code
url = "E:\dataset\state_dataset\drug.csv"
dataframe = read_csv(url)
df=dataframe.loc[:,['Product Name','Number of Prescriptions','Total Amount Reimbursed','Medicaid Amount Reimbursed']]
df[(df.Number of Prescriptions >= 100)]
and I faced the error
File "", line 10
df[(df.Number of Prescriptions >= 100)]
^
SyntaxError: invalid syntax
please how can I fixed this error

Comparing two values evaluates to false instead of true

I'm using Node.js with Express.js and Redis. I'm recording the uptime of a site component by incrementing a redis key. I want to update the uptimerecord:tracker key once the current uptime > the current uptime record but somehow it's not updating it and evaluating uptimeTracker > uptimeRecordTracker with false even though it's true.
Is there anything I'm missing?
Thanks!
db.get("uptime:tracker", function(err, uptimeTracker) {
db.get("uptimerecord:tracker", function(err, uptimeRecordTracker) {
console.log("[Stats] uptimeTracker: " + uptimeTracker)
console.log("[Stats] uptimeRecordTracker: " + uptimeRecordTracker)
console.log("[Stats] Compare: " + (uptimeTracker > uptimeRecordTracker))
if(uptimeTracker > uptimeRecordTracker) {
console.log("[Stats] Tracker Records updated")
db.set('uptimerecord:tracker', uptimeTracker)
}
});
});
The console output:
[Stats] uptimeTracker: 213
[Stats] uptimeRecordTracker: 99
[Stats] Compare: false
It looks like you're comparing strings instead of integers, in fact:
"213" > "99" == false
while
213 > 99 == true
Try converting them to integers before doing the comparison:
parseInt(uptimeTracker) > parseInt(uptimeRecordTracker)

Resources