spark sql 1.x. NotImplementedError: no parse rules - apache-spark
when I run some query by spark sql(hivecontext) it complains like below.
which syntax will cause this?
I am using spark 1.6 and hive 1.2
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Unsupported language features in query: SELECT i.*,
from_unixtime(unix_timestamp('20170221','yyyyMMdd'),"yyyy-MM-dd'T'HH:mm:ssZ") bounce_date
FROM
(SELECT country,
device_id,
os_name,
app_ver
FROM jpl_band_orc
WHERE yyyymmdd='20170221'
AND scene_id='app_intro'
AND action_id='scene_enter'
AND classifier='app_intro'
GROUP BY country, device_id, os_name, app_ver ) i
LEFT JOIN
(SELECT device_id
FROM jpl_band_orc
WHERE yyyymmdd='20170221'
AND scene_id='band_list'
AND action_id='scene_enter'
AND device_id IN
(SELECT DISTINCT device_id
FROM jpl_band_orc x
WHERE yyyymmdd='20170221'
AND scene_id='app_intro'
AND action_id='scene_enter'
AND classifier='app_intro' ) ) s
ON i.device_id = s.device_id
WHERE s.device_id is null
TOK_QUERY 8, 0,425, 10
TOK_FROM 8, 28,412, 10
TOK_LEFTOUTERJOIN 8, 36,412, 10
TOK_SUBQUERY 8, 36,186, 10
TOK_QUERY 8, 37,182, 10
TOK_FROM 8, 91,93, 10
TOK_TABREF 8, 93,93, 10
TOK_TABNAME 8, 93,93, 10
jpl_band_orc 8, 93,93, 10
TOK_INSERT 0, -1,182, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 4, 37,84, 13
TOK_SELEXPR 4, 39,39, 13
TOK_TABLE_OR_COL 4, 39,39, 13
country 4, 39,39, 13
TOK_SELEXPR 5, 54,54, 12
TOK_TABLE_OR_COL 5, 54,54, 12
device_id 5, 54,54, 12
TOK_SELEXPR 6, 69,69, 12
TOK_TABLE_OR_COL 6, 69,69, 12
os_name 6, 69,69, 12
TOK_SELEXPR 7, 84,84, 12
TOK_TABLE_OR_COL 7, 84,84, 12
app_ver 7, 84,84, 12
TOK_WHERE 12, 100,161, 13
AND 12, 102,161, 13
AND 11, 138,138, 13
AND 10, 119,119, 13
= 9, 102,104, 19
TOK_TABLE_OR_COL 9, 102,102, 11
yyyymmdd 9, 102,102, 11
'20170221' 9, 104,104, 20
= 10, 121,123, 25
TOK_TABLE_OR_COL 10, 121,121, 17
scene_id 10, 121,121, 17
'app_intro' 10, 123,123, 26
= 11, 140,142, 26
TOK_TABLE_OR_COL 11, 140,140, 17
action_id 11, 140,140, 17
'scene_enter' 11, 142,142, 27
= 12, 159,161, 27
TOK_TABLE_OR_COL 12, 159,159, 17
classifier 12, 159,159, 17
'app_intro' 12, 161,161, 28
TOK_GROUPBY 13, 168,182, 15
TOK_TABLE_OR_COL 13, 173,173, 15
country 13, 173,173, 15
TOK_TABLE_OR_COL 13, 176,176, 24
device_id 13, 176,176, 24
TOK_TABLE_OR_COL 13, 179,179, 35
os_name 13, 179,179, 35
TOK_TABLE_OR_COL 13, 182,182, 44
app_ver 13, 182,182, 44
i 13, 186,186, 54
TOK_SUBQUERY 16, 201,391, 10
TOK_QUERY 16, 202,387, 10
TOK_FROM 16, 211,213, 10
TOK_TABREF 16, 213,213, 10
TOK_TABNAME 16, 213,213, 10
jpl_band_orc 16, 213,213, 10
TOK_INSERT 0, -1,387, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 15, 202,204, 13
TOK_SELEXPR 15, 204,204, 13
TOK_TABLE_OR_COL 15, 204,204, 13
device_id 15, 204,204, 13
TOK_WHERE 20, 220,387, 13
AND 20, 222,387, 13
AND 19, 258,258, 13
AND 18, 239,239, 13
= 17, 222,224, 19
TOK_TABLE_OR_COL 17, 222,222, 11
yyyymmdd 17, 222,222, 11
'20170221' 17, 224,224, 20
= 18, 241,243, 25
TOK_TABLE_OR_COL 18, 241,241, 17
scene_id 18, 241,241, 17
'band_list' 18, 243,243, 26
= 19, 260,262, 26
TOK_TABLE_OR_COL 19, 260,260, 17
action_id 19, 260,260, 17
'scene_enter' 19, 262,262, 27
TOK_SUBQUERY_EXPR 20, 279,387, 27
TOK_SUBQUERY_OP 20, 281,281, 27
IN 20, 281,281, 27
TOK_QUERY 22, 291,387, 12
TOK_FROM 22, 305,309, 12
TOK_TABREF 22, 307,309, 12
TOK_TABNAME 22, 307,307, 12
jpl_band_orc 22, 307,307, 12
x 22, 309,309, 25
TOK_INSERT 0, -1,385, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECTDI 21, 292,296, 24
TOK_SELEXPR 21, 296,296, 24
TOK_TABLE_OR_COL 21, 296,296, 24
device_id 21, 296,296, 24
TOK_WHERE 26, 318,385, 15
AND 26, 320,385, 15
AND 25, 360,360, 15
AND 24, 339,339, 15
= 23, 320,322, 21
TOK_TABLE_OR_COL 23, 320,320, 13
yyyymmdd 23, 320,320, 13
'20170221' 23, 322,322, 22
= 24, 341,343, 27
TOK_TABLE_OR_COL 24, 341,341, 19
scene_id 24, 341,341, 19
'app_intro' 24, 343,343, 28
= 25, 362,364, 28
TOK_TABLE_OR_COL 25, 362,362, 19
action_id 25, 362,362, 19
'scene_enter' 25, 364,364, 29
= 26, 383,385, 29
TOK_TABLE_OR_COL 26, 383,383, 19
classifier 26, 383,383, 19
'app_intro' 26, 385,385, 30
TOK_TABLE_OR_COL 20, 279,279, 17
device_id 20, 279,279, 17
s 26, 391,391, 46
= 27, 404,412, 24
. 27, 404,406, 13
TOK_TABLE_OR_COL 27, 404,404, 12
i 27, 404,404, 12
device_id 27, 406,406, 14
. 27, 410,412, 27
TOK_TABLE_OR_COL 27, 410,410, 26
s 27, 410,410, 26
device_id 27, 412,412, 28
TOK_INSERT 0, -1,425, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 1, 0,23, 7
TOK_SELEXPR 1, 2,4, 7
TOK_ALLCOLREF 1, 2,4, 7
TOK_TABNAME 1, 2,2, 7
i 1, 2,2, 7
TOK_SELEXPR 2, 11,23, 3
TOK_FUNCTION 2, 11,21, 3
from_unixtime 2, 11,11, 3
TOK_FUNCTION 2, 13,18, 17
unix_timestamp 2, 13,13, 17
'20170221' 2, 15,15, 32
'yyyyMMdd' 2, 17,17, 43
"yyyy-MM-dd'T'HH:mm:ssZ" 2, 20,20, 55
bounce_date 2, 23,23, 81
TOK_WHERE 0, 417,425, 0
TOK_FUNCTION 0, 419,425, 0
TOK_ISNULL 0, 425,425, 0
. 28, 419,421, 10
TOK_TABLE_OR_COL 28, 419,419, 9
s 28, 419,419, 9
device_id 28, 421,421, 11
scala.NotImplementedError: No parse rules for ASTNode type: 864, text: TOK_SUBQUERY_EXPR :
" +
Subquery in WHERE clause is not supported in Spark 1.6. It is supported in 2.0
Reference:
https://issues.apache.org/jira/browse/SPARK-4226
Related
Creating a TXT file and seeking a position in Python
I have given the following variables: signal1 = 'speed' bins1 = [0, 10, 20, 30, 40] signal2 = 'rpm' bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500] hist_result = [ [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], ] I want to create a .TXT file which would look like this with tab separated values: speed>= 0 10 20 30 speed< 10 20 30 40 rpm>= rpm< 0 500 1 4 5 12 500 1000 5 8 9 0 1000 1500 6 7 11 19 1500 2000 1 4 5 12 2000 2500 -5 8 9 0 2500 3000 -6 7 11 19 3000 3500 1 4 5 12 3500 4000 -5 8 9 0 4000 4500 -6 7 11 19 I have written the following code: #!/usr/bin/env python3 import os from datetime import datetime import time signal1 = 'speed' bins1 = [0, 10, 20, 30, 40] signal2 = 'rpm' bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500] hist_result = [ [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], ] filename = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{signal1}_results.TXT" with open(filename, 'w') as f: # write the bin1 range f.write('\n\n\n') f.write('\t\t\t\t') f.write(signal1 + '>=') for bin in bins1[:-1]: f.write('\t' + str(bin)) f.write('\n') f.write('\t\t\t\t') f.write(signal1 + '<') for bin in bins1[1:]: f.write('\t' + str(bin)) f.write('\n') # write the bin2 range f.write('\t\t') f.write(signal2 + '>=' + '\t' + signal2 + '<' + '\n') f.write('\t\t') # store the cursor position from where hist result will be written line by line track_cursor_pos = [] curr = bins2[0] for next in bins2[1:]: f.write(str(curr) + '\t' + str(next)) track_cursor_pos.append(f.tell()) f.write('\n\t\t') curr = next f.write('\n') print(track_cursor_pos) i = 0 # Everything is fine until here # Code below doesn't work as expected!? for result in hist_result: f.seek(track_cursor_pos[i], os.SEEK_SET) for r in result: f.write('\t' + str(r)) f.write('\n') i += 1 But, this is producing the TXT file whose contents look like this: speed>= 0 10 20 30 speed< 10 20 30 40 rpm>= rpm< 0 500 1 4 5 12 0 -5 8 9 0 00 -6 7 11 19 1 4 5 12 00 -5 8 9 0 00 -6 7 11 19 1 4 5 12 00 -5 8 9 0 00 -6 7 11 19 I think I am not using the f.seek() properly. Any suggestion would be appreciated. Thanks in advance.
You don't have to seek inside the file to print your data: signal1 = 'speed' bins1 = [0, 10, 20, 30, 40] signal2 = 'rpm' bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500] hist_result = [ [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], [1, 4, 5, 12], [-5, 8, 9, 0], [-6, 7, 11, 19], ] with open('data.txt', 'w') as f_out: print('\t{signal1}>=\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[:-1]))), file=f_out) print('\t{signal1}<\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[1:]))), file=f_out) print('{signal2}>=\t{signal2}<'.format(signal2=signal2)) for a, b, data in zip(bins2[:-1], bins2[1:], hist_result): print(a, b, *data, sep='\t', file=f_out) Creates data.txt: speed>= 0 10 20 30 speed< 10 20 30 40 rpm>= rpm< 0 500 1 4 5 12 500 1000 -5 8 9 0 1000 1500 -6 7 11 19 1500 2000 1 4 5 12 2000 2500 -5 8 9 0 2500 3000 -6 7 11 19 3000 3500 1 4 5 12 3500 4000 -5 8 9 0 4000 4500 -6 7 11 19
transform integer value patterns in a column to a group
DataFrame df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]}) df Expected output df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]}) df I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100. I tried in the line of; bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze() bs This even when broken down could only bool select start and nothing more. Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex: m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1) df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100) print (df) occurance value group 0 1 45 1 1 0 3 1 2 0 2 1 3 0 12 1 4 1 14 2 5 0 32 2 6 0 1 2 7 0 1 2 8 0 6 2 9 0 4 2 10 1 9 3 11 0 32 3 12 1 78 100 13 1 96 4 14 0 12 4 15 0 6 4 16 0 3 4
Creating a vector containing the next 10 row-column values for each pandas row
I am trying to create a vector of the previous 10 values from a pandas column and insert it back into the pandas data frame as a list in a cell. The below code works but I need to do this for a dataframe of over 30 million rows so it will take too long to do it in a loop. Can someone please help me convert this to a numpy function that I can apply. I would also like to be able to apply this function in a groupby. import pandas as pd df = pd.DataFrame(list(range(1,20)),columns = ['A']) df.insert(0,'Vector','') df['Vector'] = df['Vector'].astype(object) for index, row in df.iterrows(): df['Vector'].iloc[index] = list(df['A'].iloc[(index-10):index]) I have tried in multiple ways but have not been able to get it to work. Any help would be appreciated.
IIUC df['New']=[df.A.tolist()[max(0,x-10):x] for x in range(len(df))] df Out[123]: A New 0 1 [] 1 2 [1] 2 3 [1, 2] 3 4 [1, 2, 3] 4 5 [1, 2, 3, 4] 5 6 [1, 2, 3, 4, 5] 6 7 [1, 2, 3, 4, 5, 6] 7 8 [1, 2, 3, 4, 5, 6, 7] 8 9 [1, 2, 3, 4, 5, 6, 7, 8] 9 10 [1, 2, 3, 4, 5, 6, 7, 8, 9] 10 11 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 11 12 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 12 13 [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] 13 14 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 14 15 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 15 16 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] 16 17 [7, 8, 9, 10, 11, 12, 13, 14, 15, 16] 17 18 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17] 18 19 [9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
Image.frombytes not writing squares
I have a numpy array: [[12 13 12 5 6 5 14 4 6 11 11 10 8 11 8 11 7 8 0 0 0] [ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 7 8 0 0 0 0] [ 5 14 4 6 11 10 10 8 11 8 11 8 11 8 11 8 11 7 8 0 0] [ 5 14 4 6 11 11 10 7 8 0 0 0 0 0 0 0 0 0 0 0 0] [ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 8 11 8 11 7 8] [ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 7 8 0 0 0 0] [ 5 14 4 6 11 10 10 8 11 8 11 7 8 0 0 0 0 0 0 0 0] [ 5 14 4 6 11 11 10 1 11 1 11 7 8 0 0 0 0 0 0 0 0] [ 5 14 4 6 11 10 10 1 11 1 11 1 11 7 8 0 0 0 0 0 0] [ 5 14 4 6 11 10 10 8 11 8 11 8 11 7 8 0 0 0 0 0 0] [ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 8 11 7 7 0 0]] And a colors dictionary: {0: (0, 0, 0), 1: (17, 17, 17), 2: (34, 34, 34), 3: (51, 51, 51), 4: (68, 68, 68), 5: (85, 85, 85), 6: (102, 102, 102), 7: (119, 119, 119), 8: (136, 136, 136), 9: (153, 153, 153), 10: (170, 170, 170), 11: (187, 187, 187), 12: (204, 204, 204), 13: (221, 221, 221), 14: (238, 238, 238)} And I'm trying to write pass the array through the dictionary, then write those colors in 10x10 blocks to a .png file. So far I have: rows = [] for row in arr: for j in range(10): for col in row: for i in range(10): rows.extend(colors[col]) rows = bytes(rows) img = Image.frombytes('RGB', (110, 120), rows) img.save("generated.png") But this writes it like this: Which has lines instead of the 10x10 blocks I was trying to write. It seems to me as though the blocks are shifted somehow, but I can't figure out how to un-shift them. Why is this behavior happening?
I believe you only need to change the size parameter to obtain the result you want. Replacing this line should correct the error: # img = Image.frombytes('RGB', (110, 120), rows) img = Image.frombytes('RGB', (210, 110), rows) Size should be a 2-Tuple of the width and height of the image in pixels. The rows list you are creating is an image that is (210,110) pixels. You are drawing that to an image that is (110,120) pixels. This causes the image to break to a new row every 110 pixels. Here is a working example: from PIL import Image array = [ [12, 13, 12, 5, 6, 5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0], [5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0], [5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0], [5, 14, 4, 6, 11, 11, 10, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8], [5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 7, 8, 0, 0, 0, 0], [5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0], [5, 14, 4, 6, 11, 11, 10, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0], [5, 14, 4, 6, 11, 10, 10, 1, 11, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0], [5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0], [5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 7, 7, 0, 0], ] colors = { 0: (0, 0, 0), 1: (17, 17, 17), 2: (34, 34, 34), 3: (51, 51, 51), 4: (68, 68, 68), 5: (85, 85, 85), 6: (102, 102, 102), 7: (119, 119, 119), 8: (136, 136, 136), 9: (153, 153, 153), 10: (170, 170, 170), 11: (187, 187, 187), 12: (204, 204, 204), 13: (221, 221, 221), 14: (238, 238, 238) } rows = [] for row in array: for _ in range(10): for col in row: for _ in range(10): rows.extend(colors[col]) rows = bytes(rows) img = Image.frombytes('RGB', (210, 110), rows) img.save("generated.png")
Can we update a row in Hive Table Using Spark-SQL
Tried like this : HiveContext hiveql=new org.apache.spark.sql.hive.HiveContext(ctx); hiveql.sql("UPDATE sparkexamples.employee SET empname='Sreeharsha' WHERE empid='1210'"); Submitting Job: ./bin/spark-submit --class com.spark.examples.SparkUpdateHiveContext --master local[4] /home/hadoop/SparkHIveUpdate.jar Get the Following Error : Any suggestions please 16/07/01 11:45:38 INFO parse.ParseDriver: Parse Completed Exception in thread "main" org.apache.spark.sql.AnalysisException: Unsupported language features in query: UPDATE sparkexamples.employee SET empname='Sreeharsha' WHERE empid='1210' TOK_UPDATE_TABLE 1, 0,16, 7 TOK_TABNAME 1, 2,4, 7 sparkexamples 1, 2,2, 7 employee 1, 4,4, 21 TOK_SET_COLUMNS_CLAUSE 1, 6,10, 41 = 1, 8,10, 41 TOK_TABLE_OR_COL 1, 8,8, 34 empname 1, 8,8, 34 'Sreeharsha' 1, 10,10, 42 TOK_WHERE 1, 12,16, 66 = 1, 14,16, 66 TOK_TABLE_OR_COL 1, 14,14, 61 empid 1, 14,14, 61 '1210' 1, 16,16, 67 scala.NotImplementedError: No parse rules for TOK_UPDATE_TABLE: TOK_UPDATE_TABLE 1, 0,16, 7 TOK_TABNAME 1, 2,4, 7 sparkexamples 1, 2,2, 7 employee 1, 4,4, 21 TOK_SET_COLUMNS_CLAUSE 1, 6,10, 41 = 1, 8,10, 41 TOK_TABLE_OR_COL 1, 8,8, 34 empname 1, 8,8, 34 'Sreeharsha' 1, 10,10, 42 TOK_WHERE 1, 12,16, 66 = 1, 14,16, 66 TOK_TABLE_OR_COL 1, 14,14, 61 empid 1, 14,14, 61 '1210' 1, 16,16, 67 org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:1086)