I have given the following variables:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
I want to create a .TXT file which would look like this with tab separated values:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 5 8 9 0
1000 1500 6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19
I have written the following code:
#!/usr/bin/env python3
import os
from datetime import datetime
import time
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
filename = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{signal1}_results.TXT"
with open(filename, 'w') as f:
# write the bin1 range
f.write('\n\n\n')
f.write('\t\t\t\t')
f.write(signal1 + '>=')
for bin in bins1[:-1]:
f.write('\t' + str(bin))
f.write('\n')
f.write('\t\t\t\t')
f.write(signal1 + '<')
for bin in bins1[1:]:
f.write('\t' + str(bin))
f.write('\n')
# write the bin2 range
f.write('\t\t')
f.write(signal2 + '>=' + '\t' + signal2 + '<' + '\n')
f.write('\t\t')
# store the cursor position from where hist result will be written line by line
track_cursor_pos = []
curr = bins2[0]
for next in bins2[1:]:
f.write(str(curr) + '\t' + str(next))
track_cursor_pos.append(f.tell())
f.write('\n\t\t')
curr = next
f.write('\n')
print(track_cursor_pos)
i = 0
# Everything is fine until here
# Code below doesn't work as expected!?
for result in hist_result:
f.seek(track_cursor_pos[i], os.SEEK_SET)
for r in result:
f.write('\t' + str(r))
f.write('\n')
i += 1
But, this is producing the TXT file whose contents look like this:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
0 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
1 4 5 12
00 -5 8 9 0
00 -6 7 11 19
I think I am not using the f.seek() properly. Any suggestion would be appreciated. Thanks in advance.
You don't have to seek inside the file to print your data:
signal1 = 'speed'
bins1 = [0, 10, 20, 30, 40]
signal2 = 'rpm'
bins2 = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
hist_result = [ [1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
]
with open('data.txt', 'w') as f_out:
print('\t{signal1}>=\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[:-1]))), file=f_out)
print('\t{signal1}<\t{bins}'.format(signal1=signal1, bins='\t'.join(map(str,bins1[1:]))), file=f_out)
print('{signal2}>=\t{signal2}<'.format(signal2=signal2))
for a, b, data in zip(bins2[:-1], bins2[1:], hist_result):
print(a, b, *data, sep='\t', file=f_out)
Creates data.txt:
speed>= 0 10 20 30
speed< 10 20 30 40
rpm>= rpm<
0 500 1 4 5 12
500 1000 -5 8 9 0
1000 1500 -6 7 11 19
1500 2000 1 4 5 12
2000 2500 -5 8 9 0
2500 3000 -6 7 11 19
3000 3500 1 4 5 12
3500 4000 -5 8 9 0
4000 4500 -6 7 11 19
DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4
when I run some query by spark sql(hivecontext) it complains like below.
which syntax will cause this?
I am using spark 1.6 and hive 1.2
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Unsupported language features in query: SELECT i.*,
from_unixtime(unix_timestamp('20170221','yyyyMMdd'),"yyyy-MM-dd'T'HH:mm:ssZ") bounce_date
FROM
(SELECT country,
device_id,
os_name,
app_ver
FROM jpl_band_orc
WHERE yyyymmdd='20170221'
AND scene_id='app_intro'
AND action_id='scene_enter'
AND classifier='app_intro'
GROUP BY country, device_id, os_name, app_ver ) i
LEFT JOIN
(SELECT device_id
FROM jpl_band_orc
WHERE yyyymmdd='20170221'
AND scene_id='band_list'
AND action_id='scene_enter'
AND device_id IN
(SELECT DISTINCT device_id
FROM jpl_band_orc x
WHERE yyyymmdd='20170221'
AND scene_id='app_intro'
AND action_id='scene_enter'
AND classifier='app_intro' ) ) s
ON i.device_id = s.device_id
WHERE s.device_id is null
TOK_QUERY 8, 0,425, 10
TOK_FROM 8, 28,412, 10
TOK_LEFTOUTERJOIN 8, 36,412, 10
TOK_SUBQUERY 8, 36,186, 10
TOK_QUERY 8, 37,182, 10
TOK_FROM 8, 91,93, 10
TOK_TABREF 8, 93,93, 10
TOK_TABNAME 8, 93,93, 10
jpl_band_orc 8, 93,93, 10
TOK_INSERT 0, -1,182, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 4, 37,84, 13
TOK_SELEXPR 4, 39,39, 13
TOK_TABLE_OR_COL 4, 39,39, 13
country 4, 39,39, 13
TOK_SELEXPR 5, 54,54, 12
TOK_TABLE_OR_COL 5, 54,54, 12
device_id 5, 54,54, 12
TOK_SELEXPR 6, 69,69, 12
TOK_TABLE_OR_COL 6, 69,69, 12
os_name 6, 69,69, 12
TOK_SELEXPR 7, 84,84, 12
TOK_TABLE_OR_COL 7, 84,84, 12
app_ver 7, 84,84, 12
TOK_WHERE 12, 100,161, 13
AND 12, 102,161, 13
AND 11, 138,138, 13
AND 10, 119,119, 13
= 9, 102,104, 19
TOK_TABLE_OR_COL 9, 102,102, 11
yyyymmdd 9, 102,102, 11
'20170221' 9, 104,104, 20
= 10, 121,123, 25
TOK_TABLE_OR_COL 10, 121,121, 17
scene_id 10, 121,121, 17
'app_intro' 10, 123,123, 26
= 11, 140,142, 26
TOK_TABLE_OR_COL 11, 140,140, 17
action_id 11, 140,140, 17
'scene_enter' 11, 142,142, 27
= 12, 159,161, 27
TOK_TABLE_OR_COL 12, 159,159, 17
classifier 12, 159,159, 17
'app_intro' 12, 161,161, 28
TOK_GROUPBY 13, 168,182, 15
TOK_TABLE_OR_COL 13, 173,173, 15
country 13, 173,173, 15
TOK_TABLE_OR_COL 13, 176,176, 24
device_id 13, 176,176, 24
TOK_TABLE_OR_COL 13, 179,179, 35
os_name 13, 179,179, 35
TOK_TABLE_OR_COL 13, 182,182, 44
app_ver 13, 182,182, 44
i 13, 186,186, 54
TOK_SUBQUERY 16, 201,391, 10
TOK_QUERY 16, 202,387, 10
TOK_FROM 16, 211,213, 10
TOK_TABREF 16, 213,213, 10
TOK_TABNAME 16, 213,213, 10
jpl_band_orc 16, 213,213, 10
TOK_INSERT 0, -1,387, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 15, 202,204, 13
TOK_SELEXPR 15, 204,204, 13
TOK_TABLE_OR_COL 15, 204,204, 13
device_id 15, 204,204, 13
TOK_WHERE 20, 220,387, 13
AND 20, 222,387, 13
AND 19, 258,258, 13
AND 18, 239,239, 13
= 17, 222,224, 19
TOK_TABLE_OR_COL 17, 222,222, 11
yyyymmdd 17, 222,222, 11
'20170221' 17, 224,224, 20
= 18, 241,243, 25
TOK_TABLE_OR_COL 18, 241,241, 17
scene_id 18, 241,241, 17
'band_list' 18, 243,243, 26
= 19, 260,262, 26
TOK_TABLE_OR_COL 19, 260,260, 17
action_id 19, 260,260, 17
'scene_enter' 19, 262,262, 27
TOK_SUBQUERY_EXPR 20, 279,387, 27
TOK_SUBQUERY_OP 20, 281,281, 27
IN 20, 281,281, 27
TOK_QUERY 22, 291,387, 12
TOK_FROM 22, 305,309, 12
TOK_TABREF 22, 307,309, 12
TOK_TABNAME 22, 307,307, 12
jpl_band_orc 22, 307,307, 12
x 22, 309,309, 25
TOK_INSERT 0, -1,385, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECTDI 21, 292,296, 24
TOK_SELEXPR 21, 296,296, 24
TOK_TABLE_OR_COL 21, 296,296, 24
device_id 21, 296,296, 24
TOK_WHERE 26, 318,385, 15
AND 26, 320,385, 15
AND 25, 360,360, 15
AND 24, 339,339, 15
= 23, 320,322, 21
TOK_TABLE_OR_COL 23, 320,320, 13
yyyymmdd 23, 320,320, 13
'20170221' 23, 322,322, 22
= 24, 341,343, 27
TOK_TABLE_OR_COL 24, 341,341, 19
scene_id 24, 341,341, 19
'app_intro' 24, 343,343, 28
= 25, 362,364, 28
TOK_TABLE_OR_COL 25, 362,362, 19
action_id 25, 362,362, 19
'scene_enter' 25, 364,364, 29
= 26, 383,385, 29
TOK_TABLE_OR_COL 26, 383,383, 19
classifier 26, 383,383, 19
'app_intro' 26, 385,385, 30
TOK_TABLE_OR_COL 20, 279,279, 17
device_id 20, 279,279, 17
s 26, 391,391, 46
= 27, 404,412, 24
. 27, 404,406, 13
TOK_TABLE_OR_COL 27, 404,404, 12
i 27, 404,404, 12
device_id 27, 406,406, 14
. 27, 410,412, 27
TOK_TABLE_OR_COL 27, 410,410, 26
s 27, 410,410, 26
device_id 27, 412,412, 28
TOK_INSERT 0, -1,425, 0
TOK_DESTINATION 0, -1,-1, 0
TOK_DIR 0, -1,-1, 0
TOK_TMP_FILE 0, -1,-1, 0
TOK_SELECT 1, 0,23, 7
TOK_SELEXPR 1, 2,4, 7
TOK_ALLCOLREF 1, 2,4, 7
TOK_TABNAME 1, 2,2, 7
i 1, 2,2, 7
TOK_SELEXPR 2, 11,23, 3
TOK_FUNCTION 2, 11,21, 3
from_unixtime 2, 11,11, 3
TOK_FUNCTION 2, 13,18, 17
unix_timestamp 2, 13,13, 17
'20170221' 2, 15,15, 32
'yyyyMMdd' 2, 17,17, 43
"yyyy-MM-dd'T'HH:mm:ssZ" 2, 20,20, 55
bounce_date 2, 23,23, 81
TOK_WHERE 0, 417,425, 0
TOK_FUNCTION 0, 419,425, 0
TOK_ISNULL 0, 425,425, 0
. 28, 419,421, 10
TOK_TABLE_OR_COL 28, 419,419, 9
s 28, 419,419, 9
device_id 28, 421,421, 11
scala.NotImplementedError: No parse rules for ASTNode type: 864, text: TOK_SUBQUERY_EXPR :
" +
Subquery in WHERE clause is not supported in Spark 1.6. It is supported in 2.0
Reference:
https://issues.apache.org/jira/browse/SPARK-4226