I using configuration file as below :
"trial":{
"stage_table": "trial_stg",
"folder_location": "Trial",
"column_mapping": [
{
"source_column": "(split(INPUT__FILE__NAME, '\\/')[11])",
"source_datatype": "text",
"target_column": "indication",
"target_datatype": "text",
"transform_type": "expression",
"validate": false
}
I am trying to get file name using INPUT__FILE__NAME function in pyspark but I am getting issue.
Below is the code after reading this config file :
def query_expression_builder(mapping):
print("Inside query_expression_builder")
print("mapping :",mapping)
def match_transform_type(map_col):
print("Inside match_transform_type")
if map_col.get('transform_type') is None:
print("transform_type is",map_col.get('transform_type'))
print("map_col inside if :",map_col)
return f"`{map_col['source_column']}` AS {map_col['target_column']}"
elif str(map_col.get('transform_type')).__eq__('expression'):
print("transform_type is",map_col.get('transform_type'))
print("map_col inside elif :",map_col)
return f"{map_col['source_column']} AS {map_col['target_column']}"
else:
print("transform_type is",map_col.get('transform_type'))
print("map_col inside else :",map_col)
return f"`{map_col['source_column']}` AS {map_col['target_column']}"
if mapping is None:
print("Check for mapping is None")
return []
else:
print("Mapping is not None")
return list(map(lambda col_mapping: match_transform_type(map_col=col_mapping), mapping))
def main():
query = query_expression_builder\
(mapping=config['file_table_mapping'][tbl]['column_mapping'])
print(f"Table = {tbl} Executing query {query}")
file_path = f"s3://{config['raw_bucket']}/{config['landing_directory']}/{config['file_table_mapping'][tbl]['folder_location']}/{config_audit['watermark_timestamp']}*.csv"
write_df = spark.read.csv(path=file_path, header=True,\
inferSchema=False).selectExpr(query) \
.withColumn("prcs_run_id", func.lit(config_audit['prcs_run_id']))\
.withColumn("job_run_id",\
func.lit(config_audit['job_run_id']))\
.withColumn("ins_ts", func.lit(ins_ts))\
.withColumn("rec_crt_user", func.lit(config["username"]))
write_df.show()
Below is the error I am getting :
"cannot resolve '`INPUT__FILE__NAME`' given input columns: [Pediatric Patients included (Y/N), Trial registry number, Number of patients, Sponsor, Number of treatment arms, Multicenter study, Trial Conclusion, Clinical Phase, Study Population, Country Codes, Exclusion criteria, Trial ID, Trial AcronymDerived, Comments, Countries, Trial registry name, Sample size calculation details, Randomisation, Blinding, Trial Comments, Trial start year, Trial end year, Inclusion criteria, Study treatment, Trial design, Controlled trial, Trial Acronym, Trial Control, Asymptomatic patients, Analysis method details]; line 1 pos 7;\n'Project ['split('INPUT__FILE__NAME, /)[11] AS indication#4346, Trial ID#4286 AS trial_id#4347, Trial Acronym#4287 AS trial_acronym#4348, Trial AcronymDerived#4288 AS trial_acronym_derived#4349, Sponsor#4289 AS sponsor#4350, Asymptomatic patients#4290 AS asymptomatic_patients#4351, Pediatric Patients included (Y/N)#4291 AS pediatric_patients_included#4352, Number of patients#4292 AS num_of_patients#4353, Number of treatment arms#4293 AS num_of_treatment_arms#4354, Trial start year#4294 AS trial_strt_yr#4355, Trial end year#4295 AS trial_end_yr#4356, Clinical Phase#4296 AS clinical_phase#4357, Study Population#4297 AS study_population#4358, Study treatment#4298 AS study_treatment#4359, Randomisation#4299 AS randomization#4360, Controlled trial#4300 AS controlled_trial#4361, Trial Control#4301 AS trial_control#4362, Blinding#4302 AS blinding#4363, Trial registry name#4303 AS trial_registry_name#4364, Trial registry number#4304 AS trial_registry_num#4365, Countries#4305 AS countries#4366, Country Codes#4306 AS country_codes#4367, Trial design#4307 AS trial_design#4368, Multicenter study#4308 AS multicenter_study#4369, ... 7 more fields]\n+- Relation[Trial ID#4286,Trial Acronym#4287,Trial AcronymDerived#4288,Sponsor#4289,Asymptomatic patients#4290,Pediatric Patients included (Y/N)#4291,Number of patients#4292,Number of treatment arms#4293,Trial start year#4294,Trial end year#4295,Clinical Phase#4296,Study Population#4297,Study treatment#4298,Randomisation#4299,Controlled trial#4300,Trial Control#4301,Blinding#4302,Trial registry name#4303,Trial registry number#4304,Countries#4305,Country Codes#4306,Trial design#4307,Multicenter study#4308,Inclusion criteria#4309,... 6 more fields] csv\n"
Traceback (most recent call last):
File "/mnt/yarn/usercache/root/appcache/application_1594568207850_0001/container_1594568207850_0001_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/mnt/yarn/usercache/root/appcache/application_1594568207850_0001/container_1594568207850_0001_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o529.selectExpr.
: org.apache.spark.sql.AnalysisException: cannot resolve '`INPUT__FILE__NAME`' given input columns:
How can I use INPUT__FILE__NAME function? I have already enabled hive support in my code. Or is there any other way to do this? I cannot find anything on net on how to use this function.
Try by using single underscore(_) in input_file_name() instead of double underscore.
Example:
from pyspark.sql.functions import *
sql("select *,input_file_name() from tmp")
#or
df.withColumn("filename",input_file_name()).show()
I'm getting the following OpenCV-Python error while running a face recognition module in Python 3.8.2:
cv2.error: OpenCV(4.2.0) /io/opencv/modules/imgproc/src/demosaicing.cpp:1721: error: (-215:Assertion failed) scn == 1 && (dcn == 3 || dcn == 4) in function 'demosaicing'
Could someone explain the cause of this error and the solution to it?
Here is the code:
known_faces=[]
known_names=[]
for name in os.listdir(KNOWN_FACES_DIR):
for filename in os.listdir(f"{KNOWN_FACES_DIR}/{name}"):
image=face_recognition.load_image_file(f"{KNOWN_FACES_DIR}/{name}/{filename}")
encoding=face_recognition.face_encodings(image)[0]
known_faces.append(encoding)
known_names.append(name)
print("processing unknown faces!")
for filename in os.listdir(UNKNOWN_FACES_DIR):
print(filename)
image=face_recognition.load_image_file(f"{UNKNOWN_FACES_DIR}/{filename}")
locations= face_recognition.face_locations(image,model=MODEL)
encodings=face_recognition.face_encodings(image,locations)
image=cv2.cvtColor(image,cv2.COLOR_BAYER_BG2BGR)
I did a bit of testing and searching. I think the error is due to incorrect format of the pictures that I uploaded.
I found this definition from wikipedia
A demosaicing (also de-mosaicing, demosaicking or debayering) algorithm is a digital image process used to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA). It is also known as CFA interpolation or color reconstruction.
I tried changing the code but to no avail. Then after seeing the definition thought it might be incorrect input from the picture. I think it's the type of format of of picture that I found incorrect.
I am not sure why I keep on receiving this error. Any help please?
students = ['Jacob', 'Joseph', 'Tony']
for student in students:
print(student)
students = ['Jacob', 'Joseph', 'Tony']
for student in students:
print(magician.title() + ", you got an amazing score on you exam!"
Then it says "Syntax Error: unexpected EOF while parsing" on line 9, but there isn't even a line 9. I have no idea why this keeps on occurring.
you did'nt close the parentheses:
try:
print(magician.title() + ", you got an amazing score on you exam!")
I am facing an issue with the execution of following Groovy Script snippet.
GroovyShell sh = new GroovyShell();
sh.evaluate("\"abcd\".length() >= .34");
I am getting the following exceptions. The entire stack trace is mentioned below.
Exception in thread "main" org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 1: unexpected token: >= # line 1, column 17.
"abcd".length() >= .34d
If I change .34 to 0.34, it works. However, because of some limitation, I won't be able to change the script content.
Any help to overcome will be appreciated.
I am getting the following exceptions
Exception in thread "main" org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 1: unexpected token: >= # line 1, column 17.
"abcd".length() >= .34d
^
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:310)
at org.codehaus.groovy.control.ErrorCollector.addFatalError(ErrorCollector.java:150)
at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:120)
at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:132)
at org.codehaus.groovy.control.SourceUnit.addError(SourceUnit.java:350)
at org.codehaus.groovy.antlr.AntlrParserPlugin.transformCSTIntoAST(AntlrParserPlugin.java:144)
at org.codehaus.groovy.antlr.AntlrParserPlugin.parseCST(AntlrParserPlugin.java:110)
at org.codehaus.groovy.control.SourceUnit.parse(SourceUnit.java:234)
at org.codehaus.groovy.control.CompilationUnit$1.call(CompilationUnit.java:168)
at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:943)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:605)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:584)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594)
at groovytest.Testtest.main(Testtest.java:18)
Your Groovy snippet is incorrect - Groovy does not support notation without leading zero in case of decimal numbers smaller than 1.0. If you try to compile following expression directly using groovyc:
"abcd".length() >= .34
compilation will fail with error like:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
test.groovy: 2: Unexpected input: '.' # line 2, column 20.
"abcd".length() >= .34
^
1 error
Java supports such notation, however Groovy from 2.x up to 3.0.0-alpha-3 version does not support it.
Solid solution
Fix the input Groovy code snippet to contain only a valid and compile-ready code. Any invalid Groovy statements or expressions will lead to failures and compilation errors.
Workaround: add leading zeros with replaceAll() method
The only way to compile such incorrect snippet is to replace all .\d+ (dots followed by at least one space and ended with a number) with 0.$1. Consider following example:
def snippet = "\"abcd\".length() >= .34; \"efgh\".length() >= .22; \"xyz\".length() >= 0.11;"
println snippet.replaceAll(' \\.(\\d+)', ' 0.$1')
It adds 0 to all decimal numbers where leading zero is missing. Running this example prints following output to the console:
"abcd".length() >= 0.34; "efgh".length() >= 0.22; "xyz".length() >= 0.11;
If you pass such modified snippet to GroovyShell.evaluate() method it will run with no errors.
Of course this is not a rock-solid solution and it is just a way to automatically fix some of the syntax errors introduced in the code snippet. There are some corner cases where this workaround may cause some side effects, you have to be aware of it.
I have a problem with syntax in cobol. I'm using open-cobol package on Ubuntu 4.2.0-16-generic, and i've got error:
~/cobol$ cobc -free -x -o cal cal.cbl
cal.cbl:6: Error: syntax error, unexpected $undefined, expecting "end of file"
My cal.cbl file:
IDENTIFICATION DIVISION.
PROGRAM-ID. cal.
ENVIRONMENT DIVISION.
DATA DIVISION.
?? OPTION PIC 9 VALUE ZERO.
?? NUM1 PIC 9(5)V9(2) VALUE ZERO.
?? NUM2 PIC 9(5)V9(2) VALUE ZERO.
?? RESULT PIC 9(10)V9(2) VALUE ZERO.
PROCEDURE DIVISION.
ACCEPT OPTION.
DISPLAY "INSERT FIRST OPTION".
ACCEPT NUM1.
DISPLAY "INSERT SECOND OPTION".
ACCEPT NUM2.
STOP RUN.
I'm new in cobolt, i know something about columns and thats why I'm using -free flag to compile, but this error have no sense for me.
Why this error occurs, please help:)
?? is no valid COBOL word and no level number (which is needed in line 6). These messages come from OpenCOBOL/GnuCOBOL 1.1.
Newer GnuCOBOL versions are much better in many ways, including user messages (here with GC 2.2):
cal.cob: 6: Error: Invalid symbol: ? - Skipping word
cal.cob: 6: Error: PROCEDURE DIVISION header missing
cal.cob: 6: Error: syntax error, unexpected Identifier
cal.cob: 7: Error: Invalid symbol: ? - Skipping word
cal.cob: 7: Error: syntax error, unexpected Identifier
cal.cob: 8: Error: Invalid symbol: ? - Skipping word
cal.cob: 8: Error: syntax error, unexpected Identifier
cal.cob: 9: Error: Invalid symbol: ? - Skipping word
cal.cob: 9: Error: syntax error, unexpected Identifier
cal.cob: 11: Error: syntax error, unexpected PROCEDURE
cal.cob: 12: Error: 'OPTION' is not defined
cal.cob: 15: Error: 'NUM1' is not defined
cal.cob: 17: Error: 'NUM2' is not defined
Change ?? to 01 or 77 and you don't have the error any more. Insert WORKING-STORAGE SECTION or LOCAL-STORAGE SECTION after DATA DIVISION and your program compiles fine.
Get the Programmer's Guide for knowing more about COBOL.