Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs: - apache-spark

I want to run a Process with Bash. I uploaded the jar file, I have checked my .conf and shell files. In general this process is meant to write in a database table, and not in a hdfs path.
I don't know if I need to add in the code an additional hdfs path before generating the jar. Next I show some part of the exception:
22/09/07 21:50:42 INFO yarn.Client: Deleted staging directory hdfs://nn/user/srv_remozo_equip/.sparkStaging/application_1661633254168_93772
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://nn:8020/applications/recup_remozo_equipos/Logistica_Carga_Input_SimpleData/logistica_carga_input_simpledata_2.11-0.1.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1742)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1757)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:386)
I dont know where the exception is coming from, or if related to the .conf and shell files.
Next i share some part of the code(some paths and names are in spanish), where it is the logic and writes in database.
// ================= INICIO LÓGICA DE PROCESO =================
//Se crea df a partir de archivo diario input de acuerdo a la ruta indicada
val df_csv = spark.read.format("csv").option("header","true").option("sep",";").option("mode","dropmalformed").load("/applications/recup_remozo_equipos/equipos_por_recuperar/output/agendamientos_sin_pet_2")
val df_final = df_csv.select($"RutSinDV".as("RUT_SIN_DV"),
$"dv".as("DV"),
$"Agendado".as("AGENDADO"),
to_date(col("Dia_Agendado"), "yyyyMMdd").as("DIA_AGENDADO"),
$"Horario_Agendado".as("HORARIO_AGENDADO"),
$"Nombre_Agendamiento".as("NOMBRE_AGENDAMIENTO"),
$"Telefono_Agendamiento".as("TELEFONO_AGENDAMIENTO"),
$"Email".substr(0,49).as("EMAIL"),
$"Region_Agendamiento".substr(0,29).as("REGION_AGENDAMIENTO"),
$"Comuna_Agendamiento".as("COMUNA_AGENDAMIENTO"),
$"Direccion_Agendamiento".as("DIRECCION_AGENDAMIENTO"),
$"Numero_Agendamiento".substr(0,5)as("NUMERO_AGENDAMIENTO"),
$"Depto_Agendamiento".substr(0,9).as("DEPTO_AGENDAMIENTO"),
to_timestamp(col("fecha_registro")).as("FECHA_REGISTRO"),
to_timestamp(col("Fecha_Proceso")).as("FECHA_PROCESO")
)
// ================== FIN LÓGICA DE PROCESO ==================
// Limpieza en EXADATA
println("[INFO] Se inicia la limpieza por reproceso en EXADATA")
val query_particiones = "(SELECT * FROM (WITH DATA AS (select table_name,partition_name,to_date(trim('''' " +
"from regexp_substr(extractvalue(dbms_xmlgen.getxmltype('select high_value from all_tab_partitions " +
"where table_name='''|| table_name|| ''' and table_owner = '''|| table_owner|| ''' and partition_name = '''" +
"|| partition_name|| ''''),'//text()'),'''.*?''')),'syyyy-mm-dd hh24:mi:ss') high_value_in_date_format " +
"FROM all_tab_partitions WHERE table_name = '" + table_name + "' AND table_owner = '" + table_owner + "')" +
"SELECT partition_name FROM DATA WHERE high_value_in_date_format > DATE '" + startDateYear + "-" + startDateMonth + "-" + startDateDay + "' " +
"AND high_value_in_date_format <= DATE '" + endDateYear + "-" + endDateMonth + "-" + endDateDay + "') A)"
Class.forName(driver_jdbc)
val db = DriverManager.getConnection(url_jdbc, user_jdbc, pass_jdbc)
val st = db.createStatement()
try {
val consultaParticiones = spark.read.format("jdbc")
.option("url", url_jdbc)
.option("driver", driver_jdbc)
.option("dbTable", query_particiones)
.option("user", user_jdbc)
.option("password", pass_jdbc)
.load()
.collect()
for (partition <- consultaParticiones) {
st.executeUpdate("call " + table_owner + ".DO_THE_TRUNCATE_PARTITION('" + table + "','" + partition.getString(0) + "')")
}
} catch {
case e: Exception =>
println("[ERROR TRUNCATE] " + e)
}
st.close()
db.close()
println("[INFO] Se inicia la inserción en EXADATA")
df_final.filter($"DIA_AGENDADO" >= "2022-08-01")
.repartition(repartition).write.mode("append")
.jdbc(url_jdbc, table, utils.jdbcProperties(driver_jdbc, user_jdbc, pass_jdbc))
println("[INFO] Inserción en EXADATA completada con éxito")
println("[INFO] Proceso Logistica Carga Input SimpleData")

Related

How to use placeholder in spanner

My Query:
Select * from (Select id,name, salary from Emp ORDER BY %s %s) AS AL
BETWEEN OFFSET :OFFSET AND LIMIT: LIMIT
%s ,%s Represents the created,ASC
It is not working in Spanner
How to implement this query in spanner from java side?
It seems your query has a syntax error. We are missing a WHERE <column name> before the BETWEEN.
With the fixed query, you could do something like this in the Java client:
try (ResultSet rs = databaseClient
.singleUse()
.executeQuery(Statement
.newBuilder(String.format("SELECT *"
+ " FROM ("
+ " SELECT id, name, salary FROM Emp ORDER BY %s %s"
+ " ) AS e \n"
+ " WHERE e.id BETWEEN #offset AND #limit",
"id", "ASC"
))
.bind("offset")
.to(1L)
.bind("limit")
.to(10L)
.build())) {
while (rs.next()) {
System.out.println(rs.getLong("id") + ", " + rs.getString("name") + ", " + rs.getBigDecimal("salary"));
}
}

cosmos db cannot query using IN, spring data native query and array or collection(Java)

I am trying to create(using spring native query) the findAllId for the reactive repository spring data cosmo DB.
Since for the ReactiveCosmosRepository is not implemented.
#Query(value = " SELECT *\n" +
" FROM container_name km\n" +
" WHERE km.id IN (#ids) \n" +
" ORDER BY km.createdDate DESC ")
Flux<ContainerData> findAllById(#Param("ids") String[] ids);
or even
#Query(value = " SELECT *\n" +
" FROM container_name km\n" +
" WHERE km.id IN (#ids) \n" +
" ORDER BY km.createdDate DESC ")
Flux<ContainerData> findAllById(#Param("ids") Iterable<String> ids);
but it is not retrieving any results, and it is not throwing any exception either.
So the question is, how to use IN operator with spring data native query in cosmos db and collection or array out of the box without having to do a workaround.
You should use array_contains
#Query(value = " SELECT *\n" +
" FROM container_name km\n" +
" WHERE array_contains(#ids, km.id, true) \n" +
" ORDER BY km.createdDate DESC ")
Flux<ContainerData> findAllById(#Param("ids") Iterable<String> ids);

converting sql query to equivalent spark query

I am using spark-sql-2.4.1v with java8.
I have scenario/snippet like below
Dataset<Row> df =//loaded data from a csv file
// this has columns like "code1","code2","code3","code4","code5","code6", and "class"
df.createOrReplaceTempView("temp_tab");
List<String> codesList = Arrays.asList("code1","code5"); // codes of interest to be calculated.
codesList.stream().forEach( code -> {
String query = "select "
+ " avg(" + code + ") as mean, "
+ "percentile(" + code +",0.25) as p25"
+ "from " + temp_tab
+ " group by class";
Dataset<Row> resultDs = sparkSession.sql(query);
});
how can this be written using functions.expr() & functions.agg() ?

Vertica python client export to s3 parquet

I am trying to export data from Vertica to s3 parquet with Vetica python client. The command works great from vsql and all other commands work fine for me through python client. This one specific below doesn't want to work, gives exceptions.
query = "EXPORT TO PARQUET(directory = 's3://medibeedb/unified/" + table_name + "/" + site_name + \
"', fileSizeMB=1024) AS SELECT '" + site_name + "' as site_name, *" \
" from medibee." + r[0] + ";" #+ i[0] + \
cur_unified.execute(query)
I prefer not to use s3export Vertica.Many thanks in advance.

Exception:execute_cql3_query failed: out of sequence response

i tried cassandra for storing the data but stucked with inserting huge data, my problem is like this:
i am inserting Data into cassandra using Insert Query, i have huge data stored into file, so i am reading the file Line by line and uinserting into cassandra, but i am geeting this Exception:
execute_cql3_query failed: out of sequence response
query = "insert into " + tableName[0]
+ "(sn,ts,dn,ma,br,re,ds,oi,lo,cl) values " + "('"
+ entry.getSn() + "'," + entry.getTs() + ",'"
+ entry.getDn() + "','" + entry.getMa() + "','"
+ entry.getBr() + "','" + entry.getRe() + "','"
+ entry.getDs() + "','" + entry.getOi() + "','"
+ entry.getLo() + "','" + entry.getCl() + "')";
can u please help me out?

Resources