Vertica python client export to s3 parquet - python-3.x

I am trying to export data from Vertica to s3 parquet with Vetica python client. The command works great from vsql and all other commands work fine for me through python client. This one specific below doesn't want to work, gives exceptions.
query = "EXPORT TO PARQUET(directory = 's3://medibeedb/unified/" + table_name + "/" + site_name + \
"', fileSizeMB=1024) AS SELECT '" + site_name + "' as site_name, *" \
" from medibee." + r[0] + ";" #+ i[0] + \
cur_unified.execute(query)
I prefer not to use s3export Vertica.Many thanks in advance.

Related

Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs:

I want to run a Process with Bash. I uploaded the jar file, I have checked my .conf and shell files. In general this process is meant to write in a database table, and not in a hdfs path.
I don't know if I need to add in the code an additional hdfs path before generating the jar. Next I show some part of the exception:
22/09/07 21:50:42 INFO yarn.Client: Deleted staging directory hdfs://nn/user/srv_remozo_equip/.sparkStaging/application_1661633254168_93772
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://nn:8020/applications/recup_remozo_equipos/Logistica_Carga_Input_SimpleData/logistica_carga_input_simpledata_2.11-0.1.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1742)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1757)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:386)
I dont know where the exception is coming from, or if related to the .conf and shell files.
Next i share some part of the code(some paths and names are in spanish), where it is the logic and writes in database.
// ================= INICIO LÓGICA DE PROCESO =================
//Se crea df a partir de archivo diario input de acuerdo a la ruta indicada
val df_csv = spark.read.format("csv").option("header","true").option("sep",";").option("mode","dropmalformed").load("/applications/recup_remozo_equipos/equipos_por_recuperar/output/agendamientos_sin_pet_2")
val df_final = df_csv.select($"RutSinDV".as("RUT_SIN_DV"),
$"dv".as("DV"),
$"Agendado".as("AGENDADO"),
to_date(col("Dia_Agendado"), "yyyyMMdd").as("DIA_AGENDADO"),
$"Horario_Agendado".as("HORARIO_AGENDADO"),
$"Nombre_Agendamiento".as("NOMBRE_AGENDAMIENTO"),
$"Telefono_Agendamiento".as("TELEFONO_AGENDAMIENTO"),
$"Email".substr(0,49).as("EMAIL"),
$"Region_Agendamiento".substr(0,29).as("REGION_AGENDAMIENTO"),
$"Comuna_Agendamiento".as("COMUNA_AGENDAMIENTO"),
$"Direccion_Agendamiento".as("DIRECCION_AGENDAMIENTO"),
$"Numero_Agendamiento".substr(0,5)as("NUMERO_AGENDAMIENTO"),
$"Depto_Agendamiento".substr(0,9).as("DEPTO_AGENDAMIENTO"),
to_timestamp(col("fecha_registro")).as("FECHA_REGISTRO"),
to_timestamp(col("Fecha_Proceso")).as("FECHA_PROCESO")
)
// ================== FIN LÓGICA DE PROCESO ==================
// Limpieza en EXADATA
println("[INFO] Se inicia la limpieza por reproceso en EXADATA")
val query_particiones = "(SELECT * FROM (WITH DATA AS (select table_name,partition_name,to_date(trim('''' " +
"from regexp_substr(extractvalue(dbms_xmlgen.getxmltype('select high_value from all_tab_partitions " +
"where table_name='''|| table_name|| ''' and table_owner = '''|| table_owner|| ''' and partition_name = '''" +
"|| partition_name|| ''''),'//text()'),'''.*?''')),'syyyy-mm-dd hh24:mi:ss') high_value_in_date_format " +
"FROM all_tab_partitions WHERE table_name = '" + table_name + "' AND table_owner = '" + table_owner + "')" +
"SELECT partition_name FROM DATA WHERE high_value_in_date_format > DATE '" + startDateYear + "-" + startDateMonth + "-" + startDateDay + "' " +
"AND high_value_in_date_format <= DATE '" + endDateYear + "-" + endDateMonth + "-" + endDateDay + "') A)"
Class.forName(driver_jdbc)
val db = DriverManager.getConnection(url_jdbc, user_jdbc, pass_jdbc)
val st = db.createStatement()
try {
val consultaParticiones = spark.read.format("jdbc")
.option("url", url_jdbc)
.option("driver", driver_jdbc)
.option("dbTable", query_particiones)
.option("user", user_jdbc)
.option("password", pass_jdbc)
.load()
.collect()
for (partition <- consultaParticiones) {
st.executeUpdate("call " + table_owner + ".DO_THE_TRUNCATE_PARTITION('" + table + "','" + partition.getString(0) + "')")
}
} catch {
case e: Exception =>
println("[ERROR TRUNCATE] " + e)
}
st.close()
db.close()
println("[INFO] Se inicia la inserción en EXADATA")
df_final.filter($"DIA_AGENDADO" >= "2022-08-01")
.repartition(repartition).write.mode("append")
.jdbc(url_jdbc, table, utils.jdbcProperties(driver_jdbc, user_jdbc, pass_jdbc))
println("[INFO] Inserción en EXADATA completada con éxito")
println("[INFO] Proceso Logistica Carga Input SimpleData")

Insert Maximo data into another database server

I want to insert some maximo data into table in another database server. I create an automation script like this
from psdi.security import UserInfo
from psdi.server import MXServer
from psdi.util import MXApplicationException
from psdi.util import MXException
from java.rmi import RemoteException
from java.lang import System
from java.text import Format, DateFormat, SimpleDateFormat
from java.lang import System
mx = MXServer.getMXServer();
ui = mx.getSystemUserInfo();
url= "jdbc:sqlserver://MAXIMODEMO:1433; database=IntegrationTest; user=maxadmin; password=password; encrypt=false; trustServerCertificate=false; loginTimeout=30;";
from java.lang import Class
from java.sql import DriverManager,SQLException
#load driver and register
Class.forName(jdbc_driver).newInstance()
DriverManager.registerDriver(Class.forName(jdbc_driver).newInstance())
#get Connection
#connection = DriverManager.getConnection(jdbc_url, jdbc_user), jdbc_password)
connection = DriverManager.getConnection(url)
#find if item exist
sql = "Select itemnum from item where itemnum='"+mbo.getString("ITEMNUM")+"'"
result = connection.createStatement().executeQuery(sql)
sqlinsert = ""
sdf = SimpleDateFormat("yyyy-MM--dd");
if(result.next()) :
sqlinsert = "Update Item set description='"+mbo.getString("DESCRIPTION")+"', orderunit='"+mbo.getString("ORDERUNIT")+"', status='"+mbo.getString("STATUS")+"' where ItemNum='"+mbo.getString("ITEMNUM")+"'"
else:
sqlInsert="Insert into item(itemnum, description, orderunit, statusdate, status, groupname) values('" + mbo.getString("ITEMNUM")+ "','" + mbo.getString("Description") + "', '" + mbo.getString("ORDERUNIT") + "','" + sdf.format(mbo.getDate("STATUSDATE")) + "', '" + mbo.getString("STATUS") + "','" + mbo.getString("GROUPNAME") + "') "
result.close()
result = connection.createStatement().executeQuery(sqlinsert)
connection.close()
There's no error but the data not inserted. the query for select is working fine, it can return the value, but the insert / update is not working.
Did I miss something in executing insert/update query?
You are missing a connection.commit() before your connection.close().
Also, you should be calling executeUpdate() with your insert or update statement text instead of calling executeQuery() with it. The JavaDocs for java.sql.Statement.executeUpdate() say, "Executes the given SQL statement, which may be an INSERT, UPDATE, or DELETE statement or an SQL statement that returns nothing, such as an SQL DDL statement."

cosmos db cannot query using IN, spring data native query and array or collection(Java)

I am trying to create(using spring native query) the findAllId for the reactive repository spring data cosmo DB.
Since for the ReactiveCosmosRepository is not implemented.
#Query(value = " SELECT *\n" +
" FROM container_name km\n" +
" WHERE km.id IN (#ids) \n" +
" ORDER BY km.createdDate DESC ")
Flux<ContainerData> findAllById(#Param("ids") String[] ids);
or even
#Query(value = " SELECT *\n" +
" FROM container_name km\n" +
" WHERE km.id IN (#ids) \n" +
" ORDER BY km.createdDate DESC ")
Flux<ContainerData> findAllById(#Param("ids") Iterable<String> ids);
but it is not retrieving any results, and it is not throwing any exception either.
So the question is, how to use IN operator with spring data native query in cosmos db and collection or array out of the box without having to do a workaround.
You should use array_contains
#Query(value = " SELECT *\n" +
" FROM container_name km\n" +
" WHERE array_contains(#ids, km.id, true) \n" +
" ORDER BY km.createdDate DESC ")
Flux<ContainerData> findAllById(#Param("ids") Iterable<String> ids);

converting sql query to equivalent spark query

I am using spark-sql-2.4.1v with java8.
I have scenario/snippet like below
Dataset<Row> df =//loaded data from a csv file
// this has columns like "code1","code2","code3","code4","code5","code6", and "class"
df.createOrReplaceTempView("temp_tab");
List<String> codesList = Arrays.asList("code1","code5"); // codes of interest to be calculated.
codesList.stream().forEach( code -> {
String query = "select "
+ " avg(" + code + ") as mean, "
+ "percentile(" + code +",0.25) as p25"
+ "from " + temp_tab
+ " group by class";
Dataset<Row> resultDs = sparkSession.sql(query);
});
how can this be written using functions.expr() & functions.agg() ?

Exception:execute_cql3_query failed: out of sequence response

i tried cassandra for storing the data but stucked with inserting huge data, my problem is like this:
i am inserting Data into cassandra using Insert Query, i have huge data stored into file, so i am reading the file Line by line and uinserting into cassandra, but i am geeting this Exception:
execute_cql3_query failed: out of sequence response
query = "insert into " + tableName[0]
+ "(sn,ts,dn,ma,br,re,ds,oi,lo,cl) values " + "('"
+ entry.getSn() + "'," + entry.getTs() + ",'"
+ entry.getDn() + "','" + entry.getMa() + "','"
+ entry.getBr() + "','" + entry.getRe() + "','"
+ entry.getDs() + "','" + entry.getOi() + "','"
+ entry.getLo() + "','" + entry.getCl() + "')";
can u please help me out?

Resources