If the input for a schema-attribute period (type xs: duration) is 'P14M' is there a transformation to P1Y2M. How is this transformation to be prevented?
I need the format P14M.
Thanks
Related
i'm using geospark(sedona) with pyspark:
is possible read from Oracle a sdo_geometry type and write in a table in Oracle with sdo_Geometry field?
in my app:
i'm able to read :
db_table = "(SELECT sdo_util.to_wktgeometry(geom_32632) geom FROM geodss_dev.CATASTO_GALLERIE cg WHERE rownum <10)" <---Query on Oracle Db
df_oracle = spark.read.jdbc(db_url, db_table, properties=db_properties)
df_oracle.show()
df_oracle.printSchema()
but when i write:
df_oracle.createOrReplaceTempView("gallerie")
df_write=spark.sql("select ST_AsBinary(st_geomfromwkt(geom)) geom_32632 from gallerie") <--query with Sedona Library on tempView Gallerie
print(df_write.dtypes)
df_write.write.jdbc(db_url, "geodss_dev.gallerie_test", properties=db_properties,mode="append")
i have this error:
ORA-00932: inconsistent data types: expected MDSYS.SDO_GEOMETRY, got BINARY
there is a solution for write sdo_geometry type?
thanks
Regards
You are reading the geometries in serialized formats: WKT (text) in your first example, WKB (binary) in the second.
If you want to write those back as SDO_GEOMETRY objects, you will need to deserialize them back. This can be done in two ways:
Using the SDO_GEOMETRY constructor:
insert into my_table(my_geom) values (sdo_geometry(:wkb))
or
insert into my_table(my_geom) values (sdo_geometry(:wkt))
Using the explicit conversion functions:
insert into my_table(my_geom) values (sdo_util.from_wkbgeometry(:wkb))
or
insert into my_table(my_geom) values (sdo_util.from_wktgeometry(:wkt))
I have no idea how you can express this using geospark. I assume it does allow you to specify things like a list of columns to write to, and a list of input values ?
What definitely does not happen is an automatic transformation from the serialized format (binary or text) to a geometry object. There are actually a number of serialized format in addition to the oldish WKT and WKB: GML and GeoJSON are the main alternatives. But those two need explicit calls to the transformation functions.
EDIT: About your second example: instead of stacking two function calls, you can just do:
SELECT sdo_util.to_wkbgeometry(geom_32632) geom ...
Also, in both examples, you can use the object methods instead of the function calls. The result will be the same (the methods just call those same functions anyway), but the syntax is a bit more compact. IMPORTANT: this requires using aliases!.
SELECT cg.geom_32632.get_wkt() geom
FROM geodss_dev.CATASTO_GALLERIE cg
WHERE rownum <10
SELECT cg.geom_32632.get_wkb() geom
FROM geodss_dev.CATASTO_GALLERIE cg
WHERE rownum <10
I have a dataset Dataset<Row> which comes from reading a parquet file. Knowing that one column inside InfoMap is of type Map.
Now I want to update this column, but when I use withColumn, it tells me that I cannot put a hashmap inside because it's not a litteral.
I want to know what is the correct way to update a column of type Map for a dataset ?
Try using typedLit instead of lit
typedLit
"...The difference between this function and lit() is that this
function can handle parameterized scala types e.g.: List, Seq and Map"
data.withColumn("dictionary", typedLit(Map("foo" -> 1, "bar" -> 2)))
I have a Spark dataset of the following type:
org.apache.spark.sql.Dataset[Array[Double]]
I want to map the array to a Vector so that I can use it as the input dataset for ml.clustering.KMeans.fit(...). So I try to do something like this:
val featureVectors = vectors.map(r => Vectors.dense(r))
But this fails with the following error:
error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
I guess I need to specify an encoder for the map operation, but I struggle with finding a way to do it. Any ideas?
You need the encoder to be available as implicit evidence:
def map[U : Encoder](func: T => U): Dataset[U]
breaks down to:
def map[U](func: T => U)(implicit evidence$1: Encoder[U]): Dataset[U]
So, you need to pass it in or have it available implicitly.
That said, I do not believe that Vector is supported as of yet, so you might have to drop to a DataFrame.
I have a question about ordering a DateTime RDD, finding the holes contained in it and fill it, for example, suppose we have this record into my database:
20160410,"info1"
20160409,"info2"
20160407,"info3"
20160404,"info4"
Basically for my purpose I need also holes, because it will impact over my calculations, so I would like something like this at the end:
Some(20160410,"info1")
Some(20160409,"info2")
None
Some(20160407,"info3")
None
None
Some(20160404,"info4")
What the best strategy to do that?
This is a little imcomplete excerpt code:
val records = bdao // RDD[(String,List[RecordPO])]
.findRecords
.filter(_.getRecDate >= startDate)
.filter(_.getRecDate < endDate)
.keyBy(_.getId)
.aggregateByKey(List[RecordPO]())((list, value) => value +: list, _ ++ _)
...
/* transformations */
...
val finalRecords=.... // RDD[(String,List[Option[RecordPO])]
Thanks in advance
You will need to create dataframe of all dates you want to see in resulting dataset (for example, all dates from 20160404 to 20160410). Then perform left outer join of this dataset with your records and you will get None where you expect.
I am trying to run some basic spark applications.
Can we apply a Action on another Action ?
or
Action can be applied only on Transformed RDD?
val numbersRDD = sc.parallelize(Array(1,2,3,4,5));
val topnumbersRDD = numbersRDD.take(2)
scala> topnumbersRDD.count
<console>:17: error: missing arguments for method count in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
topnumbersRDD.count
^
I would like to know why I am getting this above error .
Also what can I do if I want to find the count of first 2 numbers.. I need output as 2 .
Actions can be applied on RDD and DataFrame, the take method returns an Array, you could use length or size of array to count the elements.
If you want to choose datas with a condition, you could use filter and that returns a new RDD