Pandas executemany nulling out columns - python-3.x

I am using the pandas executemany function to load data into a DB2 database. I am having the following issue: If I insert one record with no NULLs, everything works fine; if I enter multiple records, but one has NULLs in some of the fields, it messes up all the records being inserted. The below illustrates what happens:
Edit: Some additional research, it appears the problem for one of these fields is that Pandas can't handle null integers very well. It looks like I am getting some sort of overflow when it tries to cast a null float to int; when I do explicit cast I get these same values. Is there any way to use the executemany function with null integers?
Edit2: I've tried playing around with the Int64 extension as some answers reference, but still had no luck since it still puts the type as float:

Related

Assigning indexes across rows within python DataFrame

I'm currently trying to assign a unique indexer across rows, rather than alongside columns. The main problem is these values can never repeat, and must be preserved with every monthly report that I run.
I've thought about merging the columns and assigning an indexer to that, but my concern is that I won't be able to easily modify the dataframe and still preserve the same index values for each cell with this method.
I'm expecting my df to look something like this below:
Sample DataFrame
I haven't yet found a viable solution so haven't got any code to show yet. Any solutions would be much appreciated. Thank you.

Converting Oracle RAW types with Spark

I have a table in an Oracle DB that contains a column stored as a RAW type. I'm making a JDBC connection to read that column and, when I print the schema of the resulting dataframe, I notice that I have a column with a binary data type. This was what I was expecting to happen.
The thing is that I need to be able to read that column as a String so I thought that a simple data type conversion would solve it.
df.select("COLUMN").withColumn("COL_AS_STRING", col("COLUMN").cast(StringType)).show
But what I got was a bunch of random characters. As I'm dealing with a RAW type it was possible that a string representation of this data doesn't exist so, just to be safe, I did simple select to get the first rows from the source (using sqoop-eval) and somehow sqoop can display this column as a string.
I then thought that this could be an encoding problem so I tried this:
df.selectExpr("decode(COLUMN,'utf-8')").show
With utf-8 and a bunch of other encodings. But again all I got was random characters.
Does anyone know how can I do this data type conversion?

Generic Inquiry time format

In a Generic Inquiry, I'm trying to format the time part of a DateTime field in the results. I don't currently see any way to do this without parsing the date as a string, but I must be missing something. Using the Format() function, running the query tells me "The method or operation is not implemented". Using the Minute() function gets the minutes part, but using the Hour() function says "Unsupported formula operator Hour".
Add CRCase table in your tables and don't join with any tables under Relations tab, and in result grid, under Schema field use CRCase.CreatedDateTime, with this you will get the result in DateTime format.
let me know if this is my wrong assumption to your question.

Auto infer schema from parquet/ selectively convert string to float

I have a parquet file with 400+ columns, when I read it, the default datatypes attached to a lot of columns is String (may be due to the schema specified by someone else).
I was not able to find a parameter similar to
inferSchema=True' #for spark.read.parquet, present for spark.read.csv
I tried changing
mergeSchema=True #but it doesn't improve the results
To manually cast columns as float, I used
df_temp.select(*(col(c).cast("float").alias(c) for c in df_temp.columns))
this runs without error, but converts all the actual string column values to Null. I can't wrap this in a try, catch block as its not throwing any error.
Is there a way where i can check whether the columns contains only 'integer/ float' values and selectively cast those columns to float?
Parquet columns are typed, so there is no such thing as schema inference when loading Parquet files.
Is there a way where i can check whether the columns contains only 'integer/ float' values and selectively cast those columns to float?
You can use the same logic as Spark - define preferred type hierarchy and attempt to cast, until you get to the point, where you find the most selective type, that parses all values in the column.
How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?
Spark data type guesser UDAF
There's no easy way currently,
there's a Github issue already existing which can be referred
https://github.com/databricks/spark-csv/issues/264
somthing like https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
existing for scala this can be created for pyspark

How to order by a varchar column numerically in vertica database?

How to order by a varchar column numerically in vertica database?
For example we can add a +0 in the order by clause in oracle to sort a varchar column numerically.
Thanks!
Use cast as in
select x from foo order by cast(x as int);
You will get an error if not all values can be casted to an int.
I haven't done this before in Vertica, but my advice is the same for this type of problem. Try to figure out how PostgreSQL does it and try that since Vertica is utilizing a lot of PostgreSQL funcitonality.
I just did a quick search and came up with this as a possible solution: http://archives.postgresql.org/pgsql-general/2002-01/msg01057.php
A more thorough search may get you better answers.
If the data is truly numeric data, the '+0' will do conversion as you have requested but if there are any values that can not be converted the query will return an error like the following one:
ERROR: Could not convert "200 ... something" from column table_name.column_name to a number

Resources