datatype text not support - databricks

I have huge data file and one of the column is text and has large data set in that column.
I tried to create column with text data type but it is not supported.
How to bring text data type data over to databricks.
please guide

Here's reference: Databricks data types
For CHAR, VARCHAR, NVARCHAR, TEXT and, in general, character strings of any size, just use STRING.

Related

How to copy Numeric Array from parquet to postgres using Azure data factory

We are trying to copy the parquet file from blob to Postgres table. Now the problem is my source parquet has some columns with number arrays which ADF is complaining to be not supported, if I change that to string datatype my Postgres say that it is expecting Number Array
Is there some solution or workaround to tackle this?
The workaround for the problem would be to change the type of those columns from array type to string in your Postgres table. This can be done using the following code:
ALTER TABLE <table_name> ALTER COLUMN <column_name> TYPE text;
I have taken a sample table player consisting of 2 array columns position (integer array) and role (text array).
After changing the type of these columns, the table looks like this.
ALTER TABLE player1 ALTER COLUMN position TYPE varchar(40);
ALTER TABLE player1 ALTER COLUMN role TYPE varchar(40);
You can now complete the copy activity in ADF without getting any errors.
If there are any existing records, the specific array type values will be converted to string type, and it also helps you complete the copy activity without any errors. The following is an example of this case.
Initial table data (array type columns): https://i.stack.imgur.com/O6ErV.png
Convert to String type: https://i.stack.imgur.com/Xy69B.png
After using ADF copy activity: https://i.stack.imgur.com/U8pFg.png
NOTE:
Considering you have changed the array column to string type in the source file, if you can make changes such that the list of values are enclosed within {} rather than [], then you can convert the column type back to array type using ALTER query.
If list of elements are enclosed within [] and you try to convert the columns back to array type in your table, it throws the following error.
ERROR: malformed array literal: "[1,1,0]"
DETAIL: Missing "]" after array dimensions.

Convert a hive extract file into mainframe file layout

I have generated an hive extract .For instance,It has below columns
fields-->a1,a2,a3,b,c,d,ee1,e2,f1,f2
I need to combine a1,a2,a3 fields into one field as 'a'
once it is combined, i have to take each record and apply need some vector elements for some fields when it is migrated to mainframe.Since in hive vector fields are not applicable,we used to create the source table with different coulmns for the no of vector incidences like e1,e2, f1,f2
for eg,this is the format which I needed
record
ebcdic string e;
ebcdic string f;
end [2]
Now what I need to do is write a hive query to transform normal file layout in hive into above format.Since I am not familar with this can any one suggest some logic to solve this?
Thanks in advance.

Converting Oracle RAW types with Spark

I have a table in an Oracle DB that contains a column stored as a RAW type. I'm making a JDBC connection to read that column and, when I print the schema of the resulting dataframe, I notice that I have a column with a binary data type. This was what I was expecting to happen.
The thing is that I need to be able to read that column as a String so I thought that a simple data type conversion would solve it.
df.select("COLUMN").withColumn("COL_AS_STRING", col("COLUMN").cast(StringType)).show
But what I got was a bunch of random characters. As I'm dealing with a RAW type it was possible that a string representation of this data doesn't exist so, just to be safe, I did simple select to get the first rows from the source (using sqoop-eval) and somehow sqoop can display this column as a string.
I then thought that this could be an encoding problem so I tried this:
df.selectExpr("decode(COLUMN,'utf-8')").show
With utf-8 and a bunch of other encodings. But again all I got was random characters.
Does anyone know how can I do this data type conversion?

Cassandra 2.2.11 add new map column from text column

Let's say I have table with 2 columns
primary key: id - type varchar
and non-primary-key: data - type text
Data column consist only of json values for example like:
{
"name":"John",
"age":30
}
I know that i can not alter this column to map type but maybe i can add new map column with values from data column or maybe you have some other idea?
What can i do about it ? I want to get map column in this table with values from data
You might want to make use of the CQL COPY command to export all your data to a CSV file.
Then alter your table and create a new column of type map.
Convert the exported data to another file containing UPDATE statements where you only update the newly created column with values converted from JSON to a map. For conversion use a tool or language of your choice (be it bash, python, perl or whatever).
BTW be aware, that with map you specify what data type is your map's key and what data type is your map's value. So you will most probably be limited to use strings only if you want to be generic, i.e. a map<text, text>. Consider whether this is appropriate for your use case.

SQLite 3 CSV Import to Table

I am using this as a resource to get me started - http://www.pantz.org/software/sqlite/sqlite_commands_and_general_usage.html
Currently I am working on creating an AIR program making use of the built in SQLite database. I could be considered a complete noob in making SQL queries.
table column types
I have a rather large excel file (14K rows) that I have exported to a CSV file. It has 65 columns of varying data types (mostly ints, floats and short strings, MAYBE a few bools). I have no idea about the proper form of importing so as to preserve the column structure nor do I know the best data formats to choose per db column. I could use some input on this.
table creation utils
Is there a util that can read an XLS file and based on the column headers, generate a quick query statement to ease the pain of making the query manually? I saw this post but it seems geared towards a preexisting CSV file and makes use of python (something I am also a noob at)
Thank you in advance for your time.
J
SQLite3's column types basically boil down to:
TEXT
NUMERIC (REAL, FLOAT)
INTEGER (the various lengths of integer; but INT will normally do)
BLOB (binary objects)
Generally in a CSV file you will encounter strings (TEXT), decimal numbers (FLOAT), and integers (INT). If performance isn't critical, those are pretty much the only three column types you need. (CHAR(80) is smaller on disk than TEXT but for a few thousand rows it's not so much of an issue.)
As far as putting data into the columns is concerned, SQLite3 uses type coercion to convert the input data type to the column type whereever the conversion makes sense. So all you have to do is specify the correct column type, and SQLite will take care of storing it in the correct way.
For example the number -1230.00, the string "-1230.00", and the string "-1.23e3" will all coerce to the number 1230 when stored in a FLOAT column.
Note that if SQLite3 can't apply a meaningful type conversion, it will just store the original data without attempting to convert it at all. SQLite3 is quite happy to insert "Hello World!" into a FLOAT column. This is usually a Bad Thing.
See the SQLite3 documentation on column types and conversion for gems such as:
Type Affinity
In order to maximize compatibility between SQLite and other database
engines, SQLite supports the concept of "type affinity" on columns.
The type affinity of a column is the recommended type for data stored
in that column. The important idea here is that the type is
recommended, not required. Any column can still store any type of
data. It is just that some columns, given the choice, will prefer to
use one storage class over another. The preferred storage class for a
column is called its "affinity".

Resources