Nifi-Loading XML data into Cassandra - cassandra

I am trying to insert XML data into Cassandra DB. Please can somebody suggest the flow in nifi. I have JMS on which I need to post messagedata and then consume & insert the data into Cassandra.

I'm not sure if you can directly ingest XML into Cassandra. However you could convert the XML to JSON using the TransformXml processor (and this XSLT), or as of NiFi 1.2.0, you can use ConvertRecord by specifying the input and output schemas.
If there are multiple XML records per flow file and you need one CQL statement per record, you may need SplitJson or SplitRecord after the XML-to-JSON conversion has taken place.
Then you can use ReplaceText to form a CQL statement to insert the JSON, then PutCassandraQL to push to Cassandra. Alternatively you can use CQL map syntax to insert into a map field, etc.

Related

How to read hive managed table data using spark?

I am able to read hive external table using spark-shell but, when I try to read data from hive managed table it only shows column names.
Please find queries here:
Could you please try using database name as well along with table name?
sql(select * from db_name.test_managed)
If still result is same, request you to please share output of describe formatted for both the tables.

How to convert user natural query into SQL query?

I am trying to build a chatbot in Rasa/Dialogflow, the problem i ham facing is to convert English to SQL query so that what user write in English can be converted into SQL fetch data from MYSQL database and display result to use.
Can someone suggest me how to do it?
Ideally this is only possible through solutions like SEQ2SQL(Link here for reference).
But I implemented it in a workaround fashion:-
I got the json using tracker.latest_message .
After which I processed the json to make our own structured json like:
[{'column_name':'a',
'operator': '=',
'value':'100'},
{'column_name':'b',
'operator': '>',
'value':'100'}]
Above structure was used to form the where clause of the query.
Same way I made a custom json for Select Part as well :-
[{sum:column1},{count:column2}]
5.Then I looped through the json I had created and made our queries.
Note:- This json Structure will not be able to cover all possible scenarios but worked decently for me.

How to overwrite data with PySpark's JDBC without losing schema?

I have a DataFrame that I'm willing to write it to a PostgreSQL database. If I simply use the "overwrite" mode, like:
df.write.jdbc(url=DATABASE_URL, table=DATABASE_TABLE, mode="overwrite", properties=DATABASE_PROPERTIES)
The table is recreated and the data is saved. But the problem is that I'd like to keep the PRIMARY KEY and Indexes in the table. So, I'd like to either overwrite only the data, keeping the table schema or to add the primary key constraint and indexes afterward. Can either one be done with PySpark? Or do I need to connect to the PostgreSQL and execute the commands to add the indexes myself?
The default behavior for mode="overwrite" is to first delete the table, then recreate it with the new data. You can instead truncate the data by including option("truncate", "true") and then push your own:
df.write.option("truncate", "true").jdbc(url=DATABASE_URL, table=DATABASE_TABLE, mode="overwrite", properties=DATABASE_PROPERTIES)
This way, you are not recreating the table so it shouldn't make any modifications to your schema.

Schema crawler reading data from table

I understood we can read data from a table using command in Schema crawler.
How to do that programatically in java. I could see example to read schema , table etc. But how to get data?
Thanks in advance.
SchemaCrawler allows you to obtain database metadata, including result set metadata. Standard JDBC provides you a way to get data by using java.sql.ResultSet, and you can use SchemaCrawler for obtainting result set metadata using schemacrawler.utility.SchemaCrawlerUtility.getResultColumns(ResultSet).
Sualeh Fatehi, SchemaCrawler

How do I store the contents of an array in tablestorage

I need to store the contents of an array into Azure tablestorage. The array will have between 0 and 100 entries. I don't want to have to create 100 different elements so is there a way I can pack up the array, store it and unpack it later. Any examples would be much appreciated. I just don't know where to start :-(
You need to serialize the array into binary or xml and then use the appropriate column type to store the data (binary object or xml.)
XML will be the most flexible because you can still query the values while they are in storage. (You can't query binary data. Not easily anyway.) Here is an example of serializing and here is one for inserting the value into a table.
Some detail on XML support in Azure:
The xml Data Type
SQL Azure Database supports xml data
type that stores XML data. You can
store xml instances in a column or in
a variable of the xml type.
Support for XML Data Modification
Language
The XML data modification language
(XML DML) is an extension of the
XQuery language. The XML DML adds the
following case-sensitive keywords to
XQuery and they are supported in SQL
Azure Database:
insert (XML DML)
delete (XML DML)
replace value of (XML DML)
Support for xml Data Type Methods
You can use the xml data type methods
to query an XML instance stored in a
variable or column of the xml type.
SQL Azure Database supports the
following xml data type methods:
query() Method (xml data type)
value() Method (xml data type)
exist() Method (xml data type)
modify() Method (xml data type)
nodes() Method (xml data type)
If you really are starting out in Azure Table Storage, then there are a few nice "simple" tutorials around - e.g. http://blogs.msdn.com/b/jnak/archive/2008/10/28/walkthrough-simple-table-storage.aspx
Once you are happy with reading/writing entities then there are several ways you can map your array to Table Storage.
If you ever want to access each element of your array separately from the persistent storage, then you should create 0 to 99 separate entities - each with their own entity in the Table store.
If you don't ever want to access them separately, then you can just store the array in a single entity (row) in the table - e.g. using PartitionKey="MyArrays", RowKey="" and having another column which contains the array serialised to e.g. JSON.
As a variation on 2, you could also store the array items - 0 to 99 - in separate columns ("Array_0",..."Array_99") in the row. There are ways you could map this to a nice C# Array property using the Reading/Writing events on the table storage entity - but this might not be the best place to start if you're beginning with Azure.
Be careful, besides the 1MB entity limit there is a per field limit as well (I think it's 64kb)
Your best bet is to use the Lokad Fat Entity
http://code.google.com/p/lokad-cloud/wiki/FatEntities

Resources