Description
In MySQL world, one can create an SQL file and execute it from the command line.
mysql -h hostname -u user database < path/to/sqlfile.sql
This is especially useful for test data.
I've checked the Arangodb documentation and the best I can see for putting in test data is this from the cookbook
https://docs.arangodb.com/3.2/Cookbook/AQL/CreatingTestData.html
Please is it possible to write an aql file and execute it in the command line like with MySQL?
In contrast to mysql SQL which contains DML and DDL language elements, AQL by definition only contains DML statements. Therefore "executing AQL" is most probably not what suits your needs.
The arangosh can be used to read files from the filesystem, which you then could use to send AQL queries or creating collections and indices.
If you want a reproduceable state, you could create a set of data that you like maybe by creating a CSV to read using arangoimp, create indices and so on, then dump your database using arangodump and for setting up your SUT use arangorestore.
There are tools by community members, Migrant Verde and ArangoMiGO that enable you to do schema evolution etc which may be of interest for that purpose too.
Related
I want to write the data from a PySpark DataFrame to external databases, say an Azure MySQL database. So far, I have managed to do this using .write.jdbc(),
spark_df.write.jdbc(url=mysql_url, table=mysql_table, mode="append", properties={"user":mysql_user, "password": mysql_password, "driver": "com.mysql.cj.jdbc.Driver" })
Here, if I am not mistaken, the only options available for mode are append and overwrite, however, I want to have more control over how the data is written. For example, I want to be able to perform update and delete operations.
How can I do this? Is it possible to say, write SQL queries to write data to the external databases? If so, please give me an example.
First I suggest you use the specific Azure SQL connector. https://learn.microsoft.com/en-us/azure/azure-sql/database/spark-connector.
Then I recommend you use bulk mode as row by row mode is slow, and can incur unexpected charges if you have log analytics turned on.
Lastly, for any kind of data transformation, you should use an ELT pattern:
Load raw data into an empty staging table
Run SQL code, or even better, a stored procedure which performs required logic (for example merging into a final table) run DML such as a stored proc
I have explored bit on cassandra stress tool using yaml file and it is working fine. I just wanted to know is there anyway by which we can specify the location of any external csv file in yaml profile to insert data into Cassandra table using cassandra stress?
So instead of random data i wanted to see the cassandra stres test result on specific dataload on this data model?
Standard cassandra-stress doesn't have such functionality, but you can use the NoSQLBench tool that was recently open sourced by DataStax. It also uses YAML to describe workloads, but it's much more flexible, and has a number of functions for sampling data from CSV files.
P.S. there is also a separate Slack workspace for this project (to get invite, fill this form)
I am trying to learn SQLyog Job Agent (SJA).
I am on a Linux machine, and use SJA from within a bash script by a line command: ./sja myschema.xml
I need to sync an almost 100 tables db and its local clone.
Since a single table stores a few config data, which I do not wish to sync, it seems I need to write a myschema.xml where I list all the remaining 99 tables.
Question is: is there a way to write to sync all the table but a single one?
I hope my question is clear. I appreciate your help.
If you are using the latest version of sqlyog:You are given the table below, and the option to generate an xml job file at the end of the database syncronisation wizard reflecting the operation you've opted to perform. This will in effect list the other 99 tables in the xml file itself for you, but it will give you what you are looking for, and I dont think you would be doing anything in particular with an individual table, since you are specifying all tables in a single element.
Is it possible to run more than one Cassandra query from a single Cassandra file?
So that if I share that file, the others can run it to replicate the database in all systems
The easiest way is to pass the file containing CQL statements to either cqlsh (using the -f option) or using DevCenter
If you are using Java, the Achilles framework has a class called ScriptExecutor that you can use to run CQL statements from a file and even plug in parameters to dynamically change the statements during execution.
ScriptExecutor documentation
I'm a total newb when it comes to MongoDB, but I do have previous experience with nosql stores like Hbase and Accumulo. When I used these other nosql platforms, I ended up writing my own data ingest frameworks (typically in java) do perform ETL like functions, plus inline enrichment.
I haven't found a tool that has similar functionality for Mongo, but maybe I'm missing it.
To date I have a Logstash instance and collects logs from multiple sources and saves them to disk as JSON. I know there is a mongodb output plugin for Logstash, but it doesn't have any options for configuring how the records should be indexed (i.e. aggregate documents, etc).
For my needs, I would like to create multiple aggregated documents for each event that arrives via Logstash -- which requires some preprocessing and specific inserts into Mongo.
Bottom line -- before I go build ingest tooling (probably in python, or node) -- is there something that exists already?
Try node-datapumps, an etl tool for nodejs. Just fill the input buffer from JSON objects, enrich data in .process() and use a mongo mixin to write to mongodb.
Pentaho ETL have good support of Mongodb functionnality.
You can have a look at http://community.pentaho.com/projects/data-integration/
http://wiki.pentaho.com/display/EAI/MongoDB+Output
I just found one ETL tool Talend Open Studio, it has support for many file formats . I just uploaded multiple xml files on MongoDB using Talend. It also is backed by a Talend forum where many Q & A can be found.