Ingest Multiple CSV linked files in SOLR

Ingest Multiple CSV linked files in SOLR - search

I am new to SOLR, My problem is to link multiple CSV files linked together via single field in SOLR.
I have indexed a file of more than 5GB from CSV containing more that 250 fieds (one field taxonomyid) in document and querying it successfully, now i have to add one more CSV file having fields (taxonomyid, taxonomyvalue, description) and link with the already indexed CSV file with the field taxonomoyid.Kindly help me with the direction for what should i go for in SOLR R&D.

Related

Is it possible to create a view from external data?

I have some csv files in my data lake which are being quite frequently updated through another process. Ideally I would like to be able to query these files through spark-sql, without having to run an equally frequent batch process to load all the new files into a spark table.
Looking at the documentation, I'm unsure as all the examples show views that query existing tables or other views, rather than loose files stored in a data lake.

You can do something like this if your csv is in S3 under the location s3://bucket/folder:
spark.sql(
"""
CREATE TABLE test2
(a string, b string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
LOCATION 's3://bucket/folder'
"""
)
You have to adapt the fields tho and the field separators.
To test it, you can first run:
Seq(("1","a"), ("2","b"), ("3","a"), ("4","b")).toDF("num", "char").repartition(1).write.mode("overwrite").csv("s3://bucket/folder")

Create list of files in Azure Storage and send it to sql table using ADF

I need to copy file names of excel files that are in my Azure Storage as blobs and then put these names in the SQL Server table using ADF. It can be a file path as a name of a file but the hardest thing is that in the dataset which takes all the files from one specific folder I have to select a sheet name and these sheet names are different for each file, therefore it returns an error. Is there a way to create a collective dataset without indicating the sheet name?

So, if I understand your question correctly you are looking for a way to write all Excel filenames to a SQL Database using ADF.
You can use the generic Get Metadata activity and use a binary dataset as source. Select Child items as an field to retrieve. This will retrieve all files in the folder. Then add a filter to only select the Excel file types.
Hope that this gets you on the right track.

Azure search indexing csv files

When I try to add an indexer to index csv file from blob storage, it automatically picks some of the fields from the csv file to be added as field name in the index schema but leaves some of the fields. That is probably because the left out fields have spaces or characters that are not allowed as azure search fields. Is there any way I can index all of the fields defined in the csv without changing the file itself?

Since some column names are invalid index field names, they'll need to be explicitly renamed and the indexer has to be informed of the associations.
Create the index with valid field names corresponding to each column you're interested in (https://learn.microsoft.com/en-us/rest/api/searchservice/create-index)
Create the data source to the storage account containing the CSV files (https://learn.microsoft.com/en-us/rest/api/searchservice/create-data-source)
Create an indexer using the above data source and index, and also setting the fieldMappings. (https://learn.microsoft.com/en-us/rest/api/searchservice/create-indexer). See here for sample JSON to rename fields.

Insert records from CSV file into Oracle, with NodeJs

I have a concern working with Oracle, TypeOrm, and NodeJs.
How can I input 1 million records found within a csv file?
The point is that I have to load the content of an xlsx file to a table in oracle, this file has around 1 million data or more.
The way I was doing this task was, converting from xlsx to json and from json, and that array save to database, but it was taking too long.
So now I transform to CSV, but how can I insert all the records from the CSV file into the oracle table?
I am using TypeOrm for the connection between Oracle and NodeJs

Download SQL results in CSV and PDF

I want to download the results fetched from SQL query in Excel(CSV) and PDF format.
SQL Query
$sql = 'SELECT name, address, phone, city FROM users';
I want this records in CSV and PDF. When user clicks Excel then the Excel file will be downloaded and when clicks PDF then the PDF file will be downloaded.
please help me to do this. Thanks

First, you need to convert the results of the SQL into the format required for each. .CSV is relatively simple as you just need a comma separated list of results and a row separator (\n will work just nicely).
You'll want to set the content-type of the file before returning the result:
header('Content-Type:text/csv');
// code to output sql in csv format here
PDF will require some kind of library like TCPDF: https://tcpdf.org/examples/example_011/
The premise remains the samde though: Make your query then use the results to add rows to the table using the PDF library.

If you are looking for a long term solution, the best option would be to use SQL Server Reporting Services.
The user will have the option to download the result-sets in all the formats you mentioned.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Ingest Multiple CSV linked files in SOLR - search

Related

Is it possible to create a view from external data?

Create list of files in Azure Storage and send it to sql table using ADF

Azure search indexing csv files

Insert records from CSV file into Oracle, with NodeJs

Download SQL results in CSV and PDF

Categories

Resources