meteorjs (CollectionFS) read the files source - node.js

I am using CollectionFS for my meteorjs Application and trying to upload a file and then read the source to do other stuff.
The holw list I want to do is the following:
Upload the file
Read out the source
Depends on the content I will delete or will not delete the file
Step 1 and 3 are no problem - but the second step I don't understand how to do. There are some Callbacks where I can "access" the file (like build a readStream and so on) but I do not understand how to use this to read out the hole source code.
In my case I upload a CSV file and after uploading it I want to analyse the content - but how to solve step 2 with using CollectionFS ?

Related

How to get LiveResponse library file history?

I uploaded files to LiveResponse library using https://github.com/MicrosoftDocs/microsoft-365-docs/blob/public/microsoft-365/security/defender-endpoint/live-response-library-methods.md API.
I overwrite specific file in mistake (and I have no local backup of this file), How can I recover the content of the file from the LiveResponse library before the overwrite?
When I get the list of the library files using GET /api/libraryfiles I got:
{"#odata.context":"https://api.securitycenter.microsoft.com/api/$metadata#LibraryFiles","value":[{"fileName":"script.ps1","sha256":"8b0....f1c63abad95d6bda","description":"1","creationTime":"2022-08-08T15:45:05.170374Z","lastUpdatedTime":"2022-08-08T15:45:05.170374Z","createdBy":"MyUser","hasParameters":false,"parametersDescription":null}
...
And I dont see any indication about the file history etc.

Snowpipe doesn't load the files after error has been rectified

I am using snowpipe to load files from S3 bucket. It worked well for 2 files.
But then to check out how snowpipe works when there is any error occur in between file loading, I intentionally changed file format ( changed delimiter to '|' whereas file is CSV ) so that COPY command will not work. And uploaded 3rd CSV file on S3. But it was not loaded due to file format error. It was perfect till this time.
Later I recreated file format with correct delimiter i.e. ',' but since notification was already sent for 3rd file, it did not loaded in table. So I uploaded 4th csv file and it got loaded successfully. So my questions is how to take care of loading of 3rd file for which event notification was generated while file format was wrong.
Let me know if any more details are required.

Reading GeoJSON in databricks, no mount point set

We have recently made changes to how we connect to ADLS from Databricks which have removed mount points that were previously established within the environment. We are using databricks to find points in polygons, as laid out in the databricks blog here: https://databricks.com/blog/2019/12/05/processing-geospatial-data-at-scale-with-databricks.html
Previously, a chunk of code read in a GeoJSON file from ADLS into the notebook and then projected it to the cluster(s):
nights = gpd.read_file("/dbfs/mnt/X/X/GeoSpatial/Hex_Nights_400Buffer.geojson")
a_nights = sc.broadcast(nights)
However, the new changes that have been made have removed the mount point and we are now reading files in using the string:
"wasbs://Z#Y.blob.core.windows.net/X/Personnel/*.csv"
This works fine for CSV and Parquet files, but will not load a GeoJSON! When we try this, we get an error saying "File not found". We have checked and the file is still within ADLS.
We then tried to copy the file temporarily to "dbfs" which was the only way we had managed to read files previously, as follows:
dbutils.fs.cp("wasbs://Z#Y.blob.core.windows.net/X/GeoSpatial/Nights_new.geojson", "/dbfs/tmp/temp_nights")
nights = gpd.read_file(filename="/dbfs/tmp/temp_nights")
dbutils.fs.rm("/dbfs/tmp/temp_nights")
a_nights = sc.broadcast(nights)
This works fine on the first use within the code, but then a second GeoJSON run immediately after (which we tried to write to temp_days) fails at the gpd.read_file stage, saying file not found! We have checked with dbutils.fs.ls() and can see the file in the temp location.
So some questions for you kind folks:
Why were we previously having to use "/dbfs/" when reading in GeoJSON but not csv files, pre-changes to our environment?
What is the correct way to read in GeoJSON files into databricks without a mount point set?
Why does our process fail upon trying to read the second created temp GeoJSON file?
Thanks in advance for any assistance - very new to Databricks...!
Pandas uses the local file API for accessing files, and you accessed files on DBFS via /dbfs that provides that local file API. In your specific case, the problem is that even if you use dbutils.fs.cp, you didn't specify that you want to copy file locally, and it's by default was copied onto DBFS with path /dbfs/tmp/temp_nights (actually it's dbfs:/dbfs/tmp/temp_nights), and as result local file API doesn't see it - you will need to use /dbfs/dbfs/tmp/temp_nights instead, or copy file into /tmp/temp_nights.
But the better way would be to copy file locally - you just need to specify that destination is local - that's done with file:// prefix, like this:
dbutils.fs.cp("wasbs://Z#Y.blob.core.windows.net/...Nights_new.geojson",
"file:///tmp/temp_nights")
and then read file from /tmp/temp_nights:
nights = gpd.read_file(filename="/tmp/temp_nights")

How to append files in GCS with the same schema?

Is there any way one can append two files in GCS, suppose file one is a full
load and second file is an incremental load. Then what's the way we can append
the two?
Secondly, using gsutil compose will append the two files including the attributes
names as well. So, in the final file I want the data of the two files.
You can append two separate files using compose in the Google Cloud Shell and rename the output file as the first file, like this:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/obj1
This command is meant for parallel uploads in which you divide a large object file in smaller objects. They get uploaded to Google Cloud Storage and then you can append them to get the original file. You can find more information on Composite Objects and Parallel Uploads.
I've come up with two possible solutions:
Google Cloud Function solution
The option I would go for is using a Cloud Function. Doing something like the following:
Create an empty bucket like append_bucket.
Upload the first file.
Create a Cloud Function to be triggered by new uploaded files on the
bucket.
Upload the second file.
Read the first and the second file (you will have to download them as string first).
Make the append operation.
Upload the result to the bucket.
Google Dataflow solution
You can also do it with Dataflow for BigQuery (keep in mind it’s still in beta).
Create a BigQuery dataset and table.
Create a Dataflow instance, from the template Cloud Storage Text to BigQuery.
Create a Javascript file with the logic to transform the text.
Upload your files in Json format to the bucket.
Dataflow will read the Json file, execute the Javascript code and append the new data to the BigQuery dataset.
At last, export the BigQuery query result to Cloud Storage.

Line 2 ERROR The file NuxeoCSV-USERDOC.pdf does not exist

When i want to add an attachement(csv) to a file using the addon nuxeo csv import. I got this issue:
Line 2 ERROR The file NuxeoCSV-USERDOC.pdf does not exist
This is the csv file :
name,"type","dc:title","dc:description","file:content","dc:nature","dc:source"
nuxeo-csv-userdoc,"File","Nuxeo CSV User documentation","This is the user guide for Nuxeo CSV","NuxeoCSV-USERDOC.pdf","procedure","http://doc.nuxeo.com"
Nuxeo-csv-sample-3,"File","Nuxeo CSV Sample","This a second file imported with Nuxeo CSV","Nuxeo-csv-sample-3.odt","article","http://doc.nuxeo.com"
It's demanded to make some changes in the file conf but I don't get the last line. How I'm supposed to add the path and how can I add nuxeo.csv.blobs.folder, just by pasting it?
Configuration :
The Nuxeo CSV addon enables users to create file documents and upload their
main attachment at the same time. This requires to configure where the
server will take the attachments. This is done adding the parameter
nuxeo.csv.blobs.folder in the server nuxeo.conf and giving it a value that
is a local path to a folder that can be accessed by the server.
Thanks in advance.

Resources