Cassandra COPY FROM file pattern gives error - cassandra

My cassandra version: 2.0.17.
I am following this https://www.datastax.com/dev/blog/new-features-in-cqlsh-copy post to copy all of my csv files placed in a folder to a Cassandra table. But it shows me error saying No such file or directory.
When I try to copy individual files using below command it works very well:
COPY table FROM '/home/folder1/a.csv' WITH DELIMITER=',' AND HEADER=FALSE;
There are multiple csv files in /home/folder1 location. So I tried to copy all the csv files in a single go using below command:
COPY table FROM '/home/folder1/*.csv' WITH DELIMITER=',' AND HEADER=FALSE;
When I run the above command it gives me below error:
Can't open '/home/folder1/*.csv' for reading: [Errno 2] No such file or directory: '/home/folder1/*.csv'
Please help to solve this issue.

The blog post says
We will review these new features in this post; they will be available in the following cassandra releases: 2.1.13, 2.2.5, 3.0.3 and 3.2.
So the 2.0.17 doesn't have this functionality. If you want to load all .csv files from directory, just use:
for i in /home/folder1/*.csv ; do
echo "COPY table FROM '$i' WITH DELIMITER=',' AND HEADER=FALSE;"|cqlsh -f -
done

Related

Writing file denied

I am getting an error writing a file, that is driving me crazy.
I have an C# netcore 5 application running on RH Linux.
I mounted an shared folder (windows) using: sudo mount -t cifs -o username=MyDomainUsername,password=MyDomainUsernamePassword,domain=MyDomain,dir_mode=0777,file_mode=0777 //ipv4_from_destination/Reports /fileshare/Reports
Then I run the app, using just ./WebApi --urls=http://+:8060
The read/write test executes the following steps:
Create a text file.
Write the text file.
Delete de text file.
Creates a directory
Creates a text file inside that directory
Writes the text file
Deletes the text file
Deletes the directory.
Now the problem:
The text file is created
The write operation fails.
Where goes part of the log:
Creating file: /fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp...Ok
Writing file: /fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp...[16:22:20 ERR] ID:87988856-a765-4474-9ed9-2f04aef35771 PATH:/api/about ERROR:System.UnauthorizedAccessException:Access to the path '/fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp' is denied. TRACE: at System.IO.FileStream.WriteNative(ReadOnlySpan`1 source)
at System.IO.FileStream.FlushWriteBuffer()
at System.IO.FileStream.FlushInternalBuffer()
at System.IO.FileStream.Flush(Boolean flushToDisk)
at System.IO.FileStream.Flush()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.IO.StreamWriter.Flush()
at WebApi.Controllers.ApplicationController.TestFileSystem(String folder) in xxxxxxx\WebApi\Controllers\ApplicationController.cs:line 116
What I discovered so far:
I can create and delete the files and directories.
I cannot write to files.
Can someone give me an hint on this?
Solved using the cifs option nobrl

CentOS | error apache spark file already exists Sparkcontext

I am unable to write to the file which I create. In windows it's working fine. In centos it says file already exists and does not write anything.
File tempFile= new File("temp/tempfile.parquet");
tempFile.createNewFile();
parquetDataSet.write().parquet(tempFile.getAbsolutePath());
Following is the error: file already exists
2020-02-29 07:01:18.007 ERROR 1 --- [nio-8090-exec-1] c.gehc.odp.util.JsonToParquetConverter : Stack Trace: {}org.apache.spark.sql.AnalysisException: path file:/temp/myfile.parquet already exists.;
2020-02-29 07:01:18.007 ERROR 1 --- [nio-8090-exec-1] c.gehc.odp.util.JsonToParquetConverter : sparkcontext close
The default savemode in spark is ErrorIfExists. This means that if the file with the same filename you intend to write already exists, it will give an exception similar to the one you got above. This is happening in your case because you are creating the file yourself rather than leaving that task to spark. There are 2 ways in which you can resolve the situation:
1) You can either mention savemode as "overwrite" or "append" in the write command:
parquetDataSet.write.mode("overwrite").parquet(tempFile.getAbsolutePath());
2) Or, you can simply remove the create new file command and straightaway pass the destination path in your spark write command as follows:
parquetDataSet.write.parquet("temp/tempfile.parquet");

Delete files in the drop location

I am trying to delete the folder with name "repro" and its contents in my build drop location. I have configured my delete files steps as below
Source Folder: $(BuildDropLocation)\$(BuildNumber)\CTrest\lime
Contents:
**/repro/*
repro folder resides here
$(BuildDropLocation)\$(BuildNumber)\CTrest\lime\version\package\code**repro**..
Is there something that I am missing here?
Here is the doc for the command: Delete Files task. Examples of contents:
**/temp/* deletes all files in any sub-folder named temp.
**/temp* deletes any file or folder with a name that begins with temp.
I think, **/repro* will be more suitable in your case.

CSV file unable to upload

Trying to load CSV file while doing simple linear regression . When I try to run , the error is coming as - "File name" is not exist as file/directory . Do I need to save the file in a particular folder or directory ?
try to use the full qualified path, or in the same directory of the main programm

MLlib not saving the model data in Spark 2.1

We have a machine learning model that looks roughly like this:
sc = SparkContext(appName = "MLModel")
sqlCtx = SQLContext(sc)
df = sqlCtx.createDataFrame(data_res_promo)
#where data_res promo comes from a pandas dataframe
indexer = StringIndexer(inputCol="Fecha_Code", outputCol="Fecha_Index")
train_indexer = indexer.fit(df)
train_indexer.save('ALSIndexer') #This saves the indexer architecture
In my machine, when I run it as a local, it generates a folder ALSIndexer/ that has the parquet and all the information on the model.
When I run it in our Azure cluster of Spark, it does not generate the folder in the main node (nor in the slaves). However, if we try to rewrite it, it says:
cannot overwrite folder
Which means is somewhere, but we can't find it.
Would you have any pointers?
Spark will by default save files to the distributed filesystem (probably HDFS). The files will therefore not be visible on the nodes themselves but, as they are present, you get the "cannot overwrite folder" error message.
You can easily access the files through the HDFS to copy them to the main node. This can be done in the command line by one of these commands:
1.hadoop fs -get <HDFS file path> <Local system directory path>
2.hadoop fs -copyToLocal <HDFS file path> <Local system directory path>
It can also be done by importing the org.apache.hadoop.fs.FileSystem and utilize the commands available there.

Resources