Inquiry about opening a FIleChannel - io

Why opening a FileChannel in the following way:
FileChannel.open(path,StandardOpenOption.READ,StandardOpenOption.APPEND);
gives an exception?
I know that it's specified by the API. However I would like to know why it's allowed with the combination of READ, WRITE and it is not with READ and APPEND.
Thanks in advance.

Because it doesn't mean anything. You can't append to a file while only reading from it.

Related

Reading whole directory with Spark except one file

I have the following directory containing these CSV files:
/data/one.csv
/data/two.csv
/data/three.csv
/data/four.csv
If I want to read everything, I can simply do:
/data/*.csv
but I can not seem to read everything, except four.csv.
This:
/data/*[^four]*.csv
seemed to work but I think that if the list of files would be bigger, than this way of reading would probably be wrong (because of double wildcards).
Is there a good way to do this? I also am aware that:
/data/{one,two,three,^four}.csv
would solve this specific case, but I need the except method for future needs.
Thank you very much!
I am not 100% sure that this method will work, but you can try. You can use Bash/Python or whatever script to scan all the csv files in the folder, but not with the names four.csv.
The input for spark will be (assuming you have files: one.csv, two.csv, three.csv, four.csv, five.csv ,...up to n.csv files)
PathToFiles=[/data/one.csv, /data/two.csv, /data/three.csv, /data/five.csv, ..., /data/n.csv]
Then you can use (the code is in python)
filesRDD = spark.sparkContext.wholeTextFiles(",".join(PathToFiles))
I have wrote similar code in java, in my impression, it works and you can try.

python 3.x readlines()

After a pause I've currently started to work with python again but right at the start I encountered an annoying (and at least for me not solvable...) problem. I want to open a normal .txt file with tabular content so I can iterate over specific 'columns' to gather all the information I need. The problem is that I don't get each line of the document as a list but instead python creates strings of each line.I also tried .readlines() but thats doesn't work either.
I work on a Win7 PC and the code goes as followed:
with open('C:\\filepath...\\file.txt') as file:
for f in file:
print(f[0])
I also have to add that I also worked with python in the past and never encountered such problem so if anyone knows a solution I would really appreciate some help. Thank you in advance.
you just need to split:
TheList = []
with open('C:\\filepath...\\file.txt') as file:
for f in file:
TheList.append(f.split('\t'))

How to force file processing on one node without splitting?

Is it possible to force file processing on one node without splitting? I tried to use AtomicFileProcessing set on true, but it doesn't work.
Setting
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
should work. Can you please contact me directly (mrys at msft) and provide more information on what does not, so we can investigate?
In one of my job, I am using custom user defined operators for data processing, and it does not split data. You can also try this option.

Writing PDF binary file from stream yields malformed PDF

Dear Stack Overflow users,
I would appreciate you kind help with the following problem:
We have an Apache server functioning as a forward proxy, with ext_filter configured: whenever the response is of MIME type PDF, the filter is called (a perl script), and the PDF's content may be read from the STDIN. We read the PDF from STDIN, write it to a file and that's all. This almost always work well, but on one specific website, the PDF is malformed when written in the following way:
my $input_file = shift;
binmode STDIN;
open(OUT, ">" . $input_file);
binmode OUT;
foreach my $line (<STDIN>){
print OUT $line;
}
close OUT;
If we instead call 'tee' (set the filter to use 'tee')- the file is written correctly. Analyzing the malformed PDF shows that the xref table is malformed in the PDF we write and Adobe Reader fails to open it. We have already tried using sysopen,sysread etc. , using ":raw", and several other ways to write a binary file properly, and nothing worked (cut&paste code from documnetation for writing binary files). Only when using the 'tee' utility in linux as the filter, it was written correctly. This doesn't help us- we need to be able to write it to a file from stdin as part of the perl script. Any suggestions? If there could be a way to somehow call 'tee' with a system call, and give it STDIN of the perl program- it might could work. Many thanks in advance.
Well, although the code was basiclly correct, putting it inside "eval" somehow ruined thd PDF.
I still don't understand why, but deleting the eval solved the problem.
The perl is called from a context of ext_filter module of Apache.
I'll farther investigate this and update when I'll find an explanation for this.
Thanks for everyone.

What is the standard way to handle users opening incorrect file types?

I hope my Q was clear ... I am curious about the typical way to code for someone clicking File|Open, and selecting a file that is inappropriate for the program--like someone using a word processing program and trying to open a binary file.
In my case, my files have multiple streams streamed together. I'm unsure how to have the code validate whether an improper file was selected before the app throws a stream read exception. (Or is the way to handle the situation to just write code to catch a stream read exception?)
Thanks, as always.
I think it's quite usual that you have code that just tries to open the file, and if it fails, an error is shown to the user. Most file formats has some kind of header with a "magic number", so that the reader can tell if it's not the right file very quickly after reading the first few bytes of the file.
Magic number at the start of the file generally helps -- if you have control of the file format.
Otherwise, yeah -- catch the exception and put up a dialog.

Resources