Compressing sequence file in Spark?

Compressing sequence file in Spark? - apache-spark

I'm trying to save RDD as compressed Sequence file. I'm able to save non-compressed file be calling:
counts.saveAsSequenceFile(output)
where counts is my RDD (IntWritable, Text). However, I didn't manage to compress output. I tried several configurations and always got exception:
counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.SnappyCodec])
<console>:21: error: type mismatch;
found : Class[org.apache.hadoop.io.compress.SnappyCodec](classOf[org.apache.hadoop.io.compress.SnappyCodec])
required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]]
counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.SnappyCodec])
counts.saveAsSequenceFile(output, classOf[org.apache.spark.io.SnappyCompressionCodec])
<console>:21: error: type mismatch;
found : Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec])
required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]]
counts.saveAsSequenceFile(output, classOf[org.apache.spark.io.SnappyCompressionCodec])
and it doesn't work even for Gzip:
counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.GzipCodec])
<console>:21: error: type mismatch;
found : Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec])
required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]]
counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.GzipCodec])
Could you please suggest solution? also, I didn't find how is it possible to specify compression parameters (i.e. compression type for Snappy)

The signature of saveAsSequenceFile is def saveAsSequenceFile(path: String, codec: Option[Class[_ <: CompressionCodec]] = None). You need to send a Option[Class[_ <: CompressionCodec]] as codec. E.g.,
counts.saveAsSequenceFile(output, Some(classOf[org.apache.hadoop.io.compress.SnappyCodec]))
If you read the error information of type mismatch carefully, you should have corrected it by yourself.

Related

Use *.pth model in C++

I want to run inference in C++ using a yolo3 model I trained with pytorch. I am unable to make the conversions using tracing and scripting provided by pytorch. I have this error during conversion
First diverging operator:
Node diff:
- %2 : __torch__.torch.nn.modules.container.ModuleList = prim::GetAttr[name="module_list"](%self.1)
+ %2 : __torch__.torch.nn.modules.container.___torch_mangle_139.ModuleList = prim::GetAttr[name="module_list"](%self.1)
? ++++++++++++++++++++
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
Node:
%358 : Tensor = prim::Constant[value=<Tensor>](), scope: __module.module_list.16.yolo_16

Providing an Integer parameter to the magic command fs

Trying to implement the below command but there is a type mismatch.
%fs head dbfs:/databricks-datasets/README.md 6000
Error:
notebook:1: error: type mismatch;
found : String("6000")
required: Int
println(dbutils.fs.head("dbfs:/databricks-datasets/README.md", "6000")) // SAFE COMMAND FROM MACRO
Is there no way that I can provide integer parameters at the magic command level?

There is a way. This works:
%fs head dbfs:/databricks-datasets/README.md --maxBytes=6000

Folium library error in choropleth

Am using folium library with an open data set from kaggle,
map.choropleth(geo_path=country_geo, data=plot_data,
columns=['CountryCode', 'Value'],
key_on='feature.id',
fill_color='YlGnBu', fill_opacity=0.7, line_opacity=0.2,
legend_name=hist_indicator
)
The above part of the code is giving me the following error:
TypeError: choropleth() got an unexpected keyword argument 'geo_path'
When I replace geo_path with geo_data I get this error:
JSONDecodeError: Expecting value: line 7 column 1 (char 6)

Is the issue related to "UCSanDiegoX: DSE200x Python for Data Science"? I took the advice of Cody and rename geo_path to geo_data at the specifications of map.choropleth.
At the git hub repository, take care of using the RAW data, which is in fact a file structured with the format GeoJSON. The first two lines should start like the code provided below
{"type":"FeatureCollection","features":[
{"type":"Feature","properties":{"name":"Afghanistan"},"geometry":
{"type":"Polygon","coordinates":[[[61.210817,35.650072],.....

geo_path doesn't work because it is not a parameter for choropleth. You are correct in replacing it with geo_data.
Your second error is likely due to non-existent or incorrectly formatted geojson file.
From http://python-visualization.github.io/folium/docs-master/modules.html?highlight=chor# your argument for geo_data needs to be a "URL, file path, or data (json, dict, geopandas, etc) to your GeoJSON geometries".
GeoJSON formatted files follow this structure from geojson.org:
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}

How to uri-encode unicode strings in Racket

the following code gives me an error:
(uri-encode "Kidô senkan Nadeshiko")
which is,
vector-ref: index is out of range
index: 244
valid range: [0, 127]
vector: '#("%00" "%01" "%02" "%03" "%04" "%05" "%06" "%07" "%08" "%09" "%0A" "%0B" "%0C" "%0D" "%0E" "%0F" "%10" "%11" "%12" "%13" "%...
context...:
/usr/lib/racket/collects/net/uri-codec.rkt:197:6: for-loop
/usr/lib/racket/collects/net/uri-codec.rkt:195:0: encode
/usr/lib/racket/collects/racket/private/misc.rkt:87:7
I guess uri-encode and uri-decode only support ASCII which I can infer from the source of some tests, here
So my question is, is there a library on github or elsewhere that will uri encode unicode strings properly? Or do I have to roll my own?

It might have something to do with the way you're running the program, or the version of Racket in use. I tested this in Racket 5.2.1 and it works for me:
#lang racket
(require net/uri-codec)
(uri-encode "Kidô senkan Nadeshiko")
=> "Kid%C3%B4%20senkan%20Nadeshiko"

Add data into prolog with text

?-dynamic(setup/5).
setup :-
seeing(S),
see('people.txt'),
read_data,
write('data read'),
nl,
seen,
see(S).
read_data :-
read(A),
process(A).
process(A) :- A == end_of_file.
process(A) :-
A \== end_of_file,
write('1'),
read(B),
read(C),
read(D),
read(E),
assertz(person(A,B,C,D,E)),
read_data.
and the text are
john.will.30.london.doctor.
martha.will.33.portsea.doctor.
henry.smith.26.manchester.doctor.
the result is coming out
?- setup.
* Syntax Error
* Syntax Error
* Syntax Error
* Syntax Error
* Syntax Error
data read
yes
What happens? What did I do wrong?

You are reading with read/1 which expects valid Prolog text as input. However, your data is
john.will.30.london.doctor.
which is invalid. Write something like
person(john,will,30,london,doctor).
instead. Most often, people do not read in such data manually. Instead, they load the file with ['datafile.pl'] or other commands.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Compressing sequence file in Spark? - apache-spark

Related

Use *.pth model in C++

Providing an Integer parameter to the magic command fs

Folium library error in choropleth

How to uri-encode unicode strings in Racket

Add data into prolog with text

Categories

Resources