Spark EC2 - Broken Pipe error - apache-spark

Getting following error while spinning Spark (1.6) cluster on EC2 with spark-ec2 script. I tried with --resume option but the error is consistant.
packet_write_wait: Connection to
<<host>>: Broken pipe Traceback (most recent call last): File "./spark_ec2.py", line 1535, in
<module>
main() File "./spark_ec2.py", line 1527,
in main real_main() File "./spark_ec2.py", line 1363,
in real_main setup_cluster(conn, master_nodes, slave_nodes, opts, True) File "./spark_ec2.py", line 811,
in setup_cluster dot_ssh_tar = ssh_read(master, opts,
['tar', 'c', '.ssh']) File "./spark_ec2.py", line 1209,
in ssh_read ssh_command(opts) + ['%s#%s' % (opts.user, host), stringify_command(command)]) File "./spark_ec2.py", line 1203,
in _check_output raise subprocess.CalledProcessError(retcode, cmd,
output=output)

Related

How to read .hql file (to run hive query) in pyspark

I have .hql file with huge amount of queries. It is running slow in hive. I want to read and run .hql file using pyspark/sparksql.
I tried count = sqlContext.sql(open("file.hql").read()).count() but gives the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/CDH-5.7.1-1.cdh5.7.1/lib/spark/python/pyspark/sql/context.py", line 580, in sql
return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/data/CDH-5.7.1-1.cdh5.7.1/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/data/CDH-5.7.1-1.cdh5.7.1/lib/spark/python/pyspark/sql/utils.py", line 51, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"missing EOF at ';' near 'db'; line 1 pos 36"

"RequirementParseError: Expected ',' or end-of-list in gitpython >=2.1.9<2.2 at <2.2"

I am trying to install "missinglink" using:
python -m pip install missinglink
However, I get the error:
Exception:
Traceback (most recent call last):
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2851, in _dep_map
return self.__dep_map
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2685, in __getattr__
raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\basecommand.py", line 209, in main
status = self.run(options, args)
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\commands\install.py", line 310, in run
wb.build(autobuilding=True)
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\wheel.py", line 748, in build
self.requirement_set.prepare_files(self.finder)
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\req\req_set.py", line 360, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\req\req_set.py", line 647, in _prepare_file
set(req_to_install.extras) - set(dist.extras)
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2810, in extras
return [dep for dep in self._dep_map if dep]
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2853, in _dep_map
self.__dep_map = self._compute_dependencies()
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2877, in _compute_dependencies
parsed = next(parse_requirements(distvers))
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2980, in parse_requirements
"version spec")
File "C:\Users\kntsaluba001\AppData\Local\Continuum\miniconda3\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 2956, in scan_list
raise RequirementParseError(msg, line, "at", line[p:])
pip._vendor.pkg_resources.RequirementParseError: Expected ',' or end-of-list in gitpython >=2.1.9<2.2 at <2.2
I tried uninstalling gitpython in an attempt to fix the issue, I thought that it would just be reinstalled since its a dependency however I still get the same issue.
Consider upgrading pip version. This should solve the issue

FileExistsError: [Errno 17] File exists: '/root/analytics/venv-nerapi/lib/python3.6/lib-dynload' while creating virtual environment

$virtualenv -p python3.6m ../venv-nerapi
Above command gives following error.
Running virtualenv with interpreter /usr/local/bin/python3.6m
Using base prefix '/usr/local'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 352, in copyfile
os.symlink(srcpath, dest)
FileExistsError: [Errno 17] File exists: '/usr/local/lib/python3.6/lib-dynload' -> '/root/analytics/venv-nerapi/lib/python3.6/lib-dynload'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 2343, in <module>
main()
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 712, in main
symlink=options.symlink)
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 927, in create_environment
site_packages=site_packages, clear=clear, symlink=symlink))
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 1132, in install_python
copyfile(join(stdlib_dir, fn), join(lib_dir, fn), symlink)
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 355, in copyfile
copyfileordir(src, dest, symlink)
File "/usr/local/lib/python3.6/site-packages/virtualenv.py", line 330, in copyfileordir
shutil.copytree(src, dest, symlink)
File "/usr/local/lib/python3.6/shutil.py", line 315, in copytree
os.makedirs(dst)
File "/usr/local/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/root/analytics/venv-nerapi/lib/python3.6/lib-dynload'
I followed following Github issue thread but din't find any luck.
Could any body tell whats went wrong ?
Thanks
Turns out that there was existing venv-nerapi . Deleting existing virtual env and re-creating it worked fine.

Python Assertion error even after pythonpath is set up properly

I have installed virtual environment properly but it is showing me assertion error. Please help me with this I have added pythonpath also
C:\Users\welcome>pip install virtualenv<br>
Requirement already satisfied: virtualenv in d:\lib\site-packages (16.0.0)
C:\Users\welcome>cd desktop
C:\Users\welcome\Desktop>virtualenv env
Using base prefix 'd:'
Traceback (most recent call last):
File "D:\Scripts\virtualenv-script.py", line 11, in <module>
load_entry_point('virtualenv==1.10.1', 'console_scripts', 'virtualenv')()
File "d:\lib\site-packages\virtualenv-1.10.1-py3.7.egg\virtualenv.py", line 821, in main
symlink=options.symlink)
File "d:\lib\site-packages\virtualenv-1.10.1-py3.7.egg\virtualenv.py", line 956, in create_environment
site_packages=site_packages, clear=clear, symlink=symlink))<br>
File "d:\lib\site-packages\virtualenv-1.10.1-py3.7.egg\virtualenv.py", line 1151, in install_python
copy_required_modules(home_dir, symlink)
File "d:\lib\site-packages\virtualenv-1.10.1-py3.7.egg\virtualenv.py", line 1089, in copy_required_modules
dst_filename = change_prefix(filename, dst_prefix)<br>
File "d:\lib\site-packages\virtualenv-1.10.1-py3.7.egg\virtualenv.py", line 1054, in change_prefix
(filename, prefixes)
AssertionError: Filename d:\lib\os.py does not start with any of these prefixes: ['D:\\', 'D:\\']

KeyError Exception when import data in Cassandra

I am using Cassandra 2.2 and I created table 'teste':
CREATE TABLE teste(
idflight UUID,
unique_carrier text,
taiNumber text,
airTime int,
depDelay int,
PRIMARY KEY (unique_carrier, depDelay, idflight)
);
Then I tried to import data from a CSV file using COPY FROM command:
COPY teste(idflight, unique_carrier, tailNumber, airTime, depDelay) FROM 'out.csv' WITH HEADER = true AND DELIMITER = ',';
When running this command occurs the following error:
Process ImportProcess-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/bin/cqlsh.py", line 2306, in run
cqltypes = [table_meta.columns[name].typestring for name in self.columns]
KeyError: 'tailnumber'
Process ImportProcess-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/bin/cqlsh.py", line 2306, in run
cqltypes = [table_meta.columns[name].typestring for name in self.columns]
KeyError: 'tailnumber'
Process ImportProcess-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/bin/cqlsh.py", line 2306, in run
cqltypes = [table_meta.columns[name].typestring for name in self.columns]
KeyError: 'tailnumber'
What should I do to correct this problem?

Resources