Dagster - Tail process exception # compute logs - python-3.x

I am new to using dagster. After much tinkering, I managed to create a partition pipeline.
However, when I tried to run >10 backfills using the dagit UI, I encounter this error below.
Additionally, I have 5 ops but only 2 managed to run to completion and the remaining 3 ops are skipped; this resulted in the UI display success even though it should have failed.
I do not encounter this issue if I run < 5 backfills at 1 go.
Any kind souls able to help on this? I will try to provide more info if necessary.
Not sure if anything to do with dagster.yaml but I did include this section:
run_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator
config:
max_concurrent_runs: 3
Exception:
Exception: Timed out waiting for tail process to start
File "o:\sin\ca1e\04_data science stuff\virtual environments\dagsterpipeline\lib\site-packages\dagster\core\execution\plan\execute_plan.py", line 96, in inner_plan_execution_iterator
stack.close()
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 533, in close
self.__exit__(None, None, None)
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 525, in __exit__
raise exc_details[1]
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 510, in __exit__
if cb(*exc_details):
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "o:\sin\ca1e\04_data science stuff\virtual environments\dagsterpipeline\lib\site-packages\dagster\core\storage\compute_log_manager.py", line 70, in watch
yield
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "o:\sin\ca1e\04_data science stuff\virtual environments\dagsterpipeline\lib\site-packages\dagster\core\storage\local_compute_log_manager.py", line 52, in _watch_logs
yield
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "o:\sin\ca1e\04_data science stuff\virtual environments\dagsterpipeline\lib\site-packages\dagster\core\execution\compute_logs.py", line 31, in mirror_stream_to_file
yield pids
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "o:\sin\ca1e\04_data science stuff\virtual environments\dagsterpipeline\lib\site-packages\dagster\core\execution\compute_logs.py", line 75, in tail_to_stream
yield pids
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "o:\sin\ca1e\04_data science stuff\virtual environments\dagsterpipeline\lib\site-packages\dagster\core\execution\compute_logs.py", line 104, in execute_windows_tail
raise Exception("Timed out waiting for tail process to start")

Related

Creating an Expectation Suite With an Automated Profiler Great Expectation

I am a newbie to great expectations and trying to set up but facing the below issue while creating an expectation Suite with an Automated Profiler.
C:\Users\user\great_expectations>great_expectations --v3-api suite new
Using v3 (Batch Request) API
How would you like to create your Expectation Suite?
1. Manually, without interacting with a sample batch of data (default)
2. Interactively, with a sample batch of data
3. Automatically, using a profiler
: 3
A batch of data is required to edit the suite - let's help you to specify it.
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\Scripts\great_expectations.exe\__main__.py", line 7, in <module>
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\great_expectations\cli\cli.py", line 190, in main
cli()
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\click\decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\great_expectations\cli\suite.py", line 151, in suite_new
_suite_new_workflow(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\great_expectations\cli\suite.py", line 335, in _suite_new_workflow
raise e
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\great_expectations\cli\suite.py", line 268, in _suite_new_workflow
suite: ExpectationSuite = toolkit.get_or_create_expectation_suite(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\great_expectations\cli\toolkit.py", line 82, in get_or_create_expectation_suite
default_expectation_suite_name: str = get_default_expectation_suite_name(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\great_expectations\cli\toolkit.py", line 131, in get_default_expectation_suite_name
suite_name = f"batch-{BatchRequest(**batch_request).id}"
TypeError: BatchRequest.__init__() missing 1 required positional argument: 'data_asset_name'
C:\Users\user\great_expectations>
I had the same issue and, for me, the problem came from a badly configured data source. What I suggest you to do is to test your data source config and see how many datasets it found:
from ruamel import yaml
import great_expectations as ge
context = ge.get_context()
datasource_config = {...}
context.test_yaml_config(yaml.dump(datasource_config))
When running this, the test_yaml_config will output a report on how many assets it found.
If it didn't find any, then you'll run into the issue you're describing when you'll try to create a suite on your data.

Python3: how to detect if /dev/shm (or ProcessPoolPexecutor) can be used?

I have some code that gets improvement from being multi-processed, however in AWS Lambda, /dev/shm is not available so ProcessPoolExecutor fail with the cryptic error message:
File "/var/task/black.py", line 529, in reformat_many
executor = ProcessPoolExecutor(max_workers=worker_count)
File "/var/lang/lib/python3.7/concurrent/futures/process.py", line 556, in __init__
pending_work_items=self._pending_work_items)
File "/var/lang/lib/python3.7/concurrent/futures/process.py", line 165, in __init__
super().__init__(max_size, ctx=ctx)
File "/var/lang/lib/python3.7/multiprocessing/queues.py", line 42, in __init__
self._rlock = ctx.Lock()
File "/var/lang/lib/python3.7/multiprocessing/context.py", line 67, in Lock
return Lock(ctx=self.get_context())
File "/var/lang/lib/python3.7/multiprocessing/synchronize.py", line 162, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/var/lang/lib/python3.7/multiprocessing/synchronize.py", line 59, in __init__
unlink_now)
OSError: [Errno 38] Function not implemented
is there a portable way to detect that it would have failed ?
You could use a try/except with your exception : https://docs.python.org/fr/3/tutorial/errors.html

How do I append rows to an excel spreadsheet with openpyxl and then save the excel file?

I'm writing a script that adds new data to an existing excel spreadsheet. Currently, the spreadsheet is 500k+ rows long. I've been using openpyxl to open the spreadsheet as xlsxwriter doesn't currently have any editing capabilities. However, when I use the provided append() method as explained in this answer to a similar problem.
I'm currently running Python 3.7.3 with openpyxl 2.6.2 on a Windows 7 computer.
from openpyxl import load_workbook, Workbook
records = [object list] #this is just a list of objects
file_name = 'existing_excel_file.xlsx'
excel_workbook = load_workbook(file_name, read_only=False)
worksheet = excel_workbook.active
row_list = []
for record in records:
row_number += 1
row_list.append([
str(record.weekno),
str(record.date1),
str(record.code),
str(record.customer),
str(record.date2
])
for row in row_list:
worksheet.append(row)
excel_workbook.save(file_name )
Obviously, it's supposed to save the file with the appended lines.
append() is working alright, but when I try to execute the save() method, I receive this error:
ValueError: I/O operation on closed file.
EDIT: At the suggestion of #CharlieClark, I grabbed the full traceback. I also noticed that there is a MemoryError that I simply didn't notice before (careless, I know) which might be the source of my issue; until this is resolved, I'm researching how to increase the memory being used in openpyxl as I'm sure that's probably the key. Regardless, here's the dump. Warning: it's a big hairy traceback.
Traceback (most recent call last):
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 836, in _get_writer
yield file.write
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 777, in write
short_empty_elements=short_empty_elements)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 942, in _serialize_xml
short_empty_elements=short_empty_elements)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 942, in _serialize_xml
short_empty_elements=short_empty_elements)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 942, in _serialize_xml
short_empty_elements=short_empty_elements)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 935, in _serialize_xml
write(" %s=\"%s\"" % (qnames[k], v))
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "manage.py", line 21, in <module>
main()
File "manage.py", line 17, in main
execute_from_command_line(sys.argv)
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\django\core\management\__init__.py", line 381, in execute_from_command_line
utility.execute()
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\django\core\management\__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\django\core\management\base.py", line 316, in run_from_argv
self.execute(*args, **cmd_options)
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\django\core\management\base.py", line 353, in execute
output = self.handle(*args, **options)
File "C:\Users\davidm\projects\web_admin\web_admin\ezcorp\management\commands\codes.py", line 183, in handle
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\openpyxl\workbook\workbook.py", line 397, in save
save_workbook(self, filename)
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\openpyxl\writer\excel.py", line 294, in save_workbook
writer.save()
File "C:\Users\davidm\projects\web_admin\venv\lib\site-packages\openpyxl\writer\excel.py", line 276, in save
self.write_data()
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\openpyxl\writer\excel.py", line 76, in write_data
self._write_worksheets()
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\openpyxl\writer\excel.py", line 216, in _write_worksheets
self.write_worksheet(ws)
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\openpyxl\writer\excel.py", line 201, in write_worksheet
writer.write()
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\openpyxl\worksheet\_writer.py", line 358, in write
self.close()
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\openpyxl\worksheet\_writer.py", line 366, in close
self.xf.close()
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\openpyxl\worksheet\_writer.py", line 297, in get_stream
pass
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\contextlib.py", line 119, in __exit__
next(self.gen)
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\et_xmlfile\xmlfile.py", line 50, in element
self._write_element(el)
File "C:\Users\davidm\projects\rdm_admin\venv\lib\site-packages\et_xmlfile\xmlfile.py", line 77, in _write_element
xml = tostring(element)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 1136, in tostring
short_empty_elements=short_empty_elements)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 777, in write
short_empty_elements=short_empty_elements)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\xml\etree\ElementTree.py", line 836, in _get_writer
yield file.write
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\contextlib.py", line 511, in __exit__
raise exc_details[1]
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\contextlib.py", line 496, in __exit__
if cb(*exc_details):
File "C:\Users\davidm\AppData\Local\Programs\Python\Python37-32\Lib\contextlib.py", line 383, in _exit_wrapper
callback(*args, **kwds)
ValueError: I/O operation on closed file.

PyTorch Getting Started example not working

I followed this tutorial in the Getting Started section on the PyTorch website: "Deep Learning with PyTorch: A 60 Minute Blitz" and I downloaded the code for "Training a Classifier" on the bottom of the page and I ran it, and it's not working for me. I'm using the CPU version of PyTorch if that makes a difference. I'm new to Python and basically learning it for Pytorch. Here's the error message, Control + K isn't working for me because I think the editing interface is different for the first few posts and Stack Overflow needs to fix it. Or it could just be my browser:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Anonymous\PycharmProjects\pytorchHelloWorld\train_network.py", line 100, in <module>
dataiter = iter(trainloader)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in __iter__
return _DataLoaderIter(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 560, in __init__
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "C:/Users/Anonymous/PycharmProjects/pytorchHelloWorld/train_network.py", line 100, in <module>
dataiter = iter(trainloader)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in __iter__
return _DataLoaderIter(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 560, in __init__
w.start()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
The error is likely due to multiprocessing in DataLoader and Windows since the tutorial is using num_workers=2. Python3 documentation shares some guidelines on this:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
You can either set num_workers=0 or you need to wrap your code within if __name__ == '__main__'
# Safe DataLoader multiprocessing with Windows
if __name__ == '__main__':
# Code to load the data with num_workers > 1
Check this reply on PyTorch forum for more details and this issue on GitHub.

Gremlin paging for big dataset query

I am using gremlin server, I have a big data set and I performing the gremlin paging. Following is the sample of query:
query = """g.V().both().both().count()"""
data = execute_query(query)
for x in range(0,int(data[0]/10000)+1):
print(x*10000, " - ",(x+1)*10000)
query = """g.V().both().both().range({0}*10000, {1}*10000)""".format(x,x+1)
data = execute_query(query)
def execute_query(query):
"""query execution"""
Above query is working fine, for pagination i have to know the rang where to stop the execution of the query. for getting the range i have to first fetch the count of the query and pass to the for loop. Is there any other to use the pagination of gremlin.
-- Pagination is required, because its fails when fetching 100k data in a single ex. g.V().both().both().count()
if we don't use pagination then its giving me this following error:
ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 554, in wrapper
return callback(*args)
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 343, in wrapped
raise_exc_info(exc)
File "<string>", line 3, in raise_exc_info
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 807, in _on_frame_data
self._receive_frame()
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 697, in _receive_frame
self.stream.read_bytes(2, self._on_frame_start)
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 312, in read_bytes
assert isinstance(num_bytes, numbers.Integral)
File "/usr/lib/python3.5/abc.py", line 182, in __instancecheck__
if subclass in cls._abc_cache:
File "/usr/lib/python3.5/_weakrefset.py", line 75, in __contains__
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison
ERROR:tornado.application:Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f3e1c409ae8>)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/ioloop.py", line 604, in _run_callback
ret = callback()
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 554, in wrapper
return callback(*args)
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 343, in wrapped
raise_exc_info(exc)
File "<string>", line 3, in raise_exc_info
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 807, in _on_frame_data
self._receive_frame()
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 697, in _receive_frame
self.stream.read_bytes(2, self._on_frame_start)
File "/usr/local/lib/python3.5/dist-packages/tornado/iostream.py", line 312, in read_bytes
assert isinstance(num_bytes, numbers.Integral)
File "/usr/lib/python3.5/abc.py", line 182, in __instancecheck__
if subclass in cls._abc_cache:
File "/usr/lib/python3.5/_weakrefset.py", line 75, in __contains__
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison
Traceback (most recent call last):
File "/home/rgupta/Documents/BitBucket/ecodrone/ecodrone/test2.py", line 59, in <module>
data = execute_query(query)
File "/home/rgupta/Documents/BitBucket/ecodrone/ecodrone/test2.py", line 53, in execute_query
results = future_results.result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 405, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/resultset.py", line 81, in cb
f.result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 398, in result
return self.__get_result()
File "/usr/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
raise self._exception
File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/connection.py", line 77, in _receive
self._protocol.data_received(data, self._results)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
self.data_received(data, results_dict)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
self.data_received(data, results_dict)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
self.data_received(data, results_dict)
File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
this line repeats 100 times File "/usr/local/lib/python3.5/dist-packages/gremlin_python/driver/protocol.py", line 100, in data_received
This question is largely answered here but I'll add some more comment.
Your approach to pagination is really expensive as I'm not aware of any graphs that will optimize that particular traversal and you're basically iterating all that data a lot of times. You do it once for the count(), then you iterate the first 10000, then for the second 10000, you iterate the first 10000 followed by the second 10000, and then on the third 10000, you iterate the first 20000 followed by the third 10000 and so on...
I'm not sure if there is more to your logic, but what you have looks like a form of "batching" to get smaller bunches of results. There isn't much need to do it that way as Gremlin Server is already doing that for you internally. Were you to just send g.V().both().both() Gremlin Server is going to batch up results given the resultIterationBatchSize configuration option.
Anyway, there isn't really a better way to make paging work that I am aware of beyond what was explained in the other question that I mentioned.

Resources