I am trying to create a Percentage win calculator using a multi class decision forest
I have two data sets one of existing win/loss data and another of pending items (same column structure) (win/loss/pending are all in one column call status)
the experiment runs no problems on the test data (win/loss) and has about a 90% accuracy rating
but when i move it over to a web service and try to run it with the other data set I get an error "
Apply Transformation Error Cannot process column "NAICS" of type
System.Double. The type is not supported by the module. . ( Error 0017
)"
The naic code is no different in one data set than the other
I am lost any help would be greatly appreciated
I guess you have already known that 0017 means Exception occurs if one or more specified columns have type unsupported by current module. (https://msdn.microsoft.com/en-us/library/azure/dn905850.aspx).
From the suggested resolution, you can do [Convert to Dataset][2]
Related
I am trying to create some activities using the excel importer. My activity has a technosphere flow of 0.4584 MWh of Production of electricity by gas from the previously imported EXIOBASE 3.3.17 hybrid database. The activity of Production of electricity by gas is in TJ in the database.
I ran without problems the import, something like:
ei = ExcelImporter(path_to_my_excel)
ei.apply_strategies()
ei.match_database(fields = ['name','location'])
ei.match_database(db_name = 'EXIOBASE 3.3.17 hybrid', fields = ['name','location'])
ei.match_database(db_name = 'biosphere3', fields = ['name','categories'])
ei.write_project_parameters()
ei.write_database(activate_parameters=True)
but if I iterate over the technosphere flows of my activity consuming natural gas it says it uses 0.4584 TJ of Production of electricity by gas (the same unit as the activity of production of electricty by gas, but the same amount I put in MWh). I was kind of hoping some unit conversion under the hood. Perhaps using bw2io.units.UNITS_NORMALIZATION.
Should we always express the units of exchanges with the same units as the activity they link ? is there an existing strategy to do the unit conversion for us? Thanks!
This line: ei.match_database(db_name = 'EXIOBASE 3.3.17 hybrid', fields = ['name','location']) is telling the program to match, but not to match based on units.
You can get the desired result with a migration, see an example here (in the section Fixing units for passenger cars).
I am trying to sum continuous stream of numbers from a file using hazelcast jet
pipe
.drawFrom(Sources.fileWatcher)<dir>))
.map(s->Integer.parseInt(s))
.addTimestamps()
.window(WindowDefinition.sliding(10000,1000))
.aggregate(AggregateOperations.summingDouble(x->x))
.drainTo(Sinks.logger());
Few questions
It doesn't give the expected output, my expectation is as soon as new number appears in the file, it should just add it to the existing sum
To do this why i need to give window and addTimestamp method, i just need to do sum of infinite stream
How can we achieve fault tolerance, i. e. if server restarts will it save the aggregated result and when it comes up it will aggregate from the last computed sum?
if the server is down and few numbers come in file now when the server comes up, will it read from last point from when the server went down or will it miss the numbers when it was down and will only read the number it got after the server was up.
Answer to Q1 & Q2:
You're looking for rollingAggregate, you don't need timestamps or windows.
pipe
.drawFrom(Sources.fileWatcher(<dir>))
.rollingAggregate(AggregateOperations.summingDouble(Double::parseDouble))
.drainTo(Sinks.logger());
Answer to Q3 & Q4: the fileWatcher source isn't fault tolerant. The reason is that it reads local files and when a member dies, the local files won't be available anyway. When the job restarts, it will start reading from current position and will miss numbers added while the job was down.
Also, since you use global aggregation, data from all files will be routed to single cluster member and other members will be idle.
I have a question regarding the Python API of Interactive Brokers.
Can multiple asset and stock contracts be passed into reqMktData() function and obtain the last prices? (I can set the snapshots = TRUE in reqMktData to get the last price. You can assume that I have subscribed to the appropriate data services.)
To put things in perspective, this is what I am trying to do:
1) Call reqMktData, get last prices for multiple assets.
2) Feed the data into my prediction engine, and do something
3) Go to step 1.
When I contacted Interactive Brokers, they said:
"Only one contract can be passed to reqMktData() at one time, so there is no bulk request feature in requesting real time data."
Obviously one way to get around this is to do a loop but this is too slow. Another way to do this is through multithreading but this is a lot of work plus I can't afford the extra expense of a new computer. I am not interested in either one.
Any suggestions?
You can only specify 1 contract in each reqMktData call. There is no choice but to use a loop of some type. The speed shouldn't be an issue as you can make up to 50 requests per second, maybe even more for snapshots.
The speed issue could be that you want too much data (> 50/s) or you're using an old version of the IB python api, check in connection.py for lock.acquire, I've deleted all of them. Also, if there has been no trade for >10 seconds, IB will wait for a trade before sending a snapshot. Test with active symbols.
However, what you should do is request live streaming data by setting snapshot to false and just keep track of the last price in the stream. You can stream up to 100 tickers with the default minimums. You keep them separate by using unique ticker ids.
While passing reference data field as a duration in TumblingWindow I am getting compile time error related to Window duration require positive float constant.
Can anyone please guide?
group by TumblingWindow(minute, referencetable.EntryTime)
At the moment we don't support variable time windows, so you need to set the time explicitly and not load it from the reference data. Sorry for the inconvenience.
A workaround, in the case you have only few different time durations, would be to have different steps/subqueries for the different times and use a where clause to create or not an output for that step.
Let me know if you have further question.
JS (from the Azure Stream Analytics team)
I am running a U-SQL Activity as part of a Pipeline in Azure Data Factory for a defined time slice. The U-SQL Activity runs a sequence of U-SQL scripts that read-in and process data stored in Azure Data Lake. While the data processes successfully in my local run it is throwing an System Out of Memory Exception when running in Azure Data Factory Cloud Environment.
The Input data is approximately 200MB, which should not be a problem processing, as bigger data sets have been processed previously.
Memory management is assumed to scale as needed, it is surprising to see an Out of Memory Exception in a Azure Cloud Environment, following are Exception snapshots of two runs on the same input data, the only difference being the time at which they occur.
Exception Snapshot - 1
Exception Snapshot - 2
Any assistance is highly appreciated, thanks.
Further Update: On further investigation it was observed skipping header row using variable skipNRow:1 re-solved the issue, our u-sql code behind snippet has a loop which is conditioned on a date comparison, its possible the loop isn't terminating because of an invalid date time cast of header row column given the snippet is processing DateTime type row column as input. That should ideally give an invalid date time format exception but we see an Out of memory exception instead.
It looks like something in the user code is causing the exception you can try running the failed vertex debug feature in VS. you can open the failed job in VS and it should give you an error bar in the job overview that lets you kick off that process. It will download the failed portion to the desktop and let you step through.