emission compartments in brightway and how they are implemented - brightway

I wanted to identify the emissions to air of an activity in Brightway but exploring this lead me to a more general question. Are compartments and subcompartments taken into account in the implementation of impact assessment methods?
In principle the emission factors depend on the compartment, for example is not the same thing to emit formaldehyde to water or air. Taking as example impact 2002 endpoint human health method. according to the spreadsheet provided by ecoinvent LCIA implementation_3.3, the CF is 3 orders of magnitude higher in air. If I check the implementation in the same method in Brightway:
m_name=[m for m in bw.methods if '2002' in str(m)
and 'human toxicity' in str(m)][0]
m=bw.Method(m_name)
# Generate the dictionary using a comprehension:
m_dict = {bw.get_activity(ef[0])['name']:ef[1] for ef in m.load()}
# put the whole thing in a neat Pandas series
m_series=pd.Series(m_dict,
name="{}, {}".format(m.name, m.metadata['unit']))
m_series[m_series.index.str.contains('Formaldehyde')]
I only find the value corresponding to emission to water, but no info on the compartment / subcompartment. What am I missing?

There aren't any real requirements for what metadata should be associated with a biosphere flow (or any node) in Brightway2, but the key categories is populated in the default metadata:
In [1]: import brightway2 as bw
In [2]: for flow in bw.Database("biosphere3"):
...: if 'formaldehyde' in flow['name'].lower():
...: print(flow['name'], flow['categories'])
...:
Formaldehyde ('water',)
Formaldehyde ('air', 'lower stratosphere + upper troposphere')
Formaldehyde ('water', 'ocean')
Formaldehyde ('water', 'surface water')
Formaldehyde ('water', 'ground-')
Formaldehyde ('air', 'low population density, long-term')
Formaldehyde ('water', 'ground-, long-term')
Formaldehyde ('air', 'urban air close to ground')
Formaldehyde ('air',)
Formaldehyde ('air', 'non-urban air or from high stacks')
Both emissions to air and water are characterized in ('IMPACT 2002+ (Endpoint)', 'human health', 'total'):
In [3]: name = ('IMPACT 2002+ (Endpoint)', 'human health', 'total')
In [4]: for key, cf in bw.Method(name).load():
...: flow = bw.get_activity(key)
...: if 'formaldehyde' in flow['name'].lower():
...: print(flow, cf)
...:
'Formaldehyde' (kilogram, None, ('air', 'low population density, long-term')) 0.00180414
'Formaldehyde' (kilogram, None, ('air', 'non-urban air or from high stacks')) 0.00180414
'Formaldehyde' (kilogram, None, ('air',)) 0.00180414
'Formaldehyde' (kilogram, None, ('air', 'urban air close to ground')) 0.00180414
'Formaldehyde' (kilogram, None, ('water', 'surface water')) 8.1879e-06
'Formaldehyde' (kilogram, None, ('water',)) 8.1879e-06

Related

What is the the role of using OneVsRestClassifier wrapper around XGBClassifier?

I have a multiclass classficiation problem with 3 classes.
0 - on a given day (24h) my laptop battery did not die
1 - on a given day my laptop battery died before 12AM
2 - on a given day my laptop battery died at or after 12AM
(Note that these categories are mutually exclusive. The battery is not recharged once it died)
I am interested to know the predicted probability for each 3 classes. More specifically, I intend to derive 2 types of warning:
If the prediction for class 1 is higher then a threshold x: 'Your battery is at risk of dying in the morning.'
If the prediction for class 2 is higher then a threshold y: 'Your battery is at risk of dying in the afternoon.'
I can generate the the probabilities by using xgboost.XGBClassifier with the appropriate parameters for a multiclass problem.
import numpy as np
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from xgboost import XGBClassifier
X = np.array([
[10, 10],
[8, 10],
[-5, 5.5],
[-5.4, 5.5],
[-20, -20],
[-15, -20]
])
y = np.array([0, 1, 1, 1, 2, 2])
clf1 = XGBClassifier(objective = 'multi:softprob', num_class = 3, seed = 42)
clf1.fit(X, y)
clf1.predict_proba([[-19, -20]])
Results:
array([[0.15134096, 0.3304505 , 0.51820856]], dtype=float32)
But I can also wrap this with sklearn.multiclass.OneVsRestClassifier. Which then produces slightly different results:
clf2 = OneVsRestClassifier(XGBClassifier(objective = 'multi:softprob', num_class = 3, seed = 42))
clf2.fit(X, y)
clf2.predict_proba([[-19, -20]])
Results:
array([[0.10356173, 0.34510303, 0.5513352 ]], dtype=float32)
I was expecting the two approaches to produce the same results. My understanding was that XGBClassifier is also based on a one-vs-rest approach in a multiclass case, since there are 3 probabilities in the output and they sum up to 1.
Can you tell me where the difference comes from, and how the respective results should be interpreted? And most important, which is approach is better suited to solve my problem.

Graphing a log file with Python

To start with I don't usually use any scripting language other than bash. However I have a requirement to graph the data from a large number of environmental monitoring log files from our server room monitor and feel that Python will work best. I am using Python 3.7 for this and have installed it and all the, so far, required libraries via macports and pip.
I want to end up with at least 7 graphs, each with multiple lines. Four of the graphs are the temperature and humidity data for each physical measurement point. Two of them are for air flow, hot and cold, and the last is for line voltage.
I have attempted to start this on my own and have gotten decently far. I open the log files and extract the required data. However getting the data into a graph seems to be beyond me. The data to be graphed is a date and time stamp, as X, and a number which is a dotted decimal, which should always be positive, as Y.
When extracting the date I am using time.strptime and time.mktime to turn it into a Unix epoch, which works just fine. When extracting the data I am using re.findall to remove the non-numerical portions. I plan to move the date from an epoch to date and time but that can come later.
When I get to the graphing portion is where I am having the issue.
I first tried graphing the data directly which gave me the error: TypeError: unhashable type: 'numpy.ndarray'
I have also tried using a Pandas dataframe. This gave me the error: TypeError: unhashable type: 'list'
I have even tried to convert the lists to a tuple both with and without the dataframe, the same errors were give.
Based on the output of my lists I think the issue is with using append for the values that will be the Y axis. However I cannot seem to Google well enough to find a solution.
The code, outputs seen, and input data is below. The comments are there from the last run, I use them for testing various portions.
Code so far:
# Import needed libraries
import re
import time
import matplotlib.pyplot as plt
import pandas as pd
#import matplotlib.dates as mpd
# Need to initialize these or append doesn't work
hvacepoch = []
hvacnum = []
endepoch = []
endnum = []
# Known static variables
datepattern = '%m-%d-%Y %H:%M:%S'
# Open the files
coldairfile = open("air-cold.log","r")
# Grab the data and do some initial conversions
for coldairline in coldairfile:
fields = coldairline.split()
colddate = fields[0] + " " + fields[1]
# coldepoch = mpd.epoch2num(int(time.mktime(time.strptime(colddate, datepattern))))
coldepoch = int(time.mktime(time.strptime(colddate, datepattern)))
coldnum = re.findall('\d*\.?\d+',fields[4])
coldloc = fields[9]
if coldloc == "HVAC":
hvacepoch.append(coldepoch)
hvacnum.append(coldnum)
if coldloc == "Cold":
endepoch.append(coldepoch)
endnum.append(coldnum)
# Convert the lists to a tuple. Do I need this?
hvacepocht = tuple(hvacepoch)
hvacnumt = tuple(hvacnum)
endepocht = tuple(endepoch)
endnumt = tuple(endnum)
# Testing output
print(f'HVAC air flow date and time: {hvacepoch}')
print(f'HVAC air flow date and time tuple: {hvacepocht}')
print(f'HVAC air flow numbers: {hvacnum}')
print(f'HVAC air flow numbers tuple: {hvacnumt}')
print(f'Cold end air flow date and time: {endepoch}')
print(f'Cold end air flow date and time tuple: {endepocht}')
print(f'Cold end air flow numbers: {endnum}')
print(f'Cold end air flow numbers tuple: {endnumt}')
# Graph it. How to do for multiple graphs?
# With a Pandas dataframe as a list.
#colddata=pd.DataFrame({'x': endepoch, 'y1': endnum, 'y2': hvacnum })
#plt.plot( 'x', 'y1', data=colddata, marker='', color='blue', linewidth=2, label="Cold Aisle End")
#plt.plot( 'x', 'y2', data=colddata, marker='', color='skyblue', linewidth=2, label="HVAC")
# With a Pandas dataframe as a tuple.
#colddata=pd.DataFrame({'x': endepocht, 'y1': endnumt, 'y2': hvacnumt })
#plt.plot( 'x', 'y1', data=colddata, marker='', color='blue', linewidth=2, label="Cold Aisle End")
#plt.plot( 'x', 'y2', data=colddata, marker='', color='skyblue', linewidth=2, label="HVAC")
# Without a Pandas dataframe as a list.
#plt.plot(hvacepoch, hvacnum, label = "HVAC")
#plt.plot(endepoch, endnum, label = "Cold End")
# Without a Pandas dataframe as a tuple.
#plt.plot(hvacepocht, hvacnumt, label = "HVAC")
#plt.plot(endepocht, endnumt, label = "Cold End")
# Needed regardless
#plt.title('Airflow\nUnder Floor')
#plt.legend()
#plt.show()
# Close the files
coldairfile.close()
The output from the print lines(truncated):
HVAC air flow date and time: [1588531379, 1588531389, 1588531399]
HVAC air flow date and time tuple: (1588531379, 1588531389, 1588531399)
HVAC air flow numbers: [['0.14'], ['0.15'], ['0.15']]
HVAC air flow numbers tuple: (['0.14'], ['0.15'], ['0.15'])
Cold end air flow date and time: [1588531379, 1588531389, 1588531399]
Cold end air flow date and time tuple: (1588531379, 1588531389, 1588531399)
Cold end air flow numbers: [['0.10'], ['0.09'], ['0.07']]
Cold end air flow numbers tuple: (['0.10'], ['0.09'], ['0.07'])
The input(truncated):
05-03-2020 14:42:59 Air Velocit 0.14m/ Under Floor Air Flow HVAC
05-03-2020 14:42:59 Air Velocit 0.10m/ Under Floor Air Flow Cold End
05-03-2020 14:43:09 Air Velocit 0.15m/ Under Floor Air Flow HVAC
05-03-2020 14:43:09 Air Velocit 0.09m/ Under Floor Air Flow Cold End
05-03-2020 14:43:19 Air Velocit 0.15m/ Under Floor Air Flow HVAC
05-03-2020 14:43:19 Air Velocit 0.07m/ Under Floor Air Flow Cold End
I just checked your data and it looks like the issue is that endnum and hvacnum are not lists of values. They're lists of lists, as you can see below:
In [1]: colddata.head()
Out[1]:
x y1 y2
0 1588531379 [0.10] [0.14]
1 1588531389 [0.09] [0.15]
2 1588531399 [0.07] [0.15]
So, when you go to plot the data, matplotlib doesn't know how to plot those rows. What you can do is use a list comprehension to grab unpack the list.
In [2]:
print(endnum)
print(hvacnum)
Out[2]:
[['0.10'], ['0.09'], ['0.07']]
[['0.14'], ['0.15'], ['0.15']]
In [3]:
endnum = [i[0] for i in endnum]
hvacnum = [i[0] for i in hvacnum]
print(endnum)
print(hvacnum)
Out[3]:
['0.10', '0.09', '0.07']
['0.14', '0.15', '0.15']
Given your log file, you can use pd.read_fwf with specific colspecs:
df = pd.read_fwf('/home/quang/projects/untitled.txt', header=None,
colspecs=[[0,20], [22,34], [35,39], [42, 54], [54,-1]], # modify this to fit your needs
parse_dates=[0],
names=['time', 'veloc', 'value', 'location', 'type'] # also modify this
)
which gives you a dataframe like this:
time veloc value location type
0 2020-05-03 14:42:59 Air Velocit 0.14 Under Floor Air Flow HVAC
1 2020-05-03 14:42:59 Air Velocit 0.10 Under Floor Air Flow Cold End
2 2020-05-03 14:43:09 Air Velocit 0.15 Under Floor Air Flow HVAC
3 2020-05-03 14:43:09 Air Velocit 0.09 Under Floor Air Flow Cold End
4 2020-05-03 14:43:19 Air Velocit 0.15 Under Floor Air Flow HVAC
5 2020-05-03 14:43:19 Air Velocit 0.07 Under Floor Air Flow Cold End
And you can plot with sns:
sns.lineplot(data=df, x='time', y='value', hue='type' )
Output:

Identify best GridsearchCV scoring metric for food prediction in XGBoost

I am using GridSearchCV to find the best parameter that help me tune XGBoost for a food prediction algorithm.
I am struggling to identify the best scoring metric that would result in the best profit (sales margin minus wastage costs) as this is ultimately what I am looking for. In running the script below and plugging it into the data (I reserved some data for testing only), I noticed that a better R2 seems to be better than a better RMSE in obtaining a higher profit. But I am struggling to find an explanation which will help me guide to the best scoring method.
Here some infos on the situation:
It costs me 6 USD to produce the product and 9 USD to sell, so my margin is 3 USD. Therefore my wastage is 6 USD multiplied by (production minus sales quantities), whereas my earnings are sales quantities multiplied by 3.
Example: I produce 100, sell 70, waste 30 my earnings are 70*3 - 30*6 = 30
So I have an imbalance between sales and wastage.
Main Question: Which scoring metric puts a higher penalty weight on the over-prediction?
My current code:
X = consumption[feature_names]
y = consumption['Meal1']
data_dmatrix = xgb.DMatrix(data=X,label=y)
# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
'min_child_weight':[1, 2],
'gamma': [0.05,0.06],
'reg_alpha':range(1, 2),
'colsample_bytree': [0.22, 0.23],
'n_estimators': range(28, 29),
'max_depth': range(3, 8),
'reg_alpha':range(1, 2),
'reg_lambda':range(1, 2),
'subsample': [0.7,0.8,0.9],
'learning_rate': [0.1,0.2],
}
fixed_params = {'objective':'reg:squarederror','booster':'gbtree' }
# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor(**fixed_params)
# Perform grid search: grid_mse
grid_mse = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid, scoring="r2", cv=5, verbose=1)
# Fit grid_mse to the data
grid_mse.fit(X,y)
# Print the best parameters and lowest RMSE
print("Best parameters found: ", grid_mse.best_params_)
print("Lowest Score found: ", np.sqrt(np.abs(grid_mse.best_score_)))

gv.Polygons DataError When Using OSGB Projection

I have 2 shapefiles for the UK:
In [3]: # SHAPEFILE 1:
...: # WESTMINISTER PARLIAMENTARY CONSTITUENCY UK SHAPEFILE
...: shapefile1 = "../Westminster_Parliamentary_Constituencies_De
...: cember_2017_UK_BSC_SUPER_SMALL/Westminster_Parliamentary_Constituencies_
...: December_2017_UK_BSC.shp"
In [4]: # SHAPEFILE 2:
...: # LAD19 UK SHAPEFILE
...: shapefile2 = "../03_Maps_March_2020/level3_LAD19_CONTAINS_4_L
...: EVELS_OF_DETAIL/Local_Authority_Districts_December_2019_Boundaries_UK_BU
...: C/Local_Authority_Districts_December_2019_Boundaries_UK_BUC.shp"
In [6]: # LOAD SHAPEFILE 1 INTO GEOPANDAS
...: parl_con = gpd.read_file(shapefile1)
...: parl_con.head()
Out[6]:
FID PCON17CD PCON17NM BNG_E BNG_N LONG LAT Shape__Are Shape__Len geometry
0 11 E14000540 Barking 546099 184533 0.105346 51.5408 5.225347e+07 44697.210277 MULTIPOLYGON (((0.07106 51.53715, 0.07551 51.5...
1 12 E14000541 Barnsley Central 433719 408537 -1.492280 53.5724 1.377661e+08 72932.918783 POLYGON ((-1.42490 53.60448, -1.43298 53.59652...
2 13 E14000542 Barnsley East 439730 404883 -1.401980 53.5391 2.460912e+08 87932.525762 POLYGON ((-1.34873 53.58335, -1.33215 53.56286...
3 14 E14000543 Barrow and Furness 325384 484663 -3.146730 54.2522 8.203002e+08 283121.334647 MULTIPOLYGON (((-3.20064 54.06488, -3.20111 54...
4 15 E14000544 Basildon and Billericay 569070 192467 0.440099 51.6057 1.567962e+08 57385.722178 POLYGON ((0.49457 51.62362, 0.50044 51.61807, ...
In [7]: # SHAPEFILE 1 PROJECTION:
...: parl_con.crs
Out[7]: {'init': 'epsg:4326'}
In [12]: # LOAD SHAPEFILE 2 INTO GEOPANDAS
...: lad19 = gpd.read_file(shapefile2)
...: lad19.head()
Out[12]:
objectid lad19cd lad19nm lad19nmw bng_e bng_n long lat st_areasha st_lengths geometry
0 1 E06000001 Hartlepool None 447160 531474 -1.27018 54.676140 9.684551e+07 50305.325058 POLYGON ((448986.025 536729.674, 453194.600 53...
1 2 E06000002 Middlesbrough None 451141 516887 -1.21099 54.544670 5.290846e+07 34964.406313 POLYGON ((451752.698 520561.900, 452424.399 52...
2 3 E06000003 Redcar and Cleveland None 464361 519597 -1.00608 54.567520 2.486791e+08 83939.752513 POLYGON ((451965.636 521061.756, 454348.400 52...
3 4 E06000004 Stockton-on-Tees None 444940 518183 -1.30664 54.556911 2.071591e+08 87075.860824 POLYGON ((451965.636 521061.756, 451752.698 52...
4 5 E06000005 Darlington None 428029 515648 -1.56835 54.535339 1.988128e+08 91926.839545 POLYGON ((419709.299 515678.298, 419162.998 51...
In [13]: # SHAPEFILE 2 PROJECTION:
...: lad19.crs
Out[13]: {'init': 'epsg:27700'}
With the shapefile using WGS 84 projection, I can successfully plot my choropleth using gv.Polygons:
In [14]: # USE GEOPANDAS DATAFRAME WITH gv.Polygons TO PRODUCE INTERACTIVE CHROPLETH:
...: gv.Polygons(parl_con, vdims='PCON17NM'
...: ).opts(tools=['hover','tap'],
...: width=450, height=600
...: )
Out[14]: :Polygons [Longitude,Latitude] (PCON17NM)\
However if I use the shapefile using OSGB projection then I get an error:
In [15]: # USE GEOPANDAS DATAFRAME WITH gv.Polygons TO PRODUCE INTERACTIVE CHROPLETH:
...: gv.Polygons(lad19, vdims='lad19_name',
...: ).opts(tools=['hover','tap'],
...: width=450, height=600
...: )
DataError: Expected Polygons instance to declare two key dimensions corresponding to the geometry coordinates but 3 dimensions were found which did not refer to any columns.
GeoPandasInterface expects a list of tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html
I tried converting the projection used but I just got the same error again when I tried to run gv.Polygons again:
In [16]: lad19.crs
Out[16]: {'init': 'epsg:27700'}
In [17]: lad19.crs = {'init': 'epsg:4326'}
...: lad19.crs
Out[17]: {'init': 'epsg:4326'}
In [19]: # USE GEOPANDAS DATAFRAME WITH gv.Polygons TO PRODUCE INTERACTIVE CHROPLETH:
...: gv.Polygons(lad19, vdims='lad19_name',
...: ).opts(tools=['hover','tap'],
...: width=450, height=600
...: )
DataError: Expected Polygons instance to declare two key dimensions corresponding to the geometry coordinates but 3 dimensions were found which did not refer to any columns.
GeoPandasInterface expects a list of tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html
Note that I can successfully plot choropleths for both of these shapefiles using gv.Shape. The only difference using gv.Shape is that with shapefile 1 I don’t need to specify the projection used whereas with shapefile 2 I have to specify crs=ccrs.OSGB().
Does anyone know what’s going on here?
Thanks
Shapefile download links:
Shapefile 1:
https://geoportal.statistics.gov.uk/datasets/westminster-parliamentary-constituencies-december-2017-uk-bsc
Shapefile 2:
https://geoportal.statistics.gov.uk/datasets/local-authority-districts-december-2019-boundaries-uk-buc
My issue turned out to be caused by my reprojection step from OSGB to WGS 84.
# THE ORIGINAL PROJECTION ON THE SHAPEFILE
In [16]: lad19.crs
Out[16]: {'init': 'epsg:27700'}
While the result of the following command would suggest that the reprojection step worked
In [17]: lad19.crs = {'init': 'epsg:4326'}
...: lad19.crs
Out[17]: {'init': 'epsg:4326'}
if you look at the geometry attribute you can see that it is still made up of eastings and northings and not longitudes and latitudes as you would expect after reprojecting:
In [8]: lad19["geometry"].head()
Out[8]:
0 POLYGON ((448986.025 536729.674, 453194.600 53...
1 POLYGON ((451752.698 520561.900, 452424.399 52...
2 POLYGON ((451965.636 521061.756, 454348.400 52...
3 POLYGON ((451965.636 521061.756, 451752.698 52...
4 POLYGON ((419709.299 515678.298, 419162.998 51...
Name: geometry, dtype: geometry
The solution was to instead reproject from the original to the desired projection using this method, with the key part being to include inplace=True:
In [11]: lad19.to_crs({'init': 'epsg:4326'},inplace=True)
...: lad19.crs
Out[11]: {'init': 'epsg:4326'}
The eastings and northings contained in the geometry column have now been converted to longitudes and latitudes
In [12]: lad19["geometry"].head()
Out[12]:
0 POLYGON ((-1.24098 54.72318, -1.17615 54.69768...
1 POLYGON ((-1.20088 54.57763, -1.19055 54.57496...
2 POLYGON ((-1.19750 54.58210, -1.16017 54.60449...
3 POLYGON ((-1.19750 54.58210, -1.20088 54.57763...
4 POLYGON ((-1.69692 54.53600, -1.70526 54.54916...
Name: geometry, dtype: geometry
and now gv.Polygons can use this shapefile to successfully produce a choropleth map:
In [13]: gv.Polygons(lad19, vdims='lad19nm',
...: ).opts(tools=['hover','tap'],
...: width=450, height=600
...: )
Out[13]: :Polygons [Longitude,Latitude] (lad19nm)

sklearn Cross validation score gives the same results for every number of folds

I can't figure out why cross validation gives me always the same accuracy (0.92), no matter how much folds i use.
Even when i delete parameter cv=10 it gives me the same result.
#read preprocessed data
traindata = ast.literal_eval(open('pretprocesirano.txt').read())
testdata = ast.literal_eval(open('pretprocesiranoTEST.txt').read())
#create word vector
vectorizer= CountVectorizer(tokenizer=lambda x:x.split(), min_df=3, max_features=300)
traindataCV=vectorizer.fit_transform(traindata)
#save wordlist
wordlist=vectorizer.vocabulary_
#save vectorizer
SavedVectorizer = CountVectorizer(vocabulary=wordlist)
#transform test data
testdataCV=SavedVectorizer.transform(testdata)
#modeling-NaiveBayes
clf = MultinomialNB()
clf.fit(traindataCV, label_train)
#cross validation score
CrossValScore = cross_val_score(clf, traindataCV, label_train, cv=10)
print("Accuracy CrossValScore: %0.3f" %CrossValScore.mean())
I tried this way too, and i also got the same results (0.92). This happens even when i change the number of folds, or remove it.
from sklearn.model_selection import KFold
CrossValScore = cross_val_score(clf, traindataCV, label_train, cv=KFold(10, shuffle=False, random_state=0))
print("Accuracy CrossValScore: %0.3f" %CrossValScore.mean())
Here are some samples:
traindata= ['ucg investment bank studying unicredit intesa paschi merger sole', 'mtoken sredstva autentifikacije intesa line umesto mini cda cega line vise moze koristi aktivacija', 'pll intesa', 'intesa and unicredit banka asset management the leading italia lenders are both after more fee income but url', 'about write intesa scene colbie cailat fosterthepeople that involves sexy taj between these url']
testdata= ['naumovic samo privilegovani nije delatnosti moci imati hit nama traziti depozit rimuje mentionpositive', 'breaking unicredit board okays launch bad loans vehicle with intesa kkr read more url', 'postoji promocija kupovina telefon rate telefon banka popust pretplata url', 'direktor politike haha struja obecao stan svi zaposliti kredit komercijalna banka', 'forex update unicredit and intesa pool bln euros bad loans kkr vehicle url']
label_train=[0 1 0 0 0]
label_test=[1 0 1 1 0]

Resources