Decoding Target values as "NEGATIVE" and "POSITIVE" - nlp

trying to decode the target variables 0:Negative, 2:Neutral and 4:Positive
df.target.values
Output:array(['NEGATIVE', 'NEGATIVE', 'NEGATIVE', ..., 'POSITIVE', 'POSITIVE',
'POSITIVE'], dtype=object)
So, I tried doing
decode_map = {0: 'NEGATIVE', 2: 'NEUTRAL', 4: 'POSITIVE'}
def decode_sentiment(label):
return decode_map[int(label)]
df.target = df.target.apply(lambda x: decode_sentiment(x))
But it is giving me the Error of
ValueError: invalid literal for int() with base 10: 'NEGATIVE'

Related

How to drop rows with string <NA> value and trim strings from pandas data frame

I have the below python code:
import streamlit as st
import subprocess
import pandas as pd
git_output = subprocess.run(['git', 'worktree', 'list', '--porcelain'], cwd='F:/myenv/',
capture_output=True,
text=True).stdout
df = pd.DataFrame([
{line.split()[0]: line.rsplit(" ", 1) for line in block.splitlines()}
for block in git_output.split("\n\n")])
st.table(df.filter(items=['worktree', 'branch']))
and the output is:
worktree branch
0 ["worktree","F:/demo/a"] <NA>
1 ["worktree","F:/demo/b"] ["branch","refs/heads/dev/demo/b"]
2 ["worktree","F:/demo/c"] ["branch","refs/heads/dev/demo/c"]
3 <NA> <NA>
which actions I can do on the df object to get this output:
worktree branch
0 [F:/demo/b] [refs/heads/dev/demo/b]
1 [F:/demo/c] [refs/heads/dev/demo/c]
Per the comments, also added Dictionary value:
{'worktree': {0: ['worktree', 'F:/myenv'], 1: ['worktree', 'F:/demo/a'], 2: ['worktree', 'F:/demo/b'], 3: ['worktree', 'F:/demo/c'], 4: nan}, 'bare': {0: ['bare'], 1: nan, 2: nan, 3: nan, 4: nan}, 'HEAD': {0: nan, 1: ['HEAD', '48cfcf49e277bafad'], 2: ['HEAD', '21eae7bc2694a3aaaf'], 3: ['HEAD', '28755aad57bf4820ca5'], 4: nan}, 'branch': {0: nan, 1: ['branch', 'refs/heads/dev/demo/a'], 2: ['branch', 'refs/heads/dev/demo/b'], 3: ['branch', 'refs/heads/dev/demo/c'], 4: nan}, 'prunable': {0: nan, 1: ['prunable gitdir file points to non-existent', 'location'], 2: nan, 3: nan, 4: nan}}
This will work:
import ast
df = df.dropna().astype(str).apply(lambda col: col.apply(lambda x: ast.literal_eval(x)[-1]))
Output:
>>> df
worktree branch
1 F:/demo/b refs/heads/dev/demo/b
2 F:/demo/c refs/heads/dev/demo/c
If you're sure that the contain real list objects and not just strings, you can omit the astype(str) and ast stuff:
df = df.dropna().apply(lambda col: col.str[-1]))
Using the comments and suggested answer this is what worked for me:
import streamlit as st
import subprocess
import pandas as pd
import ast
git_output = subprocess.run(['git', 'worktree', 'list', '--porcelain'], cwd='F:/views/g/MonoCentral',
capture_output=True,
text=True).stdout
df = pd.DataFrame([
{line.split()[0]: line.rsplit(" ", 1) for line in block.splitlines()}
for block in git_output.split("\n\n")])
df_new = df.dropna(subset=['worktree', 'HEAD', 'branch'])
df_filter = df_new.filter(items=['worktree', 'branch'])
df_filter['worktree'] = df_filter['worktree'].str[1]
df_filter['branch'] = df_filter['branch'].str[1]
st.table(df_filter)

Python3, Pandas, Extract associated Data from Dataframe to dictionaries

I've been working with a CSV file. I created a dataframe based on the file:
and added index names. and printed the dataframe (with added index names):
#Here is the code before failed computations.
import pandas as pd
import csv
colName = ['carIndex', 'carMake', 'Floatnum']
data2 = pd.read_csv('cars.csv', names=colName)
print(data2)
I have been trying to extract data to reach my goals although I have had some difficulty. My goals are as follows:
extract data and write alphabetical Dictionary with "carIndex" as key and a value of 0 - 2 (associated with Car A - C)
extract data and write alphabetical Dictionary with "carMake" as key and a value of 0 - 1 (Associated With Make X & Y)
Create (three) key-value pairs for the make "X" & "Y"'s values (associated with carIndex A-C) If a value doesn't exist the index should be None. append all three to a list of lists.
Finally take all three fields (First Dictionary, Second Dictionary, List-of-lists) and add them to a tuple for exportation
Anyone have suggestions for how I can extract the data as I want? Thanks in advance.
In Response to:
Will you please add two things to the question: 1. a text version of your dataframe (preferrably from print(df.to_dict())), and 2. a sample dataframe containing your expected output?
print(data2.to_dict()) (outputs) --> {'carIndex': {0: 'Car C', 1: 'Car A', 2: 'Car B', 3: 'Car B', 4: 'Car A'}, 'carMake': {0: ' Make X', 1: ' Make X', 2: ' Make X', 3: ' Make Y', 4: ' Make Y'}, 'Floatnum': {0: 2.0, 1: 2.5, 2: 1.5, 3: 4.0, 4: 3.5}}
Output Tuple I want: print(my_tup) (outputs) -->
({'Car A': 0, 'Car B': 1, 'Car C': 2}, {'Make X': 0, ' Make Y': 1}, [[2.5, 3.5], [1.5, 4.0], [1.0, None]])
Extract data and write alphabetical Dictionary with "carIndex" as key and a value of 0 - 2 (associated with Car A - C)
sorted = data2.sort_values('carIndex').drop_duplicates(subset='carIndex').reset_index()
carIndexDict = sorted['carIndex'].to_dict()
This will output
{0: 'Car A', 1: 'Car B', 2: 'Car C'}
Extract data and write alphabetical Dictionary with "carMake" as key and a value of 0 - 1 (Associated With Make X & Y)
Use the same strategy:
sorted = data2.sort_values('carMake').drop_duplicates(subset='carMake').reset_index()
carMakeDict = sorted['carMake'].to_dict()
Output:
{0: 'Make X', 1: 'Make Y'}
To make the list:
carIndexes = carIndexDict.values()
carMakes = carMakeDict.values()
full_list = []
for idx in carIndexes:
idx_search = data2.loc[df['carIndex'] == idx]
car_list = []
for make in carMakes:
make_search = idx_search.loc[idx_search['carMake'] == make]
if not make_search.empty:
car_list.append(make_search['Floatnum'].iloc[0])
else:
car_list.append(None)
full_list.append(car_list)
Outputs:
[[2.5, 3.5], [1.5, 4.0], [2.0, None]]
And finally the tuple:
myTuple = (carIndexDict, carMakeDict, full_list)
Outputs:
({0: 'Car A', 1: 'Car B', 2: 'Car C'}, {0: 'Make X', 1: 'Make Y'}, [[2.5, 3.5], [1.5, 4.0], [2.0, None]])

Different results in eigenvector centrality numpy

The following example gives different results obtained with eigenvector_centrality and eigenvector_centrality_numpy. Is there a way to make such calculation more robust? I'm using networkx 2.4, numpy 1.18.5 and scipy 1.5.0.
import numpy as np
import networkx as nx
AdjacencyMatrix = {
0: {
1: 0.6,
},
1: {
2: 0,
3: 0,
},
2: {
4: 0.5,
5: 0.5,
},
3: {
6: 0.5,
7: 0.5,
8: 0.5,
},
4: {},
5: {},
6: {},
7: {},
8: {},
}
G = nx.DiGraph()
for nodeID in AdjacencyMatrix.keys():
G.add_node(nodeID)
for k1 in AdjacencyMatrix.keys():
for k2 in AdjacencyMatrix[k1]:
weight = AdjacencyMatrix[k1][k2]
split_factor = len(AdjacencyMatrix[k1])
G.add_edge(k1, k2, weight=weight / split_factor, reciprocal=1.0 / (split_factor * weight) if weight != 0 else np.inf)
eigenvector_centrality = {v[0]: v[1] for v in sorted(nx.eigenvector_centrality(G.reverse() if G.is_directed() else G, max_iter=10000, weight="weight").items(), key=lambda x: x[1], reverse=True)}
print(eigenvector_centrality)
eigenvector_centrality_numpy = {v[0]: v[1] for v in sorted(nx.eigenvector_centrality_numpy(G.reverse() if G.is_directed() else G, max_iter=10000, weight="weight").items(), key=lambda x: x[1], reverse=True)}
print(eigenvector_centrality_numpy)
Here's my output:
{0: 0.6468489798823026, 3: 0.5392481399595738, 2: 0.5392481399595732, 1: 0.0012439403459275048, 4: 0.0012439403459275048, 5: 0.0012439403459275048, 6: 0.0012439403459275048, 7: 0.0012439403459275048, 8: 0.0012439403459275048}
{3: 0.9637027924175013, 0: 0.0031436862826891288, 6: 9.593026373266866e-11, 8: 3.5132785569658154e-11, 4: 1.2627565659784068e-11, 1: 9.433263632036004e-14, 7: -2.6958851817582286e-11, 5: -3.185304797703736e-11, 2: -0.26695888283266833}
edit - see the response by dshult. He's one of the main people who maintains/updates networkx.
I think this may be a bug, but not the way you think. This graph is directed and acyclic. So for this graph, I don't think there is a nonzero eigenvalue.
It looks like the algorithm seems to implicitly assume an undirected graph, or at least that if it's directed it has cycles. And I would expect the algorithm to break if there's no cycle.
I'm going to encourage the networkx people to look at this in more detail.
I'm actually surprised that it converges for the non-numpy version.
Joel is right to say that eigenvector_centrality isn't a useful measure for directed acyclic graphs. See this nice description of centrality. This should be useless for both the numpy and non-numpy versions of the code.

list indices must be integers or slices, not dict - in specified case?

first time i post question on stack ! I'd like to go straight into my problem !
Line 114 : error list indices must be integers or slices, not dict
This line got error(I bolded the full code below) :
if dist[c]>=dist[v]+dataAry[v][c]:
I just need give some solutions for this error, the majority of code worked as i wish, so i have to write some more sentences that stack allow me to post
dataJustified="{:>3}"
dataAry = {
0: [{1: 5}, {2: 4}, {5: 5}],
1: [{3: 3}],
2: [{7: 6}],
3: [{8: 8}],
4: [{8: 4}],
5: [{4: 2}, {6: 4}, {9: 3}],
6: [{7: 7}, {10: 5}],
7: [],
8: [{9: 7}],
9: [],
10: [],
}
print("Display Dijkstra's algorithm Shortest Path")
dist=[]
prev=[]
s=0
v=0
vl=[]
for c in range(len(dataAry)):
if c==0:
dist.append(0)
prev.append(-1)
else:
dist.append(999)
prev.append(0)
while len(vl)!=len(dataAry)-1:
vl.append(v)
u=[]
del u[:]
for index in dataAry[v]:
u.append(index)
for c in u:
if dist[c]>=dist[v]+dataAry[v][c]:
dist[c]=dist[v]+dataAry[v][c]
prev[c]=v
new_dist = dist[:]
for x in vl:
new_dist.remove(dist[x])
minV = min(new_dist)
if dist.index(minV) not in vl:
v = dist.index(minV)
else:
new_dist_2 = dist[:]
new_dist_2[dist.index(minV)] = 999
v = new_dist_2.index(minV)
print("dist : ", dist)
print("prev : ", prev)

convert integer to string then back to integer

I'm doing this python exercise where the goal is for a user to key in an integer and the function should be able to re-arrange (descending) I decided to convert the integer first to a string so I can iterate from that and the result is stored in a list. But when i try to convert it back to integer it doesn't get converted.
as shown in my code below, i tried printing the type of my variables so that I would see if its getting converted.
def conversion(nums):
int_to_str = str(nums)
list_int = []
ans = []
for x in int_to_str:
list_int.append(x)
list_int.sort(reverse=True)
ans = list_int
print (type(ans))
print(ans)
ans = ''.join(list_int)
print(type(ans))
print(ans)
str_to_int = [int(x) for x in list_int] # LIST COMPREHENSION to convert
# string back to integer type
print(type(str_to_int))
print(str_to_int)
final = ''.join(str_to_int)
print(type(final))
print(final)
enter code here
<class 'list'>
['9', '5', '4', '2', '1', '0']
<class 'str'>
954210
<class 'list'>
[9, 5, 4, 2, 1, 0]
TypeError: sequence item 0: expected str instance, int found
if I understood your question, you are receiving an input (assuming string representation of some int) and you want to convert that input to a list of integers then reverse sort and return. if that is the case:
def reverse_numeric_input(x):
try:
if type(x) != str:
x=str(x)
lst=[int(i) for i in x]
lst.sort(reverse=True)
return "".join([str(i) for i in lst])
except Exception as e:
print("%s error coverting ur input caused by: %s" % (e.__class__.__name__, str(e)))
the problem in the code you posted lies in this line final = ''.join(str_to_int) when you call join, the joined items must be cast to str() first. hope that helps.

Resources