Unable to replace values using Dict on DataFrame column - python-3.x

I am creating a dictionary code_data by loading a CSV file to a data frame and converting it with to_dict method. This is a fragment of my code:
path = "E:\Knoema_Work_Dataset\hrfpfwd\MetaData\MetaData_num_person.csv"
code_data = pd.read_csv(path, usecols=['value', 'display_value'], dtype=object)
code_data = code_data.set_index('value')['display_value'].to_dict()
In the following line I am attempting to replace its values:
data["Number of Deaths"] = data["Number of Deaths"].replace(code_data)
Sadly, it leads to an error:
Cannot compare types 'ndarray(dtype=int64)' and 'str'
Could you provide me with some assistance with regards to my problem?

Related

AttributeError:Float' object has no attribute log /TypeError: ufunc 'log' not supported for the input types

I have a series of fluorescence intensity data in a column ('2.4M'). I tried to create a new column 'ln_2.4M' by taking the ln of column '2.4M' I got an error:
AttributeError: 'float' object has no attribute 'log'
df["ln_2.4M"] = np.log(df["2.4M"])
I tried using a for loop to iterate the log over each fluorescence data in the column "2.4M":
ln2_4M = []
for x in df["2.4M"]:
ln2_4M = np.log(x)
print(ln2_4M)
Although it printed out ln2_4M as log of column "2.4M" correctly, I am unable to use the data because it gave alongside a TypeError:
ufunc 'log' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'
Not sure why? - Any help at understanding what is happening and how to fix this problem is appreciated. Thanks
.
I then tried using the method below and it worked:
df["2.4M"] = pd.to_numeric(df["2.4M"],errors = 'coerce')
df["ln_24M"] = np.log(df["2.4M"])

Extract a specific formatted output from a nested list in python

Using a csv extract from a registration system I am attempting to format the data to use as contact/distribution list import into a virtual meeting application. Using the following function I am able to pull the needed data into a nested list ([name1, email1] [name2, email2],...).
def createDistributionList():
with open(fileOpen) as readFile, open('test2.txt', 'w') as writeFile:
data = pd.read_csv(readFile)
df = pd.DataFrame(data, columns= ['Attendee Name', 'Attendee Email'])
distList = df.values.tolist()
print(' '.join(map(str, distList)))
The format I need the data in is one long string - name1(email1);name2(email2);...
I have been unable to get the output that I am looking for. Any assistance or a pointer to a relevant reference would be greatly appreciated.
You can use list comprehension for that:
tup = (["name1", "email1"], ["name2", "email2"], ["name3", "email3"])
print(";".join(["{}({})".format(l[0], l[1]) for l in tup]))

Issues deleting discrete vals from transformed sparse vector list using regex in python

I'm trying to remove all values with index vals 1, 2, and 3, from a string list like
['1:1', '2:100.0', '3:100.0',...]. The data is in sparse vector format and was loaded as a pandas dataframe. I used an online regex tester to match the first three positions of this list with success.
But as it exists in my program, the same regex doesn't work. On running:
data = pd.read_csv("c:\data.csv")
for index, row in data.itterrows():
line = parseline(row)
def parseline(line):
line = line.values.flatten() # data like: ['1:1 2:100.0 3:100.0...']
stringLine = listToString(line) # data like: 1:1 2:100.0 3:100.0...
splitLine = stringLine.split(" ") # data like: ['1:1', '2:100.0', '3:100.0',...]
remove = re.findall(r"'1:1'|'[2,3]:\d+.\d+'")
splitLine.remove(remove)
print(splitLine)
I get the following error:
TypeError: findall() missing 1 required positional argument: 'string'
Does anyone have any ideas? Thanks in advance.
The splitLine object was actually a list, but the re.findall() method (and re.sub() method, which was what was actually used) requires a string, instead of a list. Was just operating on the wrong data structure. Ultimately:
def parseline(line):
line = line.values.flatten().tolist()
stringLine = listToString(line)
stringLine = re.sub(r"1:1 |2:\d+.\d+ ", "", stringLine)
...
did the trick.

AttributeError: 'tuple' object has no attribute 'value' PYTHON TO EXCEL

I am currently trying to write data to an excel spreadsheet using Python3 and openpyxl. I have figured out how to do it when assigning one single value, but for some reason when I introduce a For loop it is giving me an error. This program will eventually be used to filter through a python dictionary and print out the keys and values from the python dictionary. For now, I am just trying to create a loop that will input a random integer in the spreadsheet for every key listed in the dictionary (not including nested keys). If anyone can help me determine why this error is coming up it would be much appreciated. Thanks in advance!
# Writing Dictionary to excel spreadsheet
import openpyxl
import random
wb = openpyxl.load_workbook("ExampleSheet.xlsx")
sheet = wb.get_sheet_by_name("Sheet1")
sheet["B1"].value = "Price" #This works and assigns the B1 value to "price" in the spreadsheet
my_dict = {'key': {'key2' : 'value1', 'key3' : 'value2'} 'key4' : {'key5' : 'value3', 'key6' : 'value4'}} #an example dictionary
n = len(my_dict)
for i in range(0,n):
sheet['A'+ str(i)].value = random.randint(1,10) #This does not work and gives an error
wb.save('ExampleSheet.xlsx')
OUTPUT >>> AttributeError: 'tuple' object has no attribute 'value'
The first column of pyxl, is one based, so if you modify your loop to go over range(1,n) your issues should be resolved
Using .format(i) instead of string + str(i) in ur code may work well!
BTW, ur var my_dict get an error .
eg:
for i in range(10):
sheet['A{}'.format(i)].value = 'xx'

AttributeError: 'numpy.ndarray' object has no attribute 'rolling'

When I am trying to do MA or rolling average with log transformed data I get this error. Where am I going wrong?
This one with original data worked fine-
# Rolling statistics
rolmean = data.rolling(window=120).mean()
rolSTD = data.rolling(window=120).std()
with log transformed data-
MA = X.rolling(window=120).mean()
MSTD = X.rolling(window=120).std()
AttributeError: 'numpy.ndarray' object has no attribute 'rolling'
You have to convert the numpy array to a pandas dataframe to use the pandas.rolling method.
The change could be something like this
dataframe = pd.DataFrame(data)
rolmean = dataframe.rolling(120).mean()
Try this instead:
numpy.roll(your_array, shift, axis = None)
There is no attribute rolling in numpy. So you shoud use the above syntax
Hope this helps

Resources