Pandas File Update/Replace values from another reference file - python-3.x

Please help me with the updation of a file, based on values from another file.
The file I received is "todays_file1.csv" and has below table:
name day a_col b_col c_col
alex 22-05 rep 68 67
stacy 22-05 sme 79 81
penny 22-05 rep 74 77
gabbi 22-05 rep 59 61
And so, I need to update the values from only ['day', 'b_col', 'c_col'] into the second file "my_file.csv" which has too many other columns.
name day a_col a_foo b_col b_foo c_col
penny 21-May rep 2 69 31 69
alex 21-May rep 2 71 34 62
gabbi 21-May rep 1 62 32 66
stacy 21-May sme 3 73 38 78
The code I have so far is below:
df1 = pd.read_csv("todays_file1.csv")
df2 = pd.read_csv("my_file.csv")
df2.replace(to_replace=df2['day', 'b_col', 'c_col'], value= df1['day', 'b_col', 'c_col'], inplace=True)
Please help, with how to replace the 3 columns based on the 'name' column which is common in both, but may be jumbled.
I get the error below:
Traceback (most recent call last):
File "D:\TESTING\Trial.py", line 93, in <module>
df2.replace(to_replace=df2['day', 'b_col', 'c_col'], value= df1['day', 'b_col', 'c_col'], inplace=True)
File "C:\Winpy\WPy64-3770\python-3.7.7.amd64\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Winpy\WPy64-3770\python-3.7.7.amd64\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ('day', 'b_col', 'c_col')

"anky" has provided the solution through the comments, and I am ever grateful.
The code below helps solve the problem.
df1 = pd.read_csv("todays_file1.csv")
df2 = pd.read_csv("my_file.csv")
df1.set_index('name')
df2.set_index('name')
df2.update(df1)
df2.to_csv("my_file.csv", index=False)
Thank you again Anky :)

Related

Applying function on columns of pandas data.frame is generating error

Let say I have below pandas data.frame -
>>> Data
Col1 Col2
53 08.02.2020 2020-02-14
55 01.02.2020 2020-02-13
335 30.01.2020 2020-02-14
365 14.02.2020 2020-02-16
446 11.02.2020 2020-02-15
476 03.02.2020 2020-02-18
504 08.02.2020 2020-02-10
557 01.02.2020 2020-02-15
668 10.02.2020 2020-02-15
756 07.02.2020 2020-02-08
Next, I have below function -
is_ten_char = lambda x: x.str.len().eq(10)
But, applying this function to columns to check the number of characters generates error -
Data[is_ten_char(Data.Col1) & is_ten_char(Data.Col2)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/generic.py", line 5270, in __getattr__
return object.__getattribute__(self, name)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/accessor.py", line 187, in __get__
accessor_obj = self._accessor(obj)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/strings.py", line 2041, in __init__
self._inferred_dtype = self._validate(data)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/strings.py", line 2098, in _validate
raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
Any pointer what is going wrong here will be highly helpful.
Col1 is clearly not a datetime format as shown
Col2 probably is a datetime format, so to compare it as a string, do the follwoing
is_ten_char = lambda x: x.str.len().eq(10)
Data[is_ten_char(Data.Col1) & is_ten_char(Data.Col2.dt.strftime('%Y-%m-%d'))]
However, this does not convert Col2 to a string
print(Data['Col2'][53]) >>> Timestamp('2020-02-14 00:00:00')
If you want Col2 converted to a string
Data.Col2 = Data.Col2.dt.strftime('%y-%m-%d')
Then use the original code

pandas/_libs/index.pyx KeyError: 'United States'

I was searching for a row index with name 'United States' and it gives this error
when I try to assign to a new DataFrame. But I can print it? Any idea? Thanks
This gives KeyError df = df.loc[country.strip(), :].to_frame()
It's clearly in the index: United States
Traceback (most recent call last):
File "/Users/feiwhang/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 133, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine._get_loc_duplicates
File "pandas/_libs/index.pyx", line 180, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer
KeyError: 'United States'
But, I can print it
print(df.loc[country.strip(), :].to_frame())
United States
Confirmed 7783
Recovered 0
Death 118
the problem is data,
i had the same problem, since my data consists of both integers and floats
array or matrix can have only one type of data
it can be fixed if you can change the data to be float or integer.
i update the original excel file to floats, and save as CSV (for some reason it won't update my file to floats) and run you program
hope it helps

OpenCV Image Denoising gives: Error: -215:Assertation failed

Trying to denoise a really simple image, using the code below. When printing out the array of data I get the following structure, which is expected as the image is greyscale:
[[ 62 62 63 ... 29 16 6]
[ 75 90 103 ... 21 16 12]
[ 77 100 118 ... 29 29 30]
...
[ 84 68 56 ... 47 50 53]
[101 94 89 ... 40 44 48]
Here is the code and the associated error, at this point I'm a little stuck. Any suggestions?
import cv2
from matplotlib import pyplot as plt
img = cv2.imread(path,0)
dst = cv2.fastNlMeansDenoising(img,None,10,10,7,21)
plt.subplot(211),plt.imshow(dst)
plt.subplot(212),plt.imshow(img)
plt.show()
____________________________________________________________________
runfile(___, wdir='G:/James Alexander/Python Programs')
Traceback (most recent call last):
File "<ipython-input-127-ce832752c183>", line 1, in <module>
runfile('G:/James Alexander/Python Programs/Noiseremoval.py', wdir=___)
File "___", line 704, in runfile
execfile(filename, namespace)
File "___", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "___", line 13, in <module>
dst = cv2.fastNlMeansDenoising(img,None,10,10,7,21)
error: OpenCV(4.1.0) C:\projects\opencv-python\opencv\modules\photo\src\denoising.cpp:120: error: (-215:Assertion failed) hn == 1 || hn == cn in function 'cv::fastNlMeansDenoising'
Read the documentation on the Denoising function that you're using. There are two ways to call the function and you seem to be doing a combination of the two.
dst = cv.fastNlMeansDenoising(src[, dst[, h[, templateWindowSize[, searchWindowSize]]]])
or
dst = cv.fastNlMeansDenoising(src, h[, dst[, templateWindowSize[, searchWindowSize[, normType]]]])
You are calling it with (src, dst, h, templateWindowSize, searchWindowSize, normType) which either has too many parameters or is in the wrong order, depending on which method you want to use.
change your parameters to
dst = cv2.fastNlMeansDenoising(img, None, 30, 7, 21)

Why this time I can not selec one column from a DataFrame by print(['column1'])?

I can selected one column from a DataFrame, for example: the code like print(df['201809']) works:
df = pd.read_csv('xxxx.csv', low_memory=False)
print(df.info()]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 11 entries, 0 to 10
Data columns (total 4 columns):
BO_product2 11 non-null object
201808 11 non-null float64
201809 11 non-null float64
4 11 non-null float64
dtypes: float64(3), object(1)
memory usage: 440.0+ bytes
print(df['201809']) # works fine
None
0 1.634931e+06
1 2.653640e+08
2 7.475315e+07
3 9.710830e+06
4 3.023899e+08
5 1.087862e+08
6 2.031106e+08
7 3.556234e+08
8 5.830665e+06
9 8.766841e+08
10 7.544689e+07
Name: 201809, dtype: float64
However print(df['4']) don't. Any tips or ideas is here?
PS: if i save the df.to_csv('yy.csv) to local file in csv format, print(a['4'])works after `df = pd.read_csv('yy.csv').
print(df['4'])
Traceback (most recent call last):
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3063, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '4'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/Python/2.py", line 45, in <module>
he()
File "E:/Python/2.py", line 26, in he
print(a['4'])
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2685, in __getitem__
return self._getitem_column(key)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2692, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3065, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '4'
If you execute the below:
[type(i) for i in df.columns]
#[str, str, str, int]
For columns having type int you should call the column as df[4] and not df['4']
Probably the reason why it is getting written as string is due to the quoting builtin function. From the docs:
quoting : optional constant from csv module
defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are >>converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non->>numeric
Hope this helps.

read one value from read_csv error

tst = pd.read_csv('/Users/me/Desktop/stuff/Et2Load.csv', header=0,delimiter="\t", quoting=3)
print(tst.head(2)) # ok
#print(tst['date'][0])
I made up this file, one line header 2 lines
3 columns, 2 lines
id,date,coldata
0 1,August 18 2016,"With all this stuff going do...
1 2,August 19 2016,this is a great movie. The mu...
i cannot access a specific "cell"
print(tst['date'][0]) error
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'date'
well this is the big secret:
set_index('name the column that going to be used as 'id'')
It was difficult to find this one.

Resources