Append numpy array with inequal row in a loop - python-3.x

I want to append several arrays but with different size. However I don't want to merge them together, just stock them in a mega-list. Here a simplified code of mine which try to reproduce my problem:
import numpy as np
total_wavel = 5
tot_values = []
for i in range(total_wavel):
size = int(np.random.uniform(low=2, high=7))
values = np.array(np.random.uniform(low=1, high=6, size=(size,)))
tot_values = np.append(tot_values,values)
Exemple Output :
array([4.88776545, 4.86006097, 1.80835575, 3.52393214, 2.88971373,
1.62978552, 4.06880898, 4.10556672, 1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801, 1.7403673 ,
4.90459377, 3.44198867, 5.03055533, 3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Expected Output :
np.array([np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214)], np.array([2.88971373,
1.62978552, 4.06880898, 4.10556672]), np.array([1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801]), np.array([1.7403673 ,
4.90459377, 3.44198867, 5.03055533], np.array([3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])])
Or
np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214], [2.88971373,
1.62978552, 4.06880898, 4.10556672],[1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801], [1.7403673 ,
4.90459377, 3.44198867, 5.03055533], [3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Thank you in advance

In for loop tot_values.append(list(values)), and after loop tot_np=np.array(tot_values)

Related

How to read in pandas column as column of lists?

Probably a simple solution but I couldn't find a fix scrolling through previous questions so thought I would ask.
I'm reading in a csv using pd.read_csv() One column is giving me issues:
0 ['Bupa', 'O2', 'EE', 'Thomas Cook', 'YO! Sushi...
1 ['Marriott', 'Evans']
2 ['Toni & Guy', 'Holland & Barrett']
3 []
4 ['Royal Mail', 'Royal Mail']
It looks fine here but when I reference the first value in the column i get:
df['brand_list'][0]
Out : '[\'Bupa\', \'O2\', \'EE\', \'Thomas Cook\', \'YO! Sushi\', \'Costa\', \'Starbucks\', \'Apple Store\', \'HMV\', \'Marks & Spencer\', "Sainsbury\'s", \'Superdrug\', \'HSBC UK\', \'Boots\', \'3 Store\', \'Vodafone\', \'Marks & Spencer\', \'Clarks\', \'Carphone Warehouse\', \'Lloyds Bank\', \'Pret A Manger\', \'Sports Direct\', \'Currys PC World\', \'Warrens Bakery\', \'Primark\', "McDonald\'s", \'HSBC UK\', \'Aldi\', \'Premier Inn\', \'Starbucks\', \'Pizza Hut\', \'Ladbrokes\', \'Metro Bank\', \'Cotswold Outdoor\', \'Pret A Manger\', \'Wetherspoon\', \'Halfords\', \'John Lewis\', \'Waitrose\', \'Jessops\', \'Costa\', \'Lush\', \'Holland & Barrett\']'
Which is obviously a string not a list as expected. How can I retain the list type when I read in this data?
I've tried the import ast method I've seen in other posts: df['brand_list_new'] = df['brand_list'].apply(lambda x: ast.literal_eval(x)) Which didn't work.
I've also tried to replicate with dummy dataframes:
df1 = pd.DataFrame({'a' : [['test','test1','test3'], ['test59'], ['test'], ['rhg','wreg']],
'b' : [['erg','retbn','ert','eb'], ['g','eg','egr'], ['erg'], 'eg']})
df1['a'][0]
Out: ['test', 'test1', 'test3']
Which works as I would expect - this suggests to me that the solution lies in how I am importing the data
Apologies, I was being stupid. The following should work:
import ast
df['brand_list_new'] = df['brand_list'].apply(lambda x: ast.literal_eval(x))
df['brand_list_new'][0]
Out: ['Bupa','O2','EE','Thomas Cook','YO! Sushi',...]
As desired

Convert a deeply nested list to csv in Python

I wanted to write a deep nested list (within list within list) to csv, but it always collapse my list and printed with ..., which im unable to retrieve the hidden values.
List is to store frames of videos, the list goes up to 5 layer ()-len of each layer, Video (no of videos) > 8 frames(8) > width(200) > height(200) > pixel of 3 channel (3)
I tried converting the list to data frame before writing it to csv but still unable to solve this problem.
"[array([[[0.23137255, 0.26666668, 0.27058825],
[0.23921569, 0.27450982, 0.2784314 ],
[0.23529412, 0.27058825, 0.27450982],
...,
[0.25882354, 0.29411766, 0.2901961 ],
[0.25490198, 0.2901961 , 0.28627452],
[0.25490198, 0.2901961 , 0.28627452]],
[[0.20392157, 0.23921569, 0.24313726],
[0.21568628, 0.2509804 , 0.25490198],
[0.21568628, 0.2509804 , 0.25490198],
...,
[0.26666668, 0.3019608 , 0.29803923],
[0.26666668, 0.3019608 , 0.29803923],
[0.2627451 , 0.29803923, 0.29411766]],
[[0.1882353 , 0.22352941, 0.22745098],
[0.2 , 0.23529412, 0.23921569],
[0.20392157, 0.23921569, 0.24313726],
...,
[0.27450982, 0.30980393, 0.30588236],
[0.27058825, 0.30588236, 0.3019608 ],
[0.27058825, 0.30588236, 0.3019608 ]],
...,
I'd try one of the following:
dump the whole object into json:
import json
with open('my_saved_file.json', 'w+') as out_file:
out_file.write(list_of_lists_of_lists, indent=2)
What I'd try is storing all of your images as images and reference them in an index (could be csv)
import numpy as np
from PIL import Image
with open('reference.csv', 'w+') as out_csv:
out_csv.write("video, frame_set, frame1, frame2, frame3, frame4, frame5, frame6, frame7, frame8\n")
for video_no, video in enumerate(list_of_lists_of_lists):
row = [video_no]
for frame_set_no, frames in enumerate(video):
for frame_no, frame in enumerate(frames):
im = Image.fromarray(frame)
frame_name = f"{video_no}-{frame_set_no}-{frame_no}.jpeg"
row.append(frame_name)
im.save(frame_name)
out_csv.write(",".join(row) + "\n")

Why this code doesn't print the single index?

We have the eng_stress and eng_strain arrays taken from excel file
eng_stress = np.array(eng_stress)
eng_strain = np.array(strain_percent / 100)
eng_strain = eng_strain + 1
true_stress = np.multiply(eng_stress, eng_strain)
true_strain = np.log(eng_strain)
print(true_stress[10])
When I try to acces to a certain index, something like the following happens instead of single outcome.
[466.12834181 466.2044319 466.27916323 466.35480041 466.43043758
466.50562183 466.58125901 466.65689618 466.73208043 466.80771761
466.8838077 466.95853903 467.03508204 467.10981338 467.18545055
467.26108772 467.33627198 467.41145623 467.48709341 467.56273058
467.63882067 467.71355201 467.78918918 467.86482635 467.94001061
468.0161007 468.09083203 468.16692212 468.2425593 468.31774355
468.39292781 468.4690179 468.54374923 468.61983932 468.69502358
468.77020783 468.84629792 468.92148218 468.99666643 469.07275652
469.14794078 469.22357795 469.29966804 469.37439938 469.45048947
469.52522081 469.6013109 469.67649515 469.75167941 469.82731658
469.90295375 469.97813801 470.05377518 470.12895943 470.20459661
470.2806867 470.35541803 470.43196104 470.50669238 470.58232955
470.65796672 470.7336039 470.80833523 470.88442532 470.95960958
471.03524675 471.11088392 471.18606818 471.26170535 471.33688961
471.41252678 471.48771103 471.56380112 471.63898538 471.71507547
471.78980681 471.8658969 471.94108115 472.01626541 472.09190258
472.16753975 472.24272401 472.3188141 472.39354543 472.46918261
472.5452727 472.62000403 472.69654704 472.77127838 472.84736847
472.92209981 472.99773698 473.07337415 473.14901132 473.22419558
473.30028567 473.37501701 473.45065418 473.52629135 473.60102269
473.6775657 473.75229703 473.82884004 473.9040243 473.97920855
474.05439281 474.1304829 474.20521423 474.28130432 474.35648858
474.43167283 474.50776292 474.58294718 474.65813143 474.73376861
474.80940578 474.88459003 474.96113304 475.03586438 475.11195447
475.18668581 475.2627759 475.33796015 475.41314441 475.48878158
475.56441875 475.63960301 475.7156931 475.79087735 475.86606161
475.9421517 476.01688303 476.09342604 476.16815738 476.24379455
476.31943172 476.3950689 476.46980023 476.54589032 476.62107458
476.69716467 476.77234892 476.84753318 476.92317035 476.99835461
477.0744447 477.14962895 477.22526612 477.30045038 477.37654047
477.45127181 477.5273619 477.60209323 477.67773041 477.75336758
477.82855183 477.90464192 477.9802791 478.05501043 478.13064761
478.20628478 478.28146903 478.35801204 478.43274338 478.50883347
478.58356481 478.65920198 478.73483915 478.81002341 478.88566058
478.96175067 479.03648201 479.1125721 479.18775635 479.26294061
479.3390307 479.41376203 479.48985212 479.5650363......... 532]
Maybe eng_stress is a 2D array?
Try:
print(eng_stress.shape)
to find out the shape of the arrays you are working with :)
If your array has the shape (X,1) then it might be in the wrong direction and you could do a quick fix by changing your code to:
eng_stress = np.array(eng_stress).T[0]
eng_strain = np.array(strain_percent / 100)
eng_strain = eng_strain + 1
true_stress = np.multiply(eng_stress, eng_strain)
true_strain = np.log(eng_strain)
print(true_stress[10])
Your numpy arrays may be 2-dimensional. That's why it's printing an array rather than a value. To access a single value of column x, try print(true_stress[10][x]).
The other thing you can do is multiply two 1D numpy arrays. In that case, you'll get a single value.

Numpy Array value setting issues

I have a data set that spans a certain length of time and data points for each of these time points. I would like to create a much more detailed timescale and fill the empty data points to zero. I wrote a piece of code to do this but it isn't doing what I want it to. I tried a sample case though and it seems to work. Below are the two codes.
This piece of code does not do what I want it to.
import numpy as np
TD_t = np.array([36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500,
43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500,
50000, 50500, 51000, 51500, 52000, 52500, 53000, 53500, 54000, 54500, 55000, 55500, 56000, 56500,
57000, 57500, 58000, 58500, 59000, 59500, 60000, 60500, 61000, 61500, 62000, 62500, 63000, 63500,
64000, 64500, 65000, 65500, 66000])
TD_d = np.array([-0.05466527, -0.04238242, -0.04477601, -0.02453717, -0.01662798, -0.02548617, -0.02339215,
-0.01186576, -0.0029057 , -0.01094671, -0.0095005 , -0.0190277 , -0.01215644, -0.01997112,
-0.01384497, -0.01610656, -0.01927564, -0.02119056, -0.011634 , -0.00544096, -0.00046568,
-0.0017769 , -0.0007341, 0.00193066, 0.01359107, 0.02054919, 0.01420335, 0.01550565,
0.0132394 , 0.01371563, 0.01959774, 0.0165316 , 0.01881992, 0.01554435, 0.01409003,
0.01898334, 0.02300266, 0.03045158, 0.02869013, 0.0238423 , 0.02902356, 0.02568908,
0.02954539, 0.02537967, 0.02927247, 0.02138605, 0.02815635, 0.02733237, 0.03321588,
0.03063803, 0.03783137, 0.04110955, 0.0451221 , 0.04646263, 0.04472884, 0.04935833,
0.03372911, 0.04031406, 0.04165237, 0.03940343, 0.03805504])
time = np.arange(0, 100001,1)
data = np.zeros_like(time)
for i in range(0, len(TD_t)):
t = TD_t[i]
data[t] = TD_d[i]
print(i,t,TD_d[i],data[t])
But for some reason this code works.
import numpy
nums = numpy.array([0,1,2,3])
data = numpy.zeros_like(nums)
data[0] = nums[2]
data[0], nums[2]
Any help will be much appreciated!!
It's because the dtype of data is being set to int64, and so when you try to reassign one of the data elements, it gets rounded to zero.
Try changing the line to:
data = np.zeros_like(time, dtype=float)
and it should work (or use whatever dtype the TD_d array is)

python - cannot make corr work

I'm struggling with getting a simple correlation done. I've tried all that was suggested under similar questions.
Here are the relevant parts of the code, the various attempts I've made and their results.
import numpy as np
import pandas as pd
try01 = data[['ESA Index_close_px', 'CCMP Index_close_px' ]].corr(method='pearson')
print (try01)
Out:
Empty DataFrame
Columns: []
Index: []
try04 = data['ESA Index_close_px'][5:50].corr(data['CCMP Index_close_px'][5:50])
print (try04)
Out:
**AttributeError: 'float' object has no attribute 'sqrt'**
using numpy
try05 = np.corrcoef(data['ESA Index_close_px'],data['CCMP Index_close_px'])
print (try05)
Out:
AttributeError: 'float' object has no attribute 'sqrt'
converting the columns to lists
ESA_Index_close_px_list = list()
start_value = 1
end_value = len (data['ESA Index_close_px']) +1
for items in data['ESA Index_close_px']:
ESA_Index_close_px_list.append(items)
start_value = start_value+1
if start_value == end_value:
break
else:
continue
CCMP_Index_close_px_list = list()
start_value = 1
end_value = len (data['CCMP Index_close_px']) +1
for items in data['CCMP Index_close_px']:
CCMP_Index_close_px_list.append(items)
start_value = start_value+1
if start_value == end_value:
break
else:
continue
try06 = np.corrcoef(['ESA_Index_close_px_list','CCMP_Index_close_px_list'])
print (try06)
Out:
****TypeError: cannot perform reduce with flexible type****
Also tried .astype but not made any difference.
data['ESA Index_close_px'].astype(float)
data['CCMP Index_close_px'].astype(float)
Using Python 3.5, pandas 0.18.1 and numpy 1.11.1
Would really appreciate any suggestion.
**edit1:*
Data is coming from an excel spreadsheet
data = pd.read_excel('C:\\Users\\Ako\\Desktop\\ako_files\\for_corr_‌​tool.xlsx') prior to the correlation attempts, there are only column renames and
data = data.drop(data.index[0])
to get rid of a line
regarding the types:
print (type (data['ESA Index_close_px']))
print (type (data['ESA Index_close_px'][1]))
Out:
**edit2*
parts of the data:
print (data['ESA Index_close_px'][1:10])
print (data['CCMP Index_close_px'][1:10])
Out:
2 2137
3 2138
4 2132
5 2123
6 2127
7 2126.25
8 2131.5
9 2134.5
10 2159
Name: ESA Index_close_px, dtype: object
2 5241.83
3 5246.41
4 5243.84
5 5199.82
6 5214.16
7 5213.33
8 5239.02
9 5246.79
10 5328.67
Name: CCMP Index_close_px, dtype: object
Well, I've encountered the same problem today.
try use .astype('float64') to help make the type correct.
data['ESA Index_close_px'][5:50].astype('float64').corr(data['CCMP Index_close_px'][5:50].astype('float64'))
This works well for me. Hope it can help you as well.
You can try as following:
Top15['Citable docs per capita']=(Top15['Citable docs per capita']*100000)
Top15['Citable docs per capita'].astype('int').corr(Top15['Energy Supply per Capita'].astype('int'))
It worked for me.

Resources