How to create a dataframe with different random numbers on each column? - python-3.x
I'm trying to make different random numbers but it keeps being the same on every column, how to fix it, using 1 line?
CODE:
yuju= pd.DataFrame()
column_price_x = [random.uniform(65.5,140.5) for i in range(20)]
for i in range(1990,2020):
yuju[i] = column_price_x
yuju
RESULT
EXPECTED:
Different numbers value for each column
How can I deal with it?
Its much easier than you think
In [12]: import numpy as np
In [13]: df = pd.DataFrame(np.random.rand(5,5))
In [14]: df
Out[14]:
0 1 2 3 4
0 0.463645 0.818606 0.520964 0.016413 0.286529
1 0.701693 0.556813 0.352911 0.738017 0.148805
2 0.899378 0.626350 0.821576 0.917648 0.404706
3 0.985617 0.336138 0.443910 0.690457 0.627859
4 0.121281 0.784853 0.799065 0.102332 0.156317
np.random.rand samples from standard uniform distribution (over [0,1])
Edit
if you want uniform distribution over given numbers, use np.random.uniform
In [16]: pd.DataFrame(np.random.uniform(low=65.5,high=140.5,size=(5,5))
...: )
Out[16]:
0 1 2 3 4
0 124.356069 96.718934 100.587485 136.670313 124.134073
1 68.109675 105.677037 86.084935 109.284336 108.393333
2 120.445978 125.036895 92.557137 105.864824 95.297450
3 91.027931 140.040051 94.362951 80.870850 70.106912
4 107.404708 92.472469 84.748544 82.116756 129.313166
here the solution
each iteration you should random again to assign new value for each column
yuju= pd.DataFrame()
for i in range(1990,2020):
yuju[i]= [random.uniform(65.5,140.5) for i in range(20)]
yuju
output
1990 1991 1992 1993 1994 1995 1996 1997 ...
0 73.117785 104.158470 76.704672 136.295814 106.008801 88.129275 96.843800 118.172649 ... 106.08
1 77.146977 131.584449 112.781430 113.071448 118.806880 140.301281 132.196554 136.222878 ... 74.85
2 67.976294 90.571586 137.313729 126.388545 134.941530 119.544528 119.692859 124.883332 ... 82.48
3 76.577618 102.765745 137.014399 84.696234 70.087628 86.180974 121.070030 87.991356 ... 71.67
4 104.675987 134.869611 120.221701 69.652423 105.650834 107.308007 122.372708 80.037225 ... 90.58
5 107.093326 124.649323 138.961846 84.312784 98.964176 87.691698 120.426266 79.888018 ... 97.46
6 97.375159 97.607740 119.027947 77.545403 81.365235 119.204719 75.426836 132.545121 ... 120.15
7 81.099338 94.315767 123.389789 85.734648 134.746295 99.196135 65.963834 72.895016 ... 135.63
8 129.577824 118.482358 137.838454 83.338883 68.603851 138.657750 85.155046 73.311065 ... 91.12
9 129.321333 134.598491 138.810883 119.487502 75.794849 125.314185 118.499014 126.969947 ... 74.86
10 122.704160 118.282868 114.196318 69.668442 112.237553 68.953530 115.395672 114.560736 ... 88.21
11 112.653109 109.635751 78.470715 81.973892 111.413094 76.918852 76.318205 129.423737 ... 103.06
12 80.984595 136.170595 83.258407 112.248942 96.730922 84.922575 104.984614 127.646325 ... 103.24
13 82.658896 97.066191 95.096705 107.757428 93.767250 93.958438 115.113325 98.931509 ... 105.32
14 85.173060 77.257117 72.668875 87.061919 130.088992 80.001858 104.526423 85.237558 ... 87.86
15 68.428850 79.948204 107.060400 92.962859 133.393354 93.806838 99.258857 138.314982 ... 86.80
16 115.105281 110.567551 119.868457 139.482290 103.235046 128.805920 140.131489 107.568099 ... 98.16
17 71.318147 119.965667 97.135972 90.174975 125.738171 115.655945 86.333461 114.574965 ... 134.80
18 134.000260 121.417473 104.832999 129.277671 139.932955 122.623911 92.369881 109.523118 ... 137.47
19 104.444951 111.712214 130.602922 119.446700 88.256841 110.316280 74.611164 88.364896 ... 115.32
Related
Panda returns 50x1 matrix instead of 50x7? (read_csv gone wrong)
I'm quite new to Python. I'm trying to load a .csv file with Panda but it returns a 50x1 matrix instead of expected 50x7. I'm a bit uncertain whether it is becaue my data contains numbers with "," (although I thought the quotechar attribute would solve that problem). EDIT: Should perhaps mention that including the attribute sep=',' doesn't solve the issue) My code looks like this df = pd.read_csv('data.csv', header=None, quotechar='"') print(df.head) print(len(df.columns)) print(len(df.index)) Any ideas? Thanks in advance Here is a subset of the data as text 10-01-2021,813,116927,"2,01",-,-,- 11-01-2021,657,117584,"2,02",-,-,- 12-01-2021,462,118046,"2,03",-,-,- 13-01-2021,12728,130774,"2,24",-,-,- 14-01-2021,17895,148669,"2,55",-,-,- 15-01-2021,15206,163875,"2,81",5,5,"0,0001" 16-01-2021,4612,168487,"2,89",7,12,"0,0002" 17-01-2021,2536,171023,"2,93",717,729,"0,01" 18-01-2021,3883,174906,"3,00",2147,2876,"0,05" Here is the output of the head-function 0 0 27-12-2020,6492,6492,"0,11",-,-,- 1 28-12-2020,1987,8479,"0,15",-,-,- 2 29-12-2020,8961,17440,"0,30",-,-,- 3 30-12-2020,11477,28917,"0,50",-,-,- 4 31-12-2020,6197,35114,"0,60",-,-,- 5 01-01-2021,2344,37458,"0,64",-,-,- 6 02-01-2021,8895,46353,"0,80",-,-,- 7 03-01-2021,6024,52377,"0,90",-,-,- 8 04-01-2021,2403,54780,"0,94",-,-,-
Using your data I got the expected result. (even without quotechar='"') Could you maybe show us your output? import pandas as pd df = pd.read_csv('data.csv', header=None) print(df) > 0 1 2 3 4 5 6 > 0 10-01-2021 813 116927 2,01 - - - > 1 11-01-2021 657 117584 2,02 - - - > 2 12-01-2021 462 118046 2,03 - - - > 3 13-01-2021 12728 130774 2,24 - - - > 4 14-01-2021 17895 148669 2,55 - - - > 5 15-01-2021 15206 163875 2,81 5 5 0,0001 > 6 16-01-2021 4612 168487 2,89 7 12 0,0002 > 7 17-01-2021 2536 171023 2,93 717 729 0,01 > 8 18-01-2021 3883 174906 3,00 2147 2876 0,05
You need to define the seperator and delimiter, like this: df = pd.read_csv('data.csv', header=None, sep = ',', delimiter=',' , quotechar='"')
Organise and create new dataframe from existing one in Python
I have a list like this. ['9_1_152', '9_2_129', '9_3_22', '9_3_140', '10_3_28', '10_3_134', '10_3_147', '10_5_15', '11_3_18', '11_3_32', '11_3_137', '11_4_150', '12_2_13', '12_3_25', '12_3_151', '12_4_138', '13_4_13', '13_4_27', '13_5_139', '13_5_151', '14_4_16', '14_4_30', '14_4_134', '14_4_146', '15_1_92', '15_2_25', '15_2_122', '15_3_11', '15_4_40', '15_4_73', '15_5_197', '15_6_60', '15_6_103', '15_6_210', '16_1_19', '16_1_34', '16_2_8', '16_2_161', '16_4_51', '16_4_61', '16_4_85', '16_4_109', '16_5_73', '16_7_208', '16_8_213', '17_2_77', '17_4_5', '17_4_44', '17_5_30', '17_5_59', '17_5_97', '17_5_111', '17_5_157', '17_6_177', '17_6_189', '17_9_217', '18_1_22', '18_2_177', '18_2_205', '18_3_163', '18_5_11', '18_5_78', '18_5_107', '18_6_55', '18_6_65', '18_6_89', '18_6_98', '19_1_16', '19_1_68', '19_1_121', '19_1_155', '19_2_181', '19_3_77', '19_3_101', '19_4_37', '19_4_89', '19_5_54', '20_1_22', '20_1_131', '20_1_145', '20_2_172', '20_3_49', '20_6_84', '20_6_159', '20_6_217', '21_2_25', '21_2_139', '21_3_66', '21_4_40', '21_4_191', '21_5_204', '21_6_93', '21_6_108', '22_1_49', '22_1_61', '22_1_134', '22_1_160', '22_1_181', '22_4_1', '22_4_93', '22_5_102', '22_5_211', '22_6_196', '22_6_203', '22_7_12', '22_8_22', '23_3_192', '23_5_92', '23_6_122', '23_6_182', '24_1_87', '24_1_137', '24_2_111', '24_4_76', '24_5_1', '24_6_41', '24_7_12', '24_8_22', '25_1_101', '25_1_137', '25_2_10', '25_2_91', '25_4_165', '25_5_68', '25_6_79', '25_6_113', '25_8_217', '26_2_34', '26_2_66', '26_2_82', '26_2_106', '26_2_117', '26_2_214', '26_4_97', '26_6_172', '26_9_197', '26_10_201', '27_2_34', '27_2_86', '27_4_9', '27_5_49', '27_5_63', '27_5_163', '27_5_190', '27_9_209', '27_10_213', '28_1_205', '28_2_17', '28_2_151', '28_4_58', '28_4_113', '28_4_124', '28_5_169', '28_6_69', '29_1_34', '29_1_81', '29_1_134', '29_1_155', '29_1_173', '29_2_51', '29_6_8', '29_6_21', '30_1_8', '30_1_37', '30_1_126', '30_1_164', '30_2_151', '30_4_65', '30_5_83', '30_5_176', '30_6_50', '31_1_19', '31_1_141', '31_2_58', '31_3_81', '31_5_116', '31_6_45', '32_2_45', '32_2_71', '32_2_97', '32_5_87', '32_5_121', '32_6_21', '32_6_166', '33_1_30', '33_1_55', '33_2_17', '33_2_102', '33_2_166', '33_5_6', '33_5_44', '33_6_117', '34_1_4', '34_1_16', '34_1_43', '34_1_75', '34_1_107', '34_1_116', '34_2_139', '34_5_30', '34_5_183', '35_1_12', '35_3_1', '35_3_39', '35_3_52', '35_3_63', '35_3_73', '35_3_91', '35_3_109', '35_3_118', '35_3_159', '35_3_198', '35_3_210', '35_4_82', '35_4_100', '35_4_131', '35_4_171', '35_4_184', '35_4_222', '35_4_229', '35_5_25', '35_5_145', '37_1_145', '37_1_197', '37_2_132', '37_3_8', '37_3_42', '37_3_56', '37_3_85', '37_3_94', '37_3_112', '37_3_122', '37_3_172', '37_3_186', '37_3_204', '37_3_224', '37_4_103', '37_4_160', '37_4_216', '37_5_25', '37_6_74', '39_1_169', '39_2_157', '39_2_189', '39_3_4', '39_3_15', '39_3_70', '39_3_88', '39_3_97', '39_3_115', '39_3_126', '39_3_179', '39_4_54', '39_4_106', '39_4_142', '39_4_198', '39_4_210', '39_5_39', '42_1_30', '42_1_96', '42_1_141', '42_1_189', '42_2_154', '42_2_197', '42_3_4', '42_3_15', '42_3_46', '42_3_59', '42_3_105', '42_3_166', '42_3_217', '42_4_69', '42_4_79', '42_4_117', '42_4_177', '42_4_204', '42_6_129', '53_3_130', '53_3_143', '53_4_34', '53_4_47', '53_4_156', '53_5_20', '54_4_121', '54_6_13', '54_6_36', '54_6_135', '54_6_147', '55_1_112', '55_2_28', '55_2_143', '55_3_156', '55_5_127', '55_7_3', '55_8_14', '56_3_35', '56_4_20', '56_5_133', '56_6_153', '57_2_21', '57_2_125', '57_2_135', '57_2_147', '57_5_35', '58_2_40', '58_4_23', '58_4_127', '58_4_153', '58_6_141', '166_1_149', '166_2_30', '175_6_17', '175_6_31', '176_6_26', '180_1_26'] I create a dataframe from this list. x 0 9_1_152 1 9_2_129 2 9_3_22 3 9_3_140 4 10_3_28 .. ... 310 166_2_30 311 175_6_17 312 175_6_31 313 176_6_26 314 180_1_26 I splitted this dataframe x[['i','r','p']] = x['x'].str.split('_',expand=True) x['i'] = pd.to_numeric(x['i'], downcast='integer') x['r'] = pd.to_numeric(x['r'], downcast='integer') x['p'] = pd.to_numeric(x['p'], downcast='integer') print(x) and obtain this one. x i r p 0 9_1_152 9 1 152 1 9_2_129 9 2 129 2 9_3_22 9 3 22 3 9_3_140 9 3 140 4 10_3_28 10 3 28 .. ... ... .. ... 310 166_2_30 166 2 30 311 175_6_17 175 6 17 312 175_6_31 175 6 31 313 176_6_26 176 6 26 314 180_1_26 180 1 26 [315 rows x 4 columns] What i would like to do that, create new dataframe. New elements are column 'i'. New columns are column 'r'. New indexes are column 'p'. Like this 1 2 3 4 5 6 17 175 22 9 28 28 129 9 152 9
This might be what you're looking for. x_pivot = x.pivot_table(index="p", columns="r", values="i", aggfunc="sum", fill_value="")
Counting the number of times the values are more than the mean for a specific column in Dataframe
I'm trying to find the number of times the value in a certain column (in this case under "AveragePrice") is more than its mean & median. I calculated the mean using the below: mean_AveragePrice = avocadodf["AveragePrice"].mean(axis = 0) median_AveragePrice = avocadodf["AveragePrice"].median(axis = 0) how do I count the number of times the values were more than the mean? Sample of the Dataframe: Date AveragePrice Total Volume PLU4046 PLU4225 PLU4770 Total Bags 0 27/12/2015 1.33 64236.62 1036.74 54454.85 48.16 8696.87 1 20/12/2015 1.35 54876.98 674.28 44638.81 58.33 9505.56 2 13/12/2015 0.93 118220.22 794.70 109149.67 130.50 8145.35 3 06/12/2015 1.08 78992.15 1132.00 71976.41 72.58 5811.16 4 29/11/2015 1.28 51039.60 941.48 43838.39 75.78 6183.95 5 22/11/2015 1.26 55979.78 1184.27 48067.99 43.61 6683.91 6 15/11/2015 0.99 83453.76 1368.92 73672.72 93.26 8318.86 7 08/11/2015 0.98 109428.33 703.75 101815.36 80.00 6829.22 8 01/11/2015 1.02 99811.42 1022.15 87315.57 85.34 11388.36
import numpy as np mean_AveragePrice = avocadodf["AveragePrice"].mean(axis = 0) median_AveragePrice = avocadodf["AveragePrice"].median(axis = 0) where_bigger = np.where((avocadodf["AveragePrice"] > mean_AveragePrice) & (avocadodf["AveragePrice"] > median_AveragePrice), 1, 0 ) where_bigger.sum() So you got the data you need and now you need the test. np.where will help you out
Find the shortest paths between connected nodes (csv files)
I'm trying to find the longest shortest path(s) between 2 counties. I was given 2 .txt files, one with all of the nodes (county ID, population, latitude and longitude, and commuters inside the county) and one with the links (source county, destination county, distance, number of commuters). 01001 43671 32.523283 -86.577176 7871 01003 140415 30.592781 -87.748260 45208 01005 29038 31.856515 -85.331312 8370 01007 20826 33.040054 -87.123243 3199 01009 51024 33.978461 -86.554768 8966 01011 11714 32.098285 -85.704915 2237 01013 21399 31.735884 -86.662232 5708 01015 112249 33.741989 -85.817544 39856 01017 36583 32.891233 -85.288745 9281 01019 23988 34.184158 -85.621930 4645 01021 39593 32.852554 -86.689982 8115 01023 15922 32.027681 -88.257855 3472 01025 27867 31.688155 -87.834164 7705 ... 01001 01001 0 7871 01001 01007 76.8615966430995 7 01001 01013 87.9182871130127 37 01001 01015 152.858742124667 5 01001 01021 38.1039665382023 350 01001 01031 140.051395101308 8 01001 01037 57.6726084645634 12 01001 01047 48.517875245493 585 01001 01051 38.9559472915165 741 01001 01053 169.524277177911 5 01001 01059 245.323879285783 7 01001 01065 102.775324022097 2 01001 01073 114.124721221283 142 ... 01003 48439 932.019063970525 9 01003 53033 3478.13978129133 11 01003 54081 997.783781484149 10 01005 01005 0.000134258785931453 8370 01005 01011 44.3219329413987 72 01005 01021 168.973302699063 7 ... The first file with the nodes is called "THE_NODES.txt" and the second is "THE_LINKS.txt". How would I use python code to find the longest shortest path(s) between any of the two counties? I assume I start with making a graph of the network, and since the second file has the connections, use 'THE_LINKS.txt' for the edges(I don't know if the weights would be the distance?)? Also, I think these files can only be read as a csv (correct me if I'm wrong), so I can't (or don't know how to) use networkx for this problem.
You can use the read_table function with | separator to read .txt files node = pd.read_table('node.txt', sep='|', header=None) links = pd.read_table('links.txt', sep='|', header=None) Then you need to find the location of countries ( please refer this link : How to select rows from a DataFrame based on column values? ). Then you have to calculate the distance between the countries. What have you tried so far ? Include that too.
Vim loading and formatting slow
I'm using vim with many plugins, the .vimrc file has a big number of plugins, but yet it was very fast, and suddenly for some reason it's not any more, not sure may be after I started using eslinter, it takes about two second every time I save a file, or open a file in a new tab. Is there any way I can find which plugin that is causing all that delay? FUNCTIONS SORTED ON TOTAL TIME count total (s) self (s) function 1 1.193202 0.000100 <SNR>67_BufWritePostHook() 1 1.192941 0.000469 <SNR>67_UpdateErrors() 1 1.188491 0.000942 <SNR>67_CacheErrors() 1 1.184402 0.000040 287() 1 1.184210 0.000254 286() 1 1.183663 0.000135 SyntaxCheckers_javascript_eslint_GetLocList() 1 1.182540 0.000674 SyntasticMake() 1 1.181614 0.000483 syntastic#util#system() 3 0.023413 0.000393 airline#extensions#tabline#get() 3 0.023020 0.001615 airline#extensions#tabline#tabs#get() 12 0.022842 0.010081 <SNR>180_parse_screen() 2 0.018991 0.003055 381() 12 0.012386 <SNR>180_create_matches() 12 0.011903 0.002183 <SNR>172_OnCursorMovedNormalMode() 8 0.011681 0.000275 <SNR>157_get_seperator() 14 0.010961 0.010719 <SNR>172_OnFileReadyToParse() 46 0.009832 0.003413 airline#highlighter#get_highlight() 10 0.009530 0.000443 <SNR>157_get_transitioned_seperator() 10 0.009087 0.000364 airline#highlighter#add_separator() 10 0.008723 0.000830 <SNR>153_exec_separator() FUNCTIONS SORTED ON SELF TIME count total (s) self (s) function 12 0.012386 <SNR>180_create_matches() 14 0.010961 0.010719 <SNR>172_OnFileReadyToParse() 12 0.022842 0.010081 <SNR>180_parse_screen() 92 0.005751 <SNR>153_get_syn() 46 0.009832 0.003413 airline#highlighter#get_highlight() 13 0.003297 <SNR>123_Highlight_Matching_Pair() 2 0.018991 0.003055 381() 1 0.002336 0.002331 gitgutter#sign#remove_signs() 12 0.011903 0.002183 <SNR>172_OnCursorMovedNormalMode() 3 0.001629 airline#extensions#tabline#tabs#map_keys() 3 0.023020 0.001615 airline#extensions#tabline#tabs#get() 1 0.001750 0.001601 gitgutter#async#execute() 12 0.001445 <SNR>146_update() 1 0.001305 0.001302 gitgutter#sign#find_current_signs() 12 0.001276 <SNR>157_get_accented_line() 14 0.001155 <SNR>172_AllowedToCompleteInBuffer() 10 0.003474 0.001059 airline#highlighter#exec() 2 0.001717 0.000985 xolox#misc#cursorhold#autocmd() 1 0.002802 0.000983 347() 1 0.001208 0.000966 gitgutter#sign#upsert_new_gitgutter_signs()