Convert a deeply nested list to csv in Python - python-3.x

I wanted to write a deep nested list (within list within list) to csv, but it always collapse my list and printed with ..., which im unable to retrieve the hidden values.
List is to store frames of videos, the list goes up to 5 layer ()-len of each layer, Video (no of videos) > 8 frames(8) > width(200) > height(200) > pixel of 3 channel (3)
I tried converting the list to data frame before writing it to csv but still unable to solve this problem.
"[array([[[0.23137255, 0.26666668, 0.27058825],
[0.23921569, 0.27450982, 0.2784314 ],
[0.23529412, 0.27058825, 0.27450982],
...,
[0.25882354, 0.29411766, 0.2901961 ],
[0.25490198, 0.2901961 , 0.28627452],
[0.25490198, 0.2901961 , 0.28627452]],
[[0.20392157, 0.23921569, 0.24313726],
[0.21568628, 0.2509804 , 0.25490198],
[0.21568628, 0.2509804 , 0.25490198],
...,
[0.26666668, 0.3019608 , 0.29803923],
[0.26666668, 0.3019608 , 0.29803923],
[0.2627451 , 0.29803923, 0.29411766]],
[[0.1882353 , 0.22352941, 0.22745098],
[0.2 , 0.23529412, 0.23921569],
[0.20392157, 0.23921569, 0.24313726],
...,
[0.27450982, 0.30980393, 0.30588236],
[0.27058825, 0.30588236, 0.3019608 ],
[0.27058825, 0.30588236, 0.3019608 ]],
...,

I'd try one of the following:
dump the whole object into json:
import json
with open('my_saved_file.json', 'w+') as out_file:
out_file.write(list_of_lists_of_lists, indent=2)
What I'd try is storing all of your images as images and reference them in an index (could be csv)
import numpy as np
from PIL import Image
with open('reference.csv', 'w+') as out_csv:
out_csv.write("video, frame_set, frame1, frame2, frame3, frame4, frame5, frame6, frame7, frame8\n")
for video_no, video in enumerate(list_of_lists_of_lists):
row = [video_no]
for frame_set_no, frames in enumerate(video):
for frame_no, frame in enumerate(frames):
im = Image.fromarray(frame)
frame_name = f"{video_no}-{frame_set_no}-{frame_no}.jpeg"
row.append(frame_name)
im.save(frame_name)
out_csv.write(",".join(row) + "\n")

Related

Append numpy array with inequal row in a loop

I want to append several arrays but with different size. However I don't want to merge them together, just stock them in a mega-list. Here a simplified code of mine which try to reproduce my problem:
import numpy as np
total_wavel = 5
tot_values = []
for i in range(total_wavel):
size = int(np.random.uniform(low=2, high=7))
values = np.array(np.random.uniform(low=1, high=6, size=(size,)))
tot_values = np.append(tot_values,values)
Exemple Output :
array([4.88776545, 4.86006097, 1.80835575, 3.52393214, 2.88971373,
1.62978552, 4.06880898, 4.10556672, 1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801, 1.7403673 ,
4.90459377, 3.44198867, 5.03055533, 3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Expected Output :
np.array([np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214)], np.array([2.88971373,
1.62978552, 4.06880898, 4.10556672]), np.array([1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801]), np.array([1.7403673 ,
4.90459377, 3.44198867, 5.03055533], np.array([3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])])
Or
np.array([4.88776545, 4.86006097, 1.80835575, 3.52393214], [2.88971373,
1.62978552, 4.06880898, 4.10556672],[1.33428321, 3.81505999,
3.95533471, 2.18424975, 5.15665168, 5.38251801], [1.7403673 ,
4.90459377, 3.44198867, 5.03055533], [3.96271897, 1.93934124,
5.60657218, 1.24646798, 3.14179412])
Thank you in advance
In for loop tot_values.append(list(values)), and after loop tot_np=np.array(tot_values)

trouble with transpose in pd.read_csv

I have a data in a CSV file structured like this
Installation Manufacturing Sales & Distribution Project Development Other
43,934 24,916 11,744 - 12,908
52,503 24,064 17,722 - 5,948
57,177 29,742 16,005 7,988 8,105
69,658 29,851 19,771 12,169 11,248
97,031 32,490 20,185 15,112 8,989
119,931 30,282 24,377 22,452 11,816
137,133 38,121 32,147 34,400 18,274
154,175 40,434 39,387 34,227 18,111
I want to skip the header and transpose the list like this
43934 52503 57177 69658 97031 119931 137133 154175
24916 24064 29742 29851 32490 30282 38121 40434
11744 17722 16005 19771 20185 24377 32147 39387
0 0 7988 12169 15112 22452 34400 34227
12908 5948 8105 11248 8989 11816 18274 18111
Here is my code
import pandas as pd
import csv
FileName = "C:/Users/kesid/Documents/Pthon/Pthon.csv"
data = pd.read_csv(FileName, header= None)
data = list(map(list, zip(*data)))
print(data)
I am getting the error "TypeError: zip argument #1 must support iteration". Any help much appreciated.
You can read_csv in "normal" mode:
df = pd.read_csv('input.csv')
(colum names will be dropper later).
Start processing from replacing NaN with 0:
df.fillna(0, inplace=True)
Then use either df.values.T or df.values.T.tolist(), whatever
better suits your needs.
You should use skiprows=[0] to skip reading the first row and use .T to transpose
df = pd.read_csv(filename, skiprows=[0], header=None).T

how to perform calculation for all elements of list parellely or simultaneously to reduce the run time?

I am extracting a dataframe from yahoo API for a list of companies but I want to do it paralelly(i.e, the dataframes for all the companies should be extracted simultaneously).So, it reduces my run time....
My code:
import pandas_datareader.data as web
import pandas as pd
import datetime
end_date = datetime.datetime.now().strftime('%d/%m/%Y')
temp = datetime.datetime.now() - datetime.timedelta(6*365/12)
start_date = temp.strftime('%d/%m/%Y')
f = web.DataReader('ACC.NS', 'yahoo', start_date, end_date)
print(f)
The output of this code is a dataframe shown below:
This i have done for single company....
I want to do it for a list of companies which is:
Company_Names = ['ACC', 'ADANIENT', 'ADANIPORTS', 'ADANIPOWER', 'AJANTPHARM', 'ALBK', 'AMARAJABAT', 'AMBUJACEM', 'APOLLOHOSP', 'APOLLOTYRE', 'ARVIND', 'ASHOKLEY', 'ASIANPAINT', 'AUROPHARMA', 'AXISBANK',
'BAJAJ-AUTO', 'BAJFINANCE', 'BAJAJFINSV', 'BALKRISIND', 'BANKBARODA', 'BANKINDIA', 'BATAINDIA', 'BEML', 'BERGEPAINT', 'BEL', 'BHARATFIN', 'BHARATFORG', 'BPCL', 'BHARTIARTL', 'INFRATEL', 'BHEL', 'BIOCON', 'BOSCHLTD', 'BRITANNIA',
'CADILAHC', 'CANFINHOME', 'CANBK', 'CAPF', 'CASTROLIND', 'CEATLTD', 'CENTURYTEX', 'CESC', 'CGPOWER', 'CHENNPETRO', 'CHOLAFIN', 'CIPLA', 'COALINDIA', 'COLPAL', 'CONCOR', 'CUMMINSIND', 'DABUR', 'DCBBANK',
'DHFL', 'DISHTV', 'DIVISLAB', 'DLF', 'DRREDDY', 'EICHERMOT', 'ENGINERSIN', 'EQUITAS', 'ESCORTS', 'EXIDEIND',
'FEDERALBNK', 'GAIL', 'GLENMARK', 'GMRINFRA', 'GODFRYPHLP', 'GODREJCP', 'GODREJIND', 'GRANULES', 'GRASIM', 'GSFC', 'HAVELLS', 'HCLTECH', 'HDFCBANK', 'HDFC', 'HEROMOTOCO', 'HEXAWARE', 'HINDALCO', 'HCC', 'HINDPETRO', 'HINDUNILVR',
'HINDZINC', 'ICICIBANK', 'ICICIPRULI', 'IDBI', 'IDEA', 'IDFCBANK', 'IDFC', 'IFCI', 'IBULHSGFIN', 'INDIANB', 'IOC', 'IGL', 'INDUSINDBK', 'INFIBEAM', 'INFY', 'INDIGO', 'IRB', 'ITC', 'JISLJALEQS', 'JPASSOCIAT', 'JETAIRWAYS', 'JINDALSTEL',
'JSWSTEEL', 'JUBLFOOD', 'JUSTDIAL', 'KAJARIACER', 'KTKBANK', 'KSCL', 'KOTAKBANK', 'KPIT', 'L&TFH', 'LT', 'LICHSGFIN', 'LUPIN', 'M&MFIN', 'MGL', 'M&M', 'MANAPPURAM', 'MRPL', 'MARICO', 'MARUTI', 'MFSL', 'MINDTREE', 'MOTHERSUMI', 'MRF', 'MCX',
'MUTHOOTFIN', 'NATIONALUM', 'NBCC', 'NCC', 'NESTLEIND', 'NHPC', 'NIITTECH', 'NMDC', 'NTPC', 'ONGC', 'OIL', 'OFSS', 'ORIENTBANK', 'PAGEIND', 'PCJEWELLER', 'PETRONET', 'PIDILITIND', 'PEL', 'PFC', 'POWERGRID', 'PTC', 'PNB', 'PVR', 'RAYMOND',
'RBLBANK', 'RELCAPITAL', 'RCOM', 'RELIANCE', 'RELINFRA', 'RPOWER', 'REPCOHOME', 'RECLTD', 'SHREECEM', 'SRTRANSFIN', 'SIEMENS', 'SREINFRA', 'SRF', 'SBIN', 'SAIL', 'STAR', 'SUNPHARMA', 'SUNTV', 'SUZLON', 'SYNDIBANK', 'TATACHEM', 'TATACOMM', 'TCS',
'TATAELXSI', 'TATAGLOBAL', 'TATAMTRDVR', 'TATAMOTORS', 'TATAPOWER', 'TATASTEEL', 'TECHM', 'INDIACEM', 'RAMCOCEM', 'SOUTHBANK', 'TITAN', 'TORNTPHARM', 'TORNTPOWER', 'TV18BRDCST', 'TVSMOTOR', 'UJJIVAN', 'ULTRACEMCO', 'UNIONBANK', 'UBL', 'UPL',
'VEDL', 'VGUARD', 'VOLTAS', 'WIPRO', 'WOCKPHARMA', 'YESBANK', 'ZEEL']
For all this companies the dataframe 'f' should be extracted parallely in order to save run time. Can anyone help me to solve this?

Overwrite GPS coordinates in Image Exif using Python 3.6

I am trying to transform image geotags so that images and ground control points lie in the same coordinate system inside my software (Pix4D mapper).
The answer here says:
Exif data is standardized, and GPS data must be encoded using
geographical coordinates (minutes, seconds, etc) described above
instead of a fraction. Unless it's encoded in that format in the exif
tag, it won't stick.
Here is my code:
import os, piexif, pyproj
from PIL import Image
img = Image.open(os.path.join(dirPath,fn))
exif_dict = piexif.load(img.info['exif'])
breite = exif_dict['GPS'][piexif.GPSIFD.GPSLatitude]
lange = exif_dict['GPS'][piexif.GPSIFD.GPSLongitude]
breite = breite[0][0] / breite[0][1] + breite[1][0] / (breite[1][1] * 60) + breite[2][0] / (breite[2][1] * 3600)
lange = lange[0][0] / lange[0][1] + lange[1][0] / (lange[1][1] * 60) + lange[2][0] / (lange[2][1] * 3600)
print(breite) #48.81368778730952
print(lange) #9.954511162420633
x, y = pyproj.transform(wgs84, gk3, lange, breite) #from WGS84 to GaussKrüger zone 3
print(x) #3570178.732528623
print(y) #5408908.20172699
exif_dict['GPS'][piexif.GPSIFD.GPSLatitude] = [ ( (int)(round(y,6) * 1000000), 1000000 ), (0, 1), (0, 1) ]
exif_bytes = piexif.dump(exif_dict) #error here
img.save(os.path.join(outPath,fn), "jpeg", exif=exif_bytes)
I am getting struct.error: argument out of range in the dump method. The original GPSInfo tag looks like: {0: b'\x02\x03\x00\x00', 1: 'N', 2: ((48, 1), (48, 1), (3449322402, 70000000)), 3: 'E', 4: ((9, 1), (57, 1), (1136812930, 70000000)), 5: b'\x00', 6: (3659, 10)}
I am guessing I have to offset the values and encode them properly before writing, but have no idea what is to be done.
It looks like you are already using PIL and Python 3.x, not sure if you want to continue using piexif but either way, you may find it easier to convert the degrees, minutes, and seconds into decimal first. It looks like you are trying to do that already but putting it in a separate function may be clearer and account for direction reference.
Here's an example:
def get_decimal_from_dms(dms, ref):
degrees = dms[0][0] / dms[0][1]
minutes = dms[1][0] / dms[1][1] / 60.0
seconds = dms[2][0] / dms[2][1] / 3600.0
if ref in ['S', 'W']:
degrees = -degrees
minutes = -minutes
seconds = -seconds
return round(degrees + minutes + seconds, 5)
def get_coordinates(geotags):
lat = get_decimal_from_dms(geotags['GPSLatitude'], geotags['GPSLatitudeRef'])
lon = get_decimal_from_dms(geotags['GPSLongitude'], geotags['GPSLongitudeRef'])
return (lat,lon)
The geotags in this example is a dictionary with the GPSTAGS as keys instead of the numeric codes for readability. You can find more detail and the complete example from this blog post: Getting Started with Geocoding Exif Image Metadata in Python 3
After much hemming & hawing I reached the pages of py3exiv2 image metadata manipulation library. One will find exhaustive lists of the metadata tags as one reads through but here is the list of EXIF tags just to save few clicks.
It runs smoothly on Linux and provides many opportunities to edit image-headers. The documentation is also quite clear. I recommend this as a solution and am interested to know if it solves everyone else's problems as well.

Numpy Array value setting issues

I have a data set that spans a certain length of time and data points for each of these time points. I would like to create a much more detailed timescale and fill the empty data points to zero. I wrote a piece of code to do this but it isn't doing what I want it to. I tried a sample case though and it seems to work. Below are the two codes.
This piece of code does not do what I want it to.
import numpy as np
TD_t = np.array([36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500,
43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500,
50000, 50500, 51000, 51500, 52000, 52500, 53000, 53500, 54000, 54500, 55000, 55500, 56000, 56500,
57000, 57500, 58000, 58500, 59000, 59500, 60000, 60500, 61000, 61500, 62000, 62500, 63000, 63500,
64000, 64500, 65000, 65500, 66000])
TD_d = np.array([-0.05466527, -0.04238242, -0.04477601, -0.02453717, -0.01662798, -0.02548617, -0.02339215,
-0.01186576, -0.0029057 , -0.01094671, -0.0095005 , -0.0190277 , -0.01215644, -0.01997112,
-0.01384497, -0.01610656, -0.01927564, -0.02119056, -0.011634 , -0.00544096, -0.00046568,
-0.0017769 , -0.0007341, 0.00193066, 0.01359107, 0.02054919, 0.01420335, 0.01550565,
0.0132394 , 0.01371563, 0.01959774, 0.0165316 , 0.01881992, 0.01554435, 0.01409003,
0.01898334, 0.02300266, 0.03045158, 0.02869013, 0.0238423 , 0.02902356, 0.02568908,
0.02954539, 0.02537967, 0.02927247, 0.02138605, 0.02815635, 0.02733237, 0.03321588,
0.03063803, 0.03783137, 0.04110955, 0.0451221 , 0.04646263, 0.04472884, 0.04935833,
0.03372911, 0.04031406, 0.04165237, 0.03940343, 0.03805504])
time = np.arange(0, 100001,1)
data = np.zeros_like(time)
for i in range(0, len(TD_t)):
t = TD_t[i]
data[t] = TD_d[i]
print(i,t,TD_d[i],data[t])
But for some reason this code works.
import numpy
nums = numpy.array([0,1,2,3])
data = numpy.zeros_like(nums)
data[0] = nums[2]
data[0], nums[2]
Any help will be much appreciated!!
It's because the dtype of data is being set to int64, and so when you try to reassign one of the data elements, it gets rounded to zero.
Try changing the line to:
data = np.zeros_like(time, dtype=float)
and it should work (or use whatever dtype the TD_d array is)

Resources