I'm trying to get information about unique flightlines appearing in a block of LIDAR data using a laspy.
I have already tried running a lasInfo module for the whole block, but what I get is just a min and max point_source_ID values opposed to list of individual flightlines, which I need.
This is what I've tried so far:
import laspy
import glob
las_files_list = glob.glob(r'PATH\*.las')
print(las_files_list)
las_source_id_set = set()
for f in las_files_list:
las_file = laspy.file.File(f, mode='r')
las_source_id_list = las_file.pt_src_id
for i in las_source_id_list:
las_source_id_set.add(i)
las_file.close()
print(las_source_id_set,' ', f)
print(las_source_id_set)
with open('point_source_id.txt', 'w') as f:
f.write(las_source_id_set)
Unfortuanetelly the whole process is rather slow, and with a larger dataset I get a stack overflow error and eventually never get to the 'write a file' part.
The process is slower than it could be, because you are doing a loop in Python.
There is a numpy function that you can use to make the process faster : numpy.unique
Your script would become:
import laspy
import glob
import numpy as np
las_files_list = glob.glob(r'PATH\*.las')
print(las_files_list)
las_source_id_set = set()
for f in las_files_list:
with laspy.file.File(p) as las:
las_source_id_set.update(np.unique(las.pt_src_id))
print(las_source_id_set,' ', f)
print(las_source_id_set)
with open('point_source_id.txt', 'w') as f:
f.write(las_source_id_set)
Related
I'm trying to create a metadata scraper to enrich my e-book collection, but am experiencing some problems. I want to create a dict (or whatever gets the job done) to store the index (only while testing), the path and the series name. This is the code I've written so far:
from bs4 import BeautifulSoup
def get_opf_path():
opffile=variables.items
pathdict={'index':[],'path':[],'series':[]}
safe=[]
x=0
for f in opffile:
x+=1
pathdict['path']=f
pathdict['index']=x
with open(f, 'r') as fi:
soup=BeautifulSoup(fi, 'lxml')
for meta in soup.find_all('meta'):
if meta.get('name')=='calibre:series':
pathdict['series']=meta.get('content')
safe.append(pathdict)
print(pathdict)
print(safe)
this code is able to go through all the opf files and get the series, index and path, I'm sure of this, since the console output is this:
However, when I try to store the pathdict to the safe, no matter where I put the safe.append(pathdict) the output is either:
or
or
What do I have to do, so that the safe=[] has the data shown in image 1?
I have tried everything I could think of, but nothing worked.
Any help is appreciated.
I believe this is the correct way:
from bs4 import BeautifulSoup
def get_opf_path():
opffile = variables.items
pathdict = {'index':[], 'path':[], 'series':[]}
safe = []
x = 0
for f in opffile:
x += 1
pathdict['path'] = f
pathdict['index'] = x
with open(f, 'r') as fi:
soup = BeautifulSoup(fi, 'lxml')
for meta in soup.find_all('meta'):
if meta.get('name') == 'calibre:series':
pathdict['series'] = meta.get('content')
print(pathdict)
safe.append(pathdict.copy())
print(safe)
For two main reasons:
When you do:
pathdict['series'] = meta.get('content')
you are overwriting the last value in pathdict['series'] so I believe this is where you should save.
You also need to make a copy of it, if you donĀ“t it will change also in the list. When you store the dict you really are storing a reeference to it (in this case, a reference to the variable pathdict.
Note
If you want to print the elements of the list in separated lines you can do something like this:
print(*save, sep="\n")
everyone, I am fairly new to using python for data analysis,so apologies for silly questions:
IDE : PyCharm
What I have : A massive .xyz file (with 4 columns) which is a combination of several datasets, each dataset can be determined by the third column of the file which goes from 10,000 to -10,000 with 0 in between and 100 as spacing and repeats (so every 201 rows is one dataset)
What I want to do : Split the massive file into its individual datasets (201 rows each)and save each file under a different name.
What I have done so far :
# Import packages
import os
import pandas as pd
import numpy as np #For next steps
import math #For next steps
#Check and Change directory
path = 'C:/Clayton/lines/profiles_aufmod'
os.chdir(path)
print(os.getcwd()) #Correct path is printed
# split the xyz file into different files for each profile
main_xyz = 'bathy_SPO_1984_50x50_profile.xyz'
number_lines = sum(1 for row in (open(main_xyz)))
print(number_lines) # 10854 is the output
rowsize = 201
for i in range(number_lines, rowsize):
profile_raw_df = pd.read_csv(main_xyz, delimiter=',', header=None, nrows=rowsize,
skiprows=i)
out_xyz = 'Profile' + str(i) + '.xyz'
profile_raw_df.to_csv(out_xyz, index=False,
header=False, mode='a')
Problems I am facing :
The for loop was at first giving output files as seen in the image,check Proof of output but now it does not produce any outputs and it is not rewriting the previous files either. The other mystery is that I am not getting an error either,check Code executed without error.
What I tried to fix the issue :
I updated all the packages and restarted Pycharm
I ran each line of code one by one and everything works until the for loop
While counting the number of rows in
number_lines = sum(1 for row in (open(main_xyz)))
you have exhausted the iterator that loops over the lines of the file. But you do not close the file. But this should not prevent Pandas from reading the same file.
A better idiom would be
with open(main_xyz) as fh:
number_lines = sum(1 for row in fh)
Your for loop as it stands does not do what you probably want. I guess you want:
for i in range(0, number_lines, rowsize):
so, rowsize is the step-size, instead of the end value of the for loop.
If you want to number the output files by data set, keep a counnt of the dataset, like this
data_set = 0
for i in range(0, number_lines, rowsize):
data_set += 1
...
out_xyz = f"Profile{data_set}.xyz"
...
I am trying to convert .pdf data to a spreadsheet. Based on some research, some guys recommended transforming it into csv first in order to avoid errors.
So, I made the below coding which is giving me:
"TypeError: cannot concatenate object of type ''; only Series and DataFrame objs are valid"
Error appears at 'pd.concat' command.
'''
import tabula
import pandas as pd
import glob
path = r'C:\Users\REC.AC'
all_files = glob.glob(path + "/*.pdf")
print (all_files)
df = pd.concat(tabula.read_pdf(f1) for f1 in all_files)
df.to_csv("output.csv", index = False)
'''
Since this might be a common issue, I am posting the solution I found.
"""
df = []
for f1 in all_files:
df = pd.concat(tabula.read_pdf(f1))
"""
I believe that breaking the item iteration in two parts would generate the dataframe it needed and therefore would work.
I'm a beginner of Python code and went through some related questions
explaining about differences between 'with' and 'with open' command.
But because of my lack of knowledge on Python 3, I still don't get the difference between the two and couldn't figure out how to make my code run.
In the tutorial I am studying now, the answer is as below.
import csv
import matplotlib.pyplot as plt
x=[]
y=[]
with open('example.txt', 'r') as csvfile:
plot = csv.reader (csvfile, delimiter = ',')
for row in plots:
x.append(int(row[0]))
y.append(int(row[1]))
plt.plot(x,y,label='file')
plt.show()
What I tried to do was using open and csv.reader command like below
import csv
import matplotlib.pyplot as plt
plotdata = open ('testing.csv')
reader = csv.reader(plotdata, delimiter =',')
hx=[ ]
hy=[ ]
for x in reader:
hx.append(reader[0])
hy.append(reader[1])
plt.plot(hx, hy)
While the first one with "with" command works, the below one without "with" command doesn't work. Just in case I added close() command at the end but keep showing the error message
"TypeError: '_csv.reader' object is not subscriptable"
What was wrong?
I think that the problem is that in the second code you are subscribing directly reader[0], otherwise in the first one you are subscribing its iterated value (row in plots) converted: int(row[0]).
This should work:
for x in reader:
hx.append(int(x[0]))
hy.append(int(x[1]))
plt.plot(hx, hy)
My ideas are the following:
a) always use with in file operations
b) except for excercises separate you reading of data and manipulating of data
c) plot (bad name) and reader are iterators, in a loop you consume the iterator to obtain row or x, which are elements provided by iterator. reader itself cannnot be adressed by [0], you must have wanted to use x[0] and x[1] instead.
I have a piece of code. When I run this code. It is compiling but not showing any print result. I want to print the returned values from this function. Can someone please guide me where I'm wrong?
`def input_data(prefix):
datafiles=os.listdir('/home/zeri/Desktop/check2')
dictData={}
for df in datafiles:
if re.match(prefix,df) and
os.path.isfile('/home/zeri/Desktop/check2'+'/'+df):
hmax=locale.atof(df[3:])
print hmax
data=np.genfromtxt(df, delimiter=' ')
dictData[hmax]=data
return dictData,len(data[0])
int main():
a=input_data('xyz')
print a`
Python is not C. So, "int main()" does not work. Better remove this line altogether, although you can define a function called "main".
But probably you have mainly an indentation issue. I tried to fix this in the code below.
import locale
import numpy as np
import re
def input_data(prefix):
datafiles = os.listdir('/home/zeri/Desktop/check2')
dictData = {}
for df in datafiles:
if re.match(prefix, df) and os.path.isfile('/home/zeri/Desktop/check2' + '/' + df):
hmax = locale.atof(df[3:])
print hmax # use "print(a)" if on Python 3
data = np.genfromtxt(df, delimiter = ' ')
dictData[hmax] = data
return dictData, len(data[0])
a = input_data('xyz')
print a # use "print(a)" if on Python 3
By the way, I would not use regular expressions to filter files.