I am using spatstat package in R to read my road network shapefile which also has some additional attributes.
When i am reading my shapefiles and converting them to as.psp(before I make them an object of linnet), I am getting n columns of data frame discarded. I do not understand why? The columns being discarded are my covariates for a linear network, so I am not able to bring them into my analysis.
Could someone give me an idea why this happens and how to correct it?
Why it happens:
I would guess that we (spatstat authors) need to spend a bit of time discussing with the maptools guys how to handle the additional info in the SpatialLinesDataFrame object, and we haven't done that yet.
How to correct it:
You have to write some code on your own at the moment. You can extract the data from SpatialLinesDataFrame object by accessing the #data slot. Please provide specific data and how you need to use the additional data (what format do you need it in) if you need more help. You can find a few helpful commands here: https://cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf
Related
I'm trying to build a map for to show spending. Our vendor data is not entirely clean and we could have 5 or 6 definitions of the same place. We also have hierarcial POI issues also (think gives an entire shopping strip geometry instead of individual store). I'd like to know if there are any common algorithms to identity duplicates at the geometry object level.
My data are in Geopandas Dataframes with a 2-D polygon to represent a place, and other meta data (i.e. store name, category, etc.). Sometimes it will come with an address but not always and not a very consistent postal format.
I'm working in Python specifically, but open to any suggestions, not exactly limited to Python. Thank you in advance for any help or suggestions.
I need to use this public dataset:
https://catalog.data.gov/dataset/2015-2016-demographic-data-grades-k-8-school
You can view data in this table viewer:
https://data.cityofnewyork.us/Education/2015-2016-Demographic-Data-Grades-K-8-School/7yc5-fec2
Many cells have No data in them, which is clear for me. However many others have s which is unclear. I didn't find any explanation.
I guess maybe it is some standard way of telling why data is absent in that cell. Or maybe not and only authors of data know it.
Please tell me what s mean or may mean.
There is no "universal" meaning behind this.
It could be a number of things:
Data not present
Not applicable
Use case specific information
Error in the dataset itself
...
If you want to create value out of this data, don't make assumptions, look for documentation or description of dataset, which describes its columns, types and expected content. The owner or creator of this dataset should of course also be able to inform you of what it represents.
I want to use dataset at https://ingmec.ual.es/datasets/lidar3d-pf-benchmark/ in my project. The available map is .simplemap. What I understand is it stores both map and the robot poses as well. I want to get the point cloud representation of this map (which later I can convert into octomap) as well as vehicles ground truth pose in the map.
I have been able to get the CPose3DPDF from which I obtained CPose3d which I believe is the desired vehicle's ground truth pose. Please correct me if I am wrong. Now I have two problems. First the length of trajectory is just 97 which makes me suspicious about my code to obtain it. Second is about the CSensoryFrame which I obtain along with CPose3DPDF. When I get CObservation by doing CSensorFrame->getObservationByIndex and write to a file, it gives me idea that it stores velodyne readings. But I am unable to recover point cloud from it. Could anyone please guide me to a tool which can convert a .simplemap into a point cloud or an octomap representation and obtain vehicle's pose out of it as well. Many thanks in advance.
For the records: this one was answered here:
Your assumptions were all correct.
I realized the full UAL campus map was not included into the downloads. It's now available to download inside 2018-02-26-ual-campus-map.zip, at the bottom of this dataset page.
You can also regenerate the pointcloud, octomap from the .simplemap using the app application-observations2map.
Example .ini files can be found under MRPT/share/mrpt/config_files/*
You can also visually inspect .simplemap files with the robot-map-gui app.
I am currently exploring the notion of using iris in a project to read forecast grib2 files using python.
My aim is to load/convert a grib message into an iris cube based on a grib message key having a specific value.
I have experimented with iris-grib, which uses gribapi. Using iris-grib I have not been to find the key in the grib2 file, althrough the key is visible with 'grib_ls -w...' via the cli.
gribapi does the job, but I am not sure how to interface it with iris (which is what, I assume, iris-grib is for).
I was wondering if anyone knew of a way to get a message into an iris cube based on a grib message key having a specific value. Thank you
You can get at anything that the gribapi understands through the low-level grib interface in iris-grib, which is the iris_grib.GribMessage class.
Typically you would use for msg in GribMessage.messages_from_filename(xxx): and then access it like e.g. msg.sections[4]['productDefinitionTemplateNumber']; msg.sections[4]['parameterNumber'] and so on.
You can use this to identify required messages, and then convert to cubes with iris_grib.load_pairs_from_fields().
However, Iris-grib only knows how to translate specific encodings into cubes : it is quite strict about exactly what it recognises, and will fail on anything else. So if your data uses any unrecognised templates or data encodings it will definitely fail to load.
I'm just anticipating that you may have something unusual here, so that might be an issue?
You can possibly check your expected message contents against the translation code at iris_grib:_load_convert.py, starting at the convert() routine.
To get an Iris cube out of something not yet supported, you would either :
(a) extend the translation rules (i.e. a Github PR), or
(b) sometimes you can modify the message so that it looks like something
that can be recognised.
Failing that, you can
(c) simply build an Iris cube yourself from the data found in your GribMessage : That can be a little simpler than using 'gribapi' directly (possibly not, depending on detail).
If you have a problem like that, you should definitely raise it as an issue on the github project (iris-grib issues) + we will try to help.
P.S. as you have registered a Python3 interest, you may want to be aware that the newer "ecCodes" replacement for gribapi should shortly be available, making Python3 support for grib data possible at last.
However, the Python3 version is still in beta and we are presently experiencing some problems with it, now raised with ECMWF, so it is still almost-but-not-quite achievable.
So a bit of a general question. I work as a data analyst for a startup. My primary process involves taking existing customer data a client has and cleansing/normalizing it to fit into our platform once as part of our onboarding process. A member of our team exports their data from their system they are transitioning from or, if they kept track of it in house, we receive their Excel log they used to track it. It is always in a different format and requires extensive cleansing (avg 1 min/record). We take what is usually one large table (.xlxs format), and after cleansing, split it into four .csv files; which we load as four tables on our platform.
I feel I have optimized the process quite well in terms of the process steps and cleansing with excel functions (if, concat, text-to-columns, etc). I have beginner-intermediate skills in VBA and SQL and have just scratched the surface in R; what is frustrating is that I know there is the potential to automate this process but I just don't know where to start. If anyone has experience with something like this, code, a link to an article / another thread, or just some general direction would be much appreciated. Please ask for clarification where you feel it is needed. Thanks.
This will be really hard to do in Excel. If you have the time you can try out Optimus, a Data Cleansing library written in Python and Pyspark (you don't need to know spark). Here is the webpage https://hioptimus.com.
You can create Data Pipelines with it, and I recommend that you do that, try to generalize your processes, and asking the client for more a structure way of passing the data.
The good thing is that you don't need Big Data for running Optimus, bit if you have it some day, the same code will work.
Check out the documentation for more:
http://optimus-ironmussa.readthedocs.io/en/latest/
Let me know if you have doubts!