Find Max Value in a field of a shapefile - python-3.x

I have a shapefile (mich_co.shp) which I try to find the county with max population. My idea is to use max() function it's not possible. Here is my code so far:
from osgeo import ogr
import os
shapefile = "C:/Users/root/Python/mich_co.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(shapefile, 0)
layer = dataSource.GetLayer()
for feature in layer:
print(feature.GetField("pop"))
layer.ResetReading()
The code above however only print all values of "pop" field like this:
10635.0
9541.0
112039.0
29234.0
23406.0
15477.0
8683.0
58990.0
106935.0
17465.0
156067.0
43868.0
135099.0
I tried:
print(max(feature.GetField("pop")))
but it returns TypeError: 'float' object is not iterable. For this, I've also tried:
for feature in range(layer):
and it returns TypeError: 'Layer' object cannot be interpreted as an integer.
Any helps of hints would be much appreciated.
Thanks you!

max() needs an iterable, such as a list. Try to build a list:
pops = [ feature.GetField("pop") for feature in layer ]
print(max(pops))

Related

nunique() not producing correct output in aggregate functions

I am using a aggregation for following data frame;
df = pd.DataFrame({'col1':['team1','team1','team2','team3'],
'col2':[23, 4, 5 ,6],
'col3':['user1','user1','user2','user2']})
gb = df.groupby('col1')
gb.agg({'col2':np.sum,
'col3':nunique()})
But it seems nunique() is not compatible with groupby. Please see following output.
NameError: name 'nunique' is not defined
May I know how can we use unique() for this example.Help is appreciated.
Using Numpy
gb = df.groupby('col1')
gb.agg({'col2':np.sum,
'col3':np.nunique()})
Gives a new error, AttributeError: module 'numpy' has no attribute 'nunique'
You need to use
gb.agg({'col2':np.sum, 'col3':lambda x: len(np.unique(x))})

TypeError: 'OneHotEncoder' object is not iterable

I'm trying to use OneHotEncoding on the categorical variables of the following dataset.
First, I'm trying to convert the 'Geography' column.
Here's what I've done so far:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(['Geography',OneHotEncoder(categories='auto'),[1]],remainder='passthrough')
df_ = ct.fit_transform(df.values)
However, when I try this, I get the following error:
Can someone help me to understand why this error occurs and how to solve this?
There is a syntax error in the input parameter to the ColumnTransformer. It expects a list of tuples.
transformers : list of tuples
List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.
Try fixing it by converting the encoder params to a tuple
ct = ColumnTransformer([('Geography',OneHotEncoder(categories='auto'),[1])],remainder='passthrough')

AttributeError: 'numpy.ndarray' object has no attribute 'rolling'

When I am trying to do MA or rolling average with log transformed data I get this error. Where am I going wrong?
This one with original data worked fine-
# Rolling statistics
rolmean = data.rolling(window=120).mean()
rolSTD = data.rolling(window=120).std()
with log transformed data-
MA = X.rolling(window=120).mean()
MSTD = X.rolling(window=120).std()
AttributeError: 'numpy.ndarray' object has no attribute 'rolling'
You have to convert the numpy array to a pandas dataframe to use the pandas.rolling method.
The change could be something like this
dataframe = pd.DataFrame(data)
rolmean = dataframe.rolling(120).mean()
Try this instead:
numpy.roll(your_array, shift, axis = None)
There is no attribute rolling in numpy. So you shoud use the above syntax
Hope this helps

Get TypeError when using TfidfVectorizer in python

I'm new to python and I'm needing your help.
I'm working with NLP, and I want to classify a field that is string.
I read the dataset
data = pd.read_csv("dataset.csv",sep=';',encoding='latin-1',error_bad_lines=False)
tokenize the field
data['campo']= data['campo'].str.split()
the output is:
1- [Su, inexperto, personal] 2- [AtenciĆ³n, al, cliente]
when I check tutorials that exist on the internet, to the majority, when tokeniza returns the separated words with apostrophe.
the problem is when I want to vectorize (TfidfVectorizer), I get an error and I think my problem is here.
Can you help me? Why do not I have the tokens with apostrophe?
After executing this, I add the possibility to vectorize the field:
Tfidf_vect = TfidfVectorizer (max_features = 5000)
Tfidf_vect.fit(data ['field'])
From here, I throw the error:
AttributeError: 'list' object has no attribute 'lower'
I thought I was coming for the subject of the lower, so I added:
Tfidf_vect = TfidfVectorizer (lowercase = False, max_features = 5000)
Tfidf_vect.fit (data ['field'])
and from there he shoots me:
TypeError: expected string or bytes-like object
Do you know what it is the problem?
Do not tokenize your text before feeding into tfidfVectorizer(), which means you have to remove the following line in your code.
data['campo']= data['campo'].str.split()
TfidfVectorizer internally does the tokenization. Try your following lines of code directly!
Tfidf_vect = TfidfVectorizer (max_features = 5000)
Tfidf_vect.fit(data ['campo'])

How to use select() transformation in Apache Spark?

I am following the Intro to Spark course on edX. However, I cant understand few things, following is an lab assignment. FYI, I am not looking for solution.
I am not able to understand as why I am receiving the error
TypeError: 'Column' object is not callable
Following is the code
from pyspark.sql.functions import regexp_replace, trim, col, lower
def removePunctuation(column):
"""
Args:
column (Column): A Column containing a sentence.
"""
# This following is giving error. I believe I am calling all the rows from the dataframe 'column' where the attribute is named as 'sentence'
result = column.select('sentence')
return result
sentenceDF = sqlContext.createDataFrame([('Hi, you!',),
(' No under_score!',),
(' * Remove punctuation then spaces * ',)], ['sentence'])
sentenceDF.show(truncate=False)
(sentenceDF
.select(removePunctuation(col('sentence')))
.show(truncate=False))
Can you be little elaborate? TIA.
The column parameter is not a DataFrame object and, therefore, does not have access to the select method. You'll need to use other functions to solve this problem.
Hint: Look at the import statement.

Resources