PySpark Can't substring into variable - apache-spark

I'm trying to get the value from a column to feed it later as a parameter. I need to substring it to get the correct values as the date format is DDMMYYYY.
But when I try applying the substring into the resulting variable, a Column object type is generated.. any suggestions?

You can't call Spark functions on Python strings. You need to use Python string methods, e.g.
print(dataCollect[:3])
which should give '301'.

Related

Change data in Pandas dataframe by column

I have some data I imported from a excel spreadsheet as a csv. I created a dataframe using Pandas, and want to change a specific column. The column contains strings such as "5.15.1.0.0". I want to change these strings to floats like "5.15100".
So far I've tried using the method "replace" to change every instance in that column:
df['Fix versions'].replace("5.15.1.0.0", 5.15.1.0.0)
this however does not work. When I reprint the dataframe after the replace methods are called it shows me the same dataframe where no changes are made. Is it not possible to change a string to a float using replace? If not does anyone know another way to do this?
I could parse each string and remove the "." but I'd prefer not to do it this way as some of the strings represent numbers of different lengths and decimal place values.
Adding the parameter "inplace" which default is false. Changing this to true will change the dataframe in place, which can be type casted.
df['Fix versions'].replace(to_replace="5.15.1.0.0", value="5.15100", inplace=True)

How to convert recordset.field to a string

I am currently attempting to compare certain values in a column from a query in access to a vector of strings to look for a match between any two values.
I used recordset.fields("column1") to access specific records from my desired column, but it seems like I am unable to get matches since the values are of different data types.
How do I convert the records from recordset.fields("column1") into a string?
Thanks!
If you are working in VBA, surround your value with the CStr() function which will return the value converted to string output.

Python not all arguments converted during string formatting

I have the following python code
table += '<tr><td>{0}</td><td>{1}</td></tr>'% format(str(key)),
format(str(value))
TypeError: not all arguments converted during string formatting
What could be the problem?
You are mixing three different methods of converting values to strings. To use the {..} placeholders, use the str.format() method, directly on the string literal that is the template:
table += '<tr><td>{0}</td><td>{1}</td></tr>'.format(key, value)
There's no need to use the str() or format() functions, and you don't need to use % either.
The format() function turns objects into a string using the format specification mini language, the same language that lets you format the values in a str.format() placeholder. It's the formatting part of the template language, available as a stand-alone function.
str() turns an object into a string without specific formatting. You'll always get the same result. Because str.format() already converts objects to strings (but with formatting controls) you don't need to use this. And if you need it anyway (because, say, you want to apply string formatting controls and not those of another object type), str.format() has the !s conversion option.
Applying % to a string object gives you a different string formatting operation called *printf-style formatting. This method is much less flexible than the str.format() version.
If your are using Python 3.6 or newer I recommend you use the faster f-string formatting string literals:
table += f'<tr><td>{key}</td><td>{value}</td></tr>'
You have to call the .format method on the string. Just like this:
table += '<tr><td>{key}</td><td>{value}</td></tr>'.format(key=str(key), value=str(value))

Using excel and modifying string based on search function

I am trying to get a value or all values similar to below in excel:
#123 maybe some text and date 12/17/209
#048309 maybe some text and date 12/17/209
#9385 maybe some text and date 12/17/209
I want to get the value proceeding the # however, I am not sure if there is an easier function? I want it to find the # then get however many numbers proceeds it. I am familiar with regex not with excel functions unfortunately.
Sorry for vagueness:
I was trying to use an IF() supplying a # as the find operation for the character I just couldnt manage to get the number as I was trying to use RIGHT() to filter after the #. What I found with the RIGHT() function is that it expects a parameter for count and so would have to be dynamic so I dropped that idea.
This formula will get the numbers directly after #:
=--MID(A1,FIND("#",A1)+1,FIND(" ",A1,FIND("#",A1))-FIND("#",A1))

converting strings to formula objects in Julia

I have a dataframe in Julia with less than 10 column names. I want to generate a list of all possible formulas that could be fed into a linear model (eg, [Y~X1+X2+X3, Y~X1+X2, ....]). I can accomplish this easily with combinations() and string versions of the column names. However, when I try to convert the strings into Formula objects, it breaks down. Looking at DataFrames.jl documentation, it seems like one can only construct Formulas from "expressions" and I can indeed make a list of individual column names as expressions. Is there any way I can somehow join together a bunch of different expressions using the "+" operator programmatically such that the resulting composite expression can then be passed into RHS of the Formula constructor? My impulse is to search for some function that will convert an arbitrary string into the equivalent expression, but not sure if that is correct.
The function parse takes a string, parses it, and returns an expression. I see nothing wrong with using it for what you're talking about.
Here is some actual working code, because I have been struggling with getting a similar problem to work. Please note this is Julia version 1.3.1 so parse is now Meta.parse and instead of combinations I used IterTools.subsets.
using RDatasets, DataFrames, IterTools, GLM
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
predictors = setdiff(names(airquality), [:Temp])
for combination in subsets(predictors)
formula = FormulaTerm(Term(:Temp), Tuple(Term.(combination)))
if length(combination) > 0
#show lm(formula, airquality)
end
end

Resources