I have some data I imported from a excel spreadsheet as a csv. I created a dataframe using Pandas, and want to change a specific column. The column contains strings such as "5.15.1.0.0". I want to change these strings to floats like "5.15100".
So far I've tried using the method "replace" to change every instance in that column:
df['Fix versions'].replace("5.15.1.0.0", 5.15.1.0.0)
this however does not work. When I reprint the dataframe after the replace methods are called it shows me the same dataframe where no changes are made. Is it not possible to change a string to a float using replace? If not does anyone know another way to do this?
I could parse each string and remove the "." but I'd prefer not to do it this way as some of the strings represent numbers of different lengths and decimal place values.
Adding the parameter "inplace" which default is false. Changing this to true will change the dataframe in place, which can be type casted.
df['Fix versions'].replace(to_replace="5.15.1.0.0", value="5.15100", inplace=True)
I'm trying to get the value from a column to feed it later as a parameter. I need to substring it to get the correct values as the date format is DDMMYYYY.
But when I try applying the substring into the resulting variable, a Column object type is generated.. any suggestions?
You can't call Spark functions on Python strings. You need to use Python string methods, e.g.
print(dataCollect[:3])
which should give '301'.
Is there a character to mask STRING values within the Excel TEXT function?
Attempting to use a mask of "0000-000000-00000-0000" seems to convert a string to a number. I simply want to add hyphens in between a specific number of characters.
I have also tried "####-######-#####-####" and "####-######-#####-####" but to no avail.
Background:
In a previous question, it was determined that a particular custom number mask could not be applied to a string because of the 15 significant digit limitation in Excel.
The goal was to convert a TEXT value of 5145350002005000080 to 5145-350002-00500-0080 using the following formula:
=text(A1,"0000-000000-00000-0000")
The output produced was:
5145-350002-00500-0000
You will need to use Excel string functions.
This works, though it is not the usual way of getting the job done:
=REPLACE(REPLACE(REPLACE(A1,16,0,"-"),11,0,"-"),5,0,"-")
The more typical method:
=LEFT(A1,4)&"-"&MID(A1,5,6)&"-"&MID(A1,10,5)&"-"&RIGHT(A1,4)
Unfortunately it's impossible to apply markup to any string value using TEXT - as per TEXT function description, it may be done only for numbers:
The TEXT function converts a numeric value to text and lets you
specify the display formatting by using special format strings.
Syntax
TEXT(value, format_text)
The TEXT function syntax has the following arguments:
value Required. A numeric value, a formula that evaluates to a numeric value, or a reference to a cell containing a numeric value.
So it looks like the only way for you to achieve what you want - is to apply recommended string conversions.
Select the cells->Press Ctrl+1->from Number Tab of Format Cells Dialog select "custom" and paste in Type edit box Below.
"Boxes";"Boxes";"Boxes";"Boxes"
Source: Here
I'm trying to load the following dataset:
Afghanistan,5,1,648,16,10,2,0,3,5,1,1,0,1,1,1,0,green,0,0,0,0,1,0,0,1,0,0,black,green
Albania,3,1,29,3,6,6,0,0,3,1,0,0,1,0,1,0,red,0,0,0,0,1,0,0,0,1,0,red,red
Algeria,4,1,2388,20,8,2,2,0,3,1,1,0,0,1,0,0,green,0,0,0,0,1,1,0,0,0,0,green,white
...
Problem is it contains both integers and strings.
I found some information on how to get out the integers only.
But haven't been able to see if there's any way to get all the data.
My question is that possible ??
If that is not possible, is there then any way to find the numbers on each line and throw everything else away without having to choose the columns?
I need specifically since it seems I cannot use str2num on a whole line at a time.
Almost anything is possible, you just have to define your goal accurately.
Assuming that your database is stored as a text file, you can parse it line by line using textread, and then apply regexp to filter only the numerical fields (this does not require having prior knowledge about the columns):
C = textread('database.txt', '%s', 'delimiter', '\n');
C = cellfun(#(x)regexp(x, '\d+', 'match'), C, 'Uniform', false);
The result here is a cell array of cell array of strings, where each string corresponds to a numerical field in a specific line.
Since the numbers are still stored as strings, you'd probably need to convert them to actual numerical values. There's a multitude of ways to do that, but you can use str2num in a tricky way: it can convert delimited strings into an array of numbers. This means that if you concatenate all strings in a specific line back into one string, and put spaces in between, you can apply str2num on all of them at once, like so:
C = cellfun(#(x)str2num(sprintf('%s ', x{:})), C, 'Uniform', false);
The resulting C is a cell array of vectors, each vector containing the values of all numerical fields in the corresponding line. To access a specific vector, you can use curly braces ({}). For instance, to access the numbers of the second line, you would use C{2}.
All the non-numerical fields are discarded in the process of parsing, of course. If you want to keep them as well, you should use a different regular expression with regexp.
Good luck!
I have a matrix where the first column contains dates and the first row contains maturities which are alpha/numeric (e.g. 16year).
The rest of the cells contain the rates for each day, which are double precision numbers.
Now I believe xlsread() can only handle numeric data so I think I will need something else or a combination of functions?
I would like to be able to read the table from excel into MATLAB as one array or perhaps a struct() so that I can keep all the data together.
The other problem is that some of the rates are given as '#N/A'. I want the cells where these values are stored to be kept but would like to change the value to blank=" ".
What is the best way to do this? Can it be done as part of the input process?
Well, from looking at matlab reference for xlsread you can use the format
[num,txt,raw] = xlsread(FILENAME)
and then you will have in num a matrix of your data, in txt the unreadable data, i.e. your text headers, and in raw you will have all of your data unprocessed. (including the text headers).
So I guess you could use the raw array, or a combination of the num and txt.
For your other problem, if your rates are 'pulled' from some other source, you can use
=IFERROR(RATE DATA,"")
and then there will be a blank instead of the error code #N\A.
Another solution (only for Windows) would be to use xlsread() format which allows running a function on your imported data,
[num,txt,raw,custom] = xlsread(filename,sheet,xlRange,'',functionHandler)
and let the function replace the NaN values with blank spots. (and you will have your output in the custom array)