cut important parts of string in a column - string

i have a column called Dateiname which contains a string. my goal is to get only the string Gruen Gelb Orange from the column and create a new column which represents each row if it contains Gruen Gelb Orange
i tried with this code:
result['Y'] = result.Dateiname.str[-10:-4]
as these words are not equally long i get 4_ or 1_ or just _, depending if it is Gruen or Gelb which i want to slice out. Is there any possibility to get the parts Gruen Gelb Orange of the column Dateiname and save it into the column Y?
the goal would be this:

Use str.extract:
result['Y'] = result.Dateiname.str[-10:-4].str.extract('(Gruen|Gelb|Orange)')
Another solution is split by _ or . and get second value from end by indexing:
result.Dateiname.str.split('_|\.').str[-2]
Or if want check all data:
result['Y'] = result.Dateiname.str.extract('(Gruen|Gelb|Orange)')

If your data follows same format as required_word followed by .csv then use str.extract with regex:
For Example:
result = pd.DataFrame({'Dateiname':['asdfjaskld_3242_34.fsdf_450_Violet.csv',
'asdfjaskld_3242_34.fsdf_450_Green.csv',
'asdfjaskld_3242_34.fsdf_450_Indigo.csv',
'asdfjaskld_3242_34.fsdf_450_Red.csv']})
result['Y'] = result.Dateiname.str.extract(r'([a-zA-Z]+).csv')
print(result)
Dateiname Y
0 asdfjaskld_3242_34.fsdf_450_Violet.csv Violet
1 asdfjaskld_3242_34.fsdf_450_Green.csv Green
2 asdfjaskld_3242_34.fsdf_450_Indigo.csv Indigo
3 asdfjaskld_3242_34.fsdf_450_Red.csv Red

You can use:
result['Y'] = result['Dateiname'].str.split('_').str[-1].str[:-4]

Related

Display 2 decimal places, and use comma as separator in pandas?

Is there any way to replace the dot in a float with a comma and keep a precision of 2 decimal places?
Example 1 : 105 ---> 105,00
Example 2 : 99.2 ---> 99,20
I used a lambda function df['abc']= df['abc'].apply(lambda x: f"{x:.2f}".replace('.', ',')). But then I have an invalid format in Excel.
I'm updating a specific sheet on excel, so I'm using : wb = load_workbook(filename) ws = wb["FULL"] for row in dataframe_to_rows(df, index=False, header=True): ws.append(row)
Let us try
out = (s//1).astype(int).astype(str)+','+(s%1*100).astype(int).astype(str).str.zfill(2)
0 105,00
1 99,20
dtype: object
Input data
s=pd.Series([105,99.2])
s = pd.Series([105, 99.22]).apply(lambda x: f"{x:.2f}".replace('.', ',')
First .apply takes a function inside and
f string: f"{x:.2f} turns float into 2 decimal point string with '.'.
After that .replace('.', ',') just replaces '.' with ','.
You can change the pd.Series([105, 99.22]) to match it with your dataframe.
I think you're mistaking something in here. In excel you can determine the print format i.e. the format in which numbers are printed (this icon with +/-0).
But it's not a format of cell's value i.e. cell either way is numeric. Now your approach tackles only cell value and not its formatting. In your question you save it as string, so it's read as string from Excel.
Having this said - don't format the value, upgrade your pandas (if you haven't done so already) and try something along these lines: https://stackoverflow.com/a/51072652/11610186
To elaborate, try replacing your for loop with:
i = 1
for row in dataframe_to_rows(df, index=False, header=True):
ws.append(row)
# replace D with letter referring to number of column you want to see formatted:
ws[f'D{i}'].number_format = '#,##0.00'
i += 1
well i found an other way to specify the float format directly in Excel using this code :
for col_cell in ws['S':'CP'] :
for i in col_cell :
i.number_format = '0.00'

Excel Formula - Match substrings of List to List

I have two Lists in an excel spreadsheet.
The first list has strings such as
1234 blue 6 abc
xyz blue/white 1234
abc yellow 123
The other list contains substrings of the first list
yellow
blue/white
blue
Result
1234 blue 6 abc blue
xyz blue/white 1234 blue/white
abc yellow 123 yellow
Now I need some kind of match formula to assign the correct value from the second list to the first. The problem, there is no specific pattern to determine where the color substring is positioned. The other problem, the values are not totally unique. As my example above shows, the lookup needs to be in an order (checking for "blue/white" before checking for "blue").
I played around the formulas like match, find also using wildcards * but couldn't come to any result.
A similar question asked here on SO covers the opposite case How to find if substring exists in a list of strings (and return full value in list if so)
Any help is appriciated. A formula would be cool, but using vba is also okay.
=INDEX(D$7:D$9, AGGREGATE(15, 7, ROW($1:$3)/ISNUMBER(SEARCH(D$7:D$9, A2)), 1))
Here is a solution with VBA
List 1 (strings) is in column A
List 2 (substrings) is in column C
The code basically contains to nested while loops checking whether the substring is inside the string.
row_1 = 1
While .Cells(row_1, "A") <> ""
row_2 = 1
While .Cells(row_2, "C") <> ""
color = .Cells(row_2, "C").Value
If InStr(1, .Cells(row_1, "A"), color, vbBinaryCompare) > 0 Then
.Cells(row_1, "B") = color
End If
row_2 = row_2 + 1
Wend
row_1 = row_1 + 1
Wend

Separating Data in the same excel column

I have a column of data with multiple value types in it. I am trying to separate out out each value type into a separate column. Below an example of the data:
6 - Cutler, Jay (Ovr: 83)
22 - Forte, Matt (Ovr: 88)
86 - Miller, Zach (Ovr: 80)
I tried to separate the data by a) going to data and clicking text to columns; however, the "Ovr: 80" portion of the data does not separate "Ovr" from 80. I also tried b) to convert to .csv file, but again was unable to separate "Ovr" from "80". Is there a formula I can use to separate this portion of the data from the rest?
I would like the data to be separated into different columns as show below:
6 | Cutler, | Jay | Ovr | 83
22 | Forte | Matt | Ovr | 88
86 | Miller | Zach | Ovr | 80
Any insight is much appreciated!
Select the cells you wish to process and run this macro to place results in the cells to the right of the selected cells:
Option Explicit
Sub dural()
Dim r As Range, s As String, ary
Dim i As Long, a
For Each r In Selection
s = r.Value
If s <> "" Then
s = Replace(Replace(s, "-", " "), ",", " ")
s = Replace(Replace(s, "(", " "), ")", " ")
s = Application.WorksheetFunction.Trim(Replace(s, ":", " "))
ary = Split(s, " ")
i = 1
For Each a In ary
r.Offset(0, i).Value = a
i = i + 1
Next a
End If
Next r
End Sub
using the method above your could do something like this...
first clean the text so its more manageable, using this formula and copying in a column you can clean it so it become a space delimited set
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"- ",""),",",""),"(",""),")",""),":","")
from there just copy the values the formula give you to a new sheet maybe and then use 'Text To Columns to get it split into columns.
For the record I do not recommend this method if you are willing to do the text to column option.
Functions used for this solution are:
LEFT function
FIND function
MID function
for your first column of text use the following:
=left(A1,find(" ",A1))*1
That will pull out the first number presuming you do not have any leading spaces. The *1 converts from text to a number.
for your second column of last times use the following:
=MID(A1,FIND("-",A1)+2,FIND(",",A1)-(FIND("-",A1)+2))
Provided you have a coma and a dash as indicated in your example data you will not get an error and it should pull the last name without the coma.
For your third column of first names follow the same general technique as last names with the following,
=MID(A1,FIND(",",A1)+2,FIND("(",A1)-2-(FIND(",",A1)+2)+1)
Follow the similar pattern to get you over column
=MID(A1,FIND("(",A1)+1,FIND(":",A1)-1-(FIND("(",A1)+1)+1)
and finally to get your age column use this:
=MID(A1,FIND(":",A1)+2,FIND(")",A1)-1-(FIND(":",A1)+2)+1)
copy the above formulas down as far as you need to go.

Split Cell by Numbers Within Cell

I have some fields that need to be split up into different cells. They are in the following format:
Numbers on Mission 21 0 21
Numbers on Mission 5 1 6
The desired output would be 4 separate cells. The first would contain the words in the string "Numbers on Mission" and the subsequent cells would have each number, which is determined by a space. So for the first example the numbers to extract would be 21, 0, 21. Each would be in its own cell next to the string value. And for the second: 5, 1, 6.
I tried using a split function but wasn't sure how to target the numbers specifically, and to identify the numbers based on the spaces separating them.
Pertinent to your first case (Numbers on Mission), the simple solution could be as shown below:
Sub SplitCells()
Const RowHeader As String = "Numbers on Mission"
Dim ArrNum As Variant
ArrNum = Split(Replace(Range("A1"), RowHeader, ""), " ")
For i = 1 To UBound(ArrNum)
Cells(1, i + 2) = ArrNum(i)
Next
Cells(1, 2) = RowHeader
End Sub
The same logic is applicable to your second case. Hope this may help.
Unless I'm overlooking something, you may not need VBA at all. Have you tried the "Text to Columns" option? If you select the cell(s) with the information you would like to split up, and go to Data -> Text to Columns. There, you can choose "delimited" and choose a space as a delimiter, which will split your data into multiple cells, split by where the space is.
edit: Just realized that will also split up your string. In that case, when you are in 3rd part of the Text to Columns, choose a destaination cell that isn't the cell with your data. (I.E. if your data is in A1, choose B1 as destination, and it'll put the split info there. Then just combine the text columns with something like =B1&" "&C1&" "&D1)
I was able to properly split the values using the following:
If i.Value Like "*on Mission*" Then
x = Split(i, " ")
For y = 0 To UBound(x)
i.Offset(0, y + 1).Value = x(y)
Next y
End If

What is the most efficient format for storing strings from a for loop?

I have a script that runs through a series of strings and using regex pulls out certain strings (approx 4 output strings per input string).
e.g. HelloStackOverflowWorld
-> Hello; Stack; Overflow; World;
The final output would ideally be a table where I can filter based upon the strings in the columns. Using the case above, column 1 row 1 would have 'Hello', column 2 row 1 would have 'Stack' and so on.
The problem is, the size of the output will change depending on the input so I am unsure of what output format to use.
At the moment I used something similar to this:
if strfind(missing{ii},'hello')
miss.exch = [miss.exch;'hello'];
temp.exc = regexp(missing{ii},'(?<=\d[Q|T])(\w*?)(?=[q])','match');
miss.exc = [miss.exc;temp.exc];
temp.TQ= regexp(missing{ii},'(Qc|Tc)','match');
if strcmp(temp.TQ{1,1}, 'Tc')
miss.TQ = [miss.TQ;'variableA'];
elseif temp.TQ{1,1} == 'Qc'
miss.TQ = [miss.TQ;'variableB'];
end
else if .........
end
Which obviously results in a 1x1 struct consisting of a number of fields each with many cells. This makes filtering on strings an issue!
How can I define and add data into a 'table of strings' that I can then filter?
I think you are just looking for a cell array. Here is a simple example of what they can do:
C = {'Abc','Bcd';'Cde',[]}
strcmp(C,'Cde')
Results in:
ans =
0 0
1 0
Make sure to check doc cell to see how you can access them.

Resources