Trying to remove only the characters F or C following numbers - node.js

I'm trying to find a regex (that will work in node.js) to remove the Fahrenheit and Celsius letters and replace them with " degrees" from the below weather forecast string.
"Clear skies. Low 46F. NNW winds shifting to ENE at 10 to 15 mph."
I want the above string to read as below:
"Clear skies. Low 46 degrees. NNW winds shifting to ENE at 10 to 15 mph."
There could be more than one instance of a temperature in the string.
NOTE: I only want to remove the F or C if it's immediately following a number with no space in-between. If "Florida" were in the above string, I'd want the letter "F" left untouched.
I've tried the below regex, but it finds the entire 46F. I just want it changed to 46 degrees.
/\d+[FC]/g
Thanks.

That is because the lack of the capturing group (parentheses):
Use this:
/(\d+)[FC]/g
$1 means the first capturing group. The \d+ in this case.
speechOutput = speechOutput.replace(/(\d+)[FC]/g, '$1 degrees');

Related

Python Regex for pattern 2 digits to 2 digits like - 26 to 40

Please help, regex blown my mind.
I am cleaning data in Pandas dataframe (python 3).
I tried so many combos of regex found on the web for digits but none work for my case. I can't seem to figure out how to write my own regex for pattern 2 digits space to space 2 digits (example 26 to 40).
My challenge is to extract from pandas column BLOOM (scraped data) number of petals. Frequently petals are specified as "dd to dd petals". I know that 2 digits in regex are \d\d or \d{2} but how do I incorporate split by "to"? It will also be good to have a condition that the pattern is followed by word "petals".
Surely I am not the first person that needs regex in python for pattern \d\d to \d\d.
Edit:
I realised that my question without a sample dataframe is a bit confusing. So here is a sample dataframe.
import pandas as pd
import re
# initialize list of lists
data = [['Evert van Dijk', 'Carmine-pink, salmon-pink streaks, stripes, flecks. Warm pink, clear carmine pink, rose pink shaded salmon. Mild fragrance. Large, very double, in small clusters, high-centered bloom form. Blooms in flushes throughout the season.'],
['Every Good Gift', 'Red. Flowers velvety red. Moderate fragrance. Average diameter 4". Medium-large, full (26-40 petals), borne mostly solitary bloom form. Blooms in flushes throughout the season.'],
['Evghenya', 'Orange-pink. 75 petals. Large, very double bloom form. Blooms in flushes throughout the season.'],
['Evita', 'White or white blend. None to mild fragrance. 35 petals. Large, full (26-40 petals), high-centered bloom form. Blooms in flushes throughout the season.'],
['Evrathin', 'Light pink. [Deep pink.] Outer petals white. Expand rarely. Mild fragrance. 35 to 40 petals. Average diameter 2.5". Medium, double (17-25 petals), full (26-40 petals), cluster-flowered, in small clusters bloom form. Prolific, once-blooming spring or summer. Glandular sepals, leafy sepals, long sepals buds.'],
['Evita 2', 'White, blush shading. Mild, wild rose fragrance. 20 to 25 petals. Average diameter 1.25". Small, very double, cluster-flowered bloom form. Blooms in flushes throughout the season.']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['NAME', 'BLOOM'])
# print dataframe.
df
This worked for me:
import re
sample = '2 digits (example 26 to 40 petals) and 16 to 43 petals.'
re.compile(r"\d{2}\sto\s\d{2}\spetals").findall(sample)
Output:
['26 to 40 petals', '16 to 43 petals']
As you have stated, \d{2} finds 2 digit numbers, \sto\s finds the word 'to' surrounded by blank spaces, then \d{2} again for the second 2-digit number, followed by a space (\s) and the word 'petals'.
You can use
df['res_col'] = df['src_col'].str.extract(r'(?<!\d)(\d{2}\s+to\s+\d{2})\s*petal', expand=False)
See the regex demo
Details
(?<!\d) - a negative lookbehind making sure there is no digit immediately on the left of the current location
(\d{2}\s+to\s+\d{2}) - Group 1 (the actual return of str.extract):
\d{2} - two digits
\s+to\s+ - 1+ whitespaces, to string, 1+ whitespaces
\d{2} - two digits
\s*petal - 0+ whitespaces followed with petal.
Posting an answer to show how I solved petals data extraction from column BLOOM. I had to use multiple regex to get all data that I wanted. This question was only covering one of the regex I used.
Sample dataframe looks like this when printed:
I created those columns before I run into the issue that lead to this post. My initial approach was to get all the data in the brackets.
#coping content in column BLOOM inside first brackets into new column PETALS
df['PETALS'] = df['BLOOM'].str.extract('(\\(.*?)\\)', expand=False).str.strip()
df['PETALS'] = df['PETALS'].str.replace("(","")
# #coping content in column BLOOM inside all brackets into new column ALL_PETALS_BRACKETS
df['ALL_PETALS_BRACKETS'] = df['BLOOM'].str.findall('(\\(.*?)\\)')
df[['NAME','BLOOM','PETALS', 'ALL_PETALS_BRACKETS']]
I later realised that this way only getting petal values for some rows. Petals can be specified in column BLOOM in more than one way. Another common pattern is "2 digits to 2 digits". There is also pattern "2 digits petals.".
# solution provided by Wiktor Stribiżew
df['PETALS_Wiktor_S'] = df['BLOOM'].str.extract(r'(?<!\d)(\d{2}\s+to\s+\d{2})\s*petal', expand=False)
# my modification that worked on the main df and not only on the test one.
# now lets copy part of column BLOOM that matches regex pattern two digits to two digits
df['PETALS5'] = df['BLOOM'].str.extract(r'(\d{2}\s+to\s+\d{2})', expand=False).str.strip()
# also came across cases where pattern is two digits followed by word "petals"
#now lets copy part of column BLOOM that matches regex patern two digits followed by word "petals"
df['PETALS6'] = df['BLOOM'].str.extract(r'(\d{2}\s+petals+\.)', expand=False).str.strip()
df
Since I was after pattern "2 digits petals." I had to modify my regex so it looks for dot using +\. in r'(\d{2}\s+petals+\. If regex is written as r'(\d{2}\s+petals. it grabs cases where word petals is followed by . and (.

delete numbers but not if they are part of a string

I want to delete numbers contained in the text with Python. But I don´t want to delete the numbers that are part of a string.
For example,
strings = [ "There are 55 cars in 45trt avenue"
In this case 55 should be deleted, but not 45trt, that should remain the same
Thanks in advance,
You could try searching for numbers which are surrounded by word boundaries:
inp = "There are 55 cars in 45trt avenue"
output = re.sub(r'\s*\b\d+\b\s*', ' ', inp).strip()
print(output)
This prints:
There are cars in 45trt avenue
The logic here is to actually replace with a single space, to ensure that the resulting string is still properly spaced. This opens an edge case for numbers which might happen to appear at the very beginning or end, leaving behind an extra space. To handle that, we trim using strip().

COBOL: How to count all characters after trimming all the spaces before and after Input

STRING FUNCTION TRIMR(EINA01 OF FORMAT1)
DELIMITED BY SIZE
INTO WORTTXT1
END-STRING.
MOVE FUNCTION REVERSE (WORTTXT1) TO WORTTXT2.
STRING FUNCTION TRIMR(WORTTXT2)
DELIMITED BY SIZE
INTO WORTTXT3
END-STRING.
INSPECT WORTTXT3 TALLYING LOO FOR CHARACTERS
BEFORE INITIAL SPACES.
MOVE EINN01 OF FORMAT1 TO X.
MOVE EINN02 OF FORMAT1 TO Y.
MOVE EINA01 OF FORMAT1 (X:Y)
TO AUSA01 OF FORMAT1.
Our problem is that if we exceed the length of the Variable EINA01, which is 50, the program crashes.
Our idea was to trim all the spaces from left and right and count all characters of the input given.
THe problem we face is that we have no way to count all the characters, since we would usually do it with "Inspect count all characters before initial spaces".
But if we for example have an input like "Hello World" he would only count everything till the first space after "Hello".
If you want to get the length of a string there a couple of different methods to do this:
METHOD 1
a simple loop:
WS-INPUT-STRING PIC X(100) VALUE "12345678901234567890".
WS-OUTPUT-STRING PIC X(50).
WS-POS PIC X(4) COMP.
PERFORM VARYING WS-POS
FROM 100 BY -1
UNTIL WS-INPUT-STRING(WS-POS:1)
NOT EQUAL SPACE OR
WS-POS < 1
END-PERFORM
IF WS-POS <= 50
MOVE WS-INPUT-STRING(1:WS-POS) TO WS-OUTPUT-STRING
END-IF
METHOD 2
inspect tallying
WS-INPUT-STRING PIC X(100) VALUE "12345678901234567890".
WS-OUTPUT-STRING PIC X(50).
WS-BLANK-COUNT PIC 9(4) COMP.
WS-IN-MAX PIC 9(4) COMP VALUE 100.
INSPECT FUNCTION REVERSE (WS-INPUT-STRING)
TALLYING WS-BLANK-COUNT FOR LEADING SPACES
IF (WS-IN-MAX - WS-BLANK-COUNT) <= 50
MOVE WS-INPUT-STRING(1:WS-IN-MAX - WS-BLANK-COUNT)
END-IF
both of these are viable options. I prefer the loop my self.
Also remember typically, leading spaces are important, I wouldn't recommend trimming them unless you are 100% sure they are not required.

replace value in a string with different values where the string changes

I have a 5 character code that needs to be converted to a 4 character code. Additionally, the 5th character is either a 1, 2 or 5, and I need to convert them to 1, 5 or 9. As an example, if my query returns '20155', I need to translate that to '2159'. So far I have:
select substr(fieldname,1,1) || substr(fieldname,3,2) || substr(fieldname,-1,1) as newfieldname
This converts it from 5 to 4 characters. What I don't know how to do is also change the last character to the new value as described above.
A sample of what I want it to achieve is:
20141 becomes 2141
20142 becomes 2145
20145 becomes 2149
20151 becomes 2151
20152 becomes 2155
20155 becomes 2159
Any assistance would be appreciated. I am not a computer programmer - I am a functional analyst that has to validate over 500,000 rows of data, each row containing the fieldname above.
You can use "case when ":
Case when 2 then 5 , etc...

Vim multiple filtering of a file, with 2 filters based upon number values

I do not know if that title will sound adequate …
Let us say I have a file (> 1000 lines) with a homogeneous structure throughout consisting of three "fields" separated by a space :
1. an integer (negative or positive)
<space>
2. another integer (negative or positive)
<space>
3. some text (description)
The integers are >-10000 and < 10000
My problem is : how can I
a) filter this file with criteria such as "1st integer <= 1000" AND "2nd integer >=250" AND "text contains : Boston OR New-York"
b) and put the subset in a new buffer, allowing me to read the results and only the results of the filter(s) ?
I wish to do that with Vim only, not knowing if it is feasible or reasonable (anyway it is above my skills)
Thanks
#FDinoff : sorry, I should have done what you suggest, of course :
It could be a chronology with a StartDate, an EndDate, and a Description :
1 -200 -50 Period one in Italy
2 -150 250 Period one in Greece
3 -50 40 Period two in Italy
4 10 10 Some event in Italy
5 20 20 Event two in Greece
The filter could be : Filter the items where (to mimic SQL) StartDate <=-50 AND EndDate >=0 AND Description contains Greece, with a resulting filter => line 2
The following generic form will match the numeric parts of your format:
^\s*-\?\d\+\s\+-\?\d\+
To implement restrictions on the numbers, replace each -\?\d\+ with a more specific pattern. For example, for <= -50:
-\([5-9][0-9]\|[1-9][0-9]\{2,}\)
That is, - followed by either a 2 digit number where the first digit is >= 5, or a >= 3 digit number.
Similarly, for >= 250:
\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)
Combining the two:
^\s*-\([5-9][0-9]\|[1-9][0-9]\{2,}\)\s\+\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)
If you also need to filter by some pattern in the description, append that:
^\s*-\([5-9][0-9]\|[1-9][0-9]\{2,}\)\s\+\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)\s\+.\{-}Greece
.\{-} is the lazy version of .*.
To filter by this pattern and write the output to a file, use the following:
:g/pattern/.w filename
Thus, to filter by "first number <= -50 AND second number >= 250 AND 'Greece' in description" and write the output to greece.out:
:g/^\s*-\([5-9][0-9]\|[1-9][0-9]\{2,}\)\s\+\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)\s\+.\{-}Greece/.w greece.out
More complex ranges quickly make this even more ridiculous; you're probably better off parsing the file and filtering with something other than regex.

Resources