Java How to Extract a Double from a String from a text File - java.util.scanner

My Java assignment is to to write a Java application that uses data from a text file with earth quake information and generate a report with:
the count of the number of values read,
the total sum of the magnitudes,
the average magnitude value (to 2 decimal places),
the maximum magnitude value, where it happened and when
the minimum magnitude value, where it happened and when
I can get the file to open using File and Scanner classes but I can't get it to read extract only a double (i.e the magnitudes below: 1.6, 1.8, ect.). I can get it to count the number of lines. I've tried deliminators with the Scanner and Pattern classes and then using Double parse but it won;t work. I'm a rookie for sure and this class just took a leap.
Example of input from text file:
1.6,"Southern California","Wednesday, January 18, 2012 19:19:12 UTC"
1.8,"Southern California","Wednesday, January 18, 2012 19:03:00 UTC"
1.8,"Southern California","Wednesday, January 18, 2012 18:46:53 UTC"
4.7,"Bonin Islands, Japan region","Wednesday, January 18, 2012 18:20:40 UTC"
1.6,"Southern California","Wednesday, January 18, 2012 17:58:07 UTC"

Try this
String str = "1.6,california,wednesday,whatever";
String[] results = str.split(",");
Double testDouble = Double.valueOf(str[0]);

Related

How to convert a string into datetime with format directives from another Timezone with Python 3

I have function in my Django application that pulls informations from SSH. I get a raw date string like the following (French language):
'mar. 27 mars 2012 17:51:42 CEST'
My issue is that I need to convert this string to a datetime object before passing it to update one of my model, but I get an error.
ssh_output = 'mar. 27 mars 2012 17:51:42 CEST'
my_datetime = datetime.datetime.strptime(ssh_output,"%a. %d %B %Y %H:%M:%S CEST")
Above code throws an exception:
time data 'mar. 27 mars 2012 17:51:42 CEST' does not match format '%a.
%d %B %Y %H:%M:%S CEST'
I'm confused because locale.getlocale() outputs: "
('fr_FR', 'UTF-8')
But testing datetime format with datetime.datetime.now().strftime("%a %d %B %Y %H:%M:%S %Z") outputs (English language):
Wed 08 July 2020 15:46:11
Also, all date format directives seems correct (plus literal strings '.' and 'CEST'):
%a : Weekday as locale’s abbreviated name.
%d : Day of the month as a zero-padded decimal number.
%B : Month as locale’s full name.
%Y : Year with century as a decimal number.
%H : Hour (24-hour clock) as a zero-padded decimal number.
%M : Minute as a zero-padded decimal number.
%S : Second as a zero-padded decimal number.
So it seems that raw datetime string grabbed from SSH output 'mar. 27 mars 2012 17:51:42 CEST' can't be parsed/converted because it does not match current timezone...
How can I convert this raw datetime string into datetime in this case?
To parse CEST to the correct timezone, you could split the ssh output into the datetime part and the timezone part. After that you can apply a mapping dict:
import datetime
import dateutil
import locale
locale.setlocale(locale.LC_ALL, 'fr_FR')
ssh_output = 'mar. 27 mars 2012 17:51:42 CEST'
# assuming timezone abbreviation separated by space
parts = ssh_output.split(' ')
timestr, tzstr = ' '.join(parts[:-1]), parts[-1]
# create a custom mapping dict
tzmapping = {'CEST': dateutil.tz.gettz('Europe/Paris')}
my_datetime = datetime.datetime.strptime(timestr,"%a %d %B %Y %H:%M:%S")
my_datetime = my_datetime.replace(tzinfo=tzmapping[tzstr])
# datetime.datetime(2012, 3, 27, 17, 51, 42, tzinfo=tzfile('Europe/Paris'))

python3: Split time series by diurnal periods

I have the following dataset:
01/05/2020,00,26.3,27.5,26.3,80,81,73,22.5,22.7,22.0,993.7,993.7,993.0,0.0,178,1.2,-3.53,0.0
01/05/2020,01,26.1,26.8,26.1,79,80,75,22.2,22.4,21.9,994.4,994.4,993.7,1.1,22,2.0,-3.54,0.0
01/05/2020,02,25.4,26.1,25.4,80,81,79,21.6,22.3,21.6,994.7,994.7,994.4,0.1,335,2.3,-3.54,0.0
01/05/2020,03,23.3,25.4,23.3,90,90,80,21.6,21.8,21.5,994.7,994.8,994.6,0.9,263,1.5,-3.54,0.0
01/05/2020,04,22.9,24.2,22.9,89,90,86,21.0,22.1,21.0,994.2,994.7,994.2,0.3,268,2.0,-3.54,0.0
01/05/2020,05,22.8,23.1,22.8,90,91,89,21.0,21.4,20.9,993.6,994.2,993.6,0.7,264,1.5,-3.54,0.0
01/05/2020,06,22.2,22.8,22.2,92,92,90,20.9,21.2,20.8,993.6,993.6,993.4,0.8,272,1.6,-3.54,0.0
01/05/2020,07,22.6,22.6,22.0,91,93,91,21.0,21.2,20.7,993.4,993.6,993.4,0.4,284,2.3,-3.49,0.0
01/05/2020,08,21.6,22.6,21.5,92,92,90,20.2,20.9,20.1,993.8,993.8,993.4,0.4,197,2.1,-3.54,0.0
01/05/2020,09,22.0,22.1,21.5,92,93,92,20.7,20.8,20.2,994.3,994.3,993.7,0.0,125,2.1,-3.53,0.0
01/05/2020,10,22.7,22.7,21.9,91,92,91,21.2,21.2,20.5,995.0,995.0,994.3,0.0,354,0.0,70.99,0.0
01/05/2020,11,25.0,25.0,22.7,83,91,82,21.8,22.1,21.1,995.5,995.5,995.0,0.8,262,1.5,744.8,0.0
01/05/2020,12,27.9,28.1,24.9,72,83,70,22.3,22.8,21.6,996.1,996.1,995.5,0.7,228,1.9,1392.,0.0
01/05/2020,13,30.4,30.4,27.7,58,72,55,21.1,22.6,20.4,995.9,996.2,995.9,1.6,134,3.7,1910.,0.0
01/05/2020,14,31.7,32.3,30.1,50,58,48,20.2,21.3,19.7,995.8,996.1,995.8,3.0,114,5.4,2577.,0.0
01/05/2020,15,32.9,33.2,31.8,44,50,43,19.1,20.5,18.6,994.9,995.8,994.9,0.0,128,5.6,2853.,0.0
01/05/2020,16,33.2,34.4,32.0,46,48,41,20.0,20.0,18.2,994.0,994.9,994.0,0.0,125,4.3,2700.,0.0
01/05/2020,17,33.1,34.5,32.7,44,46,39,19.2,19.9,18.5,993.4,994.1,993.4,0.0,170,1.6,2806.,0.0
01/05/2020,18,33.6,34.2,32.6,41,47,40,18.5,20.0,18.3,992.6,993.4,992.6,0.0,149,0.0,2319.,0.0
01/05/2020,19,33.5,34.7,32.1,43,49,39,19.2,20.4,18.3,992.3,992.6,992.3,0.3,168,4.1,1907.,0.0
01/05/2020,20,32.1,33.9,32.1,49,51,41,20.2,20.7,18.5,992.4,992.4,992.3,0.1,192,3.7,1203.,0.0
01/05/2020,21,29.9,32.2,29.9,62,62,49,21.8,21.9,20.2,992.3,992.4,992.2,0.0,188,2.9,408.0,0.0
01/05/2020,22,28.5,29.9,28.4,67,67,62,21.8,22.0,21.7,992.5,992.5,992.3,0.4,181,2.3,6.817,0.0
01/05/2020,23,27.8,28.5,27.8,71,71,66,22.1,22.1,21.5,993.1,993.1,992.5,0.0,225,1.6,-3.39,0.0
02/05/2020,00,27.4,28.2,27.3,75,75,68,22.5,22.5,21.7,993.7,993.7,993.1,0.5,139,1.5,-3.54,0.0
02/05/2020,01,27.3,27.7,27.3,72,75,72,21.9,22.6,21.9,994.3,994.3,993.7,0.0,126,1.1,-3.54,0.0
02/05/2020,02,25.4,27.3,25.2,85,85,72,22.6,22.8,21.9,994.4,994.5,994.3,0.1,256,2.6,-3.54,0.0
02/05/2020,03,25.5,25.6,25.3,84,85,82,22.5,22.7,22.1,994.3,994.4,994.2,0.0,329,0.7,-3.54,0.0
02/05/2020,04,24.5,25.5,24.5,86,86,82,22.0,22.5,21.9,993.9,994.3,993.9,0.0,290,1.2,-3.54,0.0
02/05/2020,05,24.0,24.5,23.5,87,88,86,21.6,22.1,21.3,993.6,993.9,993.6,0.7,285,1.3,-3.54,0.0
02/05/2020,06,23.7,24.1,23.7,87,87,85,21.3,21.6,21.3,993.1,993.6,993.1,0.1,305,1.1,-3.51,0.0
02/05/2020,07,22.7,24.1,22.5,91,91,86,21.0,21.7,20.7,993.1,993.3,993.1,0.6,220,1.1,-3.54,0.0
02/05/2020,08,22.9,22.9,22.6,92,92,91,21.5,21.5,21.0,993.2,993.2,987.6,0.0,239,1.5,-3.53,0.0
02/05/2020,09,22.9,23.0,22.8,93,93,92,21.7,21.7,21.4,993.6,993.6,993.2,0.0,289,0.4,-3.53,0.0
02/05/2020,10,23.5,23.5,22.8,92,93,92,22.1,22.1,21.6,994.3,994.3,993.6,0.0,256,0.0,91.75,0.0
02/05/2020,11,26.1,26.2,23.5,80,92,80,22.4,23.1,22.2,995.0,995.0,994.3,1.1,141,1.9,789.0,0.0
02/05/2020,12,28.7,28.7,26.1,69,80,68,22.4,22.7,22.1,995.5,995.5,995.0,0.0,116,2.2,1468.,0.0
02/05/2020,13,31.4,31.4,28.6,56,69,56,21.6,22.9,21.0,995.5,995.7,995.4,0.0,65,0.0,1762.,0.0
02/05/2020,14,32.1,32.4,30.6,48,58,47,19.8,22.0,19.3,995.0,995.6,990.6,0.0,105,0.0,2657.,0.0
02/05/2020,15,34.0,34.2,31.7,43,48,42,19.6,20.1,18.6,993.9,995.0,993.9,3.0,71,6.0,2846.,0.0
02/05/2020,16,34.7,34.7,32.3,38,48,38,18.4,20.3,18.3,992.7,993.9,992.7,1.4,63,6.3,2959.,0.0
02/05/2020,17,34.0,34.7,32.7,42,46,38,19.2,20.0,18.4,991.7,992.7,991.7,2.2,103,4.8,2493.,0.0
02/05/2020,18,34.3,34.7,33.6,41,42,38,19.1,19.4,18.0,991.2,991.7,991.2,2.0,141,4.8,2593.,0.0
02/05/2020,19,33.5,34.5,32.5,42,47,39,18.7,20.0,18.4,990.7,991.4,989.9,1.8,132,4.2,1317.,0.0
02/05/2020,20,32.5,34.2,32.5,47,48,40,19.7,20.3,18.7,990.5,990.7,989.8,1.3,191,4.2,1250.,0.0
02/05/2020,21,30.5,32.5,30.5,59,59,47,21.5,21.6,20.0,979.8,990.5,979.5,0.1,157,2.9,345.5,0.0
02/05/2020,22,28.6,30.5,28.6,67,67,59,21.9,21.9,21.5,978.9,980.1,978.7,0.6,166,2.2,1.122,0.0
02/05/2020,23,27.2,28.7,27.2,74,74,66,22.1,22.2,21.6,978.9,979.3,978.6,0.0,246,1.7,-3.54,0.0
03/05/2020,00,26.5,27.2,26.0,77,80,74,22.2,22.5,22.0,979.0,979.1,978.7,0.0,179,1.4,-3.54,0.0
03/05/2020,01,26.0,26.6,26.0,80,80,77,22.4,22.5,22.1,979.1,992.4,978.7,0.0,276,0.6,-3.54,0.0
03/05/2020,02,26.0,26.5,26.0,79,81,75,22.1,22.5,21.7,978.8,979.1,978.5,0.0,290,0.6,-3.53,0.0
03/05/2020,03,25.3,26.0,25.3,83,83,79,22.2,22.4,21.8,978.6,989.4,978.5,0.5,303,1.0,-3.54,0.0
03/05/2020,04,25.3,25.6,24.6,81,85,81,21.9,22.5,21.7,978.1,992.7,977.9,0.7,288,1.5,-3.00,0.0
03/05/2020,05,23.7,25.3,23.7,88,88,81,21.5,21.9,21.5,977.6,991.8,977.3,1.2,256,1.8,-3.54,0.0
03/05/2020,06,23.3,23.7,23.3,91,91,88,21.7,21.7,21.5,976.9,977.6,976.7,0.4,245,1.8,-3.54,0.0
03/05/2020,07,23.0,23.6,23.0,91,91,89,21.4,21.9,21.3,976.7,977.0,976.4,0.9,257,1.9,-3.54,0.0
03/05/2020,08,23.4,23.4,22.9,90,92,90,21.7,21.7,21.3,976.8,976.9,976.5,0.4,294,1.6,-3.52,0.0
03/05/2020,09,23.0,23.5,23.0,88,90,87,21.0,21.6,20.9,992.1,992.1,976.7,0.8,263,1.6,-3.54,0.0
03/05/2020,10,23.2,23.2,22.5,91,92,88,21.6,21.6,20.8,993.0,993.0,992.2,0.1,226,1.5,29.03,0.0
03/05/2020,11,26.0,26.1,23.2,77,91,76,21.6,22.1,21.5,993.8,993.8,982.1,0.0,120,0.9,458.1,0.0
03/05/2020,12,26.6,27.0,25.5,76,80,76,22.1,22.5,21.4,982.7,994.3,982.6,0.3,121,2.3,765.3,0.0
03/05/2020,13,28.5,28.7,26.6,66,77,65,21.5,23.1,21.2,982.5,994.2,982.4,1.4,130,3.2,1219.,0.0
03/05/2020,14,31.1,31.1,28.5,55,66,53,21.0,21.8,19.9,982.3,982.7,982.1,1.2,129,3.7,1743.,0.0
03/05/2020,15,31.6,31.8,30.7,50,55,49,19.8,20.8,19.2,992.9,993.5,982.2,1.1,119,5.1,1958.,0.0
03/05/2020,16,32.7,32.8,31.1,46,52,46,19.6,20.7,19.2,991.9,992.9,991.9,0.8,122,4.4,1953.,0.0
03/05/2020,17,32.3,33.3,32.0,44,49,42,18.6,20.2,18.2,990.7,991.9,979.0,2.6,133,5.9,2463.,0.0
03/05/2020,18,33.1,33.3,31.9,44,50,44,19.3,20.8,18.9,989.9,990.7,989.9,1.1,170,5.4,2033.,0.0
03/05/2020,19,32.4,33.2,32.2,47,47,44,19.7,20.0,18.7,989.5,989.9,989.5,2.4,152,5.2,1581.,0.0
03/05/2020,20,31.2,32.5,31.2,53,53,46,20.6,20.7,19.4,989.5,989.7,989.5,1.7,159,4.6,968.6,0.0
03/05/2020,21,29.7,32.0,29.7,62,62,51,21.8,21.8,20.5,989.7,989.7,989.4,0.8,154,4.0,414.2,0.0
03/05/2020,22,28.3,29.7,28.3,69,69,62,22.1,22.1,21.7,989.9,989.9,989.7,0.3,174,2.0,6.459,0.0
03/05/2020,23,26.9,28.5,26.9,75,75,67,22.1,22.5,21.7,990.5,990.5,989.8,0.2,183,1.0,-3.54,0.0
The second column is time (hour). I want to separate the dataset by morning (06-11), afternoon (12-17), evening (18-23) and night (00-05). How I can do it?
You can use pd.cut:
bins = [-1,5,11,17,24]
labels = ['morning', 'afternoon', 'evening', 'night']
df['day_part'] = pd.cut(df['hour'], bins=bins, labels=labels)
I added column names, including Hour for the second column.
Then I used read_csv which reads the source text, "dropping" leading
zeroes, so that Hour column is just int.
To split rows (add a column marking the diurnal period), use:
df['period'] = pd.cut(df.Hour, bins=[0, 6, 12, 18, 24], right=False,
labels=['night', 'morning', 'afternoon', 'evening'])
Then you can e.g. use groupby to process your groups.
Because I used right=False parameter, the bins are closed on the left
side, thus bin limits are more natural (no need for -1 as an hour).
And bin limits (except for the last) are just starting hours of
each period - quite natural notation.

How to reconstruct original text from spaCy tokens, even in cases with complicated whitespacing and punctuation

' '.join(token_list) does not reconstruct the original text in cases with multiple whitespaces and punctuation in a row.
For example:
from spacy.tokenizer import Tokenizer
from spacy.lang.en import English
nlp = English()
# Create a blank Tokenizer with just the English vocab
tokenizerSpaCy = Tokenizer(nlp.vocab)
context_text = 'this is a test \n \n \t\t test for \n testing - ./l \t'
contextSpaCyToksSpaCyObj = tokenizerSpaCy(context_text)
spaCy_toks = [i.text for i in contextSpaCyToksSpaCyObj]
reconstruct = ' '.join(spaCy_toks)
reconstruct == context_text
>False
Is there an established way of reconstructing original text from spaCy tokens?
Established answer should work with this edge case text (you can directly get the source from clicking the 'improve this question' button)
" UNCLASSIFIED U.S. Department of State Case No. F-2014-20439 Doc No. C05795279 Date: 01/07/2016\n\n\n RELEASE IN PART\n B5, B6\n\n\n\n\nFrom: H <hrod17#clintonemail.com>\nSent: Monday, July 23, 2012 7:26 AM\nTo: 'millscd #state.gov'\nCc: 'DanielJJ#state.gov.; 'hanleymr#state.gov'\nSubject Re: S speech this morning\n\n\n\n Waiting to hear if Monica can come by and pick up at 8 to take to Josh. If I don't hear from her, can you send B5\nsomeone else?\n\n Original Message ----\nFrom: Mills, Cheryl D [MillsCD#state.gov]\nSent: Monday, July 23, 2012 07:23 AM\nTo: H\nCc: Daniel, Joshua J <Daniel1.1#state.gov>\nSubject: FW: S speech this morning\n\nSee below\n\n B5\n\ncdm\n\n Original Message\nFrom: Shah, Rajiv (AID/A) B6\nSent: Monday, July 23, 2012 7:19 AM\nTo: Mills, Cheryl D\nCc: Daniel, Joshua.'\nSubject: S speech this morning\n\nHi cheryl,\n\nI look fwd to attending the speech this morning.\n\nI had one last minute request - I understand that in the final version there is no reference to the child survival call to\naction, but their is a reference to family planning efforts. Could you and josh try to make sure there is some specific\nreference to the call to action?\n\nAlso, in terms of acknowledgements it would be good to note torn friedan's leadership as everyone is sensitive to our ghi\ntransition and we want to continue to send the usaid-pepfar-cdc working together public message. I don't know if he is\nthere, but wanted to flag.\n\nLook forward to it.\n\nRaj\n\n\n\n\n UNCLASSIFIED U.S. Department of State Case No. F-2014-20439 Doc No. C05795279 Date: 01/07/2016\n\x0c"
You can very easily accomplish this by changing two lines in your code:
spaCy_toks = [i.text + i.whitespace_ for i in contextSpaCyToksSpaCyObj]
reconstruct = ''.join(spaCy_toks)
Basically, each token in spaCy knows whether it is followed by whitespace or not. So you call token.whitespace_ instead of joining them on space by default.

How to convert Azure date into friendly date?

In azure I am using an api, and I get back this in the json response.
Date(1533024552000)
Does anyone know how to convert that into a regular date like July 2 2018 for example?
Thanks
You can use UnixDateTimeConverter class. Converts a DateTime object to and from JSON. DateTime is represented as the total number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (in ISO 8601: 1970-01-01T00:00:00Z).
public class AzureResponse
{
[JsonConverter(typeof(UnixDateTimeConverter))]
public DateTime Date;
}
static void Main(string[] args)
{
AzureResponse input = new AzureResponse() { Date = new DateTime(2018,7,31,10,09,12)};
string output = JsonConvert.SerializeObject(input);
// "{\"Date\":1533031752}"
AzureResponse readBack = JsonConvert.DeserializeObject<AzureResponse>(output);
// Date = {31.07.2018 10:09:12}
}
Epoch, also known as Unix timestamps, is the number of seconds (not milliseconds!) that have elapsed since January 1, 1970 at 00:00:00 GMT (1970-01-01 00:00:00 GMT). https://www.freeformatter.com/epoch-timestamp-to-date-converter.html

Format the time from moment js

When I am trying to receive mail from gmail, I get time in this format (Mon, 12 Jun 2017 10:29:07 +0530). I want to calculate time minus current time. How to do so?
var testDate = moment("Mon, 12 Jun 2017 10:29:07 +0530");
//Relative to time in human readble format
testDate.fromNow(); //3 days ago
You can simply call the moment constructor with the given Date format. moment.js is smart enough to parse it for you. To get the difference you can convert it into unix based time format and subtract it.
const givenTime = moment("Mon, 12 Jun 2017 10:29:07 +0530").unix()
const currentTime = moment().unix()
//Difference in milliseconds
const diff = givenTime - currentTime

Resources