Mutate: character string is not in a standard unambiguous format - string

I have a column titled started_at which is formatted in like this: 4/12/2021 18:25. When I try to run code, I get the following message:
Error in mutate():
! Problem while computing day_of_week = wday(start_time, label = TRUE).
Caused by error in as.POSIXlt.character():
! character string is not in a standard unambiguous format
This is the code I am trying to run:
**> cyclistic_trips_merge_v2 %>%
mutate(day_of_week = wday(start_time, label = TRUE)) %>% #creates weekday field using wday()
group_by(usertype, day_of_week) %>% #groups by usertype and weekday
summarise(number_of_rides = n() #calculates the number of rides and average duration
,average_duration = mean(ride_length)) %>% # calculates the average duration
arrange(usertype, day_of_week)**
I am new to R, and this is a capstone project. I get stuck but then figure my way around things doing web searches but right now I am kind of stumped. The date/time mentioned above is classified as a string. I believe that is the problem, but what do I need to convert it to and how can I do that? Can anyone please help? Losing my mind.

Related

Altair's selection and transform_filter via binding_range slider for datetime values doesn't seem to work with equality condition or selector itself

I wanted to bind a range slider with datetime values to filter a chart for data for a particular date only. Using the stocks data, what I want to do is have the x-axis show the companies and y-axis the price of the stocks for a particular day which the user selects via a range slider.
Based on inputs from this answer and this issue I have the following code which shows something
when the slider is moved around after one particular value (with the inequality condition in transform_filter), but is empty for the rest.
What is peculiar is that if I have inequality operator then at least it shows something, but everything is empty when its ==.
import altair as alt
from vega_datasets import data
source = data.stocks()
def timestamp(t):
return pd.to_datetime(t).timestamp()
slider = alt.binding_range(step=86400, min=timestamp(min(source['date'])), max=timestamp(max(source['date']))) #86400 is the difference b/w consequetive days
select_date = alt.selection_single(fields=['date'], bind=slider, init={'date': timestamp(min(source['date']))})
alt.Chart(source).mark_bar().encode(
x='symbol',
y='price',
).add_selection(select_date).transform_filter(alt.datum.date == select_date.date)
Since the output is empty I am inclined to conclude that it's the transform_filter that is causing issues, but I have been at it for more than 6 hours now and tried all the permutation and combinations of using alt.expr.toDate and other conversions here and there but I cannot get it to work.
Also tried just transform_filter(select_date.date) and transform_filter(date) along with other things but nothing quite works.
The expected output is that, the heights of bars change(due to data being filtered on date) as the user drags the slider.
Any help would be really appreciated.
There are several issues here:
In Vega-Lite, timestamps are expressed in milliseconds, not seconds
You are filtering on equality between a numerical timestamp and a string representation of a date.
Even if you parse the date in the filter expression, Python date parsing and Javascript date parsing behave differently and the results will generally not match. Even within javascript, the date parsing behavior can vary from browser to browser; all this means that filtering on equality of a Python and Javascript timestamp is generally problematic
The data you are using has monthly timestamps, so the slider step should account for this
Keeping all that in mind, the best course would probably be to adjust the slider values and filter on matching year and month, rather than trying to achieve equality in the exact timestamp. The result looks like this:
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.stocks()
def timestamp(t):
return pd.to_datetime(t).timestamp() * 1000
slider = alt.binding_range(
step=30 * 24 * 60 * 60 * 1000, # 30 days in milliseconds
min=timestamp(min(source['date'])),
max=timestamp(max(source['date'])))
select_date = alt.selection_single(
fields=['date'],
bind=slider,
init={'date': timestamp(min(source['date']))},
name='slider')
alt.Chart(source).mark_bar().encode(
x='symbol',
y='price',
).add_selection(select_date).transform_filter(
"(year(datum.date) == year(slider.date[0])) && "
"(month(datum.date) == month(slider.date[0]))"
)
You can view the result here: vega editor.

Query date range and product size from xlsx file

I'm using python 3.6 to do this. Below are just a few important columns that I'm interested to query out.
Auto-Gen Index : Product Container : Ship Date :.......
0 : Large Box : 2017-01-09:.......
1 : Large Box : 2012-07-15:.......
2 : Small Box : 2012-07-18:.......
3 : Large Box : 2012-07-31:.......
I would like to query rows that indicate Large Box as their product container and the shipping date must be in the period of July in the year of 2012.
file_name = r'''Sample-Superstore-Subset-Excel.xlsx'''
df = read_excel(file_name, sheet_name = my_sheet)
lb = df.loc[df['Product Container'] == 'Large Box'] //Get large box
july = lb[(lb['Ship Date'] > '2012-07-01') & (lb['Ship Date'] < '2012-07-31')]
I just wonder how to use query and where condition by python(pd.query())?
If your question is when to use loc vs where, see my answer here:
Think of loc as a filter - give me only the parts of the df that
conform to a condition.
where originally comes from numpy. It runs over an array and checks if
each element fits a condition. So it gives you back the entire array,
with a result or NaN. A nice feature of where is that you can also get
back something different, e.g. df2 = df.where(df['Goals']>10,
other='0'), to replace values that don't meet the condition with 0.
If you are asking when to use query, AFAIK there is no real reason to do besides performance. If you have a very large dataset, query is expected to be faster. More on high-level performance here.

How do I subtract two arrays of cells in Matlab

I am trying to get some variables and numbers out from an Excel table using Matlab.
The variables below named "diffZ_trial1-4" should be calculated by the difference between two columns (between "start" and "finish"). However I get the error:
Undefined operator '-' for input arguments of type"
'cell'.
And I have read somewhere that it could be related to the fact that I get {} output instead of [] and maybe I need to use cell2mat or convert the output somehow. But I must have done that wrongly, as it did not work!
Question: How can I calculate the difference between two columns below?
clear all, close all
[num,txt,raw] = xlsread('test.xlsx');
start = find(strcmp(raw,'HNO'));
finish = find(strcmp(raw,'End Trial: '));
%%% TIMELINE EACH TRIAL
time_trial1 = raw(start(1):finish(1),8);
time_trial2 = raw(start(2):finish(2),8);
time_trial3 = raw(start(3):finish(3),8);
time_trial4 = raw(start(4):finish(4),8);
%%%MOVEMENT EACH TRIAL
diffZ_trial1 = raw(start(1):finish(1),17)-raw(start(1):finish(1),11);
diffZ_trial2 = raw(start(2):finish(2),17)-raw(start(2):finish(2),11);
diffZ_trial3 = raw(start(3):finish(3),17)-raw(start(3):finish(3),11);
diffZ_trial4 = raw(start(4):finish(4),17)-raw(start(4):finish(4),11);
You are right, raw contains data of all types, including text (http://uk.mathworks.com/help/matlab/ref/xlsread.html#outputarg_raw). You should use num, which is a numeric matrix.
Alternatively, if you have an updated version of Matlab, you can try readtable (https://uk.mathworks.com/help/matlab/ref/readtable.html), which I think is more flexible. It creates a table from an excel file, containing both text and numbers.

passing Jan of selected year by default in prompt

I have 2 year-month prompts. If I don't select any year-month in 1st prompt, report should by default, run from January of the same year, selected in 2nd prompt. My prompts are value prompts and have string values. Please help me materialise the requirement. I have already tried # prompt macro, ?prompt?, case when etc. I am nto sure, If javascript would help.
I'm going to assume your underlying date fields are not stored as DATE value types since you're using strings. This may be easier split into 4 prompts: from month, from year, to month, to year.
The filter would then be an implied if:
(
(?FROM_YEAR? = '' or ?FROM_MONTH? = '') and
[database_from_month] = '01' and
[database_from_year] = ?TO_YEAR? and
[database_to_month] = ?TO_MONTH? and
[database_to_year] = ?TO_YEAR?
)
OR
(
(?FROM_YEAR? <> '' or ?FROM_MONTH? <> '') and
[database_from_month] = ?FROM_MONTH? and
[database_from_year] = ?FROM_YEAR? and
[database_to_month] = ?TO_MONTH? and
[database_to_year] = ?TO_YEAR?
)
The above style filter is superior for many reasons:
More likely to be sargeable
Easy to understand
Uses simple built-in Cognos functions; more likely to be cross-version compliant
No issues with cross-browser support you would get with Javascript
Code snippet would work in other Cognos studios (Business Insight, etc)
You've likely seen CASE statements in filters throws an error. The CASE statement is passed to SQL, not compiled into a SQL statement via Cognos. Hence it's not seen as proper syntax.

Comparring a string with a manually added string

If have been cracking my head over this for a week now:
We have an assignment, where we have 2 options in our program, with option 1, the program asks for a name and a date, and then it generates an email addressed to the give name, with that date.
The second option, we have to paste text in to program, and it will tell us if the 'template' from option 1 is used or not, and it gives you the name, and date.
my question is now: how do I compare the given string, with the manual input string and make that name, and date (could be 2nd of oktober, could be 10/02, could be sunday the 2nd, basically anything that isn't the same as the template) and still make it say the template matches?
I thought: cutting the strings up, comparing them, word for word, but then what? and how?
Since I do not know what language you are programming in, I will give you some examples of what you ask in the languages I know.
Python(2.7):
x = raw_input('Manual String') // get user input, this can be replaced
y = 'this is a string: '+ str(x) // use Str incase of a number or other format of x.
if(y == 'this is a string: doubleo'):
print "The strings are equal!"
C:
use this page:
http://www.tutorialspoint.com/ansi_c/c_strcmp.htm

Resources