I have the following code
from synapse.ml.cognitive import *
input_df = spark.createDataFrame([
("I am so happy today, its sunny!", "en-US"),
("I am frustrated by this rush hour traffic", "en-US"),
("The cognitive services on spark aint bad", "en-US"),
], ["text", "language"])
sentiment_df = (TextSentiment()
.setTextCol("text")
.setLocation("eastus")
.setUrl(end_point)
.setSubscriptionKey(service_key)
.setOutputCol("sentiment")
.setErrorCol("error")
.setLanguageCol("language")
.transform(input_df))
I created azure cognititve text service and used the same endpoint and key
but when running the code getting the following error, Any help is highly appreciated.
import synapse.ml
from synapse.ml.cognitive import *
from pyspark.sql.functions import col
input_df = spark.createDataFrame([
("I am so happy today, its sunny!", "en-US"),
("I am frustrated by this rush hour traffic", "en-US"),
("The cognitive services on spark aint bad", "en-US"),
], ["text", "language"])
Sentiment_df = (TextSentiment()
.setLinkedService(linked_service_name)
.setTextCol("text")
.setLocation("eastus")
.setUrl(end_point)
.setSubscriptionKey(service_key)
.setOutputCol("sentiment")
.setErrorCol("error")
.setLanguageCol("language")
display(sentiment.transform(df).select("text", col("sentiment")[0].getItem("sentiment").alias("sentiment"), col(“service_key”)[1].getItem(“service_key”).alias(“service_key”), col(“language”)[2].getItem(“language”).alias(“language”)
))
The error occurred due to the pattern of calling the display method. Replace the code block below with the above-mentioned code block.
from synapse.ml.cognitive import *
input_df = spark.createDataFrame([
("I am so happy today, its sunny!", "en-US"),
("I am frustrated by this rush hour traffic", "en-US"),
("The cognitive services on spark aint bad", "en-US"),
], ["text", "language"])
sentiment_df = (TextSentiment()
.setTextCol("text")
.setLocation("eastus")
.setUrl(end_point)
.setSubscriptionKey(service_key)
.setOutputCol("sentiment")
.setErrorCol("error")
.setLanguageCol("language")
.transform(input_df))
The expected answer will be like below.
Related
Hi all I'm new to Dask.
I faced an error when I tried using read_sql_query to get data from Oracle database.
Here is my python script:
con_str = "oracle+cx_oracle://{UserID}:{Password}#{Domain}/?service_name={Servicename}"
sql= "
column_a, column_b
from
database.tablename
where
mydatetime >= to_date('1997-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
"
from sqlalchemy.sql import select, text
from dask.dataframe import read_sql_query
sa_query= select(text(sql))
ddf = read_sql_query(sql=sa_query, con=con, index_col="index", head_rows=5)
I refered this post: Reading an SQL query into a Dask DataFrame
Remove "select" string from my query.
And I got an cx_Oracle.DatabaseError with missing expression [SQL: SELECT FROM DUAL WHERE ROWNUM <= 5]
But I don't get it where the query came from.
Seem like it didn't execute the sql code I provided.
I'm not sure which part I did not config right.
*Note: using pandas.read_sql is ok , only fail when using dask.dataframe.read_sql_query
In the power query, I cannot use the + operator to add type date to type time natively. What are some ways to add date and time to create a datetime value?
Definitely not intuitive for an Excel user, but the Power Query method is:
date & time
let
Source = Table.FromRecords(
{[date=#date(2022,1,1), time = #time(1,15,0)]},
type table [date=date, time=time]),
#"Added Custom" = Table.AddColumn(Source, "datetime", each [date] & [time], type datetime)
in
#"Added Custom"
In the MS documentation for Power Query operators it shows x & y, where x=date and y=time, =>merged datetime
There are some code snippets that work. You can create a custom function as well if you use these often.
DateTime.FromText(Text.Combine({Text.From(DateValue), " ", Text.From(TimeValue)}))
Another is to convert them to durations and process them that way.
(DateValue - #date(1900,1,1))+(TimeValue - #time(0,0,0)) + #datetime(1900,1,1,0,0,0)
OR
List.Sum({DateValue - #date(1900,1,1), TimeValue - #time(0,0,0),#datetime(1900,1,1,0,0,0)})
Finally
#datetime(Date.Year(DateValue), Date.Month(DateValue), Date.Day(DateValue), Time.Hour(TimeValue), Time.Minute(TimeValue), Time.Second(TimeValue))
I am trying to build a simple demand forecasting model using Azure AutoML in Synapse Notebook using Spark and SQL Context.
After aggregating the item quantity with respect to date and item id, this is what my data looks like this in the event_file_processed.parquet file:
The date range is from 2020-08-13 to 2021-02-08.
I am following this documentation by MS: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-forecast
Here's how I have divided my train_data and test_data parquet files:
%%sql
CREATE OR REPLACE TEMPORARY VIEW train_data
AS SELECT
*
FROM
event_file_processed
WHERE
the_date <= '2020-12-20'
ORDER BY
the_date ASC`
%%sql
CREATE OR REPLACE TEMPORARY VIEW test_data
AS SELECT
*
FROM
event_file_processed
WHERE
the_date > '2020-12-20'
ORDER BY
the_date ASC`
%%pyspark
train_data = spark.sql("SELECT * FROM train_data")
train_data.write.parquet("train_data.parquet")
test_data = spark.sql("SELECT * FROM test_data")
test_data.write.parquet("test_data.parquet")`
Below are my AutoML settings and run submission:
from azureml.automl.core.forecasting_parameters import ForecastingParameters
forecasting_parameters = ForecastingParameters(time_column_name='the_date',
forecast_horizon=44,
time_series_id_column_names=["items_id"],
freq='W',
target_lags='auto',
target_aggregation_function = 'sum',
target_rolling_window_size = 3,
short_series_handling_configuration = 'auto'
)
train_data = spark.read.parquet("train_data.parquet")
train_data.createOrReplaceTempView("train_data")
label = "total_item_qty"
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
import logging
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
experiment_timeout_minutes=15,
enable_early_stopping=True,
training_data=train_data,
label_column_name=label,
n_cross_validations=3,
enable_ensembling=False,
verbosity=logging.INFO,
forecasting_parameters = forecasting_parameters)
from azureml.core import Workspace, Datastore
# Enter your workspace subscription, resource group, name, and region.
subscription_id = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" #you should be owner or contributor
resource_group = "XXXXXXXXXXX" #you should be owner or contributor
workspace_name = "XXXXXXXXXXX" #your workspace name
ws = Workspace(workspace_name = workspace_name,
subscription_id = subscription_id,
resource_group = resource_group)
experiment = Experiment(ws, "AML-demand-forecasting-synapse")
local_run = experiment.submit(automl_config, show_output=True)
best_run, fitted_model = local_run.get_output()
I am badly stuck in the error below:
Error:
DataException: DataException:
Message: An invalid value for argument [y] was provided.
InnerException: InvalidValueException: InvalidValueException:
Message: Assertion Failed. Argument y is null. Target: y. Reference Code: b7440909-05a8-4220-b927-9fcb43fbf939
InnerException: None
ErrorResponse
I have checked that there are no null or rogue values in total_item_qty, the type in the schema for the 3 variables is also correct.
If you can please give some suggestions, I'll be obliged.
Thanks,
Shantanu Jain
Assuming you are not using the Notebooks that the Synapse UI generates. If you use the wizard in Synapse, it will actually generate a PySpark notebook that you can run and tweak.
That experience is described here: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-automl
The are two issues:
Since you are running from Synapse, you are probably intending to run AutoML on Spark compute. In this case, you need to pass a spark context to the AutoMLConfig constructor: spark_context=sc
Second, you seem to pass a Spark DataFrame to AutoML as the training data. AutoML only supports AML Dataset (TabularDataset) input types in the Spark scenario right now. You can make a conversion like this:
df = spark.sql("SELECT * FROM default.nyc_taxi_train")
datastore = Datastore.get_default(ws)
dataset = TabularDatasetFactory.register_spark_dataframe(df, datastore, name = experiment_name + "-dataset")
automl_config = AutoMLConfig(spark_context = sc,....)
Also curious to learn more about your use case and how you intend to use AutoML in Synapse. Please let me know if you would be interested to connect on that topic.
Thanks,
Nellie (from the Azure Synapse Team)
I am collecting the twitter stream and storing it in a sqlite db.Since the streams are coming and database is getting bigger i executed a command to delete the tweets that are older than a minute.But the tweets are there only and database is getting bigger.Please help since as I am new to sqlite
Here's the code
class listener(StreamListener):
def on_data(self,data):
try:
data = json.loads(data)
tweet = unidecode(data['text'])
text = preprocess(tweet)
score = predict(text)['score']
created_at = data['created_at']
c.execute('INSERT INTO sentiment (created_at,tweet,score) VALUES (?,?,?)',(created_at,tweet,score))
conn.commit()
c.execute('DELETE FROM sentiment WHERE created_at IN(SELECT created_at FROM(SELECT
created_at, strftime("%s","now") - strftime("%s",created_at) AS passed_time FROM
sentiment WHERE passed_time >=60))')
conn.commit()
except Exception as e:
print(str(e))
You are testing IN subquery, which in turn has a subquery,
and you're complaining that this complex approach didn't work,
that IN found no matches
among your "seconds since 1970" timestamps.
Ok. Your spec is much simpler than that, you said you want
a command to delete the tweets that are older than a minute
Piece of cake. Just follow that English sentence and turn it into SQL:
DELETE FROM sentiment WHERE created_at < strftime('%s', 'now') - 60;
Current time minus sixty seconds is a minute ago,
and the WHERE clause asks for rows older than that.
I am using below query but it is not extracting defects which is linked to Test Steps of Test Cases.
So all the defects which are linked to test steps are not coming out.
SELECT Distinct BUG.BG_BUG_ID as "Defect ID", BUG.BG_DETECTED_BY as "Detected By", BUG.BG_SUMMARY as "Defect Name", BUG.BG_USER_03 as "Application", BUG.BG_USER_01 as "Category", LINK.LN_ENTITY_TYPE as "Type", TEST.TS_TEST_ID as "Test ID", TEST.TS_NAME as "Test Name", TEST.TS_USER_03 as "Test Category"
FROM LINK, BUG, TEST,STEP Where LINK.LN_BUG_ID = BUG.BG_BUG_ID AND LINK.LN_ENTITY_ID = TS_TEST_ID --AND BUG.BG_STATUS in ('Closed', 'Canceled', 'Duplicate') AND BUG.BG_DETECTED_IN_RCYC in (Select RELEASE_CYCLES.RCYC_ID from RELEASE_CYCLES where RELEASE_CYCLES.RCYC_NAME IN ('UAT - 3.1 MVA'))
Order by BUG.BG_BUG_ID