I am using the following
Product objProduct = new Product("active_flag","true");
This one will result multiple row, how can I access the multple rows? ObjProduct will have only one row?
If you're using 2.1 or above you can do the following:
ProductCollection products = DB.Select().From(Product.Schema)
.Where(Product.Columns.active_flag).IsEqualTo(true)
.ExecuteAsCollection<ProductCollection>();
Another way...
ProductCollection products = new ProductCollection()
.Where(Product.Columns.active_flag, true)
.Load();
Related
Iam new to python please help me in below problem
I have a dictionary as below
city = {"AP":"VIZAG","TELANGANA":"HYDERABAD"}
and also I have a list which I need to loop for all state tables as below
states=['AP','HYDERABAD']
for st in states:
df = spark.sql(f"""select * from {st} where city = {city}["{st}"]""")
In above df I am trying to filter city based on dictionary value as per state. But I am not able to do it
New answer
By combining two filter conditions you can do the expected filtering.
selected_city = 'AP'
df = df.filter(
(F.col('city') == selected_city)
& (F.col('state') == cities[selected_city])
)
Old answer
It is a simple change: You can use isin to filter a column based on a list [Docs].
cities = list(city.keys())
df = df.filter(F.col('city').isin(cities))
If you want to construct more complex conditions based on a dictionary see this question.
[Edit] Updated answer based on OPs comment. Will leave the old one in there for completeness.
So as far as I know Apache Spark doesn't has a functionality that imitates the update SQL command. Like, I can change a single value in a column given a certain condition. The only way around that is to use the following command I was instructed to use (here in Stackoverflow): withColumn(columnName, where('condition', value));
However, the condition should be of column type, meaning I have to use the built in column filtering functions apache has (equalTo, isin, lt, gt, etc). Is there a way I can instead use an SQL statement instead of those built in functions?
The problem is I'm given a text file with SQL statements, like WHERE ID > 5 or WHERE AGE != 50, etc. Then I have to label values based on those conditions, and I thought of following the withColumn() approach but I can't plug-in an SQL statement in that function. Any idea of how I can go around this?
I found a way to go around this:
You want to split your dataset into two sets: the values you want to update and the values you don't want to update
Dataset<Row> valuesToUpdate = dataset.filter('conditionToFilterValues');
Dataset<Row> valuesNotToUpdate = dataset.except(valuesToUpdate);
valueToUpdate = valueToUpdate.withColumn('updatedColumn', lit('updateValue'));
Dataset<Row> updatedDataset = valuesNotToUpdate.union(valueToUpdate);
This, however, doesn't keep the same order of records as the original dataset, so if order is of importance to you, this won't suffice your needs.
In PySpark you have to use .subtract instead of .except
If you are using DataFrame, you can register that dataframe as temp table,
using df.registerTempTable("events")
Then you can query like,
sqlContext.sql("SELECT * FROM events "+)
when clause translates into case clause which you can relate to SQL case clause.
Example
scala> val condition_1 = when(col("col_1").isNull,"NA").otherwise("AVAILABLE")
condition_1: org.apache.spark.sql.Column = CASE WHEN (col_1 IS NULL) THEN NA ELSE AVAILABLE END
or you can chain when clause as well
scala> val condition_2 = when(col("col_1") === col("col_2"),"EQUAL").when(col("col_1") > col("col_2"),"GREATER").
| otherwise("LESS")
condition_2: org.apache.spark.sql.Column = CASE WHEN (col_1 = col_2) THEN EQUAL WHEN (col_1 > col_2) THEN GREATER ELSE LESS END
scala> val new_df = df.withColumn("condition_1",condition_1).withColumn("condition_2",condition_2)
Still if you want to use table, then you can register your dataframe / dataset as temperory table and perform sql queries
df.createOrReplaceTempView("tempTable")//spark 2.1 +
df.registerTempTable("tempTable")//spark 1.6
Now, you can perform sql queries
spark.sql("your queries goes here with case clause and where condition!!!")//spark 2.1
sqlContest.sql("your queries goes here with case clause and where condition!!!")//spark 1.6
If you are using java dataset
you can update dataset by below.
here is the code
Dataset ratesFinal1 = ratesFinal.filter(" on_behalf_of_comp_id != 'COMM_DERIVS' ");
ratesFinal1 = ratesFinal1.filter(" status != 'Hit/Lift' ");
Dataset ratesFinalSwap = ratesFinal1.filter (" on_behalf_of_comp_id in ('SAPPHIRE','BOND') and cash_derivative != 'cash'");
ratesFinalSwap = ratesFinalSwap.withColumn("ins_type_str",functions.lit("SWAP"));
adding new column with value from existing column
ratesFinalSTW = ratesFinalSTW.withColumn("action", ratesFinalSTW.col("status"));
I have an "iplRDD" which is a json, and I do below steps and query through hivecontext. I get the results but without columns headers. Is there is a way to get the columns names along with the values?
val teamRDD = hiveContext.jsonRDD(iplRDD)
teamRDD.registerTempTable("teams")
hiveContext.cacheTable("teams")
val result = hiveContext.sql("select * from teams where team_name = "KKR" )
result.collect.foreach(println)
Any thoughts please ?
teamRDD.schema.fieldNames should contain the header names.
You can get it by using:
result.schema().fields();
you can save your dataframe 'result' like this with header as csv file:
result.write().format("com.databricks.spark.csv").option("header", "true").save(outputPath);
I've got a question regarding how to retrieve the auto-increment or identity value for a column in SQL Server 2005, when said column is not the first declared column in a table.
I can get the generated value for a table just by issuing the following code:
MyTable newRecord = new MyTable();
newRecord.SomeColumn = 2;
newRecord.Save();
return newRecord.MyIdColumn;
Which works fine regardles of how many other columns make up the primary key of that particular table, but the first column declared MUST be the identity column, otherwise this doesn't work.
My problem is that I have to integrate my code with other tables that are out of my reach, and they have identity columns which are NOT the first columns in those tables, so I was wondering if there is a proper workaround to my problem, or if I'm stuck using something along the lines of SELECT ##IDENTITY to manually get that value?
Many thanks in advance for all your help!
From the "ewww gross" department:
Here's my workaround for now, hopefully someone may propose a better solution to what I did.
MyTable newRecord = new MyTable();
newRecord.SomeColumn = 2;
newRecord.Save();
CodingHorror horror = new CodingHorror();
string SQL = "SELECT IDENT_CURRENT(#tableName)";
int newId = horror.ExecuteScalar<int>(SQL, "MyTable");
newRecord.MyIdColumn = newId;
newRecord.MarkClean();
return newRecord.MyIdColumn;
I want to perform a simple join on two tables (BusinessUnit and UserBusinessUnit), so I can get a list of all BusinessUnits allocated to a given user.
The first attempt works, but there's no override of Select which allows me to restrict the columns returned (I get all columns from both tables):
var db = new KensDB();
SqlQuery query = db.Select
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
The second attept allows the column name restriction, but the generated sql contains pluralised table names (?)
SqlQuery query = new Select( new string[] { BusinessUnitTable.IdColumn, BusinessUnitTable.NameColumn } )
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
Produces...
SELECT [BusinessUnits].[Id], [BusinessUnits].[Name]
FROM [BusinessUnits]
INNER JOIN [UserBusinessUnits]
ON [BusinessUnits].[Id] = [UserBusinessUnits].[BusinessUnitId]
WHERE [BusinessUnits].[RecordStatus] = #0
AND [UserBusinessUnits].[UserId] = #1
So, two questions:
- How do I restrict the columns returned in method 1?
- Why does method 2 pluralise the column names in the generated SQL (and can I get round this?)
I'm using 3.0.0.3...
So far my experience with 3.0.0.3 suggests that this is not possible yet with the query tool, although it is with version 2.
I think the preferred method (so far) with version 3 is to use a linq query with something like:
var busUnits = from b in BusinessUnit.All()
join u in UserBusinessUnit.All() on b.Id equals u.BusinessUnitId
select b;
I ran into the pluralized table names myself, but it was because I'd only re-run one template after making schema changes.
Once I re-ran all the templates, the plural table names went away.
Try re-running all 4 templates and see if that solves it for you.