I have a table with a few key columns created with nvarchar(80) => unicode.
I can list the full dataset with SELECT * statement (Table1) and can confirm the values I need to filter are there.
However, I can't get any results from that table if I filter rows by using as input an alphabet char on any column.
Columns in table1 stores values in cyrilic characters.
I know it must have to do with character encoding => what I see in the result list is not what I use as input characters.
Unicode nvarchar type should resolve automatically this character type mismatch.
What do you suggest me to do in order to get results?
Thank you very much.
Paulo
I have a Hive table whit data stored as ORC.
I write in some fields empty values (blank, '"") but sometimes when I run a select query on this table the empty string columns are shown as NULL in the query result.
I would like to see the empty values I entered, how is this possible?
If you want to see, empty values for NULL in hive table, then you can use NVL function, which can help you to produce default values for NULL column values.
Below is syntax,
NVL(arg1, arg2) - here argument 1 is expression or column and arg2 is default value for
NULL values.
e.g. Query - SELECT NVL(blank,'') as blank_1 AS FROM db.table;
In Sqlite we have table1 with column column1
there are 4 rows with following values for column1
(p1,p10,p11,p20)
DROP TABLE IF EXISTS table1;
CREATE TABLE table1(column1 NVARCHAR);
INSERT INTO table1 (column1) values ('p1'),('p10'),('p11'),('p20');
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',','+column1+',') from table1;
We have to get the position of each value of column1 in the given string:
,p112,p108,p124,p204,p11,p1124,p1,p10,p20,
the query
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',column1) from table1;
returns values
(2,7,2,17)
which is not what we want
the query
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',','+column1+',') from table1;
returns 9 for all rows -
it turned out that it is the position of first "0" symbol ???
Howe we can get the exact positions of column1 in the given string in sqlite ??
In SQLite the concatenation operator is || and not + (like SQL Server), so do this:
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',',' || column1 || ',') from table1;
What you did with your code was number addition which resulted to 0, because none of the string operands could succesfully be converted to number,
so instr() was searching for '0' and found it always at position 9 of the string:',p112,p108,p124,p204,p11,p1124,p1,p10,p20,'.
Here is my dataframe
Word 1_gram-Probability
0 ('A',) 0.001461
1 ('45',) 0.000730
now i just want to select the row where Word is 45. i tried
print(simple_df.loc[simple_df['Word']=='45'])
but i get
Empty DataFrame
what am i missing? Is this the correct way of accessing the row? I also tried ('45',) as the value but that did not work either.
It appears that you have the literal string value "('45',)" in the cell of your dataframe. You must select it exactly so.
simple_df.loc[simple_df['Word']=="('45',)"]
Is there any difference in semantics between df.na().drop() and df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull() && !df.col("onlyColumnInOneColumnDataFrame").isNaN()) where df is Apache Spark Dataframe?
Or shall I consider it as a bug if the first one does NOT return afterwards null (not a String null, but simply a null value) in the column onlyColumnInOneColumnDataFrame and the second one does?
EDIT: added !isNaN() as well. The onlyColumnInOneColumnDataFrame is the only column in the given Dataframe. Let's say it's type is Integer.
With df.na.drop() you drop the rows containing any null or NaN values.
With df.filter(df.col("onlyColumnInOneColumnDataFrame").isNotNull()) you drop those rows which have null only in the column onlyColumnInOneColumnDataFrame.
If you would want to achieve the same thing, that would be df.na.drop(["onlyColumnInOneColumnDataFrame"]).
In one case, I had to select records with NAs or nulls or >=0. I could so by using only coalesce function and none of like above 3 functions.
rdd.filter("coalesce(index_column, 1000) >= 0")