I followed the steps in this question
Append text to column data based on the column in PostgreSQL
But I made a mistake in the SIMILAR TO clause and the text got added to fields it shouldn't have. How can I reverse it?
Te query I ran was:
update metadatavalue set text_value = 'Fil: ' || text_value where metadata_field_id = 136 and text_value not similar to 'Fill:%';
How can I remove extre characters from those fields?
Thanks a lot in advance.
You can trim the prepended string off and update the column with the result:
UPDATE metadatavalue
SET text_value = regexp_replace(text_value, '^Fil: ','');
Related
I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc
I have to create a clean list wherein names with 'Trust' or 'Trustee' in rows get deleted.
I'm using the following code but i'm not getting the desired result ?
df_clean = df[~df['Row Labels'].str.contains('trusteeship')]
eg : if the 'Row Labels' contains a row with ABC Trust or XYTrusteeshipZ, then the whole row should get deleted.
df_clean = df[~df['Row Labels'].str.contains('Trust')]
df_clean = df[~df['Row Labels'].str.lower().str.contains('trust')]
You can match with case=False parameter for ignore lower/uppercase characters:
df_clean = df[~df['Row Labels'].str.contains('trust', case=False)]
Or first convert values to lowercase like mentioned #anon01 in comments:
df_clean = df[~df['Row Labels'].str.lower().str.contains('trust')]
I'm trying to find the amount of words in this table:
Download Table here: http://www.mediafire.com/file/m81vtdo6bdd7bw8/Table_RandomInfoMiddle.mat/file
Words are indicated by the "Type" criteria, being "letters". The key thing to notice is that not everything in the table is a word, and that the entry "" registers as a word. In other words I need to determine the amount of words, by only counting "letters", except if it is a "missing".
Here is my attempt (Yet unsuccessful - Notice the two mentions of "Problem area"):
for col=1:size(Table_RandomInfoMiddle,2)
column_name = sprintf('Words count for column %d',col);
MiddleWordsType_table.(column_name) = nnz(ismember(Table_RandomInfoMiddle(:,col).Variables,{'letters'}));
MiddleWordsExclusionType_table.(column_name) = nnz(ismember(Table_RandomInfoMiddle(:,col).Variables,{'<missing>'})); %Problem area
end
%Call data from table
MiddleWordsType = table2array(MiddleWordsType_table);
MiddleWordsExclusionType = table2array(MiddleWordsExclusionType_table); %Problem area
%Take out zeros where "Type" was
MiddleWordsTotal_Nr = MiddleWordsType(MiddleWordsType~=0);
MiddleWordsExclusionTotal_Nr = MiddleWordsExclusionType(MiddleWordsExclusionType~=0);
%Final answer
FinalMiddleWordsTotal_Nr = MiddleWordsTotal_Nr-MiddleWordsExclusionTotal_Nr;
Any help will be appreciated. Thank you!
You can get the unique values from column 1 when column 2 satisfies some condition using
MiddleWordsType = numel( unique( ...
Table_RandomInfoMiddle{ismember(Table_RandomInfoMiddle{:,2}, 'letters'), 1} ) );
<missing> is a keyword in a categorical array, not literally the string "<missing>". That's why it appears blue and italicised in the workspace. If you want to check specifically for missing values, you can use this instead of ismember:
ismissing( Table_RandomInfoMiddle{:,1} )
I am trying to clean text strings containing any ' or ' (which includes an ; but if i add it here you will see just ' again. Because the the ANSI is also encoded by stackoverflow. The string content contains ' and when it does there is an error.
when i insert the string to my database i get this error:
psycopg2.ProgrammingError: syntax error at or near "s"
LINE 1: ...tment and has commenced a search for mr. whitnell's
the original string looks like this:
...a search for mr. whitnell's...
To remove the ' and ' ; I use:
stripped_content = stringcontent.replace("'","")
stripped_content = stringcontent.replace("' ;","")
any advice is welcome, best regards
When you try to replace("' ;","") it literally searching for "' ;" occurrences in string. You need to convert "' ;" to its character equivalent. Try this:
s = "That's how we 'roll"
r = s.replace(chr(int('''[2:])), "")
and with this chr(int('''[2:])) you'll get ' character.
Output:
Thats how we roll
Note
If you try to run this s.replace(chr(int('''[2:])), "") without saving your result in variable then your original string would not be affected.
I need to clean up an address field in PostgreSQL 8.4 by removing everything to the right of a street name. This includes dropping suites ("100 Broadway Street Suite 100") and correcting names that have unit numbers appended to the street name ("100 Broadway Street100") so that the result in both cases would be "100 Broadway Street".
Essentially I am trying to remove everything to the right of "Street". I can't seem to get a replace function to work without individually coding for each case. A rtrim function also doesn't work because the characters I want removed would be a wildcard.
Here is what I am trying to get to work:
update *tablename* set *fieldname* = replace (*fieldname*, '%STREET%', '%STREET')
This SQL below works, but I don't want to code each possible combination:
UPDATE *tablename* set *fieldname* = replace (*fieldname*, ' SUITE 100', '');
UPDATE *tablename* set *fieldname* = replace (*fieldname*, ' STREET100', ' STREET');
How can I remove everything to the right of a string "Street" without explicitly specifying what follows "Street"?
Thanks for the help.
Try something like this:
SELECT regexp_replace('100 broadway street 100', '(.*)(Street).*', '\1\2', 'i');
The above is basically looking for anything followed by "Street" (case insensitively, per the last 'i' agrument), and then stripping off everything after the "Street", which I think is what you're asking for. See http://www.postgresql.org/docs/current/static/functions-matching.html for more details.
This truncates after the first instance of 'STREET':
UPDATE tablename
SET fieldname = SUBSTR(fieldname, 1, position('STREET' IN fieldname) + 5)
WHERE fieldname LIKE '%STREET%'
Update: If the desire is to have a case-insensitive search of "STREET":
UPDATE tablename
SET fieldname = SUBSTR(fieldname, 1, position('STREET' IN UPPER(fieldname)) + 5)
WHERE UPPER(fieldname) LIKE '%STREET%'