Extracting sql insert with sed, line cuts - linux

I've been reading on stackoverflow about the use of sed for extracting data from sql dumps, being more accurate, the final purpose is to extract inserts for an specific table in order to restore only that table.
I’m using this:
sed -n '/LOCK TABLES `TABLE_NAME`/,/UNLOCK TABLES/p' dump.sql > output.sql
The problem that I’m having is that we have inserts on 1 line that are more than 50Mb long, so while extracting the insert, the output gets cut before the end of the line.
like:
......
(4
3458,'0Y25565137SEOEJ','001','PREPAR',1330525937741,
NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(43459,'666
I tried to use awk and even simple grep and the result is the same, the line gets cut.
Edit: Im using this on a sql dump from mysql and the system I'm working on is a Centos 5.2

You can try awk and see if it's better (I think so) :
awk '/LOCK TABLES `TABLE_NAME`/,/UNLOCK TABLES/' dump.sql > output.sql

But if it's a dump file created with exp, you can import only the needed tables with
imp user/pass tables=table1,table2 ...

Related

Split single record into Multiple records in Unix shell Script

I have record
Example:
EMP_ID|EMP_NAME|AGE|SALARAy
123456|XXXXXXXXX|30|10000000
Is there a way i can split the record into multiple records. Example output should be like
EMP_ID|Attributes
123456|XXXXXXX
123456|30
123456|10000000
I want to split the same record into multiple records. Here Employee id is my unique column and remaining 3 columns i want to run in a loop and create 3 records. Like EMP_ID|EMP_NAME , EMP_ID|AGE , EMP_ID|SALARY. I may have some more columns as well but for sample i have provided 3 columns along with Employee id.
Please help me with any suggestion.
With bash:
record='123456|XXXXXXXXX|30|10000000'
IFS='|' read -ra fields <<<"$record"
for ((i=1; i < "${#fields[#]}"; i++)); do
printf "%s|%s\n" "${fields[0]}" "${fields[i]}"
done
123456|XXXXXXXXX
123456|30
123456|10000000
For the whole file:
{
IFS= read -r header
while IFS='|' read -ra fields; do
for ((i=1; i < "${#fields[#]}"; i++)); do
printf "%s|%s\n" "${fields[0]}" "${fields[i]}"
done
done
} < filename
Record of lines with fields separated by a special delimiter character such as | can be manipulated by basic Unix command line tools such as awk. For example with your input records in file records.txt:
awk -F\| 'NR>1{for(i=2;i<=NF;i++){print $1"|"$(i)}}' records.txt
I recommend to read a awk tutorial and play around with it. Related command line tools worth to learn include grep, sort, wc, uniq, head, tail, and cut. If you regularly do data processing of delimiter-separated files, you will likely need them on a daily basis. As soon as your data structuring format gets more complex (e.g. CSV format with possibility to also use the delimiter character in field values) you need more specific tools, for instance see this question on CSV tools or jq for processing JSON. Still knowledge of basic Unix command line tools will save you a lot of time.

CQL3.2: DROP TABLE with certain prefix?

I have a Cassandra 2.1.8 database with a bunch of tables, all in the form of either "prefix1_tablename" or "prefix2_tablename".
I want to DROP every table that begins with prefix1_ and leave anything else alone.
I know I can grab table names using the query:
SELECT columnfamily_name FROM system.schema_columnfamilies
WHERE keyspace_name='mykeyspace'
And I thought about filtering the results somehow to get only prefix1_ tables, putting them into a table with DROP TABLE prepended to each one, then executing all the statements in my new table. It was similar thinking to strategies I've seen for people solving the same problem with MySQL or Oracle.
With CQL3.2 though, I don't have access to User-Defined Functions (at least according to the docs I've read...) and I don't know how to do something like execute statements off of a table query result, as well as even how to filter out prefix1_ tables with no LIKE operator in Cassandra.
Is there a way to accomplish this?
I came up with a Bash shell script to solve my own issue. Once I realized that I could export the column families table to a CSV file, it made more sense to me to perform the filtering and text manipulation with grep and awk as opposed to finding a 'pure' cqlsh method.
The script I used:
#!/bin/bash
# No need for a USE command by making delimiter a period
cqlsh -e "COPY system.schema_columnfamilies (keyspace_name, columnfamily_name)
TO 'alltables.csv' WITH DELIMITER = '.';"
cat alltables.csv | grep -e '^mykeyspace.prefix1_' \
| awk '{print "DROP TABLE " $0 ";"}' >> remove_prefix1.cql
cqlsh -f 'remove_prefix1.cql'
rm alltables.csv remove_prefix1.cql

How do we build Normalized table from DeNormalized text file one?

How do we build Normalized table from DeNormalized text file one?
Thanks for your replies/time.
We need to build a Normalized DB Table from DeNormalized text file. We explored couple of options such as unix shell , and PostgreSQL etc. I am looking learn better ideas for resolutions from this community.
The input text file is various length with comma delimited records. The content may look like this:
XXXXXXXXXX , YYYYYYYYYY, TTTTTTTTTTT, UUUUUUUUUU, RRRRRRRRR,JJJJJJJJJ
111111111111, 22222222222, 333333333333, 44444444, 5555555, 666666
EEEEEEEE,WWWWWW,QQQQQQQ,PPPPPPPP
We like to normalize as follows (Split & Pair):
XXXXXXXXXX , YYYYYYYYYY
TTTTTTTTTTT, UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111, 22222222222
333333333333, 44444444
5555555, 666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP
Do we need to go with text pre-process and Load approach?
If yes, what is the best way to pre-process?
Are there any single SQL/Function approach to get the above?
Thanks in helping.
Using gnu awk (due to the RS)
awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" file
XXXXXXXXXX,YYYYYYYYYY
TTTTTTTTTTT,UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111,22222222222
333333333333,44444444
5555555,666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP
{$1=$1} Cleans up and remove extra spaces
NR%2==1 {printf "%s,",$0} prints odd parts
NR%2==0 prints even part and new line
RS="[,\n]" sets the record to , or newline
Here is an update. Here is what I did in Linux server.
sed -i 's/\,,//g' inputfile <------ Clean up lot of trailing commas
awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" inputfile <----Jotne's idea
dos2unix -q -n inputfile outputfle <------ to remove ^M in some records
outputfile is ready to process as comma delimited format
Any thoughts to improve above steps further?
Thanks in helping.

Cassandra selective copy

I want to copy selected rows from a columnfamily to a .csv file. The copy command is available just to dump a column or entire table to a file without where clause. Is there a way to use where clause in copy command?
Another way I thought of was,
Do "Insert into table2 () values ( select * from table1 where <where_clause>);" and then dump the table2 to .csv , which is also not possible.
Any help would be much appreciated.
There are no way to make a where clause in copy, but you can use this method :
echo "select c1,c2.... FROM keySpace.Table where ;" | bin/cqlsh > output.csv
It allows you to save your result in the output.csv file.
No, there is no built-in support for a "where" clause when exporting to a CSV file.
One alternative would be to write your own script using one of the drivers. In the script you would do the "select", then read the results and write out to a CSV file.
In addition to Amine CHERIFI's answer:
| sed -e 's/^\s+//; s_\s*\|\s*_,_g; /^-{3,}|^$|^\(.+\)$/d'
Removes spaces
Replaces | with ,
Removes header separator, empty and summary lines
Other ways to run the SQL with filter and redirect the response to csv
1) Inside the cqlsh, use the CAPTURE command and redirect the output to a file. You need to set the tracing on before executing the command
Example: CAPTURE 'output.txt' -- output of the sql executed after this command gets captured into output.txt file
2) In case if you would like to redirect the SQL output to a file from outside of cqlsh
./cqlsh -e'select * from keyspaceName.tableName' > fileName.txt -- hostname

Is it possible to load a subset of columns using Sybase 15 bcp?

I have a CSV file with 20 or so columns and I want to load it into a table with only 9 columns - I want to throw away the rest.
Can I do it directly with bcp or do I need to preprocess the file to strip it down to just what I need?
The manual does not seem to detail it.
But then I seem to have options that arent in the manual, eg -labeled ?
Thanks in advance, Chris
No, this isn't possible with bcp.
You can combine pipes, awk and bcp.
F.e.
In the first shell:
mknod bcp.pipe p
cat > awk > bcp.pipe
in the second shell:
bcp db..table in bcp.pipe -c -U ...
You could create a view on the table which only includes the columns you want. Then bcp out the view instead of the table.

Resources