File: bulk_loader.1

package info (click to toggle)
skytools 2.1.8-2.2
  • links: PTS, VCS
  • area: main
  • in suites: squeeze
  • size: 1,980 kB
  • ctags: 1,543
  • sloc: sql: 6,635; python: 6,237; ansic: 2,799; makefile: 308; sh: 268
file content (231 lines) | stat: -rw-r--r-- 5,473 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
.\"     Title: bulk_loader
.\"    Author: 
.\" Generator: DocBook XSL Stylesheets v1.73.2 <http://docbook.sf.net/>
.\"      Date: 09/22/2008
.\"    Manual: 
.\"    Source: 
.\"
.TH "BULK_LOADER" "1" "09/22/2008" "" ""
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.SH "NAME"
bulk_loader - PgQ consumer that loads urlencoded records to slow databases
.SH "SYNOPSIS"
.sp
.RS 4
.nf
bulk_loader\.py [switches] config\.ini
.fi
.RE
.SH "DESCRIPTION"
bulk_loader is PgQ consumer that reads url encoded records from source queue and writes them into tables according to configuration file\. It is targeted to slow databases that cannot handle applying each row as separate statement\. Originally written for BizgresMPP/greenplumDB which have very high per\-statement overhead, but can also be used to load regular PostgreSQL database that cannot manage regular replication\.
.sp
Behaviour properties: \- reads urlencoded "logutriga" records\. \- does not do partitioning, but allows optionally redirect table events\. \- does not keep event order\. \- always loads data with COPY, either directly to main table (INSERTs) or to temp tables (UPDATE/COPY) then applies from there\.
.sp
Events are usually procuded by pgq\.logutriga()\. Logutriga adds all the data of the record into the event (also in case of updates and deletes)\.
.sp
.SH "QUICK-START"
Basic bulk_loader setup and usage can be summarized by the following steps:
.sp
.sp
.RS 4
\h'-04' 1.\h'+02'pgq and logutriga must be installed in source databases\. See pgqadm man page for details\. target database must also have pgq_ext schema\.
.RE
.sp
.RS 4
\h'-04' 2.\h'+02'edit a bulk_loader configuration file, say bulk_loader_sample\.ini
.RE
.sp
.RS 4
\h'-04' 3.\h'+02'create source queue
.sp
.RS 4
.nf
$ pgqadm\.py ticker\.ini create <queue>
.fi
.RE
.RE
.sp
.RS 4
\h'-04' 4.\h'+02'Tune source queue to have big batches:
.sp
.RS 4
.nf
$ pgqadm\.py ticker\.ini config <queue> ticker_max_count="10000" ticker_max_lag="10 minutes" ticker_idle_period="10 minutes"
.fi
.RE
.RE
.sp
.RS 4
\h'-04' 5.\h'+02'create target database and tables in it\.
.RE
.sp
.RS 4
\h'-04' 6.\h'+02'launch bulk_loader in daemon mode
.sp
.RS 4
.nf
$ bulk_loader\.py \-d bulk_loader_sample\.ini
.fi
.RE
.RE
.sp
.RS 4
\h'-04' 7.\h'+02'start producing events (create logutriga trggers on tables) CREATE OR REPLACE TRIGGER trig_bulk_replica AFTER INSERT OR UPDATE ON some_table FOR EACH ROW EXECUTE PROCEDURE pgq\.logutriga(\fI<queue>\fR)
.RE
.SH "CONFIG"
.SS "Common configuration parameters"
.PP
job_name
.RS 4
Name for particulat job the script does\. Script will log under this name to logdb/logserver\. The name is also used as default for PgQ consumer name\. It should be unique\.
.RE
.PP
pidfile
.RS 4
Location for pid file\. If not given, script is disallowed to daemonize\.
.RE
.PP
logfile
.RS 4
Location for log file\.
.RE
.PP
loop_delay
.RS 4
If continuisly running process, how long to sleep after each work loop, in seconds\. Default: 1\.
.RE
.PP
connection_lifetime
.RS 4
Close and reconnect older database connections\.
.RE
.PP
use_skylog
.RS 4
foo\.
.RE
.SS "Common PgQ consumer parameters"
.PP
pgq_queue_name
.RS 4
Queue name to attach to\. No default\.
.RE
.PP
pgq_consumer_id
.RS 4
Consumers ID to use when registering\. Default: %(job_name)s
.RE
.SS "Config options specific to bulk_loader"
.PP
src_db
.RS 4
Connect string for source database where the queue resides\.
.RE
.PP
dst_db
.RS 4
Connect string for target database where the tables should be created\.
.RE
.PP
remap_tables
.RS 4
Optional parameter for table redirection\. Contains comma\-separated list of <oldname>:<newname> pairs\. Eg:
oldtable1:newtable1, oldtable2:newtable2\.
.RE
.PP
load_method
.RS 4
Optional parameter for load method selection\. Available options:
.TS
tab(:);
lt lt
lt lt
lt lt.
T{
0
T}:T{
UPDATE as UPDATE from temp table\. This is default\.
T}
T{
1
T}:T{
UPDATE as DELETE+COPY from temp table\.
T}
T{
2
T}:T{
merge INSERTs with UPDATEs, then do DELETE+COPY from temp table\.
T}
.TE
.sp
.RE
.SH "LOGUTRIGA EVENT FORMAT"
PgQ trigger function pgq\.logutriga() sends table change event into queue in following format:
.PP
ev_type
.RS 4

(op || ":" || pkey_fields)\. Where op is either "I", "U" or "D", corresponging to insert, update or delete\. And
pkey_fields
is comma\-separated list of primary key fields for table\. Operation type is always present but pkey_fields list can be empty, if table has no primary keys\. Example:
I:col1,col2
.RE
.PP
ev_data
.RS 4
Urlencoded record of data\. It uses db\-specific urlecoding where existence of
\fI=\fR
is meaningful \- missing
\fI=\fR
means NULL, present
\fI=\fR
means literal value\. Example:
id=3&name=str&nullvalue&emptyvalue=
.RE
.PP
ev_extra1
.RS 4
Fully qualified table name\.
.RE
.SH "COMMAND LINE SWITCHES"
Following switches are common to all skytools\.DBScript\-based Python programs\.
.PP
\-h, \-\-help
.RS 4
show help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.PP
\-d, \-\-daemon
.RS 4
make program go background
.RE
.sp
Following switches are used to control already running process\. The pidfile is read from config then signal is sent to process id specified there\.
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE